Delphi Clinic C++Builder Gate Training & Consultancy Delphi Notes Weblog Dr.Bob's Webshop
Bob Swart (aka Drs.Bob) Dr.Bob's Delphi Clinics Dr.Bob's Delphi Courseware Manuals
View Bob Swart's profile on LinkedIn Drs.Bob's Delphi Notes
These are the voyages using Delphi Enterprise (and Architect). Its mission: to explore strange, new worlds. To design and build new applications. To boldly go...
Title:

Unicode tip #4 - Using and Extending TCharacter with IsVowel

Author: Bob Swart
Posted: 11/25/2008 12:34:50 PM (GMT+1)
Content:

A helpful Unicode support class in Delphi 2009 is the TCharacter class, which is a sealed class which only consists of static class functions to check whether a character is a Digit, a Letter, etc.
The TCharacter class is the “solution” to eliminate compiler warnings when you combine Chars with Char sets (sets can only contain AnsiChar values, so an expression is changed which results in a compiler warning).

  var
C: Char;
begin
// assign some Char value to C
if C in ['a'..'z','A'..'Z'] then
And the set expression should be replaced by a call to IsLetter from the TCharacter class, as follows:
  if TCharacter.IsLetter(C) then
While this works for this particular case, sometimes we need a test that doesn’t already exists in the TCharacter class, like the following:
  var
C: Char;
begin
// assign some Char value to C
if C in ['a','e','i','o','u'] then
There is no IsVowel function in TCharacter, but the compiler warning itself actually already suggests the CharInSet method from SysUtils, so in this case we can change the code as follows:
  var
C: Char;
begin
// assign some Char value to C
if CharInSet(C, ['a','e','i','o','u']) then
Having said that... Although the TCharacter class is a sealed class, we can still extend it with an IsVowel method by using class helpers, as follows:
  type
TMyChar = class helper for TCharacter
class function IsVowel(C: Char): Boolean;
end;
The implementation uses the newly mentioned CharInSet function as follows:
  class function TMyChar.IsVowel(C: Char): Boolean;
begin
Result := CharInSet(C, ['a', 'e', 'o', 'i', 'u'])
end;
And now we can replace the CharInSet call with a simple call to TCharacter.IsVowel as follows:
  var
C: Char;
begin
// assign some Char value to C
if TCharacter.IsVowel(C) then
With no more compiler warnings. Note that the unit that defines the IsVowel method must be added to the uses clause of any other unit where you want to use this functionality.

This tip is the fourth in a series of Unicode tips taken from my Delphi 2009 Development Essentials book published earlier this week on Lulu.com.

Back  


10 Comments

AuthorPostedComments
Olaf Monien 08/11/25 15:04:47The idea of your code is clear, but is not quite correct though. The problem is that you are working on Unicode strings/characters and you take the assumption that 'a', 'e', 'o', 'i', 'u' the only vowels in Unicode / and or that your source string is of some western language only (English, Dutch, German etc). If you have a user typing in Cyrillic strings for example, then your code will just fail. Cyrillic has many different vowels ...
Bob Swart 08/11/25 15:07:14I never said I would find all Vowels, just used this as an example (to show how CharInSet and TCharacter can be used, and how TCharacter can be extended). A developer using Cyrillic strings would hopefully not have a hard time writing his own IsCurillicVowel based on this example.
Lars D 08/11/25 17:48:25The definition of a vowel depends on the language, so any IsVowel() function should also have a locale parameter.
Keld R. Hansen 08/11/26 06:59:22F.ex. the letter "Y" is not a vowel in English, but is in Danish, Norwegian and Swedish (and probably other languages as well).
Olaf Monien 08/11/26 13:32:36The point is, that as long as you ignore all "localization" issues, you don't have to worry much about String being Unicode with D2009. As soon as you start thinking "unicodized" though, things may get a bit more complex (as Lars' and Keld's statements show). This example is certainly a good one for demonstrating the usefulness of Extension Methods. It does not really qualify as best practice though under a "Unicode tip" category - imo. I believe IsVowel has not been implemented by CodeGear for good reasons.
Russian_Developer 08/11/26 22:13:37With cyrillic string CharInSet is helpless. It's absolutely not work. CodeGear implementation of this function is bad joke. Result := (C < #$0100) and (AnsiChar(C) in CharSet); - it's body of this function for Unicode. But russian chars code is BIGGER than #$0100. Oye, I can realize my own function. But Unicode is not two bytes for Latin. It's _international_ standard for all language.
Olaf Monien 08/11/27 09:51:44The CharInSet "problem" is that Delphi/Pascal Sets may have a maximum of 256 elements (Cardinality) AND the base type's Cardinality must not exceed 256 either. In other words you cannot build a set like that: SomeSet = Set of 300..399; That set would have 100 members, but the base type would be Integer - which obviously has a Cardinality > 256. For the same reason you can not use Unicode Chars for set operations - which is what the compiler warning wants to tell you. CharInSet can not do "magic" either. It's only meant to be a "type safe" operator. If you are using Cyrrilic chars in Unicode then you probably have to implement IsCharSomething functions like those in unit "Character" to perform certain operations. This was all "so easy" in plain old ANSI, where you just used one of the Eastern Europe ISO encodings, which maps all your chars into the $10 - $FF area. Unicode offers much more, but it comes at a price ... You could certainly say, that the Set type is boring, but it has this limitation because it uses bits to represent each set member - and that is extremely fast ...
Russian_Developer 08/11/27 11:07:54I now about this details. I think CodeGear in his docs and white papers must directly point on this problem or "problem". And propose best practice for it.
Maya Opperman 09/10/13 11:44:14Any idea whether Embarcadero will ever increase the 256 element and cardinality limit? I now have at least 2 areas where I have been using sets for several years (not character or string related), and my app has reached a stage where I have now run out of values, and need to start recoding everything completely differently. Hmm, I suppose I'll also have to go the class route, and do something like TModule.IsSales(ModuleType)
incrediball 10/06/26 05:20:46Testing for a character in a set of Unicode characters is easy. Sets are not suited to this so forget using CharInSet for this purpose. Just write the characters in a string and test if the character is in the string, something like the strchr function in C.


New Comment (max. 2048 characters, no HTML):

Name:
Comment:



This webpage © 2005-2017 by Bob Swart (aka Dr.Bob - www.drbob42.com). All Rights Reserved.