• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode canonical decomposed form and text encoding
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode canonical decomposed form and text encoding


  • Subject: Re: Unicode canonical decomposed form and text encoding
  • From: Aki Inoue <email@hidden>
  • Date: Tue, 14 Jan 2003 11:39:38 -0800

Renaud,

I cooked up a simple example of using TEC to canonically decompose.
#import <Foundation/Foundation.h>

static UniChar characters[] = {0x00C0}; // LATIN CAPITAL LETTER A WITH GRAVE

#define MAX_BUFFER_LENGTH (100)

int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
UnicodeToTextInfo textInfo;
UnicodeMapping mapping = {CreateTextEncoding(kTextEncodingUnicodeDefault, kTextEncodingDefaultVariant, kUnicode16BitFormat), CreateTextEncoding(kTextEncodingUnicodeDefault, kUnicodeCanonicalDecompVariant, kUnicode16BitFormat), kUnicodeUseLatestMapping};
UniChar buffer[MAX_BUFFER_LENGTH];
ByteCount inputRead, outputLen;
OSStatus status;

status = CreateUnicodeToTextInfo(&mapping, &textInfo);
if (noErr != status) {
NSLog(@"Failed to create UnicodeToTextInfo");
exit(1);
}

status = ConvertFromUnicodeToText(textInfo, sizeof(characters), characters, kTECKeepInfoFixMask, 0, NULL, NULL, NULL, MAX_BUFFER_LENGTH * sizeof(UniChar), &inputRead, &outputLen, buffer);
if (noErr != status) {
NSLog(@"Failed to convert string");
exit(1);
}

DisposeUnicodeToTextInfo(&textInfo);

[pool release];
return 0;
}

I tested this on Jaguar, but supposed to work on earlier versions. Pre-Jaguar TEC don't have precomposition capability.
Note I'm passing kTECKeepInfoFixMask to ConvertFromUnicodeToText() above.

This bit keeps the last conversion info so that TEC ensures the repeated conversion doesn't cause surprises with decomposed/surrogate characters in the source buffer.

Hope this help,

Aki

On 2003.1.13, at 07:54 PM, Renaud Boisjoly wrote:

Hi all!

I'm trying to convert Unicode strings from the precomposed form to the decomposed form (or the other way around) under 10.1

Under 10.2 I can use NSString's decomposedStringWithCanonicalMapping method, which works fine.

But this is not supported under 10.1, which breaks my app.

Has anyone ever done this using Text Encoding COnverter or Unicode Converter? Or perhaps by adapting GPL routines like GNOME's libunicode? (http://cvs.gnome.org/lxr/source/libunicode/decomp.h and http://cvs.gnome.org/lxr/source/libunicode/decomp.c)

I've never adapted regular C routines like these to Onjective-C and my experience is quite limited in that area. Same goes with Carbon routines like TEC.

If anyone is willing to share their experience with this type of stuff, it would really help me out.

Here's something I got from this list which does encoding conversions, but I'm not sure how I'm supposed to call this function from my Cocoa code... I used kTextEncodingMacUnicode instead of the one in the original code, but I'm not sure if that is the right choice either...

+ (TECObjectRef) _unicodeID3v2ToMacTextConverter
{
static TECObjectRef id3v2ToMacTextConverter = NULL;
if (id3v2ToMacTextConverter == NULL) {
OSStatus status;
status = TECCreateConverter(&id3v2ToMacTextConverter,
kTextEncodingMacUnicode, kTextEncodingUnicode);
if (status != 0) {
NSLog(@"TECCreateConverter() error %d", status);
return nil;
}
}
return id3v2ToMacTextConverter;
}

+ (CFStringRef /* implies caller retain */) _convertUnicodeText:
(ConstTextPtr) textBuffer ofLength: (unsigned int) length
{
ByteCount actualInputConsumed;
ByteCount actualOutputProduced;
UInt8 outputBuffer[1024];
CFMutableStringRef returnString = CFStringCreateMutable(NULL, 0);
TECObjectRef textConverter = [self _unicodeID3v2ToMacTextConverter];
do {
OSStatus status;
status = TECConvertText(textConverter, textBuffer, length,
&actualInputConsumed, outputBuffer, sizeof(outputBuffer),
&actualOutputProduced);
if (status != 0) {
NSLog(@"TECConvertText() error %d", status);
return nil;
}
CFStringAppendCharacters(returnString, (const UniChar *)outputBuffer, actualOutputProduced);
length = length - actualInputConsumed;
textBuffer = textBuffer + actualInputConsumed;
} while (length > 0);
return returnString;
}
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: Unicode canonical decomposed form and text encoding
      • From: Renaud Boisjoly <email@hidden>
    • Re: Unicode canonical decomposed form and text encoding
      • From: Renaud Boisjoly <email@hidden>
References: 
 >Unicode canonical decomposed form and text encoding (From: Renaud Boisjoly <email@hidden>)

  • Prev by Date: Re: Window display problems
  • Next by Date: Re: Unicode canonical decomposed form and text encoding
  • Previous by thread: Unicode canonical decomposed form and text encoding
  • Next by thread: Re: Unicode canonical decomposed form and text encoding
  • Index(es):
    • Date
    • Thread