Re: How to get Unicode's "General Category" of a character?
Re: How to get Unicode's "General Category" of a character?
- Subject: Re: How to get Unicode's "General Category" of a character?
- From: Dmitry Markman <email@hidden>
- Date: Tue, 07 Jul 2015 20:48:54 -0400
Hi Gerriet
first of all it’s unicode/uchar.h header (not utypes.h)
I think it would be the best to download ICU distribution from
http://site.icu-project.org/download/55#TOC-ICU4C-Download
download sources and build it
in order to build you have to do the following
download and unarchive icu4c-55_1-src.tgz
cd icu
mkdir build
export CXXFLAGS='--std=c++11 --stdlib=libc++ -DUCHAR_TYPE=char16_t' (or add --enable-debug for debug)
cd build
../source/configure --enable-shared --enable-static —prefix=<path_to_install_dir>
make
make install
in include/unicode/platform.h immediately after lines
# if (defined(__cplusplus) && __cplusplus >= 201103L) || (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L)
# define U_HAVE_CHAR16_T 1
add the following
# define UCHAR_TYPE char16_t
try ICU if you are getting error U_MISSING_RESOURCE_ERROR, then
rebuild data from build/data directory: touch Makefile and just run make
Note: I tried to use homebrew, but I wasn’t able to build c++11 libraries that use char16_t type
instructions from above will let you do just that
in order to build your application use the following switches
LDFLAGS: -L<path_to_install_dir>/lib
CPPFLAGS: -I<path_to_install_dir>/include
hope it will help
ask me off-list if you have any problem
cheers
dm
> On Jul 7, 2015, at 9:10 AM, Gerriet M. Denkmann <email@hidden> wrote:
>
>
>> On 7 Jul 2015, at 19:33, Dmitry Markman <email@hidden> wrote:
>>
>> ICU’s
>>
>> u_charType
>
> Looks exactly like what I need.
> But: are the headers and the library on my Mac?
>
> There is /usr/lib/libicucore.A.dylib which might contain u_charType, but I cannot find any headers (e.g. utypes.h).
>
> Do I have to download the source from ICU?
>
>
> Kind regards,
>
> Gerriet.
>
>
>
>>
>>
>>> On Jul 7, 2015, at 8:03 AM, Gerriet M. Denkmann <email@hidden> wrote:
>>>
>>> Given a character (a Unicode code point, to be exact) like U+FF0B (FULLWIDTH PLUS SIGN), I want to know the General Category of this.
>>> For this example it would be “Sm" (aka. Math_Symbol or Symbol, Math).
>>>
>>> I could download the current version of UnicodeData.txt and parse it.
>>> But this looks not very efficient.
>>>
>>> For punctuation one could use NSCharacterSet punctuationCharacterSet.
>>>
>>> But for Math Symbols?
>>>
>>> I did look at CFStringTransform, which can give the Character name via kCFStringTransformToUnicodeName.
>>>
>>> But I cannot find anything for “General Category"
>>>
>>> NSRegularExpression can match for [\p{General_Category = Math_Symbol}]; not quite what I want, but better than nothing.
>>>
>>>
>>> Any ideas?
>>>
>>> Gerriet.
>>>
>>>
>>> _______________________________________________
>>>
>>> Cocoa-dev mailing list (email@hidden)
>>>
>>> Please do not post admin requests or moderator comments to the list.
>>> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>>>
>>> Help/Unsubscribe/Update your Subscription:
>>>
>>> This email sent to email@hidden
>>
>> Dmitry Markman
>>
>
Dmitry Markman
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden