• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: How to get Unicode's "General Category" of a character?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get Unicode's "General Category" of a character?


  • Subject: Re: How to get Unicode's "General Category" of a character?
  • From: Dmitry Markman <email@hidden>
  • Date: Tue, 07 Jul 2015 08:33:07 -0400

ICU’s

u_charType

and after that you can use

// http://www.fileformat.info/info/unicode/category/index.htm
std::tuple<std::string, std::string> u_charTypeName(UCharCategory c) {
    switch (c) {
        /*case U_UNASSIGNED:*/
        case U_GENERAL_OTHER_TYPES:
            return std::make_tuple("Cn","Other, Not Assigned (no characters in the file have this property) ");
        case U_UPPERCASE_LETTER:
            return std::make_tuple("Lu","Letter, Uppercase");
        case U_LOWERCASE_LETTER:
            return std::make_tuple("Ll","Letter, Lowercase");
        case U_TITLECASE_LETTER:
            return std::make_tuple("Lt","Letter, Titlecase");
        case U_MODIFIER_LETTER:
            return std::make_tuple("Lm","Letter, Modifier");
        case U_OTHER_LETTER:
            return std::make_tuple("Lo","Letter, Other");
        case U_NON_SPACING_MARK:
            return std::make_tuple("Mn","Mark, Nonspacing");
        case U_ENCLOSING_MARK:
            return std::make_tuple("Me","Mark, Enclosing");
        case U_COMBINING_SPACING_MARK:
            return std::make_tuple("Mc","Mark, Spacing Combining");
        case U_DECIMAL_DIGIT_NUMBER:
            return std::make_tuple("Nd","Number, Decimal Digit");
        case U_LETTER_NUMBER:
            return std::make_tuple("Nl","Number, Letter");
        case U_OTHER_NUMBER:
            return std::make_tuple("No","Number, Other");
        case U_SPACE_SEPARATOR:
            return std::make_tuple("Zs","Separator, Space");
        case U_LINE_SEPARATOR:
            return std::make_tuple("Zl","Separator, Line");
        case U_PARAGRAPH_SEPARATOR:
            return std::make_tuple("Zp","Separator, Paragraph");
        case U_CONTROL_CHAR:
            return std::make_tuple("Cc","Other, Control");
        case U_FORMAT_CHAR:
            return std::make_tuple("Cf","Other, Format");
        case U_PRIVATE_USE_CHAR:
            return std::make_tuple("Co","Other, Private Use");
        case U_SURROGATE:
            return std::make_tuple("Cs","Other, Surrogate");
        case U_DASH_PUNCTUATION:
            return std::make_tuple("Pd","Punctuation, Dash");
        case U_START_PUNCTUATION:
            return std::make_tuple("Ps","Punctuation, Open");
        case U_END_PUNCTUATION:
            return std::make_tuple("Pe","Punctuation, Close");
        case U_CONNECTOR_PUNCTUATION:
            return std::make_tuple("Pc","Punctuation, Connector");
        case U_OTHER_PUNCTUATION:
            return std::make_tuple("Po","Punctuation, Other");
        case U_MATH_SYMBOL:
            return std::make_tuple("Sm","Symbol, Math");
        case U_CURRENCY_SYMBOL:
            return std::make_tuple("Sc","Symbol, Currency");
        case U_MODIFIER_SYMBOL:
            return std::make_tuple("Sk","Symbol, Modifier");
        case U_OTHER_SYMBOL:
            return std::make_tuple("So","Symbol, Other");
        case U_INITIAL_PUNCTUATION:
            return std::make_tuple("Pi","Punctuation, Initial quote (may behave like Ps or Pe depending on usage)");
        case U_FINAL_PUNCTUATION:
            return std::make_tuple("Pf","Punctuation, Final quote (may behave like Ps or Pe depending on usage)");
        default:
            return std::make_tuple("","");
    }
}




> On Jul 7, 2015, at 8:03 AM, Gerriet M. Denkmann <email@hidden> wrote:
>
> Given a character (a Unicode code point, to be exact) like U+FF0B (FULLWIDTH PLUS SIGN), I want to know the General Category of this.
> For this example it would be “Sm" (aka. Math_Symbol or Symbol, Math).
>
> I could download the current version of UnicodeData.txt and parse it.
> But this looks not very efficient.
>
> For punctuation one could use NSCharacterSet punctuationCharacterSet.
>
> But for Math Symbols?
>
> I did look at CFStringTransform, which can give the Character name via kCFStringTransformToUnicodeName.
>
> But I cannot find anything for “General Category"
>
> NSRegularExpression can match for [\p{General_Category = Math_Symbol}]; not quite what I want, but better than nothing.
>
>
> Any ideas?
>
> Gerriet.
>
>
> _______________________________________________
>
> Cocoa-dev mailing list (email@hidden)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden

Dmitry Markman


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden


  • Follow-Ups:
    • Re: How to get Unicode's "General Category" of a character?
      • From: "Gerriet M. Denkmann" <email@hidden>
References: 
 >How to get Unicode's "General Category" of a character? (From: "Gerriet M. Denkmann" <email@hidden>)

  • Prev by Date: How to get Unicode's "General Category" of a character?
  • Next by Date: Re: Possible to defer initialization of let variable?
  • Previous by thread: How to get Unicode's "General Category" of a character?
  • Next by thread: Re: How to get Unicode's "General Category" of a character?
  • Index(es):
    • Date
    • Thread