I'm about being finished with catch all case conversion and identifying handlers for unicode text with AS.
There are however still some mapping to be done.
See: http://en.wikipedia.org/wiki/List_of_Unicode_characters
I have some questions
Can I for instance map the lowercase character Latin Small Letter C with curl : ( utf value 0255) to "C"?
— Will this be right for any cases you know of in particular?
-I can't say I have seen this in any Germanic language which I speak, read and write.
And can I do this in all similar cases? And finally: is there any exceptions to this?
Hm. I'd say it depends on the purpose.Did you try the uppercase function in satimage OSAX, would it do what you need?
As far as I see, there are no capital letters with curl in Unicode, and satimage returns "ɕ" unchanged, as opposed to "ß", which yields "SS".
So what do you need it for?
This question is for you fancy Roman descended and Non Northern Germanic language users: Hungarian, Finnish, Polish, French, Italian, Spanish and so on.
Please do share your expertise with me. I would like to make as good case conversion routines as possible, and can't do without you.
As I said, it depends on the purpose. If you are trying to *reduce* these characters to some base form, I might be able to provide tables for not very exotic characters; if you need them for something paricular, I can ask our specialist on Eastern European languages.
By the way:
1. \x0255 is the hexadecimal code point of "ɕ", resulting in an UTF-8 representation as \xC9 \x95
2. some of the characters might be *combined* characters, which yield a list of numbers when you ask for their id, although they are counted (correctly) as single characters by AppleScript.
All the best
Thomas Fischer