Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: file to uri to file madness



All,
IMHO, this is a bug in the VM in that it forces you to decompose the unicode characters. It is a nice fact to know that the filesystem stores decomposed UTF-8 file names, but why in the world did that creep up into the semantic contract of the File class????? In the unicode standard, the decomposed and composed characters are intended to carry the same meaning. While there may be nothing that states that both should work equivalently within a Unicode application (maybe there is, I am not sure?), it makes life difficult to have to know which form is required. If Apple chooses to require the decomposed form in the C APIs, fine, but have the JAVA VM do the decomposition so the WORA contract of the core JAVA File class may be maintained. Logically, if the non-UI File class becomes semantically platform specific, the entire WORA model of the VM is compromised!!


As a bit of history, non-decomposed characters used to work (better) in 1.3 and 1.4, but with 1.5 all *#$!% broke loose. Remote mounted volumes with extended character file names had their own restrictions on each of the previous version VM versions. We filed bug report 4486240 related to 1.5 on needing to decompose related to remote mounted volumes, but the more I play with it (could be related to update 4) the more this restriction keeps popping up not related to remote mounted volumes. I hold almost no hope that this bug will ever be changed from "open".

Larry, the simple workaround is to do the decomposition yourself. If you download the unicode data table (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt), one of the columns (column 6 I believe) is the equivalent decomposed form. If you write a simple program (see the attached java file), to parse the file, you can output a conversion map that looks roughly like (see the attachment for the full list):

\u00C0=\u0041\u0300
\u00C1=\u0041\u0301
\u00C2=\u0041\u0302
\u00C3=\u0041\u0303
\u00C4=\u0041\u0308
.....

I have attached the full mapping file that is produced from the attached program. A separate class that performs the mapping loads this decomposed resource file. Hope this helps you.
Regards,
Mike


Larry Nussbaum wrote:
I've had the same problem with greek characters, inconsistent decoding... any one know of a good decoding routine?
\u00C0=\u0041\u0300
\u00C1=\u0041\u0301
\u00C2=\u0041\u0302
\u00C3=\u0041\u0303
\u00C4=\u0041\u0308
\u00C5=\u0041\u030A
\u00C7=\u0043\u0327
\u00C8=\u0045\u0300
\u00C9=\u0045\u0301
\u00CA=\u0045\u0302
\u00CB=\u0045\u0308
\u00CC=\u0049\u0300
\u00CD=\u0049\u0301
\u00CE=\u0049\u0302
\u00CF=\u0049\u0308
\u00D1=\u004E\u0303
\u00D2=\u004F\u0300
\u00D3=\u004F\u0301
\u00D4=\u004F\u0302
\u00D5=\u004F\u0303
\u00D6=\u004F\u0308
\u00D9=\u0055\u0300
\u00DA=\u0055\u0301
\u00DB=\u0055\u0302
\u00DC=\u0055\u0308
\u00DD=\u0059\u0301
\u00E0=\u0061\u0300
\u00E1=\u0061\u0301
\u00E2=\u0061\u0302
\u00E3=\u0061\u0303
\u00E4=\u0061\u0308
\u00E5=\u0061\u030A
\u00E7=\u0063\u0327
\u00E8=\u0065\u0300
\u00E9=\u0065\u0301
\u00EA=\u0065\u0302
\u00EB=\u0065\u0308
\u00EC=\u0069\u0300
\u00ED=\u0069\u0301
\u00EE=\u0069\u0302
\u00EF=\u0069\u0308
\u00F1=\u006E\u0303
\u00F2=\u006F\u0300
\u00F3=\u006F\u0301
\u00F4=\u006F\u0302
\u00F5=\u006F\u0303
\u00F6=\u006F\u0308
\u00F9=\u0075\u0300
\u00FA=\u0075\u0301
\u00FB=\u0075\u0302
\u00FC=\u0075\u0308
\u00FD=\u0079\u0301
\u00FF=\u0079\u0308
\u0100=\u0041\u0304
\u0101=\u0061\u0304
\u0102=\u0041\u0306
\u0103=\u0061\u0306
\u0104=\u0041\u0328
\u0105=\u0061\u0328
\u0106=\u0043\u0301
\u0107=\u0063\u0301
\u0108=\u0043\u0302
\u0109=\u0063\u0302
\u010A=\u0043\u0307
\u010B=\u0063\u0307
\u010C=\u0043\u030C
\u010D=\u0063\u030C
\u010E=\u0044\u030C
\u010F=\u0064\u030C
\u0112=\u0045\u0304
\u0113=\u0065\u0304
\u0114=\u0045\u0306
\u0115=\u0065\u0306
\u0116=\u0045\u0307
\u0117=\u0065\u0307
\u0118=\u0045\u0328
\u0119=\u0065\u0328
\u011A=\u0045\u030C
\u011B=\u0065\u030C
\u011C=\u0047\u0302
\u011D=\u0067\u0302
\u011E=\u0047\u0306
\u011F=\u0067\u0306
\u0120=\u0047\u0307
\u0121=\u0067\u0307
\u0122=\u0047\u0327
\u0123=\u0067\u0327
\u0124=\u0048\u0302
\u0125=\u0068\u0302
\u0128=\u0049\u0303
\u0129=\u0069\u0303
\u012A=\u0049\u0304
\u012B=\u0069\u0304
\u012C=\u0049\u0306
\u012D=\u0069\u0306
\u012E=\u0049\u0328
\u012F=\u0069\u0328
\u0130=\u0049\u0307
\u0134=\u004A\u0302
\u0135=\u006A\u0302
\u0136=\u004B\u0327
\u0137=\u006B\u0327
\u0139=\u004C\u0301
\u013A=\u006C\u0301
\u013B=\u004C\u0327
\u013C=\u006C\u0327
\u013D=\u004C\u030C
\u013E=\u006C\u030C
\u0143=\u004E\u0301
\u0144=\u006E\u0301
\u0145=\u004E\u0327
\u0146=\u006E\u0327
\u0147=\u004E\u030C
\u0148=\u006E\u030C
\u014C=\u004F\u0304
\u014D=\u006F\u0304
\u014E=\u004F\u0306
\u014F=\u006F\u0306
\u0150=\u004F\u030B
\u0151=\u006F\u030B
\u0154=\u0052\u0301
\u0155=\u0072\u0301
\u0156=\u0052\u0327
\u0157=\u0072\u0327
\u0158=\u0052\u030C
\u0159=\u0072\u030C
\u015A=\u0053\u0301
\u015B=\u0073\u0301
\u015C=\u0053\u0302
\u015D=\u0073\u0302
\u015E=\u0053\u0327
\u015F=\u0073\u0327
\u0160=\u0053\u030C
\u0161=\u0073\u030C
\u0162=\u0054\u0327
\u0163=\u0074\u0327
\u0164=\u0054\u030C
\u0165=\u0074\u030C
\u0168=\u0055\u0303
\u0169=\u0075\u0303
\u016A=\u0055\u0304
\u016B=\u0075\u0304
\u016C=\u0055\u0306
\u016D=\u0075\u0306
\u016E=\u0055\u030A
\u016F=\u0075\u030A
\u0170=\u0055\u030B
\u0171=\u0075\u030B
\u0172=\u0055\u0328
\u0173=\u0075\u0328
\u0174=\u0057\u0302
\u0175=\u0077\u0302
\u0176=\u0059\u0302
\u0177=\u0079\u0302
\u0178=\u0059\u0308
\u0179=\u005A\u0301
\u017A=\u007A\u0301
\u017B=\u005A\u0307
\u017C=\u007A\u0307
\u017D=\u005A\u030C
\u017E=\u007A\u030C
\u01A0=\u004F\u031B
\u01A1=\u006F\u031B
\u01AF=\u0055\u031B
\u01B0=\u0075\u031B
\u01CD=\u0041\u030C
\u01CE=\u0061\u030C
\u01CF=\u0049\u030C
\u01D0=\u0069\u030C
\u01D1=\u004F\u030C
\u01D2=\u006F\u030C
\u01D3=\u0055\u030C
\u01D4=\u0075\u030C
\u01D5=\u0055\u0308\u0304
\u01D6=\u0075\u0308\u0304
\u01D7=\u0055\u0308\u0301
\u01D8=\u0075\u0308\u0301
\u01D9=\u0055\u0308\u030C
\u01DA=\u0075\u0308\u030C
\u01DB=\u0055\u0308\u0300
\u01DC=\u0075\u0308\u0300
\u01DE=\u0041\u0308\u0304
\u01DF=\u0061\u0308\u0304
\u01E0=\u0041\u0307\u0304
\u01E1=\u0061\u0307\u0304
\u01E2=\u00C6\u0304
\u01E3=\u00E6\u0304
\u01E6=\u0047\u030C
\u01E7=\u0067\u030C
\u01E8=\u004B\u030C
\u01E9=\u006B\u030C
\u01EA=\u004F\u0328
\u01EB=\u006F\u0328
\u01EC=\u004F\u0328\u0304
\u01ED=\u006F\u0328\u0304
\u01EE=\u01B7\u030C
\u01EF=\u0292\u030C
\u01F0=\u006A\u030C
\u01F4=\u0047\u0301
\u01F5=\u0067\u0301
\u01F8=\u004E\u0300
\u01F9=\u006E\u0300
\u01FA=\u0041\u030A\u0301
\u01FB=\u0061\u030A\u0301
\u01FC=\u00C6\u0301
\u01FD=\u00E6\u0301
\u01FE=\u00D8\u0301
\u01FF=\u00F8\u0301
\u0200=\u0041\u030F
\u0201=\u0061\u030F
\u0202=\u0041\u0311
\u0203=\u0061\u0311
\u0204=\u0045\u030F
\u0205=\u0065\u030F
\u0206=\u0045\u0311
\u0207=\u0065\u0311
\u0208=\u0049\u030F
\u0209=\u0069\u030F
\u020A=\u0049\u0311
\u020B=\u0069\u0311
\u020C=\u004F\u030F
\u020D=\u006F\u030F
\u020E=\u004F\u0311
\u020F=\u006F\u0311
\u0210=\u0052\u030F
\u0211=\u0072\u030F
\u0212=\u0052\u0311
\u0213=\u0072\u0311
\u0214=\u0055\u030F
\u0215=\u0075\u030F
\u0216=\u0055\u0311
\u0217=\u0075\u0311
\u0218=\u0053\u0326
\u0219=\u0073\u0326
\u021A=\u0054\u0326
\u021B=\u0074\u0326
\u021E=\u0048\u030C
\u021F=\u0068\u030C
\u0226=\u0041\u0307
\u0227=\u0061\u0307
\u0228=\u0045\u0327
\u0229=\u0065\u0327
\u022A=\u004F\u0308\u0304
\u022B=\u006F\u0308\u0304
\u022C=\u004F\u0303\u0304
\u022D=\u006F\u0303\u0304
\u022E=\u004F\u0307
\u022F=\u006F\u0307
\u0230=\u004F\u0307\u0304
\u0231=\u006F\u0307\u0304
\u0232=\u0059\u0304
\u0233=\u0079\u0304
\u0340=\u0300
\u0341=\u0301
\u0343=\u0313
\u0344=\u0308\u0301
\u0374=\u02B9
\u037E=\u003B
\u0385=\u0020\u0308\u0301
\u0386=\u0391\u0301
\u0387=\u00B7
\u0388=\u0395\u0301
\u0389=\u0397\u0301
\u038A=\u0399\u0301
\u038C=\u039F\u0301
\u038E=\u03A5\u0301
\u038F=\u03A9\u0301
\u0390=\u03B9\u0308\u0301
\u03AA=\u0399\u0308
\u03AB=\u03A5\u0308
\u03AC=\u03B1\u0301
\u03AD=\u03B5\u0301
\u03AE=\u03B7\u0301
\u03AF=\u03B9\u0301
\u03B0=\u03C5\u0308\u0301
\u03CA=\u03B9\u0308
\u03CB=\u03C5\u0308
\u03CC=\u03BF\u0301
\u03CD=\u03C5\u0301
\u03CE=\u03C9\u0301
\u03D3=\u03A5\u0301
\u03D4=\u03A5\u0308
\u0400=\u0415\u0300
\u0401=\u0415\u0308
\u0403=\u0413\u0301
\u0407=\u0406\u0308
\u040C=\u041A\u0301
\u040D=\u0418\u0300
\u040E=\u0423\u0306
\u0419=\u0418\u0306
\u0439=\u0438\u0306
\u0450=\u0435\u0300
\u0451=\u0435\u0308
\u0453=\u0433\u0301
\u0457=\u0456\u0308
\u045C=\u043A\u0301
\u045D=\u0438\u0300
\u045E=\u0443\u0306
\u0476=\u0474\u030F
\u0477=\u0475\u030F
\u04C1=\u0416\u0306
\u04C2=\u0436\u0306
\u04D0=\u0410\u0306
\u04D1=\u0430\u0306
\u04D2=\u0410\u0308
\u04D3=\u0430\u0308
\u04D6=\u0415\u0306
\u04D7=\u0435\u0306
\u04DA=\u04D8\u0308
\u04DB=\u04D9\u0308
\u04DC=\u0416\u0308
\u04DD=\u0436\u0308
\u04DE=\u0417\u0308
\u04DF=\u0437\u0308
\u04E2=\u0418\u0304
\u04E3=\u0438\u0304
\u04E4=\u0418\u0308
\u04E5=\u0438\u0308
\u04E6=\u041E\u0308
\u04E7=\u043E\u0308
\u04EA=\u04E8\u0308
\u04EB=\u04E9\u0308
\u04EC=\u042D\u0308
\u04ED=\u044D\u0308
\u04EE=\u0423\u0304
\u04EF=\u0443\u0304
\u04F0=\u0423\u0308
\u04F1=\u0443\u0308
\u04F2=\u0423\u030B
\u04F3=\u0443\u030B
\u04F4=\u0427\u0308
\u04F5=\u0447\u0308
\u04F8=\u042B\u0308
\u04F9=\u044B\u0308
\u0622=\u0627\u0653
\u0623=\u0627\u0654
\u0624=\u0648\u0654
\u0625=\u0627\u0655
\u0626=\u064A\u0654
\u06C0=\u06D5\u0654
\u06C2=\u06C1\u0654
\u06D3=\u06D2\u0654
\u0929=\u0928\u093C
\u0931=\u0930\u093C
\u0934=\u0933\u093C
\u0958=\u0915\u093C
\u0959=\u0916\u093C
\u095A=\u0917\u093C
\u095B=\u091C\u093C
\u095C=\u0921\u093C
\u095D=\u0922\u093C
\u095E=\u092B\u093C
\u095F=\u092F\u093C
\u09CB=\u09C7\u09BE
\u09CC=\u09C7\u09D7
\u09DC=\u09A1\u09BC
\u09DD=\u09A2\u09BC
\u09DF=\u09AF\u09BC
\u0A33=\u0A32\u0A3C
\u0A36=\u0A38\u0A3C
\u0A59=\u0A16\u0A3C
\u0A5A=\u0A17\u0A3C
\u0A5B=\u0A1C\u0A3C
\u0A5E=\u0A2B\u0A3C
\u0B48=\u0B47\u0B56
\u0B4B=\u0B47\u0B3E
\u0B4C=\u0B47\u0B57
\u0B5C=\u0B21\u0B3C
\u0B5D=\u0B22\u0B3C
\u0B94=\u0B92\u0BD7
\u0BCA=\u0BC6\u0BBE
\u0BCB=\u0BC7\u0BBE
\u0BCC=\u0BC6\u0BD7
\u0C48=\u0C46\u0C56
\u0CC0=\u0CBF\u0CD5
\u0CC7=\u0CC6\u0CD5
\u0CC8=\u0CC6\u0CD6
\u0CCA=\u0CC6\u0CC2
\u0CCB=\u0CC6\u0CC2\u0CD5
\u0D4A=\u0D46\u0D3E
\u0D4B=\u0D47\u0D3E
\u0D4C=\u0D46\u0D57
\u0DDA=\u0DD9\u0DCA
\u0DDC=\u0DD9\u0DCF
\u0DDD=\u0DD9\u0DCF\u0DCA
\u0DDE=\u0DD9\u0DDF
\u0F43=\u0F42\u0FB7
\u0F4D=\u0F4C\u0FB7
\u0F52=\u0F51\u0FB7
\u0F57=\u0F56\u0FB7
\u0F5C=\u0F5B\u0FB7
\u0F69=\u0F40\u0FB5
\u0F73=\u0F71\u0F72
\u0F75=\u0F71\u0F74
\u0F76=\u0FB2\u0F80
\u0F78=\u0FB3\u0F80
\u0F81=\u0F71\u0F80
\u0F93=\u0F92\u0FB7
\u0F9D=\u0F9C\u0FB7
\u0FA2=\u0FA1\u0FB7
\u0FA7=\u0FA6\u0FB7
\u0FAC=\u0FAB\u0FB7
\u0FB9=\u0F90\u0FB5
\u1026=\u1025\u102E
\u1E00=\u0041\u0325
\u1E01=\u0061\u0325
\u1E02=\u0042\u0307
\u1E03=\u0062\u0307
\u1E04=\u0042\u0323
\u1E05=\u0062\u0323
\u1E06=\u0042\u0331
\u1E07=\u0062\u0331
\u1E08=\u0043\u0327\u0301
\u1E09=\u0063\u0327\u0301
\u1E0A=\u0044\u0307
\u1E0B=\u0064\u0307
\u1E0C=\u0044\u0323
\u1E0D=\u0064\u0323
\u1E0E=\u0044\u0331
\u1E0F=\u0064\u0331
\u1E10=\u0044\u0327
\u1E11=\u0064\u0327
\u1E12=\u0044\u032D
\u1E13=\u0064\u032D
\u1E14=\u0045\u0304\u0300
\u1E15=\u0065\u0304\u0300
\u1E16=\u0045\u0304\u0301
\u1E17=\u0065\u0304\u0301
\u1E18=\u0045\u032D
\u1E19=\u0065\u032D
\u1E1A=\u0045\u0330
\u1E1B=\u0065\u0330
\u1E1C=\u0045\u0327\u0306
\u1E1D=\u0065\u0327\u0306
\u1E1E=\u0046\u0307
\u1E1F=\u0066\u0307
\u1E20=\u0047\u0304
\u1E21=\u0067\u0304
\u1E22=\u0048\u0307
\u1E23=\u0068\u0307
\u1E24=\u0048\u0323
\u1E25=\u0068\u0323
\u1E26=\u0048\u0308
\u1E27=\u0068\u0308
\u1E28=\u0048\u0327
\u1E29=\u0068\u0327
\u1E2A=\u0048\u032E
\u1E2B=\u0068\u032E
\u1E2C=\u0049\u0330
\u1E2D=\u0069\u0330
\u1E2E=\u0049\u0308\u0301
\u1E2F=\u0069\u0308\u0301
\u1E30=\u004B\u0301
\u1E31=\u006B\u0301
\u1E32=\u004B\u0323
\u1E33=\u006B\u0323
\u1E34=\u004B\u0331
\u1E35=\u006B\u0331
\u1E36=\u004C\u0323
\u1E37=\u006C\u0323
\u1E38=\u004C\u0323\u0304
\u1E39=\u006C\u0323\u0304
\u1E3A=\u004C\u0331
\u1E3B=\u006C\u0331
\u1E3C=\u004C\u032D
\u1E3D=\u006C\u032D
\u1E3E=\u004D\u0301
\u1E3F=\u006D\u0301
\u1E40=\u004D\u0307
\u1E41=\u006D\u0307
\u1E42=\u004D\u0323
\u1E43=\u006D\u0323
\u1E44=\u004E\u0307
\u1E45=\u006E\u0307
\u1E46=\u004E\u0323
\u1E47=\u006E\u0323
\u1E48=\u004E\u0331
\u1E49=\u006E\u0331
\u1E4A=\u004E\u032D
\u1E4B=\u006E\u032D
\u1E4C=\u004F\u0303\u0301
\u1E4D=\u006F\u0303\u0301
\u1E4E=\u004F\u0303\u0308
\u1E4F=\u006F\u0303\u0308
\u1E50=\u004F\u0304\u0300
\u1E51=\u006F\u0304\u0300
\u1E52=\u004F\u0304\u0301
\u1E53=\u006F\u0304\u0301
\u1E54=\u0050\u0301
\u1E55=\u0070\u0301
\u1E56=\u0050\u0307
\u1E57=\u0070\u0307
\u1E58=\u0052\u0307
\u1E59=\u0072\u0307
\u1E5A=\u0052\u0323
\u1E5B=\u0072\u0323
\u1E5C=\u0052\u0323\u0304
\u1E5D=\u0072\u0323\u0304
\u1E5E=\u0052\u0331
\u1E5F=\u0072\u0331
\u1E60=\u0053\u0307
\u1E61=\u0073\u0307
\u1E62=\u0053\u0323
\u1E63=\u0073\u0323
\u1E64=\u0053\u0301\u0307
\u1E65=\u0073\u0301\u0307
\u1E66=\u0053\u030C\u0307
\u1E67=\u0073\u030C\u0307
\u1E68=\u0053\u0323\u0307
\u1E69=\u0073\u0323\u0307
\u1E6A=\u0054\u0307
\u1E6B=\u0074\u0307
\u1E6C=\u0054\u0323
\u1E6D=\u0074\u0323
\u1E6E=\u0054\u0331
\u1E6F=\u0074\u0331
\u1E70=\u0054\u032D
\u1E71=\u0074\u032D
\u1E72=\u0055\u0324
\u1E73=\u0075\u0324
\u1E74=\u0055\u0330
\u1E75=\u0075\u0330
\u1E76=\u0055\u032D
\u1E77=\u0075\u032D
\u1E78=\u0055\u0303\u0301
\u1E79=\u0075\u0303\u0301
\u1E7A=\u0055\u0304\u0308
\u1E7B=\u0075\u0304\u0308
\u1E7C=\u0056\u0303
\u1E7D=\u0076\u0303
\u1E7E=\u0056\u0323
\u1E7F=\u0076\u0323
\u1E80=\u0057\u0300
\u1E81=\u0077\u0300
\u1E82=\u0057\u0301
\u1E83=\u0077\u0301
\u1E84=\u0057\u0308
\u1E85=\u0077\u0308
\u1E86=\u0057\u0307
\u1E87=\u0077\u0307
\u1E88=\u0057\u0323
\u1E89=\u0077\u0323
\u1E8A=\u0058\u0307
\u1E8B=\u0078\u0307
\u1E8C=\u0058\u0308
\u1E8D=\u0078\u0308
\u1E8E=\u0059\u0307
\u1E8F=\u0079\u0307
\u1E90=\u005A\u0302
\u1E91=\u007A\u0302
\u1E92=\u005A\u0323
\u1E93=\u007A\u0323
\u1E94=\u005A\u0331
\u1E95=\u007A\u0331
\u1E96=\u0068\u0331
\u1E97=\u0074\u0308
\u1E98=\u0077\u030A
\u1E99=\u0079\u030A
\u1E9B=\u0073\u0307
\u1EA0=\u0041\u0323
\u1EA1=\u0061\u0323
\u1EA2=\u0041\u0309
\u1EA3=\u0061\u0309
\u1EA4=\u0041\u0302\u0301
\u1EA5=\u0061\u0302\u0301
\u1EA6=\u0041\u0302\u0300
\u1EA7=\u0061\u0302\u0300
\u1EA8=\u0041\u0302\u0309
\u1EA9=\u0061\u0302\u0309
\u1EAA=\u0041\u0302\u0303
\u1EAB=\u0061\u0302\u0303
\u1EAC=\u0041\u0323\u0302
\u1EAD=\u0061\u0323\u0302
\u1EAE=\u0041\u0306\u0301
\u1EAF=\u0061\u0306\u0301
\u1EB0=\u0041\u0306\u0300
\u1EB1=\u0061\u0306\u0300
\u1EB2=\u0041\u0306\u0309
\u1EB3=\u0061\u0306\u0309
\u1EB4=\u0041\u0306\u0303
\u1EB5=\u0061\u0306\u0303
\u1EB6=\u0041\u0323\u0306
\u1EB7=\u0061\u0323\u0306
\u1EB8=\u0045\u0323
\u1EB9=\u0065\u0323
\u1EBA=\u0045\u0309
\u1EBB=\u0065\u0309
\u1EBC=\u0045\u0303
\u1EBD=\u0065\u0303
\u1EBE=\u0045\u0302\u0301
\u1EBF=\u0065\u0302\u0301
\u1EC0=\u0045\u0302\u0300
\u1EC1=\u0065\u0302\u0300
\u1EC2=\u0045\u0302\u0309
\u1EC3=\u0065\u0302\u0309
\u1EC4=\u0045\u0302\u0303
\u1EC5=\u0065\u0302\u0303
\u1EC6=\u0045\u0323\u0302
\u1EC7=\u0065\u0323\u0302
\u1EC8=\u0049\u0309
\u1EC9=\u0069\u0309
\u1ECA=\u0049\u0323
\u1ECB=\u0069\u0323
\u1ECC=\u004F\u0323
\u1ECD=\u006F\u0323
\u1ECE=\u004F\u0309
\u1ECF=\u006F\u0309
\u1ED0=\u004F\u0302\u0301
\u1ED1=\u006F\u0302\u0301
\u1ED2=\u004F\u0302\u0300
\u1ED3=\u006F\u0302\u0300
\u1ED4=\u004F\u0302\u0309
\u1ED5=\u006F\u0302\u0309
\u1ED6=\u004F\u0302\u0303
\u1ED7=\u006F\u0302\u0303
\u1ED8=\u004F\u0323\u0302
\u1ED9=\u006F\u0323\u0302
\u1EDA=\u004F\u031B\u0301
\u1EDB=\u006F\u031B\u0301
\u1EDC=\u004F\u031B\u0300
\u1EDD=\u006F\u031B\u0300
\u1EDE=\u004F\u031B\u0309
\u1EDF=\u006F\u031B\u0309
\u1EE0=\u004F\u031B\u0303
\u1EE1=\u006F\u031B\u0303
\u1EE2=\u004F\u031B\u0323
\u1EE3=\u006F\u031B\u0323
\u1EE4=\u0055\u0323
\u1EE5=\u0075\u0323
\u1EE6=\u0055\u0309
\u1EE7=\u0075\u0309
\u1EE8=\u0055\u031B\u0301
\u1EE9=\u0075\u031B\u0301
\u1EEA=\u0055\u031B\u0300
\u1EEB=\u0075\u031B\u0300
\u1EEC=\u0055\u031B\u0309
\u1EED=\u0075\u031B\u0309
\u1EEE=\u0055\u031B\u0303
\u1EEF=\u0075\u031B\u0303
\u1EF0=\u0055\u031B\u0323
\u1EF1=\u0075\u031B\u0323
\u1EF2=\u0059\u0300
\u1EF3=\u0079\u0300
\u1EF4=\u0059\u0323
\u1EF5=\u0079\u0323
\u1EF6=\u0059\u0309
\u1EF7=\u0079\u0309
\u1EF8=\u0059\u0303
\u1EF9=\u0079\u0303
\u1F00=\u03B1\u0313
\u1F01=\u03B1\u0314
\u1F02=\u03B1\u0313\u0300
\u1F03=\u03B1\u0314\u0300
\u1F04=\u03B1\u0313\u0301
\u1F05=\u03B1\u0314\u0301
\u1F06=\u03B1\u0313\u0342
\u1F07=\u03B1\u0314\u0342
\u1F08=\u0391\u0313
\u1F09=\u0391\u0314
\u1F0A=\u0391\u0313\u0300
\u1F0B=\u0391\u0314\u0300
\u1F0C=\u0391\u0313\u0301
\u1F0D=\u0391\u0314\u0301
\u1F0E=\u0391\u0313\u0342
\u1F0F=\u0391\u0314\u0342
\u1F10=\u03B5\u0313
\u1F11=\u03B5\u0314
\u1F12=\u03B5\u0313\u0300
\u1F13=\u03B5\u0314\u0300
\u1F14=\u03B5\u0313\u0301
\u1F15=\u03B5\u0314\u0301
\u1F18=\u0395\u0313
\u1F19=\u0395\u0314
\u1F1A=\u0395\u0313\u0300
\u1F1B=\u0395\u0314\u0300
\u1F1C=\u0395\u0313\u0301
\u1F1D=\u0395\u0314\u0301
\u1F20=\u03B7\u0313
\u1F21=\u03B7\u0314
\u1F22=\u03B7\u0313\u0300
\u1F23=\u03B7\u0314\u0300
\u1F24=\u03B7\u0313\u0301
\u1F25=\u03B7\u0314\u0301
\u1F26=\u03B7\u0313\u0342
\u1F27=\u03B7\u0314\u0342
\u1F28=\u0397\u0313
\u1F29=\u0397\u0314
\u1F2A=\u0397\u0313\u0300
\u1F2B=\u0397\u0314\u0300
\u1F2C=\u0397\u0313\u0301
\u1F2D=\u0397\u0314\u0301
\u1F2E=\u0397\u0313\u0342
\u1F2F=\u0397\u0314\u0342
\u1F30=\u03B9\u0313
\u1F31=\u03B9\u0314
\u1F32=\u03B9\u0313\u0300
\u1F33=\u03B9\u0314\u0300
\u1F34=\u03B9\u0313\u0301
\u1F35=\u03B9\u0314\u0301
\u1F36=\u03B9\u0313\u0342
\u1F37=\u03B9\u0314\u0342
\u1F38=\u0399\u0313
\u1F39=\u0399\u0314
\u1F3A=\u0399\u0313\u0300
\u1F3B=\u0399\u0314\u0300
\u1F3C=\u0399\u0313\u0301
\u1F3D=\u0399\u0314\u0301
\u1F3E=\u0399\u0313\u0342
\u1F3F=\u0399\u0314\u0342
\u1F40=\u03BF\u0313
\u1F41=\u03BF\u0314
\u1F42=\u03BF\u0313\u0300
\u1F43=\u03BF\u0314\u0300
\u1F44=\u03BF\u0313\u0301
\u1F45=\u03BF\u0314\u0301
\u1F48=\u039F\u0313
\u1F49=\u039F\u0314
\u1F4A=\u039F\u0313\u0300
\u1F4B=\u039F\u0314\u0300
\u1F4C=\u039F\u0313\u0301
\u1F4D=\u039F\u0314\u0301
\u1F50=\u03C5\u0313
\u1F51=\u03C5\u0314
\u1F52=\u03C5\u0313\u0300
\u1F53=\u03C5\u0314\u0300
\u1F54=\u03C5\u0313\u0301
\u1F55=\u03C5\u0314\u0301
\u1F56=\u03C5\u0313\u0342
\u1F57=\u03C5\u0314\u0342
\u1F59=\u03A5\u0314
\u1F5B=\u03A5\u0314\u0300
\u1F5D=\u03A5\u0314\u0301
\u1F5F=\u03A5\u0314\u0342
\u1F60=\u03C9\u0313
\u1F61=\u03C9\u0314
\u1F62=\u03C9\u0313\u0300
\u1F63=\u03C9\u0314\u0300
\u1F64=\u03C9\u0313\u0301
\u1F65=\u03C9\u0314\u0301
\u1F66=\u03C9\u0313\u0342
\u1F67=\u03C9\u0314\u0342
\u1F68=\u03A9\u0313
\u1F69=\u03A9\u0314
\u1F6A=\u03A9\u0313\u0300
\u1F6B=\u03A9\u0314\u0300
\u1F6C=\u03A9\u0313\u0301
\u1F6D=\u03A9\u0314\u0301
\u1F6E=\u03A9\u0313\u0342
\u1F6F=\u03A9\u0314\u0342
\u1F70=\u03B1\u0300
\u1F71=\u03B1\u0301
\u1F72=\u03B5\u0300
\u1F73=\u03B5\u0301
\u1F74=\u03B7\u0300
\u1F75=\u03B7\u0301
\u1F76=\u03B9\u0300
\u1F77=\u03B9\u0301
\u1F78=\u03BF\u0300
\u1F79=\u03BF\u0301
\u1F7A=\u03C5\u0300
\u1F7B=\u03C5\u0301
\u1F7C=\u03C9\u0300
\u1F7D=\u03C9\u0301
\u1F80=\u03B1\u0313\u0345
\u1F81=\u03B1\u0314\u0345
\u1F82=\u03B1\u0313\u0300\u0345
\u1F83=\u03B1\u0314\u0300\u0345
\u1F84=\u03B1\u0313\u0301\u0345
\u1F85=\u03B1\u0314\u0301\u0345
\u1F86=\u03B1\u0313\u0342\u0345
\u1F87=\u03B1\u0314\u0342\u0345
\u1F88=\u0391\u0313\u0345
\u1F89=\u0391\u0314\u0345
\u1F8A=\u0391\u0313\u0300\u0345
\u1F8B=\u0391\u0314\u0300\u0345
\u1F8C=\u0391\u0313\u0301\u0345
\u1F8D=\u0391\u0314\u0301\u0345
\u1F8E=\u0391\u0313\u0342\u0345
\u1F8F=\u0391\u0314\u0342\u0345
\u1F90=\u03B7\u0313\u0345
\u1F91=\u03B7\u0314\u0345
\u1F92=\u03B7\u0313\u0300\u0345
\u1F93=\u03B7\u0314\u0300\u0345
\u1F94=\u03B7\u0313\u0301\u0345
\u1F95=\u03B7\u0314\u0301\u0345
\u1F96=\u03B7\u0313\u0342\u0345
\u1F97=\u03B7\u0314\u0342\u0345
\u1F98=\u0397\u0313\u0345
\u1F99=\u0397\u0314\u0345
\u1F9A=\u0397\u0313\u0300\u0345
\u1F9B=\u0397\u0314\u0300\u0345
\u1F9C=\u0397\u0313\u0301\u0345
\u1F9D=\u0397\u0314\u0301\u0345
\u1F9E=\u0397\u0313\u0342\u0345
\u1F9F=\u0397\u0314\u0342\u0345
\u1FA0=\u03C9\u0313\u0345
\u1FA1=\u03C9\u0314\u0345
\u1FA2=\u03C9\u0313\u0300\u0345
\u1FA3=\u03C9\u0314\u0300\u0345
\u1FA4=\u03C9\u0313\u0301\u0345
\u1FA5=\u03C9\u0314\u0301\u0345
\u1FA6=\u03C9\u0313\u0342\u0345
\u1FA7=\u03C9\u0314\u0342\u0345
\u1FA8=\u03A9\u0313\u0345
\u1FA9=\u03A9\u0314\u0345
\u1FAA=\u03A9\u0313\u0300\u0345
\u1FAB=\u03A9\u0314\u0300\u0345
\u1FAC=\u03A9\u0313\u0301\u0345
\u1FAD=\u03A9\u0314\u0301\u0345
\u1FAE=\u03A9\u0313\u0342\u0345
\u1FAF=\u03A9\u0314\u0342\u0345
\u1FB0=\u03B1\u0306
\u1FB1=\u03B1\u0304
\u1FB2=\u03B1\u0300\u0345
\u1FB3=\u03B1\u0345
\u1FB4=\u03B1\u0301\u0345
\u1FB6=\u03B1\u0342
\u1FB7=\u03B1\u0342\u0345
\u1FB8=\u0391\u0306
\u1FB9=\u0391\u0304
\u1FBA=\u0391\u0300
\u1FBB=\u0391\u0301
\u1FBC=\u0391\u0345
\u1FBE=\u03B9
\u1FC1=\u0020\u0308\u0342
\u1FC2=\u03B7\u0300\u0345
\u1FC3=\u03B7\u0345
\u1FC4=\u03B7\u0301\u0345
\u1FC6=\u03B7\u0342
\u1FC7=\u03B7\u0342\u0345
\u1FC8=\u0395\u0300
\u1FC9=\u0395\u0301
\u1FCA=\u0397\u0300
\u1FCB=\u0397\u0301
\u1FCC=\u0397\u0345
\u1FCD=\u0020\u0313\u0300
\u1FCE=\u0020\u0313\u0301
\u1FCF=\u0020\u0313\u0342
\u1FD0=\u03B9\u0306
\u1FD1=\u03B9\u0304
\u1FD2=\u03B9\u0308\u0300
\u1FD3=\u03B9\u0308\u0301
\u1FD6=\u03B9\u0342
\u1FD7=\u03B9\u0308\u0342
\u1FD8=\u0399\u0306
\u1FD9=\u0399\u0304
\u1FDA=\u0399\u0300
\u1FDB=\u0399\u0301
\u1FDD=\u0020\u0314\u0300
\u1FDE=\u0020\u0314\u0301
\u1FDF=\u0020\u0314\u0342
\u1FE0=\u03C5\u0306
\u1FE1=\u03C5\u0304
\u1FE2=\u03C5\u0308\u0300
\u1FE3=\u03C5\u0308\u0301
\u1FE4=\u03C1\u0313
\u1FE5=\u03C1\u0314
\u1FE6=\u03C5\u0342
\u1FE7=\u03C5\u0308\u0342
\u1FE8=\u03A5\u0306
\u1FE9=\u03A5\u0304
\u1FEA=\u03A5\u0300
\u1FEB=\u03A5\u0301
\u1FEC=\u03A1\u0314
\u1FED=\u0020\u0308\u0300
\u1FEE=\u0020\u0308\u0301
\u1FEF=\u0060
\u1FF2=\u03C9\u0300\u0345
\u1FF3=\u03C9\u0345
\u1FF4=\u03C9\u0301\u0345
\u1FF6=\u03C9\u0342
\u1FF7=\u03C9\u0342\u0345
\u1FF8=\u039F\u0300
\u1FF9=\u039F\u0301
\u1FFA=\u03A9\u0300
\u1FFB=\u03A9\u0301
\u1FFC=\u03A9\u0345
\u1FFD=\u0020\u0301
\u2000=\u0020
\u2001=\u0020
\u2126=\u03A9
\u212A=\u004B
\u212B=\u0041\u030A
\u219A=\u2190\u0338
\u219B=\u2192\u0338
\u21AE=\u2194\u0338
\u21CD=\u21D0\u0338
\u21CE=\u21D4\u0338
\u21CF=\u21D2\u0338
\u2204=\u2203\u0338
\u2209=\u2208\u0338
\u220C=\u220B\u0338
\u2224=\u2223\u0338
\u2226=\u2225\u0338
\u2241=\u223C\u0338
\u2244=\u2243\u0338
\u2247=\u2245\u0338
\u2249=\u2248\u0338
\u2260=\u003D\u0338
\u2262=\u2261\u0338
\u226D=\u224D\u0338
\u226E=\u003C\u0338
\u226F=\u003E\u0338
\u2270=\u2264\u0338
\u2271=\u2265\u0338
\u2274=\u2272\u0338
\u2275=\u2273\u0338
\u2278=\u2276\u0338
\u2279=\u2277\u0338
\u2280=\u227A\u0338
\u2281=\u227B\u0338
\u2284=\u2282\u0338
\u2285=\u2283\u0338
\u2288=\u2286\u0338
\u2289=\u2287\u0338
\u22AC=\u22A2\u0338
\u22AD=\u22A8\u0338
\u22AE=\u22A9\u0338
\u22AF=\u22AB\u0338
\u22E0=\u227C\u0338
\u22E1=\u227D\u0338
\u22E2=\u2291\u0338
\u22E3=\u2292\u0338
\u22EA=\u22B2\u0338
\u22EB=\u22B3\u0338
\u22EC=\u22B4\u0338
\u22ED=\u22B5\u0338
\u2329=\u3008
\u232A=\u3009
\u2ADC=\u2ADD\u0338
\u304C=\u304B\u3099
\u304E=\u304D\u3099
\u3050=\u304F\u3099
\u3052=\u3051\u3099
\u3054=\u3053\u3099
\u3056=\u3055\u3099
\u3058=\u3057\u3099
\u305A=\u3059\u3099
\u305C=\u305B\u3099
\u305E=\u305D\u3099
\u3060=\u305F\u3099
\u3062=\u3061\u3099
\u3065=\u3064\u3099
\u3067=\u3066\u3099
\u3069=\u3068\u3099
\u3070=\u306F\u3099
\u3071=\u306F\u309A
\u3073=\u3072\u3099
\u3074=\u3072\u309A
\u3076=\u3075\u3099
\u3077=\u3075\u309A
\u3079=\u3078\u3099
\u307A=\u3078\u309A
\u307C=\u307B\u3099
\u307D=\u307B\u309A
\u3094=\u3046\u3099
\u309E=\u309D\u3099
\u30AC=\u30AB\u3099
\u30AE=\u30AD\u3099
\u30B0=\u30AF\u3099
\u30B2=\u30B1\u3099
\u30B4=\u30B3\u3099
\u30B6=\u30B5\u3099
\u30B8=\u30B7\u3099
\u30BA=\u30B9\u3099
\u30BC=\u30BB\u3099
\u30BE=\u30BD\u3099
\u30C0=\u30BF\u3099
\u30C2=\u30C1\u3099
\u30C5=\u30C4\u3099
\u30C7=\u30C6\u3099
\u30C9=\u30C8\u3099
\u30D0=\u30CF\u3099
\u30D1=\u30CF\u309A
\u30D3=\u30D2\u3099
\u30D4=\u30D2\u309A
\u30D6=\u30D5\u3099
\u30D7=\u30D5\u309A
\u30D9=\u30D8\u3099
\u30DA=\u30D8\u309A
\u30DC=\u30DB\u3099
\u30DD=\u30DB\u309A
\u30F4=\u30A6\u3099
\u30F7=\u30EF\u3099
\u30F8=\u30F0\u3099
\u30F9=\u30F1\u3099
\u30FA=\u30F2\u3099
\u30FE=\u30FD\u3099
\uFB1D=\u05D9\u05B4
\uFB1F=\u05F2\u05B7
\uFB2A=\u05E9\u05C1
\uFB2B=\u05E9\u05C2
\uFB2C=\u05E9\u05BC\u05C1
\uFB2D=\u05E9\u05BC\u05C2
\uFB2E=\u05D0\u05B7
\uFB2F=\u05D0\u05B8
\uFB30=\u05D0\u05BC
\uFB31=\u05D1\u05BC
\uFB32=\u05D2\u05BC
\uFB33=\u05D3\u05BC
\uFB34=\u05D4\u05BC
\uFB35=\u05D5\u05BC
\uFB36=\u05D6\u05BC
\uFB38=\u05D8\u05BC
\uFB39=\u05D9\u05BC
\uFB3A=\u05DA\u05BC
\uFB3B=\u05DB\u05BC
\uFB3C=\u05DC\u05BC
\uFB3E=\u05DE\u05BC
\uFB40=\u05E0\u05BC
\uFB41=\u05E1\u05BC
\uFB43=\u05E3\u05BC
\uFB44=\u05E4\u05BC
\uFB46=\u05E6\u05BC
\uFB47=\u05E7\u05BC
\uFB48=\u05E8\u05BC
\uFB49=\u05E9\u05BC
\uFB4A=\u05EA\u05BC
\uFB4B=\u05D5\u05B9
\uFB4C=\u05D1\u05BF
\uFB4D=\u05DB\u05BF
\uFB4E=\u05E4\u05BF



/**
 * Copyright 2001-2006 WAVE Corporation
 * All Rights Reserved.
 */

package com.wavecorp.generation;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

/**
 * @author Michael Macaluso
 */
public class GenerateUnicodeAccentDecomposerData
{
	public static final String UNICODE_PREPEND_SEQUENCE[];
	static
	{
		UNICODE_PREPEND_SEQUENCE = new String[]
		{
			"\\u0000",
			"\\u000",
			"\\u00",
			"\\u0",
			"\\u",
		};
	}

	public static String GetCharacterAsUnicodeSequence(int aChar)
	{
		String anIntString = Integer.toHexString(aChar).toUpperCase();
		return UNICODE_PREPEND_SEQUENCE[anIntString.length()] + anIntString;
	}

	public static class UnicodeData
	{
		public String m_CharacterAsHexString; // 0
		public Integer m_Character; // 0
		public String m_Name; // 1
		public String m_GeneralCategory; // 2
		public Integer m_CanonicalCombiningClass; // 3
		public String m_BidiClass; // 4
		public String m_DecompositionType; // 5
		public Integer[] m_DecompositionMapping; // 5
		public String m_NumericValueDecimalDigit; // 6
		public String m_NumericValueDigit; // 7
		public String m_NumericValueNumeric; // 8
		public Boolean m_BidiMirrored; // 9
		public String m_Unicode1Name; // 10
		public String m_ISOComment; // 11
		public Integer m_SimpleUppercaseMapping; // 12
		public Integer m_SimpleLowercaseMapping; // 13
		public Integer m_SimpleTitlecaseMapping; // 14
	}

	public GenerateUnicodeAccentDecomposerData()
	{
	}
	
	public static Integer GetCharacterFromHexString(String aHexString)
	{
		if (null == aHexString || aHexString.length() == 0)
		{
			return null;
		}

		try
		{
			return Integer.valueOf(aHexString, 16);
		}
		catch (Exception e)
		{
			e.printStackTrace();
			return null;
		}
	}
	
	public static void AddCharacterFromHexStringToList(String aHexString, List aList)
	{
		Integer aCharacter = GetCharacterFromHexString(aHexString);
		if (null != aCharacter)
		{
			aList.add(aCharacter);
		}
	}
	
	public static String GetSubString(String aString, int begin, int end)
	{
		if (-1 == end)
		{
			if (begin == aString.length())
			{
				return null;
			}

			return aString.substring(begin);
		}

		if (begin == end + 1)
		{
			return null;
		}

		return aString.substring(begin, end);
	}

	public static Map ReadUnicodeData(String aUnicodeFileName)
	{
		Map aReturnMap = new TreeMap();

		FileReader aFileReader = null;
		try
		{
			aFileReader = new FileReader(aUnicodeFileName);
			BufferedReader in = new BufferedReader(aFileReader);

			String aLine;
			while (null != (aLine = in.readLine()))
			{
				UnicodeData aUnicodeData = new UnicodeData();

				int begin = 0;
				int end = -1;

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_CharacterAsHexString = GetSubString(aLine, begin, end);
				aUnicodeData.m_Character = GetCharacterFromHexString(aUnicodeData.m_CharacterAsHexString);

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_Name = GetSubString(aLine, begin, end);

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_GeneralCategory = GetSubString(aLine, begin, end);

				end = aLine.indexOf(';', (begin = end + 1));
				String aCanonicalCombiningClassString = GetSubString(aLine, begin, end);
				try
				{
					aUnicodeData.m_CanonicalCombiningClass = Integer.valueOf(aCanonicalCombiningClassString);
				}
				catch (Exception e)
				{
					e.printStackTrace();
				}

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_BidiClass = GetSubString(aLine, begin, end);

				end = aLine.indexOf(';', (begin = end + 1));
				String aDecomposition = GetSubString(aLine, begin, end);
				if (aDecomposition.length() > 0)
				{
					int startOfDecomposition = 0;
					if (aDecomposition.charAt(0) == '<')
					{
						int endOfType = aDecomposition.indexOf('>');
						startOfDecomposition = 1;
						aUnicodeData.m_DecompositionType = GetSubString(aDecomposition, startOfDecomposition, endOfType);
						startOfDecomposition = endOfType + 2;
					}

					List aDecompositionList = new ArrayList();
					int indexOfSpace = aDecomposition.indexOf(' ', startOfDecomposition);
					while (true)
					{
						try
						{
							String aDecompositionCharacterInHex = GetSubString(aDecomposition, startOfDecomposition, indexOfSpace);
							AddCharacterFromHexStringToList(aDecompositionCharacterInHex, aDecompositionList);
						}
						catch (IndexOutOfBoundsException e)
						{
							e.printStackTrace();
							break;
						}

						if (-1 == indexOfSpace)
						{
							break;
						}

						startOfDecomposition = indexOfSpace + 1;
						indexOfSpace = aDecomposition.indexOf(' ', startOfDecomposition);
					}

					aUnicodeData.m_DecompositionMapping = (Integer[]) aDecompositionList.toArray(new Integer[aDecompositionList.size()]);
				}

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_NumericValueDecimalDigit = GetSubString(aLine, begin, end);

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_NumericValueDigit = GetSubString(aLine, begin, end);

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_NumericValueNumeric = GetSubString(aLine, begin, end);

				end = aLine.indexOf(';', (begin = end + 1));
				String aBidiMirroredFlag = GetSubString(aLine, begin, end);
				aUnicodeData.m_BidiMirrored = aBidiMirroredFlag.equals("Y") ? Boolean.TRUE : Boolean.FALSE;

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_Unicode1Name = GetSubString(aLine, begin, end);

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_ISOComment = GetSubString(aLine, begin, end);

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_SimpleUppercaseMapping = GetCharacterFromHexString(GetSubString(aLine, begin, end));

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_SimpleLowercaseMapping = GetCharacterFromHexString(GetSubString(aLine, begin, end));

				end = aLine.indexOf(';', (begin = end + 1));
				aUnicodeData.m_SimpleTitlecaseMapping = GetCharacterFromHexString(GetSubString(aLine, begin, end));
				
				aReturnMap.put(aUnicodeData.m_Character, aUnicodeData);
			}
		}
		catch (Exception e)
		{
			e.printStackTrace();
		}
		finally
		{
			if (null != aFileReader) { try { aFileReader.close(); } catch (Exception e) {} }
		}
		
		return aReturnMap;
	}
	
	public static void RecursivelyAddCanonicallyDecomposedCharacters(UnicodeData aUnicodeData, Map aLookupMap, List aCanonicalDecompositionCharacterList) throws Exception
	{
		Integer[] aDecompositionMapping = aUnicodeData.m_DecompositionMapping;
		if (null == aDecompositionMapping)
		{
			aCanonicalDecompositionCharacterList.add(aUnicodeData.m_Character);
		}
		else
		{
			for (int i = 0; i < aDecompositionMapping.length; i++)
			{
				Integer aDecomposedCharacter = aDecompositionMapping[i];
				UnicodeData aDecomposedUnicodeData = (UnicodeData) aLookupMap.get(aDecomposedCharacter);
				if (null == aDecomposedUnicodeData)
				{
					throw new Exception("Error resolving character: " + GetCharacterAsUnicodeSequence(aDecomposedCharacter.intValue()));
				}
				if (aDecomposedUnicodeData.m_CharacterAsHexString.length() > 4)
				{
					throw new Exception("Resolve character is too long: " + aDecomposedUnicodeData.m_CharacterAsHexString);
				}
				RecursivelyAddCanonicallyDecomposedCharacters(aDecomposedUnicodeData, aLookupMap, aCanonicalDecompositionCharacterList);
			}
		}
	}

	/**
	 * @param args the command line arguments
	 */
	public static void main(String[] args)
	{
		if (args.length == 0)
		{
			System.err.println("Please pass in the location of a Unicode 4.0 or greater text file as the first parameter");
		}

		Map aMap = ReadUnicodeData(args[0]);
		Map aLookupMap = (Map)((TreeMap) aMap).clone();
		Iterator anIterator = aMap.entrySet().iterator();
		while (anIterator.hasNext())
		{
			Map.Entry aMapEntry = (Map.Entry) anIterator.next();
			Integer aCharacter = (Integer) aMapEntry.getKey();
			UnicodeData aUnicodeData = (UnicodeData) aMapEntry.getValue();

			if	(
					null != aUnicodeData.m_DecompositionType			// We only want Canonical Decompositions
				||	null == aUnicodeData.m_DecompositionMapping			// That are not blank
				|| aUnicodeData.m_CharacterAsHexString.length() > 4		// That we can represent as a JAVA character
				)
			{
				continue;
			}

			List aCanonicalDecompositionCharacterList = new LinkedList();
			try
			{
				RecursivelyAddCanonicallyDecomposedCharacters(aUnicodeData, aLookupMap, aCanonicalDecompositionCharacterList);
			}
			catch (Exception e)
			{
				// This error is raised when a character in a decomposition list is unable to be looked-up
				// Eat this now and ignore this mapping
				continue;
			}

			System.out.print(GetCharacterAsUnicodeSequence(aCharacter.intValue()));
			System.out.print('=');
			Iterator aCanonicalDecompositionCharacterListIterator = aCanonicalDecompositionCharacterList.iterator();
			while (aCanonicalDecompositionCharacterListIterator.hasNext())
			{
				Integer aDecomposedCharacter = (Integer) aCanonicalDecompositionCharacterListIterator.next();
				System.out.print(GetCharacterAsUnicodeSequence(aDecomposedCharacter.intValue()));
			}
			System.out.println();
		}
	}
}

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden

References: 
 >Re: file to uri to file madness (From: François-Paul Servant <email@hidden>)
 >Re: file to uri to file madness (From: Jerry <email@hidden>)
 >Re: file to uri to file madness (From: Larry Nussbaum <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.