Re: Getting Unicode Number
Re: Getting Unicode Number
- Subject: Re: Getting Unicode Number
- From: has <email@hidden>
- Date: Sun, 6 Feb 2005 18:57:37 +0000
Joseph Weaks wrote:
I'm working on a routine to encode Unicode characters into xhtml entities.
This is one of those tasks that tends to be speed-critical, so the
faster the language you can use the better. AppleScript is just too
slow for converting more than very small amounts of text. An osax
would be best if you don't mind getting down-n-dirty with C. Or
here's a simple Perl script that should be reasonably nippy:
#!/usr/bin/perl
die 'Bad args' unless (scalar @ARGV == 2);
my ($srcfile, $destfile) = @ARGV;
writeFile($destfile, utf16ToHTML($srcfile));
sub utf16ToHTML {
sysopen F, $_[0], 0;
my @chars = '';
while (sysread F, $c, 2) {
$charnum = (unpack 'S', $c);
if ($charnum < 128) {
push @chars, chr $charnum;
} else {
push @chars, '&#'.$charnum.';';
}
}
return join '', @chars;
}
sub writeFile {
my ($f, $text) = @_;
open F, ">$f";
print F $text;
close F;
}
This takes a UTF16 file (written using 'write <text> to <file> as
Unicode text') and outputs HTML-encoded ASCII to a second file. Call
it using:
do shell script "perl /path/to/script /path/to/inputfile
/path/to/outputfile"
HTH
has
p.s. Python also has very good text processing libraries, so I could
wrap a bunch of its text conversion routines in a scriptable FBA if
folks want to provide me a list of requests and a bit of free hosting
for it.
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden