Re: specifying "text" language
Re: specifying "text" language
- Subject: Re: specifying "text" language
- From: Takaaki Naganoya <email@hidden>
- Date: Tue, 9 Dec 2008 11:10:36 +0900
On 2008/12/09, at 10:40, Jean-Christophe Helary wrote:
Thank you Takaaki for the idea.
I am assuming that the language is known.
My problem is that Applescript's tokenizer seems to depend on the
International preferences.
Yes. But Applescript's tokenizer does *not* wokrs well for Japanese
language. It is far from parsing.
So, if I want to parse the following sentence:
"事前に必ずこの取扱説明書を熟読の上、正しい操作
に基づき最良の状態でご使用下さい。"
Oh, is it written in EULA ?
the word count/structure will differ depending on whether my
International Preference is Japanese or, say, French if the user has
set a French locale.
In AppleScript world, using Applescript's tokenizer is bad idea. We
can detect language setting in International Preference, but parsing
Japanese does not work well by native function in AppleScript .
This code tells you the primary language.
<AppleScript>
set aRes to paragraphs of (do shell script " defaults read -g
AppleLanguages")
set defaultLangurage to item 2 of aRes
set bRes to repChar(defaultLangurage, ",", "") of me
set cRes to repChar(bRes, " ", "") of me
--> "ja"
on repChar(origText, targStr, repStr)
set {txdl, AppleScript's text item delimiters} to {AppleScript's text
item delimiters, targStr}
set temp to text items of origText
set AppleScript's text item delimiters to repStr
set res to temp as text
set AppleScript's text item delimiters to txdl
return res
end repChar
</AppleScript>
I need Javascript to be told which language the sentence will be so
that it can provide me with the proper tokenization.
Is it runs on Web browser? If it runs on Web-browser...there may be a
way to get character code setting of Web browser.
Jean-Christophe Helary
On mardi 09 déc. 08, at 10:28, Takaaki Naganoya wrote:
How about picking up one line from unidentified text object (UTO)
and search the words by Google?
Result pages include each URL. URLs contain each country top level
domain. :-)
The another serious approach is .. calculate character code
distribution in each language.
Character code distribution and pick up characteristic character
(ex: Umlaut) may help you to specify which language the text is.
<distGraph_s.jpg>
On 2008/12/02, at 16:13, Jean-Christophe Helary wrote:
Is there a way to specify a text language without having to rely
on the International preferences ?
The reference for "word" indicates:
A continuous series of characters, with word elements parsed
according to the word-break rules set in the International
preference pane.
Because the rules for parsing words are thus under user control,
your scripts should not count on a deterministic text parsing of
words.
But what if I need to parse a multilingual text, or a foreign text
in a different environment, for ex, Japanese in a French
"International" setting ?
Are there programatic ways to accomplish that ?
Jean-Christophe Helary
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden
)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden
--
Takaaki Naganoya
Piyomaru Software
http://piyo.piyocast.com
email@hidden
PiyoCast Web (Podcasting with Music!)
http://www.piyocast.com
Free AppleScript Library "AS Hole"
http://www.piyocast.com/as/
--
Takaaki Naganoya
Piyomaru Software
http://piyo.piyocast.com
email@hidden
PiyoCast Web (Podcasting with Music!)
http://www.piyocast.com
Free AppleScript Library "AS Hole"
http://www.piyocast.com/as/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden