Thanks Nigel.
This time the script behaves well.
Just two comments :
(1) the asker can't use it because - off list - I learnt that he is running under Lion.
(2) always Off List, i received what was returned by the original script : {"Language: English Words Count: 18816 Page Count: 82 Price (Eu): 246
", "Language: Lithuanian Words Count: 2590 Page Count: 11 Price (Eu): 33
", "Language: Russian Words Count: 30586 Page Count: 133 Price (Eu): 0"}
Your script returns : {"Language: English Words count: 14737 Pages count: 64 Price (Eu): 192
", "Language: Lithuanian Words count: 2590 Pages count: 11 Price (Eu): 33
", "Language: Russian Words count: 29168 Pages count: 127 Price (Eu): 254
"}
which show that the regex splitting the documents into words doesn't behave like TextEdit on the asker's machine.
I guess that it's because local settings are important in the splitting process.
When I apply the main script (under Sierra 10.12.5 in French) on documents exported as Utf8 by Pages, I get : {"Language: English Words count: 14890 Pages count: 65 Price (Eu): 195", "
", "Language: Lithuanian Words count: 2590 Pages count: 11 Price (Eu): 33", "
", "Language: Russian Words count: 29180 Pages count: 127 Price (Eu): 254"}
As you may see, only the Lithuanian values are identical in the 3 attempts. At this time I really don't understand what is causing TextEdit to fail to extract the count of words on the Utf16 original files (even after removing the characters $FFFC which are OBJECT REPLACEMENT CHARACTER ones).
Yvan KOENIG running Sierra 10.12.5 in French (VALLAURIS, France) dimanche 28 mai 2017 21:02:45
|