On 08-05-04, at 11:35, Axel Luttgens wrote: Le 3 mai 08 à 02:05, Philip Aker a écrit :
[...]
I think I can understand the current behavior. Seems like a bug's been fixed, but I must admit, it does seem odd when first looking at it.
The problem with the "text items of" operator is that it never has been explicitly defined, so that the concept of bug is rather volatile...
Before, it seems that following descriptive definition [1] could be given:
If delimiter D is the empty string, then "text items of S" is equivalent to "characters of S". Else, "text items of S" is the minimal set {S1, ..., Sn} of substrings of S such that "{S1, ..., Sn} as text" is S, with n >= 0.
Currently, it is as if one would need to write:
If S is the empty string, then "text items of S" is {""}. Else if delimiter D is the empty string, then "text items of S" is equivalent to "characters of S". Else "text items of S" is the set {S1, ..., Sn} of substrings of S such that "{S1, ..., Sn} as text" is S.
The earlier behavior tends to be more intuitive, and even more consistent, in the boundary cases: an empty string doesn't have chunks (either characters or non-empty substrings), hence an empty set of chunks: {}.
Moreover, it clearly makes "text items of" appear as a generalization of "characters of": instead of splitting S at character boundaries, one splits it at the occurrence(s) of D.
On the other hand, current behavior seems to be more consistent wrt the definition of the coercion of a list to a text object: taking that definition [2] to the letter, "{} as text" makes no sense and should raise an error (even if it instead has returned the empty string since the dawn of AppleScript).
Unless the definition of list coercion needs to be amended so as to explicitly allow for "{} as text": after all, if 'characters of ""' returns {} then, by symmetry, one needs to have a way to build the empty string from {}... [3]
As a result, I would tend to believe that the change was an unneeded one.
But...
Re-reading the release notes for 10.4 and earlier:
Getting the count of the paragraphs or text items of an empty Unicode string now returns 1 instead of 0. [3978194]
So, I waked my Tiger box and tried:
text items of "" --> {} paragraphs of "" --> {""} text items of ("" as Unicode text) --> {""} paragraphs of ("" as Unicode text) --> {""}
??? The "new" behavior is thus related to Leopard only because of the generalized move to Unicode strings. And Tiger already allowed an empty text to have one text item; but also to have none, depending on the encoding...
I'm not sure, but it looks like that a correction wrt "paragraphs of" has been too enthusiastically ported to "items of" as well. [4] Does someone have more info about that bug [3978194]?
Moreover, as Matt already noticed, the release notes for 10.5 list:
Counting the paragraphs of an empty string gives a result of zero. [4588706]
And, on Leopard, one indeed has:
text items of "" --> {""} paragraphs of "" --> {}
As if correction [4588706] have had the wrong target. Or as if I just don't understand those matters...
Axel [1] As opposed to, for example, an "algorithmic definition".
[2] "AppleScript also supports coercion of an entire list to a text (page 100) object if each of the items in the list can be coerced to a text object [...] The resulting text object concatenates all the items, separated by the current value of the AppleScript property text item delimiters."
[3] Because single characters are handled as strings of length 1, "{} as text" could be interpreted as "get the string that has no characters", or as "get the string which has no non-empty substrings".
[4] I can understand that "" may be viewed as having one empty paragraph, because of an old Macintosh convention (the last line of a text doesn't need to end with a line-ending for counting as a line)
Furthermore (herewith increasing the size of my previous synopsis by a considerable percentage), note what happens when the following is run in Script Editor on 10.5:
1. -- a sidebar really (binding dilemma)
2. So, I could previously understand the behavior only if 'text items of ""' is a valid coercion to a 1 item list consisting of the empty string. But in fact that's not consistent. And, it seems to me, being Unicode text has nothing to do with it at all. The special case is the "last paragraph" issue where EOF constitutes an implicit entry in paragraph separator sets.
set AppleScript's paragraph delimiters to {"\n", "\r", character id 8232, character id 8233, …}
(á la Unicode: set AppleScript's paragraph separators to {"\n", "\r", character id 8232, character id 8233, …}
§
set AppleScript's list item delimiters to …
Thanks,
Philip Aker
Democracy: Two wolves and a sheep voting on lunch. |