styled text
styled text
- Subject: styled text
- From: Arthur J Knapp <email@hidden>
- Date: Thu, 17 Jan 2002 17:16:40 -0500
Hello, I've been trying to reverse engineer how styled text data
works, and I'd like to share:
I typed this literal text into ClarisWorks, (or whatever it's called
nowadays)
plain, black, Helvetica, 10
bold, blue, Monaco, 15
italic, red, Geneva, 12
bold & italic, green, Courier, 24
And then I styled each line exactly as the line's contents would suggest.
After copying the styled text to the clipboard, I popped into a script
editor:
set styled_text to the clipboard as styled text
set style_info to +class ksty; of (styled_text as record)
--
--> +data styl0004...;
try
err of style_info
on error error_string
end try
set text item delimiters to {"+data styl"}
set error_string to error_string's text item 2
set text item delimiters to {";"}
set error_string to error_string's text item 1
set text item delimiters to {""}
set style_hex_data to error_string
--
--> "0004..."
After a lot of playing around, I'd like to show you what I've
discovered.
The first four hex characters are a 2-byte integer, indicating
how many style-runs there are:
set count_of_styles to style_hex_data's text 1 thru 4
set count_of_styles to HexToInteger( count_of_styles ) --> 4
Every group of 40 hex characters after the style-count then
represents a single style.
I've formatted the rest of the hex characters into a table,
each line having 40 hex characters:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11
-----|------|------|------|------|---------|------|------|------|-----
0000 | 0000 | 000D | 0009 | 0015 | 00 | 02 | 000A | 0000 | 0000 | 0000
0000 | 001C | 0012 | 000D | 0004 | 01 | 02 | 000F | 0000 | 9999 | FFFF
0000 | 0033 | 000F | 000A | 0003 | 02 | 02 | 000C | DDDD | 0000 | 0000
0000 | 004B | 0015 | 000F | 0016 | 03 | 02 | 0012 | 0000 | 8888 | 0000
Every two hex characters represents one byte:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11
----|------|------|------|------|---|---|------|-------|-------|-------
0 0 | 0 0 | 0 13 | 0 9 | 0 21 | 0 | 2 | 0 10 | 0 0| 0 0| 0 0
0 0 | 0 28 | 0 18 | 0 13 | 0 4 | 1 | 2 | 0 15 | 0 0|153 153|255 255
0 0 | 0 51 | 0 15 | 0 10 | 0 3 | 2 | 2 | 0 12 |221 221| 0 0| 0 0
0 0 | 0 75 | 0 21 | 0 15 | 0 22 | 3 | 2 | 0 18 | 0 0|136 136| 0 0
This is what I have, (and have not), figured out:
1: Always seems to be 2 ascii nulls. Perhaps this should actually be
grouped with field 3:, making it a long, rather than a short,
integer???
2: A 2-byte, (or 4-byte?), 0-based index into the string, where this
style-run begins, ie: The second style-run is 28, so it's style
starts to be applied to the string at character 29.
3: This is either a width, or perhaps a total character height. It is
usually 2 or 3 points higher than the point size, (field 8).
4: This appears to be the baseline, the imaginary line that the
characters "sit" on. It is usually a few points smaller than
the point size, (field 8).
5: This appears to be a font id number, with id 21 being Helvetica,
id 4 being Monaco, id 3 being Geneva, and id 22 being Courier.
I don't know if this is "hard-coded", ie: that these ids are
the same on every Mac, or if they are somehow dynamically
generated based on what fonts you have on your system.
6: This is what I call the *style flag*. It is a single character that
tells you which of the basic text styles is applied. It is actually
a bitmap, where each bit indicates a style. I've worked out the
following script snippits for determining styles:
set x to ( the style flag as an integer )
set is_plain to ( x = 0 )
set is_bold to ( x mod 2 is not 0 )
set is_italic to ( x div 2 mod 2 is not 0 )
set is_underline to ( x div 4 mod 2 is not 0 )
set is_outline to ( x div 8 mod 2 is not 0 )
set is_shadow to ( x div 16 mod 2 is not 0 )
set is_condensed to ( x div 32 mod 2 is not 0 )
set is_expanded to ( x div 64 mod 2 is not 0 )
I'm sure Nigel can show me a better math technique to do this...
7: I don't know what this is. I can tell you that it seems to
change from day to day???
8: This is the actual point size.
9:
10:
11: These are RGB color specifications, (I think it's RGB?).
With two bytes per color, (red, green, and blue), they
can represent a whole lot of colors. One thing I don't
understand is, (if you treat each as two seperate bytes),
why they are usually the same byte, ie: the blue that I
choose is hex: "00009999FFFF", giving us the ascii bytes:
0, 0, 153, 153, 255, 255.
I have a plan for a vanilla script that will perform simple
html formatting from a styled text. I'll keep you posted.
{ Arthur J. Knapp, of <
http://www.STELLARViSIONs.com>
<
mailto:email@hidden>
try
<
http://www.seanet.com/~jonpugh/>
on error number -128
end try
}