Re: NSLinguisticTagger
Re: NSLinguisticTagger
- Subject: Re: NSLinguisticTagger
- From: Roland King <email@hidden>
- Date: Wed, 24 Sep 2014 13:23:36 +0800
> On 24 Sep 2014, at 1:02 pm, Gerriet M. Denkmann <email@hidden> wrote:
>
>
> On 24 Sep 2014, at 11:46, Roland King <email@hidden> wrote:
>
>>
>>> On 24 Sep 2014, at 12:31 pm, Gerriet M. Denkmann <email@hidden> wrote:
>>>
>>> I have a problem with NSLinguisticTagger / CFStringTokenizer on iOS 8.0
>>>
>>> OS X 10.9.5 (and iOS 7 and earlier) parses "สีเหลือง" quite rightly as two words: "สี" = colour and "เหลือง" = yellow.
>>>
>>> No dictionary will ever contain "yellow colour". Every dictionary will contain "yellow" and "colour".
>>> There are hundreds, if not thousands of these expressions, which are wrongly classified as one word.
>>> Might have something to do with the new predictive keyboard.
>>>
>>> But I am not writing this to complain, but to ask for a favour: could anybody on 10.10 just click anywhere in: "สีเหลือง" and tell me whether all gets highlighted, or just a part (as in 10.9.5)?
>>
>>
>> If I double click anywhere on the right of that I get the second part (all bar the first character) highlighted. Clicking on the first character I get just that character. So 10.10 (beta 8) splits that sequence into two ‘words’.
> This is a big relief. Thanks a lot.
>
>>
>> Why do you suspect the predictive keyboard? Certainly wouldn’t be the first thing I thought of seeing that issue. I would probably instead assume I’d written myself a bug.
>
> Well, here is the code; maybe you can find a bug:
>
> let text = "สีเหลือง"
> let opts: Int = 0
> let schemes = [ NSLinguisticTagSchemeTokenType, NSLinguisticTagSchemeNameTypeOrLexicalClass ]
> let tagger = NSLinguisticTagger(tagSchemes: schemes, options: opts )
>
> let nsText = text as NSString
> let length = nsText.length
> tagger.string = nsText
> let range = NSMakeRange(0,length)
> let theScheme = NSLinguisticTagSchemeTokenType
> let ops = NSLinguisticTaggerOptions(0)
> tagger.enumerateTagsInRange (
> range,
> scheme: theScheme,
> options: ops,
> usingBlock:
> { ( tag: String!,
> tokenRange: NSRange,
> sentenceRange: NSRange,
> stop: UnsafeMutablePointer<ObjCBool>
> ) -> Void in
>
> let word = nsText.substringWithRange(tokenRange)
> println("\(tag) = \(word) " )
> }
> )
>
> Gerriet.
>
Here’s my version I was just writing - I ran it in an iOS playground AND in an OSX playground and get the same ‘single word’ result either time. So I’m not entirely sure that the click test on OSX proved anything. If you comment out the Thai string and uncomment Chinese one, it works better and splits stuff up although the last two words are wrong there as well, they should be ‘去“ and “健身房“. It’s the same in an OSX playground and an iOS one but then again iOS playgrounds are emulated so ..
I also compiled it as an OSX command line tool and it does the same thing for my phrase AND yours. So whatever is doing the highlighting when you ‘click’ isn’t the same thing NSLinguisticTagger is doing.
The click test works on my chinese phrase too, it gets the last two words correct. Something sure ain’t right.
Should write the objc version to eliminate any possibility it’s swift.
let str = "สีเหลือง"
//let str = "我今天还没有去健身房"
let str2 = str as NSString
let tagger = NSLinguisticTagger(tagSchemes: [NSLinguisticTagSchemeTokenType], options: 0 )
let range = NSMakeRange( 0, str2.length )
tagger.string = str2
var ranges : NSArray?
let things = tagger.tagsInRange( range, scheme: NSLinguisticTagSchemeTokenType, options: NSLinguisticTaggerOptions.allZeros, tokenRanges: &ranges )
things.count
ranges
for ( index, type ) in enumerate( things )
{
let type_range : NSValue? = ranges?[ index ] as NSValue?
print( "Type: '\(type)' at \(type_range!) ")
println( str2.substringWithRange(type_range! ) )
}
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden