• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: NSLinguisticTagger
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSLinguisticTagger


  • Subject: Re: NSLinguisticTagger
  • From: Roland King <email@hidden>
  • Date: Wed, 24 Sep 2014 13:23:36 +0800

> On 24 Sep 2014, at 1:02 pm, Gerriet M. Denkmann <email@hidden> wrote:
>
>
> On 24 Sep 2014, at 11:46, Roland King <email@hidden> wrote:
>
>>
>>> On 24 Sep 2014, at 12:31 pm, Gerriet M. Denkmann <email@hidden> wrote:
>>>
>>> I have a problem with NSLinguisticTagger / CFStringTokenizer on iOS 8.0
>>>
>>> OS X 10.9.5 (and iOS 7 and earlier) parses "สีเหลือง" quite rightly as two words: "สี" = colour and "เหลือง" = yellow.
>>>
>>> No dictionary will ever contain "yellow colour". Every dictionary will contain "yellow" and "colour".
>>> There are hundreds, if not thousands of these expressions, which are wrongly classified as one word.
>>> Might have something to do with the new predictive keyboard.
>>>
>>> But I am not writing this to complain, but to ask for a favour: could anybody on 10.10 just click anywhere in: "สีเหลือง" and tell me whether all gets highlighted, or just a part (as in 10.9.5)?
>>
>>
>> If I double click anywhere on the right of that I get the second part (all bar the first character) highlighted. Clicking on the first character I get just that character. So 10.10 (beta 8) splits that sequence into two ‘words’.
> This is a big relief. Thanks a lot.
>
>>
>> Why do you suspect the predictive keyboard? Certainly wouldn’t be the first thing I thought of seeing that issue. I would probably instead assume I’d written myself a bug.
>
> Well, here is the code; maybe you can find a bug:
>
> let text = "สีเหลือง"
> let opts: Int = 0
> let schemes = [ NSLinguisticTagSchemeTokenType, NSLinguisticTagSchemeNameTypeOrLexicalClass ]
> let tagger = NSLinguisticTagger(tagSchemes: schemes, options: opts )
>
> let nsText = text as NSString
> let length = nsText.length
> tagger.string = nsText
> let range = NSMakeRange(0,length)
> let theScheme = NSLinguisticTagSchemeTokenType
> let ops = NSLinguisticTaggerOptions(0)
> tagger.enumerateTagsInRange (
> 	range,
> 	scheme: 	theScheme,
> 	options: 	ops,
>   	usingBlock:
>   	{ 	(	tag: 			String!,
> 			tokenRange: 	NSRange,
> 			sentenceRange:	NSRange,
> 			stop: 			UnsafeMutablePointer<ObjCBool>
> 		) -> Void in
>
> 		let word = nsText.substringWithRange(tokenRange)
> 		println("\(tag) = \(word) " )
> 	}
> )
>
> Gerriet.
>



Here’s my version I was just writing - I ran it in an iOS playground AND in an OSX playground and get the same ‘single word’ result either time. So I’m not entirely sure that the click test on OSX proved anything. If you comment out the Thai string and uncomment Chinese one, it works better and splits stuff up although the last two words are wrong there as well, they should be ‘去“ and “健身房“. It’s the same in an OSX playground and an iOS one but then again iOS playgrounds are emulated so ..

I also compiled it as an OSX command line tool and it does the same thing for my phrase AND yours. So whatever is doing the highlighting when you ‘click’ isn’t the same thing NSLinguisticTagger is doing.

The click test works on my chinese phrase too, it gets the last two words correct. Something sure ain’t right.

Should write the objc version to eliminate any possibility it’s swift.



let str = "สีเหลือง"
//let str = "我今天还没有去健身房"
let str2 = str as NSString

let tagger = NSLinguisticTagger(tagSchemes:  [NSLinguisticTagSchemeTokenType], options: 0 )


let range = NSMakeRange( 0, str2.length )

tagger.string = str2

var ranges : NSArray?
let things = tagger.tagsInRange( range, scheme: NSLinguisticTagSchemeTokenType, options: NSLinguisticTaggerOptions.allZeros, tokenRanges: &ranges )
things.count

ranges

for ( index, type ) in enumerate( things )
{
	let type_range : NSValue? = ranges?[ index ] as NSValue?
	print( "Type: '\(type)' at \(type_range!) ")
	println( str2.substringWithRange(type_range! ) )

}


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden


  • Follow-Ups:
    • Re: NSLinguisticTagger
      • From: "Gerriet M. Denkmann" <email@hidden>
    • Re: NSLinguisticTagger
      • From: "Gerriet M. Denkmann" <email@hidden>
References: 
 >NSLinguisticTagger (From: "Gerriet M. Denkmann" <email@hidden>)
 >Re: NSLinguisticTagger (From: Roland King <email@hidden>)
 >Re: NSLinguisticTagger (From: "Gerriet M. Denkmann" <email@hidden>)

  • Prev by Date: Re: NSLinguisticTagger
  • Next by Date: Re: NSLinguisticTagger
  • Previous by thread: Re: NSLinguisticTagger
  • Next by thread: Re: NSLinguisticTagger
  • Index(es):
    • Date
    • Thread