• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Data, enumerateBytes: separate blocks?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Data, enumerateBytes: separate blocks?


  • Subject: Re: Data, enumerateBytes: separate blocks?
  • From: Daryle Walker <email@hidden>
  • Date: Wed, 27 Dec 2017 14:59:52 -0500

> On Dec 25, 2017, at 2:09 PM, Quincey Morris
> <email@hidden> wrote:
>
> So, if your goal is to minimize searching, you have to search for CR and LF
> simultaneously. There are two easy ways to do this:
>
> 1. Use “index(where:)” and test for both values in the closure.
>
> 2. Use a manual loop that indexes into a buffer pointer (C-style).
>
> #1 is the obvious choice unless invoking the closure is too slow when a lot
> of bytes need to be examined. #2 would use “enumerateBytes” to get a series
> of buffer pointers efficiently, but there is no boundary code to be tested,
> since you’re only examining 1 byte at a time.
>
> Once you have the optional indices to the first CR or LF, and you find you
> need to check for a potential CR-LF or CR-CR-LF, you can do that by
> subscripting into the original Data object directly, outside of the search
> loop.
>
> This approach would eliminate the problematic test case, and (unless I’m
> missing something obvious) have the initial search as its only O(n)
> computation, everything else being O(1), i.e. constant and trivial.
>

Right now I have:

>         guard let firstBreak = index(where: {
>             [MyConstants.cr, MyConstants.lf].contains($0)
>         }) else { return nil }
>
>         let which: Terminator
>         switch self[firstBreak] {
>         case MyConstants.cr:
>             let nextBreak = index(after: firstBreak)
>             if nextBreak < endIndex {
>                 switch self[nextBreak] {
>                 case MyConstants.cr:
>                     let nextBreak2 = index(after: nextBreak)
>                     if nextBreak2 < endIndex {
>                         if self[nextBreak2] == MyConstants.lf {
>                             which = .crcrlf
>                         } else {
>                             which = .cr
>                         }
>                     } else {
>                         which = .cr
>                     }
>                 case MyConstants.lf:
>                     which = .crlf
>                 default:
>                     which = .cr
>                 }
>             } else {
>                 which = .cr
>             }
>         case MyConstants.lf:
>             which = .lf
>         default:
>             preconditionFailure("The search from 'index' should never find
> anything outside {CR, LF}.")
>         }
>         return (which, firstBreak)


In my basic test suite, the property is called 37 times. The guard’s return is
hit 4 times, and the outer switch 33 times. For that outer switch, the CR case
is hit 25 times, the LF case 8 times, and that default I had to put in 0 times.
Within the CR case, the individual results are hit 4, 6, 3, 5, 5, and 2 times
respectively.

However, the guard’s contain test is covered 192 times! I’m guessing that’s
once for each byte the code goes past, right? Between that and wondering how
efficient the test is, I wonder if using something like [2] would be better.
But I would test a megabyte at a time or something. Now I have to figure out
how to divide a range to a set of subranges (of a set size, except possibly the
last). And how would I test which way is faster?

—
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: Data, enumerateBytes: separate blocks?
      • From: Quincey Morris <email@hidden>
References: 
 >Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Charles Srstka <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
 >Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)

  • Prev by Date: Re: Data, enumerateBytes: separate blocks?
  • Next by Date: Re: Data, enumerateBytes: separate blocks?
  • Previous by thread: Re: Data, enumerateBytes: separate blocks?
  • Next by thread: Re: Data, enumerateBytes: separate blocks?
  • Index(es):
    • Date
    • Thread