Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Data, enumerateBytes: separate blocks?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Data, enumerateBytes: separate blocks?

Subject: Re: Data, enumerateBytes: separate blocks?
From: Quincey Morris <email@hidden>
Date: Mon, 25 Dec 2017 11:09:21 -0800

On Dec 25, 2017, at 10:23 , Daryle Walker <email@hidden> wrote:
>
> What happens if whichever byte value is second is gigabytes away from the
> first?

Your Data extension code doesn’t solve that problem anyway:

>         var firstCr, firstLf: Index?
>         enumerateBytes { buffer, start, stop in
>             if let localLf = buffer.index(of: ParsingQueue.Constants.lf) {
>                 firstLf = start.advanced(by: buffer.startIndex.distance(to:
> localLf))
>                 stop = true
>             }
>
>             if let firstCrIndex = firstCr, firstCrIndex.distance(to:
> start.advanced(by: buffer.count)) > 2 {
>                 // No block after this current one could find a LF close
> enough to form CR-LF or CR-CR-LF.
>                 stop = true
>             } else if let localCr = buffer.index(of:
> ParsingQueue.Constants.cr) {
>                 firstCr = start.advanced(by: buffer.startIndex.distance(to:
> localCr))
>                 stop = true
>             }
>         }

In the case where the Data object is *one* multi-GB buffer, if it doesn’t
contain a LF you will search gigabytes for the non-existent LF before searching
them again for the CR. Even if you’re lucky and the Data object is multiple
smallish-buffers, you will still search all the buffers that don’t have a CR
for a LF, before you find the one that does have a CR.

So, if your goal is to minimize searching, you have to search for CR and LF
simultaneously. There are two easy ways to do this:

1. Use “index(where:)” and test for both values in the closure.

2. Use a manual loop that indexes into a buffer pointer (C-style).

#1 is the obvious choice unless invoking the closure is too slow when a lot of
bytes need to be examined. #2 would use “enumerateBytes” to get a series of
buffer pointers efficiently, but there is no boundary code to be tested, since
you’re only examining 1 byte at a time.

Once you have the optional indices to the first CR or LF, and you find you need
to check for a potential CR-LF or CR-CR-LF, you can do that by subscripting
into the original Data object directly, outside of the search loop.

This approach would eliminate the problematic test case, and (unless I’m
missing something obvious) have the initial search as its only O(n)
computation, everything else being O(1), i.e. constant and trivial.

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: Data, enumerateBytes: separate blocks?
  - From: Daryle Walker <email@hidden>

References:
	>Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Charles Srstka <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Quincey Morris <email@hidden>)
	>Re: Data, enumerateBytes: separate blocks? (From: Daryle Walker <email@hidden>)

Prev by Date: Re: Data, enumerateBytes: separate blocks?
Next by Date: Re: Data, enumerateBytes: separate blocks?
Previous by thread: Re: Data, enumerateBytes: separate blocks?
Next by thread: Re: Data, enumerateBytes: separate blocks?
Index(es):
- Date
- Thread