Sorry, I meant to cut out the DART reference in the email...
We've turned of the DMA completely. We are just seeing a huge hit when
doing a System "BlockMoveData" function to or from the wired buffer. This
hit is not occurring if we allocate the same size (or larger) buffers in
user space (using malloc, or clearPtr), and copy between them.
- Mike
> As I've said on this list before. It is very unlikely to be a dart
> issue. You are only hitting the dart once when you prepare your
> memory descriptor. After that it is a non-issue.
>
> Are you seeing the issue with the G5 doing to copy or your DMA doing
> the copy?
>
> Godfrey
>
> On 04/27/2005, at 13:22 , email@hidden wrote:
>
>> Hello all,
>>
>> I'm trying to overcome a performance problem we're having with our
>> video
>> capture card drivers on the G5.
>>
>> The nature of our problem is that we're seeing drastic differences in
>> performance between G4 and G5 systems, and not the differences we'd
>> expect.
>> In a G4 system, when we read from or write to a wired, contiguous,
>> buffer
>> allocated by a IOBufferMemoryDescriptor call, we see roughly the
>> advertised
>> memory performance of the system. However on the G5, we are
>> incurring huge
>> overhead, resulting in significantly slower performance where we
>> would have
>> expected about a 3X increase.
>>
>> We have tried literally dozens of things to get around this
>> problem. In our
>> testing, we have discovered that in User-space, we can allocate two
>> buffers
>> and copy between them at the expected G5 performance levels.
>> However, if
>> one of the buffers is our kernel-allocated buffer, it appears we
>> are taking
>> a huge hit. We've set the buffer up as "user/kernel shared" we've
>> tried
>> setting the directions to match the direction of the copy. We've
>> even tried
>> setting the direction as in/out. As an experiment, we've allocated
>> two
>> buffers in the kernel, and done the copy at interrupt level. In
>> every case,
>> the performance is an order of magnitude less than the same code on
>> the G4.
>>
>> Some details:
>> We allocate wired buffers in the kext, of around 800KBytes (The
>> size of a
>> standard-definition video frame) each. We allocate them as page-
>> aligned and
>> contiguous so that we can DMA to/from them from our hardware card
>> without
>> setting up scatter-gather. This results in almost 200 vm pages per
>> buffer.
>> We have a QuickTime component which fills these buffers from a
>> given QT
>> buffer.
>>
>> At this point, we are assuming that we are incurring a hit brought
>> on by the
>> DART reloading its address tables, or perhaps issues with VM on the
>> G5. The
>> same code on G4 machines seems fine.
>>
>> We have recently come across an Apple soft-VDIG example which
>> includes a
>> kext, and it looks like our implementation is exactly as in the
>> example, and
>> as you might expect, the example code performs just as poorly on
>> the G5.
>>
>> - Mike Stroven
>>
>>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Darwin-drivers mailing list (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>>
>> This email sent to email@hidden
>>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-drivers mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden