Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Re(2): [OT] Quartz blitter/hardware support?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re(2): [OT] Quartz blitter/hardware support?

Subject: Re: Re(2): [OT] Quartz blitter/hardware support?
From: Allan Odgaard <email@hidden>
Date: Thu, 27 Jun 2002 05:27:27 +0200

On torsdag, juni 27, 2002, at 04:11 , Jens Bauer wrote:

If the window was tranparent, then that would of course affect the
speed, because the blitter in the Amiga doesn't feature any
alpha-blending modes, but I'm sure my GeForce II does -- but again
remember, the window in question (on Mac OS X) is *not* transparent.

Still Mac OS X has to check if there's a translucent window covering your
window.
This could be done for every pixel that is changed, or it could be done
once if we're lucky.

Mac OS X keeps my window in an off screen buffer, so all it would have to do would be (assuming *my* window is not transparent):

Scroll contents of the off screen buffer (preferably using the blitter).

solidParts = { "rectangle with dimension of my window" }
alphaParts = { nil }

For all windows with a higher depth level than my window (in ascending order)
if other window has no alpha
for each rectangle in solidParts and alphaParts
rect = rect - intersection(rect, other window) // this operation can result in 0-4 new rectangles
else if other window has alpha
alphaParts += intersection(rect, other window) // also store a pointer to other windows bitmap and alpha mask/level

All rectangles ending up in solidParts can simply be moved to the display buffer (again, perferably using a blitter).
For the rectangles in alphaParts pop first rectangle, do the blend (which at least for bitmaps with only an alpha level and no mask, should be doable with the blitter, I think even a mask can be handled by todays blitters), then for all remaining rectangles in alphaParts calculate the intersection and if any, blend also with this, and subtract that intersection from the rectangle interesected. Continue till the set is empty.

I think the above code should be rather quick to perform, especially as it would terminate almost instantly when no windows cover the window in which we scroll.

But I also think that it's clear that the layer calculations are not the problem with Quartz, the fact that there is absolutely no use of a blitter is (because off screen bitmaps are allocated in main memory and not on the graphics card, and they are also moved to the graphics card with a simple for()-loop or similar)...

What? The 680x0 is a CISC processor that commits an instruction every
fifth cycle or so (not including memory stalls -- and remember that it
has much smaller caches, no second level cache at all and also slower
memory interface).

Take a 60MHz PowerMac 6100 and run a hand-optimized (601) assembly
program on it.
Run the same program, which is hand-optimized on a 68040, and watch the
difference.

hmm... I'm afraid I don't have any 60 MHz PowerMac's, so I'll have to take your word for that...

You may be able to save some clock-cycles in the RISC version, but
remember that the CISC often performs more than one action per
instruction; eg...
bchg d0,(a0)+
would...
read bit d0 from {a0}, flip it, write it back to {a0} and increment a0.
-This is still a bit complicated when using RISC code. (oh, now I'm
really going off-topic!)

Modern RISC processors do have auto increment instructions and similar.

Furthermore the "move.l (a0)+,do" does take several cycles to complete, and no instructions can be executed meanwhile, whereas the RISC processor pipeline things and thus start a new instruction every cycle, in fact they start several instructions pr. cycle if there are no resouce conflicts.

So with RISC you generally complete 1.5-3 instructions pr. cycle, and on the CISC you complete maybe 0.2.

Now the PCI bus alone can transfer more than that -- but what about AGP?
AFAIK we have AGP support on the Mac, and this can certainly also do
much much more pr. second.
Yep, but I don't know what kind of connection to the graphics card you
have on the iMac.

I don't have an iMac, but a PowerMac G4/SilverLine.

Could you try making a test and let me know how much data you can move
per second on your iMac using PPC assembly ?

Well, I can test memmove() which I assume is optimized assembly (???).

memmove() actually gives me 290 MB/s.
bzero() gives me 496 MB/s.

My approx. calculations would be you can write between 133MB and 185MB
data per second (based upon the results from my old 6200). If you need to
copy data, the number of MB you can move would then drop (not to 50%, as
reading is faster than writing, but I don't remember the exact ratio).
Uhm, maybe I'll mail you a speed tester off-list later, so you don't have
to write it yourself. ;)

Feel very free to do so! As I have no experience with PPC assembler myself...

Say we can only copy 133MB per second, and we're using an offscreen, we
need to first write to RAM to update the offscreen, *then* copy the
scrolled offscreen to the graphics card. This means we'd most likely be
able to copy (very, very approximately) 66MB per second. This is
excluding the time for reading the data.

Yes -- but this is also assuming that there is no DMA to help us copy to the graphics card.

Though it would seem my C copy function of 75 MB/s is rather slow compared to assembler. Hopefully Apple does use memmove() rather than roll their own implementation in C...

Also, it would actually be possible to split the window into an off screen part (the parts covered by other windows) and an on screen part. One would have to copy to and from the display buffer when moving windows around, if the clipped rectangles change (but off screen parts should be allocated from graphics card memory, so the blitter would be used here, and you wouldn't notice it), and the benefit would be that all window rendering would require only half the work... but of course then there would be no automatic double buffer, which seems to be a priority for Apple (though one doesn't need to double buffer a window to get flicker free content scrolling, even when the area scrolled is partly covered by other windows.

If this is all Apple can squeze out of my machine then they should
really hire me to work on Quartz for them!!! I don't even need hardware
support to beat what they currently have to offer...

Well, it won't hurt asking them, would it ? ;)

Yeah, they probably do a lot of R&D in Apple Denmark ;-)

I *do* believe it's possible to "go faster", but you shouldn't expect the
full 20x, not even 10x, because it's better being surprised than
dissapointed. ;)

I gues I should look into what the OpenGL (Quicktime?) APIs have to offer in regard to direct graphics memory access and blitter support, and then write some low level benchmarks that replicate the functionality of the Aqua window server (i.e. windows with transparency, pattern-backgrounds, double buffer etc.) and see what I can squeze out of the system, and if I can scroll the contents of a window faster ;-)
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

References:
	>Re(2): [OT] Quartz blitter/hardware support? (From: Jens Bauer <email@hidden>)

Prev by Date: Re: NSPoint and NSRect
Next by Date: Re: [OT] Quartz blitter/hardware support?
Previous by thread: Re(2): [OT] Quartz blitter/hardware support?
Next by thread: Re: [OT] Quartz blitter/hardware support?
Index(es):
- Date
- Thread