Re: Re(2): [OT] Quartz blitter/hardware support?
Re: Re(2): [OT] Quartz blitter/hardware support?
- Subject: Re: Re(2): [OT] Quartz blitter/hardware support?
- From: Allan Odgaard <email@hidden>
- Date: Thu, 27 Jun 2002 05:27:27 +0200
On torsdag, juni 27, 2002, at 04:11 , Jens Bauer wrote:
If the window was tranparent, then that would of course affect the
speed, because the blitter in the Amiga doesn't feature any
alpha-blending modes, but I'm sure my GeForce II does -- but again
remember, the window in question (on Mac OS X) is *not* transparent.
Still Mac OS X has to check if there's a translucent window covering
your
window.
This could be done for every pixel that is changed, or it could be done
once if we're lucky.
Mac OS X keeps my window in an off screen buffer, so all it would have
to do would be (assuming *my* window is not transparent):
Scroll contents of the off screen buffer (preferably using the blitter).
solidParts = { "rectangle with dimension of my window" }
alphaParts = { nil }
For all windows with a higher depth level than my window (in ascending
order)
if other window has no alpha
for each rectangle in solidParts and alphaParts
rect = rect - intersection(rect, other window) // this operation can
result in 0-4 new rectangles
else if other window has alpha
alphaParts += intersection(rect, other window) // also store a pointer
to other windows bitmap and alpha mask/level
All rectangles ending up in solidParts can simply be moved to the
display buffer (again, perferably using a blitter).
For the rectangles in alphaParts pop first rectangle, do the blend
(which at least for bitmaps with only an alpha level and no mask, should
be doable with the blitter, I think even a mask can be handled by todays
blitters), then for all remaining rectangles in alphaParts calculate the
intersection and if any, blend also with this, and subtract that
intersection from the rectangle interesected. Continue till the set is
empty.
I think the above code should be rather quick to perform, especially as
it would terminate almost instantly when no windows cover the window in
which we scroll.
But I also think that it's clear that the layer calculations are not the
problem with Quartz, the fact that there is absolutely no use of a
blitter is (because off screen bitmaps are allocated in main memory and
not on the graphics card, and they are also moved to the graphics card
with a simple for()-loop or similar)...
What? The 680x0 is a CISC processor that commits an instruction every
fifth cycle or so (not including memory stalls -- and remember that it
has much smaller caches, no second level cache at all and also slower
memory interface).
Take a 60MHz PowerMac 6100 and run a hand-optimized (601) assembly
program on it.
Run the same program, which is hand-optimized on a 68040, and watch the
difference.
hmm... I'm afraid I don't have any 60 MHz PowerMac's, so I'll have to
take your word for that...
You may be able to save some clock-cycles in the RISC version, but
remember that the CISC often performs more than one action per
instruction; eg...
bchg d0,(a0)+
would...
read bit d0 from {a0}, flip it, write it back to {a0} and increment a0.
-This is still a bit complicated when using RISC code. (oh, now I'm
really going off-topic!)
Modern RISC processors do have auto increment instructions and similar.
Furthermore the "move.l (a0)+,do" does take several cycles to complete,
and no instructions can be executed meanwhile, whereas the RISC
processor pipeline things and thus start a new instruction every cycle,
in fact they start several instructions pr. cycle if there are no
resouce conflicts.
So with RISC you generally complete 1.5-3 instructions pr. cycle, and on
the CISC you complete maybe 0.2.
Now the PCI bus alone can transfer more than that -- but what about
AGP?
AFAIK we have AGP support on the Mac, and this can certainly also do
much much more pr. second.
Yep, but I don't know what kind of connection to the graphics card you
have on the iMac.
I don't have an iMac, but a PowerMac G4/SilverLine.
Could you try making a test and let me know how much data you can move
per second on your iMac using PPC assembly ?
Well, I can test memmove() which I assume is optimized assembly (???).
memmove() actually gives me 290 MB/s.
bzero() gives me 496 MB/s.
My approx. calculations would be you can write between 133MB and 185MB
data per second (based upon the results from my old 6200). If you need
to
copy data, the number of MB you can move would then drop (not to 50%, as
reading is faster than writing, but I don't remember the exact ratio).
Uhm, maybe I'll mail you a speed tester off-list later, so you don't
have
to write it yourself. ;)
Feel very free to do so! As I have no experience with PPC assembler
myself...
Say we can only copy 133MB per second, and we're using an offscreen, we
need to first write to RAM to update the offscreen, *then* copy the
scrolled offscreen to the graphics card. This means we'd most likely be
able to copy (very, very approximately) 66MB per second. This is
excluding the time for reading the data.
Yes -- but this is also assuming that there is no DMA to help us copy to
the graphics card.
Though it would seem my C copy function of 75 MB/s is rather slow
compared to assembler. Hopefully Apple does use memmove() rather than
roll their own implementation in C...
Also, it would actually be possible to split the window into an off
screen part (the parts covered by other windows) and an on screen part.
One would have to copy to and from the display buffer when moving
windows around, if the clipped rectangles change (but off screen parts
should be allocated from graphics card memory, so the blitter would be
used here, and you wouldn't notice it), and the benefit would be that
all window rendering would require only half the work... but of course
then there would be no automatic double buffer, which seems to be a
priority for Apple (though one doesn't need to double buffer a window to
get flicker free content scrolling, even when the area scrolled is
partly covered by other windows.
If this is all Apple can squeze out of my machine then they should
really hire me to work on Quartz for them!!! I don't even need hardware
support to beat what they currently have to offer...
Well, it won't hurt asking them, would it ? ;)
Yeah, they probably do a lot of R&D in Apple Denmark ;-)
I *do* believe it's possible to "go faster", but you shouldn't expect
the
full 20x, not even 10x, because it's better being surprised than
dissapointed. ;)
I gues I should look into what the OpenGL (Quicktime?) APIs have to
offer in regard to direct graphics memory access and blitter support,
and then write some low level benchmarks that replicate the
functionality of the Aqua window server (i.e. windows with transparency,
pattern-backgrounds, double buffer etc.) and see what I can squeze out
of the system, and if I can scroll the contents of a window faster ;-)
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.