Re(2): [OT] Quartz blitter/hardware support?
Re(2): [OT] Quartz blitter/hardware support?
- Subject: Re(2): [OT] Quartz blitter/hardware support?
- From: Jens Bauer <email@hidden>
- Date: Thu, 27 Jun 2002 04:11:39 +0200
Hi Allan,
On Thu, 27 Jun, 2002, Allan Odgaard <email@hidden> wrote:
{snip}
>
> If you had that on the Amiga, I'm sure you'd get in trouble with the
>
> scrolling speed as well.
>
>
Anti-aliased fonts should not bring it down -- I scroll only a few
>
pixels at a time, so only one line of text needs to be drawn, and I did
>
experiment with anti-aliased fonts on the Amiga, and this did not spoil
>
the real-time scrolling of my browser -- and remember, my tests were
>
also without font-rendering. The font system actually seems quite
>
reasonable, at least when using the CoreGraphics functions (~300.000
>
glyphs pr. second).
>
>
If the window was tranparent, then that would of course affect the
>
speed, because the blitter in the Amiga doesn't feature any
>
alpha-blending modes, but I'm sure my GeForce II does -- but again
>
remember, the window in question (on Mac OS X) is *not* transparent.
Still Mac OS X has to check if there's a translucent window covering your
window.
This could be done for every pixel that is changed, or it could be done
once if we're lucky.
>
> On computers back then, we could "hog" the display, even the CPU; today,
>
> we have to respect that other programs might want to use the display as
>
> well, so we can't do direct-to-screen drawing, and we can't hog the CPU
>
> either.
>
>
For the records, the Amiga has both a shared display and
>
multi-tasking :-)
OK, you've done the tests in a "legal" environment, I suppose, without
doing any tricks. ;)
>
> Even though we can't do this, I believe it's still possible to find
>
> parts
>
> of the graphics system that can be optimized, maybe by using completely
>
> different algorithms.
>
>
I wrote my own layers system for the Amiga, if anyone still has one of
>
these machines I'd really recommend taking a look at the demo I put on
>
my web-page (sorry about making this all about the Amiga) -- it opens
>
500 windows in a blink of an eye, and you can move anyone of these
>
around realtime (i.e. following the screens refresh rate), even without
>
bringing that window to front, so it has to be clipped with the 499
>
other windows, and this is actually running in true colour (on my
>
system)...
That *is* very impressive!
>
There is of course no support for alpha channels, but OTOH this demo
>
asks each window to repaint damaged regions when parts are uncovered,
>
where Mac OS X cache each windows contents in an off screen buffer -- if
>
I'd do the latter, and had a blitter that could mix bitmaps using an
>
alpha channel, then I doubt it'd seriously hurt the performance (of
>
course I'd need to have the memory required for 500 true colour
>
off-screen bitmaps :-) ).
>
>
> You may not be able to get 20 times the speed on a 800 MHz computer
>
> compared to a 40MHz (680x0), but it would probably be possible to
>
> reach 5
>
> times the speed if working *really* hard on it.
>
>
This is certainly not my experience -- I've moved much C code from the
>
Amiga to both Windows and Linux and found speedups closer to 100. that
>
would also comply better with Moores law.
>
>
> -Remember, the 680x0 is faster per MHz than a PowerPC!
>
>
What? The 680x0 is a CISC processor that commits an instruction every
>
fifth cycle or so (not including memory stalls -- and remember that it
>
has much smaller caches, no second level cache at all and also slower
>
memory interface).
Take a 60MHz PowerMac 6100 and run a hand-optimized (601) assembly
program on it.
Run the same program, which is hand-optimized on a 68040, and watch the
difference.
You may be able to save some clock-cycles in the RISC version, but
remember that the CISC often performs more than one action per
instruction; eg...
bchg d0,(a0)+
would...
read bit d0 from {a0}, flip it, write it back to {a0} and increment a0.
-This is still a bit complicated when using RISC code. (oh, now I'm
really going off-topic!)
I believe, ofcourse, that a G3 performs better than a 601 if scaling the
MHz down to 60.
>
The PowerPC is a RISC processor which I'd estimate commits on average at
>
least 1.5 instructions pr. cycle. probably closer to 2-3 instructions
>
pr. cycle for optimized code.
>
>
> Again, there's a lot of issues involved in it, which you (I) don't take
>
> into account.
>
>
Any naive implementation done by Apple is beyond my view -- but I can
>
assure you that I do consider all the added benefits of the Aqua
>
interface when I claim that something is wrong speed-wise.
>
>
To repeat the simple scrolling-test I did, then I added a timer to
>
execute a method 60 times pr. second. This method called scrollToPoint
>
on an NSScrollView with a custom NSView subclass. This NSView subclass
>
simply filled the background of *new* pixels (using the fillRect method
>
of the NZBezierPath class).
>
>
I set both copiesOnScroll and drawsBackground to YES, I also returns
>
YES in isOpaque for my NSView subclass.
>
>
I have tested this program as frontmost, without any transparency
>
involved.
>
>
The result?
>
>
Scrolling is very close to realtime (i.e. 60 times pr. second) when the
>
NSScrollView has a size of ~450x320.
>
>
When I make the window fill the entire screen (1600x1024) the
>
NSScrollView only scrolls 17.5 times pr. second.I.e. scrolling the
>
bitmap (one blit) takes ~0.06 seconds.
>
>
On each scroll we need to move around 6 MB, so that gives us a total of
>
around 100 MB/s.
>
>
Now the PCI bus alone can transfer more than that -- but what about AGP?
>
AFAIK we have AGP support on the Mac, and this can certainly also do
>
much much more pr. second.
Yep, but I don't know what kind of connection to the graphics card you
have on the iMac.
>
And remember, this is really the worst-case scenario where we have our
>
bitmap in main memory (despite belonging to the front most window in the
>
display buffer -- which Apple could easily utilize, and thus simply use
>
the blitter of the graphics card. This is a general situation (i.e.
>
scrolling the contents of the front most window), so such an
>
optimization should really be worth the effort).
Could you try making a test and let me know how much data you can move
per second on your iMac using PPC assembly ?
My approx. calculations would be you can write between 133MB and 185MB
data per second (based upon the results from my old 6200). If you need to
copy data, the number of MB you can move would then drop (not to 50%, as
reading is faster than writing, but I don't remember the exact ratio).
Uhm, maybe I'll mail you a speed tester off-list later, so you don't have
to write it yourself. ;)
Say we can only copy 133MB per second, and we're using an offscreen, we
need to first write to RAM to update the offscreen, *then* copy the
scrolled offscreen to the graphics card. This means we'd most likely be
able to copy (very, very approximately) 66MB per second. This is
excluding the time for reading the data.
>
If this is all Apple can squeze out of my machine then they should
>
really hire me to work on Quartz for them!!! I don't even need hardware
>
support to beat what they currently have to offer...
Well, it won't hurt asking them, would it ? ;)
I *do* believe it's possible to "go faster", but you shouldn't expect the
full 20x, not even 10x, because it's better being surprised than
dissapointed. ;)
Love,
Jens
--
Jens Bauer, Faster Software.
-Let's make the World better, shall we ?
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.