Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

VAR vs VAO vs VBO (was Loading vertex arrays in the graphics card ?)




On Dec 27, 2004, at 12:04 PM, Christopher Niederauer wrote:

ARB_vertex_buffer_object (aka VBO) was added in 10.3.4 and should be
supported on any card that already supports APPLE_vertex_array_range
(aka VAR).  If you are running on a capable system, double check the
extensions list for it (not all ARB extensions are next to each other)
– but please do file a bug if its really not there.

Let me ring in here, too, that APPLE_vertex_array_range is a better
suited spec for certain circumstances, such as modifying a small sub
range of data for streaming.  It has the ability to flush a sub range
of data when modified, whereas VBO requires at the very least a call to
BufferSubData (which means at least one more copy for non-cached data)
or flushing the entire range if you modify a piece of the data manually
via mapping (whew, did that make sense? ;-).  The main benefit that VBO
provides over VAR is the ability to mix and match multiple buffer
objects (ie streamable vertex positions with static texture coords).
Since you are asking about statically caching data on the video card, I
think VBO should work out fine, but don't let #ifdef's stop you from
using a better suited API whenever applicable (not just VAR vs VBO)!


We've used VAR, VAO, and VBO. I have a few cents to toss in. I would welcome any corrections to what I have in my mental model of how these things work.


VAR definitely lets you have very tight control over memory allocation and synchronization (since you have to do it all yourself), and also allows you to amortize the synchronization costs of fences, across many draw-batches, if you do it right.
Downside to VAR: you can't easily maintain multiple regions without marking and unmarking an area as a VAR, and thus it is very difficult if not impossible to maintain some data in system RAM while other data is cached in VRAM. i.e. as soon as you re-set the VAR from a cached range to a shared range, since the only reference to the cached data has been released, it can't stay in VRAM, poof.


VAO lets you get around that last problem with VAR, by allowing per-array-object VAR state including the storage hint, but it (IMHO) goes too far and does this book-marking of the vertex array pointers, so when you bind to a given VAO, it re-steers all the vertex attribute pointers to wherever you left them set when that VAO was last bound. So it's easy with VAO to create disjoint, multiple, arrays of data that can be either RAM or VRAM based, but you have to keep that vertex-pointer bookmarking in mind or it will make your higher level code very messy. This is especially acute when trying to backfit VAO at a low level into a library that is used to steering the vertex pointers whenever it feels like and that they stay set where it left them. VAO has no notion of streaming data either, which is something that is simple to do in VAR with a ring buffer and a couple of fences (see Warcraft III).

VBO has a lot going for it. It doesn't do the weird vertex pointer bookmarking that VAO does (and I understand that said bookmarking can be a big performance win on some hardware, since pointers need not be re-translated into hardware terms and re-validated by the driver). It does offer an easy way to set up RAM based buffers, VRAM based buffers, streaming or static buffers, whatever you need, and keep them all alive on an ongoing basis.
It also adds something that neither VAR or VAO could do, which is let you stripe different attributes into either RAM or VRAM (just reiterating what Chris said). For example if you had a mesh with some static texcoords at each vertex but CPU generated positions and normals, with VBO it is simple to put the static stuff in VRAM and the dynamic stuff in system/AGP RAM, straddling your vertex data between the two worlds.


	Problems with VBO:

a) no way to explain which memory you touched during your tenure over a mapped buffer. So whenever you unmap a buffer, my understanding is that GL is going to have to do the equivalent of a glFlushArrayRangeAPPLE over the entire known size of the buffer to make sure any writes you did, got flushed out to real RAM for DMA. I'm assuming here that we're discussing mapping of a buffer in shared/AGP RAM.
If I could change this, I would want an explicit call for flushing dirty regions, and a flag to the unmap call to say "trust me, I flushed the writes" so it can skip the big flush of the whole size. An alternative would be something like a "MapSubBuffer" call, allowing one to map just a smaller region, but this would mean more API traffic I think, having to do some kind of map/unmap pair for each sub section I wanted to write to. I like the first idea better.


b) in 10.3.4 through 10.3.7, the streaming VBO support is non-optimal. As in, if you try to follow the canonical example of streaming draw calls espoused in the NVIDIA white paper on VBO, you will see a gi-normous amount of time spent in the kernel, in response to the VBO implementation doing a lot of allocations and frees to satisfy each unique size request coming down the pike (glBufferData calls with the NULL pointer).
In "World of Warcraft" we work around this non-optimality by pre-creating a flotilla of pre-sized VBO's, and whenever we would normally do the standard-style streaming into a single VBO name with glBufferData (size, NULL) we instead look at the size and route the req to a VBO of sufficient size that is least recently used in its size class. We call this the streaming VBO broker, and it uses up a bunch of RAM, but it makes the app noticeably faster.
I know Apple has improvements in the works so this issue may fall by the wayside some day, but at least for today, if we try to activate the simple streaming-VBO code in our game, well, you get bad FPS even on the title screen and double digit percentage time spent futzing in the kernel. Caveat emptor if you want to do streaming VBO stuff under OSX 10.3.x today.
Basically with our approach, you have to do a bit of empirical measurement to find out which size draw-batches you are usually blocking on (i.e. you see time spikes in glMapBuffer when you go to re-use a slot), and you re-build the app to provide more slots at that size if you didn't make enough. This gets gnarly if your app generates a varying spectrum of draw batch sizes, or you just wind up throwing more and more RAM at it. The root issue is that the normal streaming path is just too slow in 10.3.7 to use as-is.
Again I know Apple is working on improving this, so this is more fodder for other coders than any kind of 'cry for help' with it, they know about, we know about it, I figure list readers might want to know about it.


c) in 10.3.7 at least, and dunno about 10.3.x prior, glTestObjectAPPLE against a VBO does not work. If you look at how I went about attacking the problem in "b", you might think that you could use this call to set up a dynamically balanced flotilla of VBO slots for efficient streaming, by simply checking if the slot you want to use is still busy, and creating new ones on the fly. But you can't, because glTestObjectAPPLE has a bug and always returns "done" on VBO's (bug filed).
If it worked, I could sense when I didn't have enough 4KB slots to go around and was starting to stall, or say 256KB slots if the game started doing something that really needed a lot at that size, and I could also readily avoid pre-allocating too much RAM for slots that would be underutilized. I thought of a way to do it "after the fact" by simply counting the microseconds spent in each glMapBuffer call in the streaming case, and responding to any detected "long waits" by bumping up the number of buffers in that size range, but that's gross. Plus it would be wasted effort if/when improvements to GL finally do make the VBO streaming path faster. But this is really a detour, we would much prefer to chuck the whole VBO-broker complication the second the underlying implementation is revved up, though it will likely always have to be compiled in to be able to continue supporting 10.3.4-10.3.x users without tanking their frame rate, some kind of runtime switch.


d) you can't glMapBuffer in a nonblocking way, if for example you have set up a buffer that is holding font glyph geometry, and you have a pending draw running on it, and then you want to map the buffer and concatenate some new shapes to the later portion of the buffer - GL will make you wait, even though there would have been no conflict - you would be writing to RAM that is disjoint from what the GPU was DMA'ing out of.
In Direct3D this is known as locking a vertex buffer with the NOOVERWRITE hint. It is amazingly useful, since you have design pressure to reduce the number of draw batches, one way to do that is to steadily gather all your vertex data into big arrays and then spin up new index buffers to reach into those arrays on the fly, so you can amortize per-batch cost across as many drawn things as you are able to coalesce. But in a game that dynamically spools in chunks of data, the fact that glMapBuffer will always block you if drawing is pending on your buffer, will prevent you from overlapping data-fetch-and-place type of operations with drawing.
If I could change this I would make a version of glMapBuffer that would not block, and that would trust the caller to not do anything stupid. This would be the same level of trust afforded anyone using VAR :)


The combination of a faster streaming-mode VBO implementation, a "no-flush" version of glUnmapBuffer, and a "non blocking" version of glMapBuffer would help us a lot both for present day as well as future titles. Actually I think that if I just had the last two, I could synthesize the first one without going all the way back to "VAR in the stone age". I could set aside one large shared-RAM VBO and treat it like a VAR ring buffer.

Rob

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Mac-opengl mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/mac-opengl/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.