We have set up a rendering framework based on Quartz Composer where
we use the image output generated by some compositions as inputs to
other compositions ,etc...
We have one object, let's call it ImageProducer, that maintains a
list of QCRenders and asks them to render their composition in a
specific OpenGL context. Then we arrach a CVOpenGLBufferRef (taken
from a pool, as in the Performer example) to the GLContext, call
glFlush and get our renderer image. This image, along with some
others calculated in parallel are then sent as inputs to the
QCRenderers of other ImageProducers, and so on.
We were hoping that dealing with CVOpenGLBufferRef would keep all
images in the Graphics Card memory and in order to get optimal
performances.
First results are quite promising with only a few producers, but when
we increase their number , we get a sudden drop of performances from
60fps to 5fps.
I am asking for some hints here on how to tackle performance
measurements in such a situation. Namely:
- how would you monitor what gets copied between the main memory and
the graphics card ?
- how would you interpret the initial result given by Shark that I
list below:
It looks like a lot of time is spent waiting for a TimeStamp from the
card driver. What does it mean ?
Of course, I am also interested in any thoughts you might have
regarding the problem.
- each ImageProducer has its own OpenGL context in which it renders
its child compositions. I can't really see how to avoid that. Is
there a problem with having many offline contexts being used at tghe
same time (please note that our rendering is single-threaded).
- each ImageProducer has its own CVOpenGLBufferPool...maybe tha's
bad...is it ?
- anything else.
We know that we should get much better performances because what we
display using multiple compositions in a hierarchy can be done with a
single composition and it is very fast; So there is something about
the way we use OpenGL that is wrong.