Hello,
My software processes very large raw video files. There is a separate processing thread which loads raw
data from the large file, then repeatedly calls a cocoa instance method which renders each frame from a
pointer to the raw data. Everything runs fine and does what it is supposed to do, i'm just trying to make
sure it runs as fast as it can because it is a lengthy process even on the fastest macs.
Here are some questions I have about possibly making this render function run faster:
1.)I'm passing a large number of arguments to the render instance method, mostly are pointers to
temporary image buffers. This prevents allocating/freeing memory between renders of each frame.
However, the pointers are to vImage_Buffers, which are structs. In the render method I am frequently
accessing(in loops) members of these structs directly...so for example using vBuffer_ptr->data to access
the image data.
Since I am accessing these struct members repeatedly, should I be instead assigning the values to local
variables and using those instead? I'm thinking perhaps the members are being fetched from memory
each time, it would be much faster if I could force the processor to keep it in a register...i think?
2.)I'm using vImage functions for as many things a I can because I am under the impression that is the
fastest way to do those image operations. There are however a couple of things that vImage doesn't do.
For example I need to take raw data pixels and arrange them in a larger image(like building a Bayer
mosaic) which I implemented with loops and scalar code.
For example, I have to take the red pixels from the raw data and spread them out in an image such that
the pixels sit only at locations where x and y coordinates are even. I do this in this manner:
for(y = 0; y < HEIGHT; y++)
{
for(x = 0; x < WIDTH; x++)
{
if(x%2 == 0 && y%2 == 0)
{
*((unsigned short int *)vBufferRed->data + (y * WIDTH) + x) = (unsigned short int) *
((unsigned char *)vBufferRed0->data + (y * WIDTH) + x);
}
else {...similar code to set these locations to 0...}
}
}
Is there anything fishy or perhaps a better way to do this? As you can see I am also converting from the
8-bit raw data to a 16-bit image, it is my understanding that as long as i'm not converting to/from a float
this is OK to do.
3.)Also, I'm not using any inline functions. Should I be? For example, would it benefit to make an inline
function for the x%2 == 0 && y%2 == 0 statement?
4.)The instance method which renders each frame only uses vImage functions and NSBitmapRep/
NSImage ot save each image...is it faster to use a plain C/C++ function for this kind of repeated calls for
processing?
5.)Finally, at the beginning of the render method I am defining some convolution kernels as:
const float kernel1[] = {..3x3 values...}
since this is done on every frame, should I be defining this kernels outside and pass them to the method,
or is this really as efficient as it gets?
Thanks for any input and taking the time to read through this.
Cheers,
Juan
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden