Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Varied questions about squeezing more performance




1) Do you have any evidence to suggest that changing data in structs is a performance problem in your application? Unless such evidence emerges, it seems highly unlikely to me this is worth worrying about.


2) Check the disassembly. If it is doing multiplication in the inner loop, you might look at ways to change those to adds. So, for example, inside vImage, we rarely do ptr + y*width + x, we do this:

	uint8_t  *row = vImage_Buffer->data;

	for( y = 0; y < height; y++ )
	{
		uint8_t *pixel = row;
		for( x = 0; x < width; x++ )
		{
			Do something with *pixel here
			pixel++;
		}

		row += vImage_Buffer->rowBytes;
	}

...except, like, vectorized. Often the compiler can do this kind of transformation for you, but not always.

Note that vImage does do some limited data reorganization. Look at the vImagePermute functions. However, I don't think these are vectorized on intel at the moment, for obvious reasons. We are looking at special case vectorization for some common cases. If you think you have one, file a bug at bugreporter.apple.com against component Accelerate/X asking for the special case.

3) Those aren't functions so nothing to inline there. Technically, all you really need there is to use the boolean & operation to look at the 1's bit, which is a lot cheaper than a real mod operation. However, it is quite possible the compiler knows that and is doing that optimization for you.

4) Uhh... only if you are good. :-) vImage functions are plain C functions.

5) Well, if it is the same kernel every time, you could make that array static const. I doubt this is worth worrying about.

Overall, performance-wise, I suspect you are asking the wrong questions. I suggest running Shark to see where the time is going, and then figure out how to accelerate those things that are taking up the time. Except for maybe #2, it is unlikely that these other things are going to impact your speed much, because comparatively speaking, they don't happen very often. An operation on half a million pixels is vastly more expensive than the ObjC overhead for one function call or the cost of copying a 9 element array. Just think of how much data needs to be touched for each task and you'll get the idea.

Depending on what you are doing, in certain cases, OpenGL/CoreVideo might be faster.

Ian

On Apr 20, 2006, at 2:59 PM, Juan P. Pertierra wrote:

Hello,

My software processes very large raw video files. There is a separate processing thread which loads raw
data from the large file, then repeatedly calls a cocoa instance method which renders each frame from a
pointer to the raw data. Everything runs fine and does what it is supposed to do, i'm just trying to make
sure it runs as fast as it can because it is a lengthy process even on the fastest macs.


Here are some questions I have about possibly making this render function run faster:

1.)I'm passing a large number of arguments to the render instance method, mostly are pointers to
temporary image buffers. This prevents allocating/freeing memory between renders of each frame.
However, the pointers are to vImage_Buffers, which are structs. In the render method I am frequently
accessing(in loops) members of these structs directly...so for example using vBuffer_ptr->data to access
the image data.


Since I am accessing these struct members repeatedly, should I be instead assigning the values to local
variables and using those instead? I'm thinking perhaps the members are being fetched from memory
each time, it would be much faster if I could force the processor to keep it in a register...i think?


2.)I'm using vImage functions for as many things a I can because I am under the impression that is the
fastest way to do those image operations. There are however a couple of things that vImage doesn't do.
For example I need to take raw data pixels and arrange them in a larger image(like building a Bayer
mosaic) which I implemented with loops and scalar code.


For example, I have to take the red pixels from the raw data and spread them out in an image such that
the pixels sit only at locations where x and y coordinates are even. I do this in this manner:


for(y = 0; y < HEIGHT; y++)
{
for(x = 0; x < WIDTH; x++)
{
if(x%2 == 0 && y%2 == 0)
{
*((unsigned short int *)vBufferRed- >data + (y * WIDTH) + x) = (unsigned short int) *
((unsigned char *)vBufferRed0->data + (y * WIDTH) + x);
}
else {...similar code to set these locations to 0...}
}
}


Is there anything fishy or perhaps a better way to do this? As you can see I am also converting from the
8-bit raw data to a 16-bit image, it is my understanding that as long as i'm not converting to/from a float
this is OK to do.


3.)Also, I'm not using any inline functions. Should I be? For example, would it benefit to make an inline
function for the x%2 == 0 && y%2 == 0 statement?


4.)The instance method which renders each frame only uses vImage functions and NSBitmapRep/
NSImage ot save each image...is it faster to use a plain C/C++ function for this kind of repeated calls for
processing?


5.)Finally, at the beginning of the render method I am defining some convolution kernels as:

const float kernel1[] = {..3x3 values...}

since this is done on every frame, should I be defining this kernels outside and pass them to the method,
or is this really as efficient as it gets?


Thanks for any input and taking the time to read through this.

Cheers,
Juan
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (PerfOptimization- email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/iano% 40apple.com


This email sent to email@hidden

_______________________________________________ Do not post admin requests to the list. They will be ignored. PerfOptimization-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Varied questions about squeezing more performance (From: "Juan P. Pertierra" <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.