Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tail recursion with dynamic branches



Hi all,

I'm trying to improve the performances of an emulator that uses threaded emulation.

My code uses tail recursion to avoid as many MMU translations as possible, but it seems that the tail recursion calls are calls with LK instead of direct branches.

Each instruction of the target processor is backed with a unit with data and a func pointer. The func pointer initially points to the translate function that fills the data and sets the func pointer to the appropriate instruction execution function.

The instruction execution function and the translate function hence have the same signature. These functions have 5 parameters including a pointer to a signal to know if the target processor was signaled and a number of consecutive instructions that can be executed without performing a MMU translation.

For example, this (static) function very simply moves the content of a register to another register:

void
UJITGeneric::MOV_NoShift(
					volatile KUInt32* inSignal,
					KUInt32 inCount,
					TARMProcessor* ioCPU,
					KUInt32 inPAddr,
					JITUnit* ioUnit )
{
	KUInt32 theInstruction = ioUnit->fInstruction;
	KUInt32 Rd = (theInstruction & 0x0000F000) >> 12;
	KUInt32 Opnd2;

	Opnd2 = ioCPU->mCurrentRegisters[theInstruction & 0x0000000F];

	ioCPU->mCurrentRegisters[Rd] = Opnd2;
	if (Rd == TARMProcessor::kR15)
	{
		ioCPU->mCurrentRegisters[TARMProcessor::kR15] += 4;
		return;
	}

	if ((inCount) && (*inSignal))
	{
		ioCPU->mCurrentRegisters[TARMProcessor::kR15] += 4;
		ioUnit[1].fFirstFuncPtr(
							inSignal,
							inCount - 1,
							ioCPU,
							inPAddr + 4,
							&ioUnit[1] );
	}
}

My problem is situated in the last bit of the function (actually a macro in my source code).

gcc (with either -O3 or -fast -mcpu=7450) generates something like:
...
	cmpwi	cr7, r9, 15	; if (Rd != TARMProcessor::kR15) ...
	bne+	cr7, label1	;
	lwz	r2, 60(r5)	; increment r15
	addi	r2, r2, 4	; ..
	stw	r2, 60(r5)	; ..
	b       label2		; return
label1:
	cmpwi	cr7, r4, 0	; test inCount
	beq	cr7, label2	; if it's null, return.
	lwz	r0, 0(r3)	; test *inSignal (it's already in r3!)
	cmpwi	cr7, r0, 0	; ..
	beq+	cr7, label2	; if it's null, return.
	lwz	r2, 60(r5)	; increment r15
	addi	r2, r2, 4	; ..
	stw	r2, 60(r5)	; ..
	subi	r4, r4, 1	; decrement inCount
	addi	r6, r6, 4	; increment inPAddr
	addi	r7, r7, 16	; increment ioUnit pointer
	lwz	r12, 16(r11)	; load next function address
	mtctr	r12		; move it to the count register
	bctrl			; branch with link to the next function
	nop			; I've got plenty of nothing
	nop			; ..
label2:
	lwz	r0, 72(r1)	; grab old lr
	addi	r1, r1, 64	; restore stack pointer
	mtlr	r0		; restore old lr
	blr			; return

I just wonder how to avoid the branch with link, i.e. how to get something like:

	addi	r7, r7, 16	; increment ioUnit pointer
	lwz	r12, 16(r11)	; load next function address
	mtctr	r12		; move it to the count register
+	lwz	r0, 72(r1)	; grab old lr
+	addi	r1, r1, 64	; restore stack pointer
+	mtlr	r0		; restore old lr
	bctr			; branch to the next function
label2:
	lwz	r0, 72(r1)	; grab old lr
...

Is there some gcc option I missed? Is there some way to rewrite the code as a hint that this last call is a tail call?

Paul
--
Ministre plénipotentiaire en disponibilité.
Baignoire à vendre.
http://www.kallisys.com/
http://newton.kallisys.net:8080/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.