I'm trying to improve the performances of an emulator that uses
threaded emulation.
My code uses tail recursion to avoid as many MMU translations as
possible, but it seems that the tail recursion calls are calls with
LK instead of direct branches.
Each instruction of the target processor is backed with a unit with
data and a func pointer. The func pointer initially points to the
translate function that fills the data and sets the func pointer to
the appropriate instruction execution function.
The instruction execution function and the translate function hence
have the same signature. These functions have 5 parameters including
a pointer to a signal to know if the target processor was signaled
and a number of consecutive instructions that can be executed without
performing a MMU translation.
For example, this (static) function very simply moves the content of
a register to another register:
My problem is situated in the last bit of the function (actually a
macro in my source code).
gcc (with either -O3 or -fast -mcpu=7450) generates something like:
...
cmpwi cr7, r9, 15 ; if (Rd != TARMProcessor::kR15) ...
bne+ cr7, label1 ;
lwz r2, 60(r5) ; increment r15
addi r2, r2, 4 ; ..
stw r2, 60(r5) ; ..
b label2 ; return
label1:
cmpwi cr7, r4, 0 ; test inCount
beq cr7, label2 ; if it's null, return.
lwz r0, 0(r3) ; test *inSignal (it's already in r3!)
cmpwi cr7, r0, 0 ; ..
beq+ cr7, label2 ; if it's null, return.
lwz r2, 60(r5) ; increment r15
addi r2, r2, 4 ; ..
stw r2, 60(r5) ; ..
subi r4, r4, 1 ; decrement inCount
addi r6, r6, 4 ; increment inPAddr
addi r7, r7, 16 ; increment ioUnit pointer
lwz r12, 16(r11) ; load next function address
mtctr r12 ; move it to the count register
bctrl ; branch with link to the next function
nop ; I've got plenty of nothing
nop ; ..
label2:
lwz r0, 72(r1) ; grab old lr
addi r1, r1, 64 ; restore stack pointer
mtlr r0 ; restore old lr
blr ; return
I just wonder how to avoid the branch with link, i.e. how to get
something like:
addi r7, r7, 16 ; increment ioUnit pointer
lwz r12, 16(r11) ; load next function address
mtctr r12 ; move it to the count register
+ lwz r0, 72(r1) ; grab old lr
+ addi r1, r1, 64 ; restore stack pointer
+ mtlr r0 ; restore old lr
bctr ; branch to the next function
label2:
lwz r0, 72(r1) ; grab old lr
...
Is there some gcc option I missed? Is there some way to rewrite the
code as a hint that this last call is a tail call?