On Thu, Oct 7, 2010 at 12:52 PM, Terry Lambert
<email@hidden> wrote:
On Oct 7, 2010, at 12:59 AM, Dave Keck wrote:
>> A harder solution is to leave the INT3 in place, and set EIP to an
>> instruction buffer of your own on the side that says `syscall ; int3`. That
>> way you can run a syscall instruction on that thread without interfering
>> with any new thread.
>
> Clever - that sounds like the answer I was looking for.
>
>> (Of course, that new thread might stumble over one of
>> your other breakpoints or mutate memory. If you want to handle that, you'd
>> need a breakpoint on the first instruction for new threads, so you can catch
>> the thread there and keep it halted until you're ready to resume threads.)
>
> Is that necessary? With this instruction buffer technique, when Thread
> A hits Breakpoint A:
>
> 1. Suspend every thread
This procedure is not instantaneous. You effectively set a suspend count and tell it to suspend itself when it's safe to do so. This does not occur in lock-step. The threads will continue to run for some time after this until they run up to a "safe" stopping point, at which point they will actually be suspended. This may include running code in the area you are attempting to modify.
I see. Is there a technique to determine when a thread has actually
stopped executing? My testing shows that the information returned by
thread_info() cannot be used for this purpose.
thread_suspend() does not return until the target thread has been suspended, i.e. the target thread is in a state where it longer executes instructions in user space until it is resumed. See
> 2. Let user handle breakpoint
> 3. Cache Thread A's EIP
> 4. Set EIP of Thread A to detached instruction buffer filled with
> 'syscall; int 3'
This is somewhat problematic:
(1) Security features will often prevent a page from being both writeable and executable, and may prevent a page from moving to executable if it has once been writable. The only exception to this would be operations in protected mode in the kernel to change these mappings (e.g. this is how dtrace emulates instructions when it replaces function preambles for FBT - Function Boundary Tracing). You can see this sort of thing in AntiVirus software or in copy protection, etc..
Given that the poster is referring to Apple intel systems, he can use mprotect() on the heap to confer execute protection and turn off read permission (to account for any future W^X enforcement; 10.6 Intel does not enforce W^X see below). dtrace and antivirus are not relevant.
void *range = mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANON|MAP_PRIVATE, 0, 0);
void (*funcptr) (void) = NULL;
perror("mmap");
memset(range, 0xCC, 4096);
funcptr = range;
(funcptr)();
(2) Your code signature for the modified page will no longer be valid. If it gets checked after you modify it, depending on your signing flags, your program may be immediately killed.
The poster is proposing to generate code on the heap on an Apple Intel system (where there is no concept of "code signatures" for the heap), and repoint a thread's PC to the generated trampoline, not modify pages backed by code signatures on disk.
(3) A fault in the detached instruction buffer in this case would be effectively unrecoverable.
(4) The syscall instruction is not a single byte; if it's tripped non-aligned, e.g. not on a 4 byte boundary, or crossing a cache line boundary, the update of an in progress instruction stream can be undefined. Because it's a single byte (0xCC), this is typically why people use INT3 for this type of thing in the first place, rather than trying to replace other instructions. Modifications are typically followed by a flush of the instruction cache, in case it's not write-back, since otherwise your changes may not be visible. This is typically why self-modifying code is strongly frowned upon these days.
The poster is referring to Apple Intel systems. The instruction cache is coherent with the data cache on intel. No flush is necessary. There is also no lack of definition, see below.
(5) Other threads in the same process can be running through the same code path at the same time, and if they happen to be on the same core as an SMT thread, depending on the cache and instruction scheduling microarchitecture, you could be running through the instruction stream being modified at the time of the modification, resulting in undefined behaviour (see #3, #4).
The behaviour is not undefined. See sections 8.1.3 "self modifying code" and 11.6 of the intel software developer's manual. In practice, the serialization requirements are not as strict as defined in the manual, and are not followed to the letter by GDB.
> 5. Resume every thread
Also non-instantaneous. There will be a latency, potentially a substantial one, and the order in which the threads will be run afterward is indefinite.
> 6. When Thread A hits the INT3 from step #4, set Thread A's EIP to the
> instruction after the cached EIP (from step #3) and resume
You won't know why the INT3 got hit, and you won't be able to store per-thread state with the thread structure as a trigger indicator in any case. So for example, if someone is dtrace'ing at the same time (e.g. running "Instruments" as you are doing your thing, or your code is intended ), you are going to have a conflict. The way dtrace avoids this is that in the trap handler (by which I mean the protected mode kernel code used for handling hardware and software exceptions), it has a list of locations where it has placed its INT3's, and will look in the table to see if it sees one. If it doesn't, it will pass it through. However, there is nothing to prevent your code from mistaking a dtrace INT3 for one of its own, and there's nothing to prevent either of you from instrumenting over top of each other.
The only way to cooperate would be to hook where dtrace hooks, which is in the kernel trap handler, and since there's no way to do that programmatically either, you'd have to compile your own kernel to do it.
dtrace will refuse to instrument locations already populated by INT3s, see fasttrap_tracepoint_init() and
Presumably Dave Keck's debugger can also atomically check and decline to break on pre-existing INT3s, or break on the prior instruction and instruction single step etc.
> Since Thread B won't be spawned until step #5 -- after the user has
> handled the breakpoint and chosen to continue -- I don't see Thread
> B's actions as being an issue.
This is either an invalid assumption, or an assumption based on implementation details for your program which may not hold true for all programs.
-- Terry
>
> Thanks,
>
> David
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Darwin-dev mailing list (
email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to
email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (
email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to
email@hidden