When All Thread States is not enabled, it stops each CPU core every so often and, if it's running a thread from a process that's within the target scope, it records the stack trace of that thread.
I forgot to emphasise that while Sampler stops the entire process at once, so you kind of get an atomic snapshot (though not really, since for its purposes threads can only "park" at safe intervals, so you'll get a lot of barrier effects), Time Profiler deals with only one thread at a time, per core, at most. This means that while it tries to record samples from each core simultaneously, for efficiency it doesn't try too hard*. So don't ever assume that two events from separate threads or cores represent a coherent or atomic snapshot, regardless of whether they appear to have similar timestamps or not.
System Trace, on the other hand, can be somewhat relied upon for this, as its margin of error is dependent only on the error in the TSC synchronisation between cores, which is usually only a few nanoseconds (rarely worse than 100ns, in my experience).
[[ FWIW having the option of truly concurrent snapshots across cores was considered - people sometimes freak out when they get told that's not how it works - but while the implementation is reasonably simple and it sounds good on the surface, when you get down to it it's either not necessary or Time Profiler's not the right tool for the given use case. ]]
* = It also means that Time Profiler (All Thread States) is ridiculously lower overhead than the equivalent functionality in Sampler, even more so than the "normal" running-threads-only mode. |