site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com Dear Friends, -- Terry _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... On Aug 20, 2008, at 6:26 AM, alejandro <alejandro@openstudionetworks.com> wrote: I am running very low latency real-time threads (for audio purposes) on 6 cpus and other admin threads on the other 2 cpus, including disk I/O threads. When reading audio from disk, the real-time threads get stucked at random times. I believe that this interaction is because the disk read command launches a lower-half kernel thread that can run any of the 8 cpus, and the memory movement from disk to RAM invalidates their caches and breaks their real-time behavior. Is there any solution? For example, prohibiting the kernel threads to run on particular cpus or bound them to a cpu? Basic queueing theory would suggest that you buffer more data so your pool retention time exceeds any possible latency. The problem then goes away on its own, since then there is zero latency between when you ask for data and when it comes available to you. In general, Mac OS X does not run interrupts in IOAPIC virtual wire mode, so no matter what you do, your I/O completion interrupts from your disk I/O are going to come in to the BSP rather than one of the APs. The more APs you have, the more likely you will be taking the low level completion interrupt on the BSP (the initial boot CPU) and doing the upper level processing of the interrupt on an AP (one of the other CPUs). It turns out that this will typically have no effect on performance at all, since the data will have been DMAed directly into main memory, not CPU cache, and the only thing that will be "hot" in the cache on the BSP will be the low level interrupt routine data, which is typically limited to device data space (I/O space is/should be uncached for obvious reasons), and the upper level device driver will/ should never touch that. So any latency isn't going to be related to cache busting at that level, and you've got a bad assumption about the source of your problem. The most common cause of problems like this are trying to feed a RT thread data from a non-RT or lower priority thread, with the higher priority thread causing the lower priority one to starve. Increasing the pool size and reducing the priority to reasonable (or at least non- starvation) levels works there, too. Another common cause is thinking your code is the only thing running, and thinking because you have 8 CPUs you should have 8 worker threads. Besides the interrupt processing and the upper level device driver, the system also has to think about your UI, etc., so sometimes reducing worker threads actually makes performance better. These are just possibilities, of course; if in doubt, use Shark to see what's actually happening insttead of what you think should be happening. PS: JIT (Just In Time) is great for inventory control and for interpreted languages, not so good for RT tasks on a deadline. Best bet is to line up all your ducks before any possible deadline. This email sent to site_archiver@lists.apple.com