Re: 64-bit problem with VM statistics
site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com William Kucharski writes:
On Jun 5, 2005, at 9:26 AM, Andrew Gallatin wrote:
This is not really true. Many PCI devices these days are capable of addressing 64-bits of physical memory (eg, "DAC"). The DART enables address limited devices to address memory anywhere, at the price of artificially crippling DAC capable devices because of an oversight on IBM or Apple's part.
Though due to the PCI bus' 32-bit address width, DAC needs two cycles to specify the 64-bit address to be referenced, so depending upon locality it's not necessarily any faster for a DAC device to address memory than it is to do it via the DART or an IOMMU. One can easily craft scenarios in which each method of access is faster.
The time for an address cycle is of no consequence. I can show you DMA benchmarks comparing alpha, ia64 and opteron (all DAC) vs G5 (DART) if you don't beleive me.
This is important to OS-bypass HPC network devices because it allows a large amount of memory to be pinned and available for network communication. Whereas on G5s (and IBM js20s), only 2GB of memory can be pinned at any one time. Allowing for the needs of other devices, this realistically means that only 1.5GB can be pinned at any one time. This is one of the handicaps the G5 platform faces in the HPC cluster market.
I wouldn't consider the G5's performance in HPC clusters such as Virginia Tech's to exactly be "handicapped." Every system has trade-offs between theoretical
Take a look at the efficiency (peak vs max), and you see the handicap. DAC based machines can keep nearly all of RAM pinned, thereby avoiding any interaction with the host OS kernel during network communication. Eg, all the CPU cycles are spent on the application. However, IOMMU based machines either have to copy to/from a pinned region, or they keep pinning/unpinning memory when they need to transfer arrays whose size exceeds the amount of pinnable memory on the machine. Either way, CPU cycles are wasted either on copying, or pinning/unpinning, either of which increase both latency and overhead. This reduces the efficiency of the cluster. Now, with all my whining and complaining aside, the IOMMU code in tiger has gotten *much* better. With tiger, pinning memory is nearly as fast as it is on a DAC machine. However, the address space is still limited. There's nothing they can do about that. Drew _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... This email sent to site_archiver@lists.apple.com
participants (1)
-
Andrew Gallatin