Network stack/ethernet driver issues
site_archiver@lists.apple.com Delivered-To: darwin-kernel@lists.apple.com Importance: Normal User-agent: SquirrelMail/1.4.9a-3.berkeley Greetings, Here's my situation: I have a custom piece of data logging hardware that generates about 50MB/sec of data and sends it out over a gigabit ethernet link as a steady stream of UDP packets (1k packet every ~20 microseconds, very periodic, not bursty at all) broadcasted to the local subnet. I want to log all this data to disk for offline analysis. As a first step I wrote a program to just listen for the UDP packets and verify that they were all being received. This is easy to do because each packet has a 16 bit sequence ID in it. Much to my dismay, I was unable to reliably achieve this. I am running 10.5.2 on a 2.33 gHz MacBook Pro with 2G of memory. After making SO_RCVBUF big enough, I was still occasionally missing a block of 100+ packets at a time. Looking in the system.log file I realized that each dropout was correlated to a message like: Apr 9 13:38:51 Macintosh-6 kernel[0]: AppleYukon2: 00000000,00000000 skgehw - cppSkDrvEvent - SK_DRV_RX_OVERFLOW: rcv fifo overflow Apr 9 13:38:51 Macintosh-6 kernel[0]: AppleYukon2: 00000000,00000000 sky2 - RX ring overflow -- dropped a packet and occasionally Apr 9 13:36:13 Macintosh-6 kernel[0]: AppleYukon2: 00000000,00000000 sky2 - RX ring overflow -- dropped a packet Apr 9 13:36:13 Macintosh-6 kernel[0]: AppleYukon2: 00000000,00000001 sk98osx sky2 - - sk98osx_sky2::replaceOrCopyPacket tried N times something else I should mention - just connecting my data logging hardware to the MacBook Pro causes a massive spike in CPU use (even when there's no program listening for the packets) with no listener running top reports 0-1% user, 49% system, 50% idle with the listener program running: 4% user, 85% system, 11% idle i decided to run Shark to see where the CPU time was going. here's the Time Profile of Everything for the first case (no listener) 55.4% 55.4% mach_kernel machine_idle 13.1% 13.1% mach_kernel ml_set_interrupts_enabled 7.9% 7.9% com.apple.iokit.AppleYukon2 yukon2osx::GetInterruptSource(unsigned long*) 3.4% 3.4% com.apple.iokit.AppleYukon2 yukon2osx::SkY2Isr(IOInterruptEventSource*, unsigned) 2.1% 2.1% mach_kernel lck_mtx_lock 1.9% 1.9% mach_kernel lck_mtx_unlock 1.9% 1.9% mach_kernel ifnet_input 1.9% 1.9% com.apple.iokit.AppleYukon2 yukon2osx::PrcocessInterruptSource(IOInterruptEventSource*, int) 1.2% 1.2% mach_kernel udp_input 1.0% 1.0% mach_kernel OSCompareAndSwap 1.0% 1.0% mach_kernel in_pcb_checkstate 0.7% 0.7% mach_kernel _mutex_assert 0.6% 0.6% mach_kernel ip_input 0.4% 0.4% mach_kernel udp_lock 0.4% 0.4% mach_kernel OSAddAtomic 0.4% 0.4% mach_kernel wait_queue_wakeup_all 0.3% 0.3% com.apple.iokit.AppleYukon2 yukon2osx::HandleReceives(int, unsigned short, unsigned, unsigned short, unsigned short, unsigned, unsigned short) 0.3% 0.3% mach_kernel cantrace 0.3% 0.3% mach_kernel IOWorkLoop::threadMain() 0.3% 0.3% mach_kernel udp_unlock 0.2% 0.2% mach_kernel IOWorkLoop::runEventSources() 0.2% 0.2% mach_kernel wait_queue_assert_wait 0.2% 0.2% mach_kernel ether_demux 0.2% 0.2% mach_kernel IORecursiveLockLock 0.2% 0.2% mach_kernel uiomove 0.2% 0.2% mach_kernel proto_input 0.1% 0.1% mach_kernel in_broadcast and here's the output for the second case (listener running) 22.2% 22.2% mach_kernel ml_set_interrupts_enabled 21.2% 21.2% mach_kernel machine_idle 4.7% 4.7% com.apple.iokit.AppleYukon2 yukon2osx::GetInterruptSource(unsigned long*) 4.6% 4.6% com.apple.iokit.AppleYukon2 yukon2osx::PrcocessInterruptSource(IOInterruptEventSource*, int) 4.6% 4.6% com.apple.iokit.AppleYukon2 yukon2osx::SkY2Isr(IOInterruptEventSource*, unsigned) 3.1% 3.1% mach_kernel lck_mtx_lock 2.6% 2.6% mach_kernel lck_mtx_unlock 1.7% 1.7% mach_kernel OSAddAtomic 1.4% 1.4% mach_kernel ifnet_input 1.3% 1.3% mach_kernel copyout_kern 1.2% 1.2% mach_kernel soreceive 1.1% 1.1% com.apple.iokit.AppleYukon2 yukon2osx::HandleReceives(int, unsigned short, unsigned, unsigned short, unsigned short, unsigned, unsigned short) 1.1% 1.1% mach_kernel _rtc _nanotime_read 1.0% 1.0% libSystem.B.dylib recvfrom$NOCANCEL$UNIX2003 0.9% 0.9% mach_kernel lo_unix_scall 0.8% 0.8% mach_kernel udp_input 0.8% 0.8% mach_kernel _mutex_assert 0.8% 0.8% mach_kernel ip_input 0.6% 0.6% mach_kernel wait_queue_assert_wait 0.6% 0.6% mach_kernel nval_copy_windows 0.5% 0.5% mach_kernel wait_queue_wakeup_all 0.5% 0.5% mach_kernel m_mclget 0.5% 0.5% mach_kernel uiomove 0.5% 0.5% mach_kernel sbappendaddr 0.5% 0.5% mach_kernel lck_rw_lock_shared 0.5% 0.5% mach_kernel cantrace 0.4% 0.4% mach_kernel udp_lock 0.4% 0.4% com.apple.iokit.AppleYukon2 yukon2osx::HandleStatusLEs() 0.4% 0.4% mach_kernel lck_mtx_lock_spinwait 0.4% 0.4% com.apple.iokit.AppleYukon2 yukon2osx::GiveRxBufferToHw(unsigned, int, s_packet*) 0.4% 0.4% mach_kernel blkclr 0.4% 0.4% mach_kernel socketpair 0.4% 0.4% mach_kernel udp_unlock 0.4% 0.4% mach_kernel zalloc_canblock 0.4% 0.4% mach_kernel IOWorkLoop::runEventSources() 0.4% 0.4% mach_kernel uiomove64 0.3% 0.3% mach_kernel unix_syscall 0.3% 0.3% libSystem.B.dylib recvfrom Ultimately I was unable to reliably receive all the packets the device was putting over the ethernet link due to the rx ring buffer overflowing. These overflows happen a few times a minute, regardless of whether or not an application is actually listening for the packets. Since I have Linux running on the same machine, I decided to try recompiling and running the same test program under Linux 2.6.24. Under Linux my test program receives every packet reliably and doesn't stress the machine out. For comparison, here is what top reports: with device attached, no listener 0.5% user, 0% system, 84% idle, 5% hardware interrupt, 4% software interrupt with the listener program running: 1% user, 3% system, 82% idle, 5% hardware interrupt, 9% software interrupt so it seems to me that something is screwy with the macos networking stack. It wasn't a big deal for me as I was able to get my data logging software running reliably under Linux but it doesn't seem right that macos can't keep up with a 50MB/sec network load (only about 40% of the available bandwidth on a gigabit ethernet link). Is this a known issue ? Should I file a bug report ? thanks -rimas _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-kernel mailing list (Darwin-kernel@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a... This email sent to site_archiver@lists.apple.com
participants (1)
-
rimas@cnmat.berkeley.edu