• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: [syslogd] Performance issue?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [syslogd] Performance issue?


  • Subject: Re: [syslogd] Performance issue?
  • From: Marc Majka <email@hidden>
  • Date: Wed, 22 Oct 2008 09:43:23 -0700

Yes, syslogd has some known performance problems in 10.5.5 and earlier versions. We are addressing them in 10.5.6.

syslogd is probably getting overwhelmed with messages. It tries to keep the back-end data store (/var/log/asl.db) from growing too large. It monitors the size of the file, and when it grows too large, it deletes messages (oldest first) to shrink the file. This works OK when the rate of messages being received is low, but when things get busy, it hits a couple of problems: The search and deletion are slow, and the task of compressing out "holes" (i.e. fragmentation) are slow. While syslogd is slowly searching, deleting, and compressing (and using lots of CPU in the process), lots of new messages arrive and are buffered in memory. As soon as the search/delete/compress phase is over, syslogd goes into overdrive saving all the new messages keeping the the CPU load high. If it's *really* getting hammered, it doesn't take long to max out the file size, and it all happens again.

To fix this, we streamlined the data store code, and let syslogd focus on the task of just saving messages to file. Messages are saved to multiple files (in /var/log/asl). syslogd limits the size of each file - new ones are opened when the previous one gets too large. A helper process (aslmanager) launched as required by launchd is responsible for managing the size of the whole data store. aslmanager will delete messages and optionally save them to a long-term archive. The new data architecture eliminates most of the cost of deletion and fragmentation / compression.

So what can you do other than eagerly await 10.5.6?

- Try to eliminate processes that flood syslogd with messages that you really don't need. You may be able to use client-side filtering using setlogmask() in the code, or using "syslog -c" to adjust the log mask of a process that's logging a lot of messages.

- Adjust the filter that syslogd uses to determine which messages it saves in the data store. You can do this using "sudo syslog -c syslogd -X", where X is a cutoff log level. By default, syslogd saves all messages with priority levels in the range 0 (Emergency) to 5 (Notice). To save just messages in the range 0 to 2 (Critical) you would use "-2". Using "syslog -c" to make this adjustment only lasts until a restart. To make the change take effect on every reboot, edit the syslogd plist file: /System/Library/LaunchDaemons/ com.apple.syslogd.plist, and add
<string>-c</string>
<string>2</string>
(or whatever cutoff level you choose) to the ProgramArguments.


- Increase the size of the asl data store, so that it doesn't need to run the search/delete/compress operation as frequently. This can be done by setting the "-db_max" command line option in the plist (as above). See the syslogd man page for details.

--
Marc Majka

On 22 Oct, 2008, at 03:28, Stéphane Sudre wrote:

Is the following case a performance issue with syslogd on Mac OS X 10.5.5?

Here's the scenario:

I'm dumping a byte buffer (which weights 4.7 MB approximately) into stdout using printf calls. The output looks like the one of hexdump - C. There's a '\n' for each new line. This may be done multiple times.

After doing this, I quit the process calling printf.

syslogd then eats 100% of CPU and does not seem to want to stop but will after quite a while (more than 5 minutes).

In the meantime Console.app is not responding and if it's launched, it does not refresh the Console Messages contents.

Time to create new shell window in Terminal.app is bigger than usual. It takes 5-6 seconds.

Logs may get lost .

Real Memory (in Activity Monitor) can goes up as much as 860 MB.

The size of /var/log/asl.db may change or not while syslogd is using 100% CPU.

If I run a sample on syslogd, it shows that it's either asl_store_save or asl_store_compact that are taking 99.x% of CPU Time.

----------------------------------------------------------------------------------

Sampling process 928 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling syslogd (pid 928) every 1 millisecond
Call graph:
2465 Thread_2503
2465 start
2465 main
2465 klog_in_close
2465 db_archive
2465 asl_store_truncate
2123 asl_store_open
2108 asl_store_compact
1622 asl_store_compact
1576 asl_store_compact
27 fread
27 __srefill
26 read
26 read
1 __srefill
16 fseek
15 _fseeko
8 __sflush
8 write$NOCANCEL$UNIX2003
8 write$NOCANCEL$UNIX2003
6 lseek
6 lseek
1 __bzero
1 __bzero
1 flockfile
1 __spin_lock
1 __spin_lock
3 fwrite$UNIX2003
3 __sfvwrite
3 __sfvwrite
467 asl_store_compact
13 fseek
11 _fseeko
6 __sflush
6 write$NOCANCEL$UNIX2003
6 write$NOCANCEL$UNIX2003
4 lseek
4 lseek
1 _sseek
1 _sseek
1 flockfile
1 __spin_lock
1 __spin_lock
1 funlockfile
1 funlockfile
6 fread
6 __srefill
6 read
6 read
15 asl_store_open
15 asl_store_open
8 fread
7 __srefill
6 read
6 read
1 _sread
1 _sread
1 __memcpy
1 __memcpy
6 fseek
6 _fseeko
4 __sflush
4 write$NOCANCEL$UNIX2003
4 write$NOCANCEL$UNIX2003
1 __bzero
1 __bzero
1 _fseeko
1 asl_store_open
342 asl_store_truncate
2465 Thread_2603
2465 thread_start
2465 _pthread_start
2465 database_server
2465 asl_ipc_server
2465 asl_ipc_server_routine
2465 __asl_server_query
2465 db_query
2465 pthread_mutex_lock
2465 semaphore_wait_signal_trap
2465 semaphore_wait_signal_trap
2465 Thread_2703
2465 thread_start
2465 _pthread_start
2465 db_worker
2465 pthread_mutex_lock
2465 semaphore_wait_signal_trap
2465 semaphore_wait_signal_trap
2465 Thread_2803
2465 thread_start
2465 _pthread_start
2465 launchd_drain
2465 _vprocmgr_log_drain
2465 vproc_mig_log_drain
2465 mach_msg
2465 mach_msg_trap
2465 mach_msg_trap


Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
       semaphore_wait_signal_trap        4930
       mach_msg_trap        2465
       asl_store_compact        2043
       asl_store_truncate        342
       read        38
       write$NOCANCEL$UNIX2003        18
       lseek        10
Sample analysis of process 928 written to file /dev/stdout

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden

_______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
References: 
 >[syslogd] Performance issue? (From: Stéphane Sudre <email@hidden>)

  • Prev by Date: Re: rename and POSIX permissions
  • Next by Date: Re: rename and POSIX permissions
  • Previous by thread: [syslogd] Performance issue?
  • Next by thread: rename and POSIX permissions
  • Index(es):
    • Date
    • Thread