Re: getaddrinfo() starts failing with EAI_AGAIN (again)
Re: getaddrinfo() starts failing with EAI_AGAIN (again)
- Subject: Re: getaddrinfo() starts failing with EAI_AGAIN (again)
- From: Jamus Jegier <email@hidden>
- Date: Thu, 10 Jul 2008 07:36:56 -0500
On Jul 9, 2008, at 10:21 PM, Terry Lambert wrote:
On Jul 7, 2008, at 8:41 AM, Jamus Jegier wrote:
A couple months ago, there was a post by Peter Oberauer about
getaddrinfo() failing with EAI_AGAIN by processes under a specific
process tree.
I am running into the exact same problem, and was wondering if
there was any resolution to the issue.
I installed Nagios, which frequently forks processes to verify the
status of network hosts. After about 8 hours, all child processes
start failing with EAI_AGAIN.
Under the Nagios parent process, dig works, but ping doesn't. Both
DNS and Bonjour lookups begin to fail.
Ping and the Nagios helper executables work fine when started under
an unrelated process.
The only other thing I can add is that I see this on 10.5.3 on a G4
system. I'm installing 10.5.4 now, and will post if I still see
this.
Typically, getaddrinfo() fails with EAI_AGAIN when there is a
failure of malloc of the memory needed to return the linked list of
addrinfo structures. This generally happens if:
(1) you have a memory leak (either you fail to call freeaddrinfo()
on the returned memory, or have a different leak)
(2) you have fragmented your process address space (sufficiently
that it is impossible to allocate a contiguous memory chunk large
enough to return the requested information_
(3) the information you are requesting would be so large that it's
impossible to return it (misconfigured DNS server, DNS cache
poisoning attack, broken DNS server software, etc.)
You should examine how much memory is in use by your process to
distinguish #1, you should use vmmap to distinguish #2, and you
should use host and similar commands and/or a packet analyzer (or
tcpdump) to distinguish #3.
Thanks for the suggestions, but I believe Peter Oberauer hit the nail
on the head - starting the process from launchd solved my issues.
Before that nagios was started by sshing into my system and sudo to
the nagios user like this:
sudo -u nagios /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/
nagios.cfg
Nagios would run this script every few minutes successfully before it
would start failing after approximately 8-10 hours. In this script,
ping would fail while dig would succeed in resolving the domain name.
#!/bin/bash
whoami >> /tmp/debug.log
date >> /tmp/debug.log
dig $1 >> /tmp/debug.log
ping -c 5 $1>> /tmp/debug.log
So I don't think it would be #2, seeing that I doubt that ping would
allocate and fragment its address space like you suggested. Also, I
don't think it's #3, since bonjour also fails with a directly
connected Airport Extreme, dig returns a good value with DNS queries,
and pings outside of the nagios process tree would succeed
I also don't think it's #1, since I would have noticed ping thrashing
as it tried to allocate 4gb in my 1.5 gb system.
I can attempt to reproduce the issue and try to gather hard data if my
observations aren't sufficient.
Jamus
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden