Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why does performance crap out when lookupd...



On 3/27/02 4:30 PM, "David A. Gatwood" <email@hidden> wrote:

> On Wed, 27 Mar 2002, Josh Graessley wrote:
>
>> The DNS on Mac OS X is implemented in lookupd. Lookupd uses a thread and
>> walks through various agents to resolve a query. The DNS agent has a socket.
>> The code just blocks on the socket waiting for a response. This ties up both
>> the socket and the thread so neither can be used for another query. DNS
>> servers aren't guaranteed to respond to all queries, some lookups may fail
>> silently. This can tie up a thread and socket for a very long while it's
>> sits waiting for the lookup to timeout. Since there is a very small number
>> of threads, it may be possible to tie them all up waiting for DNS queries to
>> timeout.
>
> There are several different issues that make DNS a bit painful at
> times. I'd like to try to decompose this into the basic issues so we can
> tackle them individually.
>
> 1. semi-local DNS server failure
> Results in all lookups timing out, then goes to a second server,
> etc.

This isn't generally a large issue. If multiple DNS servers are configured,
the query is sent to the first one, after a short timeout, the query is sent
to the next one and so on. Unlike the DNR on Mac OS 9 that would try to send
the first few queries to the same server until it was obvious that server
would not respond, Mac OS X sends the queries to different servers
increasing the timeout each time it's looped through all of the servers.

> 2. remote DNS server failure
> Results in a single domain's lookup timing out.

This doesn't have to be a remote DNS server failure. It is possible to form
queries that DNS servers will not respond to, even though they are working
perfectly.

> 3. differential delay
> Results in fast lookups waiting behind slower ones.
>
> The asynchronous lookups help #1 somewhat, although if there are still a
> fixed number of threads doing the lookup, then you still have the problem
> of them all being tied up; you just no longer have the socket open. Of
> course, if you force all the slow lookups into a subset of the total
> thread pool, it will help for a lot of cases. It then resembles the
> banker solution, albeit a cleaner implementation thereof.

The asynchronous lookups will solve the problem if the architecture in
lookupd is changed to follow suit. As long as every query is handled by a
single thread and that one thread calls through to each of the agents,
blocking when necessary until the query has been answered, the maximum
number of simultaneous queries is limited to the maximum number of threads.
If the limit is set to 64 threads, all you need is 64 queries that block for
a significant quantity of in any of the agents (DNS or otherwise) to
effectively lock up lookupd until some query times out. Bumping the number
of threads to 128 doesn't do much except increase the maximum number of
outstanding queries.

If lookupd is changed to use threads and queues more intelligently, all of
this trouble could be avoided. When a query is pulled off of the queue of
waiting queries on the mach port, instead of being paired up with a thread
that will walk it through the process to the end, the query could be put on
a queue for the agent that is responsible for looking it up. That agent
could use one or more threads to manage the resolution. In the case of DNS,
the lookups could be done with one thread that can handle any number of
outstanding queries. Other agents could be architected in a similar manner,
eliminating the limit of simultaneous queries to the number of threads
lookupd is permitted to allocate. It does make the programming a little
trickier, but the benefits are immense.

> The banker solution, however, fails to account for failure of the first
> server in the list, since all requests would suddenly become "slow"
> requests unless appropriate accounting measures are taken.

This is already handled by the DNS code.

> From observation, most DNS lookups taking over about a second are going to
> time out. Not always, but more often than not. Of course, this depends
> on network speed somewhat. Regardless, it shouldn't be hard to tune this
> in a semi-reasonable way.

The fine tuning of timeouts is another subject altogether.

> So here's a radical proposal, based on a combination of the suggestions
> given:
>
> 1. lookupd keeps a pool of thread for doiing DNS queries, plus one
> "questionable query" thread.

Lookupd needs to move away from using a pool of threads to do its work. At
some point, the pool can become exhausted. When the pool is exhausted, the
queries stack up waiting for a free thread.

> 2. Upon receiving a request, lookupd starts a DNS request using one of
> the pool threads. It only uses the nameserver designated as "primary
> local". This request has about a 1-2 second timeout. If it completes,
> the result is returned immediately. If it times out, it sends back a
> message that says it will return the data out-of-band. The caller closes
> the connection and waits for a callback.

Lookupd does much more than just DNS. I believe lookupd pairs a thread with
a query and that thread calls various agents until the query is resolved.

If you get the timeout wrong on the first thread, you'll make things worse
by piling up all queries on the last "questionable quries" thread.

> 3. At this point, lookupd kicks the "questionable query" out to the QQ
> thread, where they are looked up in a FCFS fashion with a longer timeout.

Why? This could easily be coded to send out DNS query packets as soon as a
query is received on a queue. If you do things first come first served,
you've synchronized every query and queries that will time out take much
longer to do so. If you have 10 queries all of which will eventually
timeout, the 10th queries doesn't even get started until the 9th one has
timed out. Instead of taking the 6 seconds or whatever that it should have
taken to timeout the 10th query, it takes 60 seconds.

Serializing these queries leads to huge performance problems, especially if
an application is using synchronous blocking APIs to perform the lookup. At
least with the current implementation a query gets scheduled right away
unless there are already 64 queries in progress.

-josh
_______________________________________________
darwin-development mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/darwin-development
Do not post admin requests to the list. They will be ignored.

References: 
 >Re: Why does performance crap out when lookupd... (From: "David A. Gatwood" <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.