Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DNSServiceResolve() fails with -65537 under Solaris 10



Some additional comments...

1. The problem is reproducible under Solaris10/SPARC
2. The problem takes places not only at DNSServiceResolve() -- once we reproduced that error with DNSServiceQueryRecord().
3. If I call the failed function (e.g. DNSServiceResolve()) right after its return once again, it works. So, I enclosed all calls to libraries functions with loops with few iterations, calling the library function again if it fails, and.. the problem has "gone" -- I mean, even though sometimes a call fails, but due to next successful attempts it is no longer noticeable from the application level.
4. I have to make the following change to mDNSResponder sources:


Index: src/mDNSShared/dnssd_clientstub.c
===================================================================
--- src/mDNSShared/dnssd_clientstub.c (revision 8036)
+++ src/mDNSShared/dnssd_clientstub.c (working copy)
@@ -464,7 +464,10 @@
//syslog(LOG_WARNING, "deliver_request writing %ld bytes\n", datalen + sizeof(ipc_msg_hdr));
//syslog(LOG_WARNING, "deliver_request name is %s\n", (char *)msg + sizeof(ipc_msg_hdr));
if (write_all(sdr->sockfd, msg, datalen + sizeof(ipc_msg_hdr)) < 0)
+ {
+ err = kDNSServiceErr_Unknown;
goto cleanup;
+ }
free(msg);
msg = NULL;


In my opinion, it is a bug that deliver_request() doesn't return error code if it can't write to socket. In case of SIGPIE, application can disable that signal or ignore it, thus, allowing the guilty code (that raised SIGPIPE) to continue. Without the change above, deliver_request() returns with a successful return code.
I would also consider different types of errors which can take place because of failed writing to socket, and assign different error codes, not just kDNSServiceErr_Unknown.


5. It is still not clear who closes the socket, so, the reason why SIGPIPE takes place is still unknown.

----- Original Message ----- From: "Igor Seleznev" <email@hidden>
To: <email@hidden>
Sent: Friday, January 19, 2007 1:54 PM
Subject: DNSServiceResolve() fails with -65537 under Solaris 10



Hi,

We are using mDNSResponder v107.6 quite extensively for half a year already, under Red Hat Linux/AMD and Solaris 10/AMD, both mdnsd and libdns_sd.so 64-bit.

And we experience a floating problem, under Solaris10 only: sSometimes, DNSServiceResolve() fails with -65537 error code due to SIGPIPE happend within. SIGPIPE is caught at this place:

fffffd7ffe15e34a _lwp_kill () + a
fffffd7ffe1075b9 raise () + 19
fffffd7ffe0ea5a0 abort () + 90
fffffd7fff319682 void ACE_OS::abort() () + 12
fffffd7fff315745 void tst::SyncClient::signal_handler(int) () + 15
fffffd7ffe15b276 __sighndlr () + 6
fffffd7ffe150642 call_user_handler () + 252
fffffd7ffe150828 sigacthandler (d, 0, fffffd7ffd9fb080) + a8
--- called from signal handler with signal 13 (SIGPIPE) --- fffffd7ffe15d59a _so_send () + a
fffffd7ffe6e2bf3 write_all () + 2f
fffffd7ffe6e30c3 deliver_request () + 158
fffffd7ffe6e3614 DNSServiceResolve () + 14a
fffffd7ffe76e8b3 int tbricks::bonjour::ServiceResolver::start(tbricks::bonjour::ServiceDiscovery*,ACE_Reactor*,unsigned,const char*,const char*,const char*) () + 463
fffffd7ffe75d199 void tbricks::bonjour::ServiceBrowser::browse(unsigned,unsigned,int,const char*,const char*,const char*) () + c29
fffffd7ffe75ed7c void tbricks::bonjour::ServiceBrowser::browse_callback(_DNSServiceRef_t*,unsigned,unsigned,int,const char*,const char*,const char*,void*) () + 6c
fffffd7ffe6e3ac4 handle_browse_response () + 115
fffffd7ffe6e3308 DNSServiceProcessResult () + db
fffffd7ffe754352 int tbricks::bonjour::DNS_EventHandler::handle_input(int) () + 162


So, for some reason the socket got closed.

If we take into consideration the facts that:
1) it works in 99.9% of cases and the problem is reproducible quite rarely
2) it never happens under Red Hat Linux
3) it takes place with DNSServiceResolve() only and always ends up with SIGPIPE and -65537 error code
4) no errors in /var/adm/messages from mdnsd
5) mdnsd is functioning with no noticeable problems
....the question is: what can cause this?


Does anyone experience such problem?
Do you know how to trace down the closer of the socket?
Perhaps we are just lucky and catch problems with DNSServiceResolve() only while it probably reproducible with other functions too -- if it happens I'll immediately let you know.


Can we say/assume that getting SIGPIPE is a "normal" situation which application should take care of, or this should never happen at all?
As far as I understand some platforms do not allow you to configure a socket (via setsockopt()) so that it doesn't throw SIGPIPE on "writing to closed socket" but returns EPIPE error instead, therefore, there seem not sense in such socket configuring at all and we should receive SIGPIPE quite often. But, in fact it doesn't happen at all, except in this case...


Any help is very appreciated!

Kind regards,
Igor

_______________________________________________ Do not post admin requests to the list. They will be ignored. Bonjour-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/bonjour-dev/email@hidden

This email sent to email@hidden
References: 
 >DNSServiceResolve() fails with -65537 under Solaris 10 (From: "Igor Seleznev" <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.