Re: Deadlock in open() syscall
Re: Deadlock in open() syscall
- Subject: Re: Deadlock in open() syscall
- From: Shantonu Sen <email@hidden>
- Date: Tue, 12 Jul 2011 20:23:46 -0700
You can't rebuild. You need to use the binary that you had loaded at the time (preferably with a dSYM for line number mapping) to symbolicate. If you build the kext from scratch it's useless.
Shantonu
On Jul 12, 2011, at 4:08 PM, Anatol Pomozov wrote:
> Hi,
>
> I've got the load address from here
>
> $ kextstat | grep fuse4x
> 166 0 0x5a9a5000 0x18000 0x17000 org.fuse4x.kext.fuse4x
> (0.8.7) <7 5 4 3 1>
>
> And then run it with the kext image
>
> atos -o ~/sources/fuse4x/kext/build/Debug/fuse4x.kext/Contents/MacOS/fuse4x
> -l 0x5a9a5000 -arch i386 -f kext.txt
>
> As a result I've got this:
>
> fuse_body_audit (in fuse4x) (fuse_ipc.c:656)
> (0x5a9ab84c)
> FSNodeGetOrCreateFileVNodeByID (in fuse4x) (fuse_node.c:41)
> (0x5a9ad25e)
> fuse_internal_attr_vat2fsai (in fuse4x) (fuse_internal.c:577)
> (0x5a9a7b75)
> fuse_skip_apple_double_mp (in fuse4x) (fuse_internal.h:368)
> (0x5a9b21dd)
>
>
> But it looks like a total junk to me. These functions are independent.
> Could this mistake happen because of compiler optimizations? I built
> kext with "Debug" configuration. What compiler flags I need to set to
> make "atos" output as much informative as possible?
>
> On 7/8/11, Antoine Missout <email@hidden> wrote:
>> iirc, kextstat can give you your kext load address, and then just use the -l
>> param of atos (and correct -arch)
>> - Antoine
>>
>>
>>
>> On 2011-07-08, at 13:36, Anatol Pomozov wrote:
>>
>>> Hi,
>>>
>>> While we are here - how to symbolize KEXT addresses. Below there is a
>>> group:
>>>
>>> 0x92f11df1 (0x92f11df1)
>>> 0x92f0f0ef (0x92f0f0ef)
>>> 0x92f1b3ae (0x92f1b3ae)
>>> And I would like to get their function names. I tried 'symstacks.rb' and
>>> 'atos' and I was not able to symbolize it. I believe atos needs
>>> load-address but I do not know how to get it for my kext.
>>>
>>> Is there any documentation with examples?
>>>
>>> On Fri, Jul 1, 2011 at 8:07 PM, Shantonu Sen <email@hidden> wrote:
>>> This is a bug in your fuse kext. Two threads are serialized behind:
>>> Kernel stack:
>>> machine_switch_context (in mach_kernel) + 753 (0x2a8a6e)
>>> thread_dispatch (in mach_kernel) + 1966 (0x226fbe)
>>> thread_block_reason (in mach_kernel) + 331 (0x22725d)
>>> thread_block (in mach_kernel) + 33 (0x2272eb)
>>> lck_mtx_sleep_deadline (in mach_kernel) + 104 (0x221407)
>>> wakeup (in mach_kernel) + 286 (0x48ead0)
>>> msleep (in mach_kernel) + 157 (0x48f1f3)
>>> 0x92f11df1 (0x92f11df1)
>>> 0x92f0f0ef (0x92f0f0ef)
>>> 0x92f1b3ae (0x92f1b3ae)
>>> VNOP_ACCESS (in mach_kernel) + 137 (0x2fe251)
>>> vfs_mountroot (in mach_kernel) + 1617 (0x2e314b)
>>> kauth_authorize_action (in mach_kernel) + 80
>>> (0x46ba55)
>>> vnode_authorize (in mach_kernel) + 91
>>> (0x2dd78f)
>>> inet_ntop (in mach_kernel) + 8233
>>> (0x2c8f8d)
>>> getattrlist (in mach_kernel) + 169
>>> (0x2ca72c)
>>> unix_syscall64 (in mach_kernel) + 617
>>> (0x4f6075)
>>> lo64_unix_scall (in mach_kernel) +
>>> 77 (0x2a144d)
>>> User stack:
>>> getattrlist (in libSystem.B.dylib) + 10 (0x7fff86875fce)
>>> FSMount::_getattrs(unsigned int, char const*, unsigned int, unsigned
>>> int, FSAttributeInfo*, unsigned int, unsigned char*) (in CarbonCore) + 237
>>> (0x7fff84613ad7)
>>> FSMount::getattrs(unsigned int, char const*, unsigned int,
>>> unsigned int, FSAttributeInfo*, unsigned int, unsigned char*) (in
>>> CarbonCore) + 248 (0x7fff846139a4)
>>> GetFSRefAttributes(FSMount*, FSRefPrivate const*, unsigned int,
>>> FSAttributeInfo*, unsigned int, char*) (in CarbonCore) + 193
>>> (0x7fff8461387f)
>>> PBGetCatalogInfoSync (in CarbonCore) + 250 (0x7fff846135c5)
>>> FSGetCatalogInfo (in CarbonCore) + 40 (0x7fff846134c8)
>>> TFSInfo::Initialize(FSRef const&, unsigned int) (in
>>> DesktopServicesPriv) + 190 (0x7fff838d64ee)
>>> TFSInfo::GetVolumeInfo(short, unsigned int,
>>> FSVolumeInfo&, TCountedPtr<TFSInfo>*) (in DesktopServicesPriv) + 97
>>> (0x7fff838d6411)
>>>
>>> TFSVolumeInfo::Initialize(TCountedPtr<TVolumeSyncThread> const&, short,
>>> unsigned int, bool&) (in DesktopServicesPriv) + 643 (0x7fff83940d6b)
>>>
>>> TFSVolumeInfo::AddVolume(TCountedPtr<TVolumeSyncThread> const&, short,
>>> unsigned int, TCountedPtr<TFSVolumeInfo>&, bool&) (in DesktopServicesPriv)
>>> + 201 (0x7fff839417b7)
>>> TNode::AddVolume(TCountedPtr<TVolumeSyncThread>
>>> const&, short, unsigned int, TNodePtr&, bool) (in DesktopServicesPriv) +
>>> 206 (0x7fff83915e0e)
>>> TNode::HandleVolumeAdded(TCountedPtr<TNodeTask>
>>> const&, TNodePtr const&, TCountedPtr<TVolumeSyncThread> const&) (in
>>> DesktopServicesPriv) + 65 (0x7fff83916213)
>>>
>>> TNode::HandleNodeRequest(TCountedPtr<TNodeTask> const&,
>>> TCountedPtr<TVolumeSyncThread> const&) (in DesktopServicesPriv) + 1068
>>> (0x7fff8391cad4)
>>> __PostNodeTaskRequest_block_invoke_2 (in
>>> DesktopServicesPriv) + 98 (0x7fff838dc654)
>>> _dispatch_call_block_and_release (in
>>> libSystem.B.dylib) + 15 (0x7fff86876284)
>>> _dispatch_queue_drain (in
>>> libSystem.B.dylib) + 251 (0x7fff86854df2)
>>> _dispatch_queue_invoke (in
>>> libSystem.B.dylib) + 57 (0x7fff86854c54)
>>> _dispatch_worker_thread2 (in
>>> libSystem.B.dylib) + 252 (0x7fff868547fe)
>>> _pthread_wqthread (in
>>> libSystem.B.dylib) + 353 (0x7fff86854128)
>>> start_wqthread (in
>>> libSystem.B.dylib) + 13 (0x7fff86853fc5)
>>>
>>>
>>> This looks like the filesystem was unmounted while a callout to userspace
>>> was still in flight, and the unmount code did not initiate a wakeup(),
>>> leaving it orphaned and waiting forever.
>>>
>>> Shantonu
>>>
>>>
>>> On Jun 30, 2011, at 3:58 PM, Anatol Pomozov wrote:
>>>
>>>> Hi, Shantonu
>>>>
>>>> Thanks for your answer.
>>>>
>>>> On Thu, Jun 30, 2011 at 3:50 PM, Shantonu Sen <email@hidden> wrote:
>>>> You haven't provided the full stackshot, so it's hard to tell. Are other
>>>> threads blocked in the filesystem?
>>>>
>>>> Here is the full stackhot. https://gist.github.com/1057485 (both raw and
>>>> symbolized). I hope it does not leak any private information :)
>>>>
>>>> vfs_mountroot() is a red herring. It's the nearest global symbol to the
>>>> function being called, which is most likely vnode_authorize_opaque()
>>>>
>>>> Yes, it is almost certainly related to using fuse and mounting/unmounting
>>>> filesystems.
>>>>
>>>> I can also provide instructions how to setup tup tests and reproduce the
>>>> issue locally.
>>>>
>>>>
>>>> On Jun 30, 2011, at 3:38 PM, Anatol Pomozov wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I work on a build tool called tup, and most of the time it works fine.
>>>>> But unfortunately when I run tests it deadlocks on one of the open()
>>>>> syscalls. The probability that a test deadlocks is ~1%, but it always
>>>>> happens in different tests. I think deadlock happens here
>>>>> https://github.com/gittup/tup/blob/master/src/compat/dir_mutex.c#L23
>>>>>
>>>>> Looking at the stacktrace that I got from stackshot I see that open()
>>>>> syscall calls vfs_mountroot (??) that tried to acquire mutex on a some
>>>>> node. It looks like the mutex is already hold by someone else. An
>>>>> interesting thing is that I see the deadlock in tests only, and I never
>>>>> saw it in the production. Maybe it happens because tests create a lot of
>>>>> short-living fuse filesystems (and call a bunch of mount/umount)?
>>>>>
>>>>> I do not know whether this is an issue in the kernel. It can be also
>>>>> related to fuse4x kernel extension that my application uses
>>>>> http://github.com/fuse4x/kext
>>>>>
>>>>> Are there any pointers that help me to understand what is going on here?
>>>>>
>>>>> This is a stacktrace for 32 bits kernel (10.6.7) and 64 bits app:
>>>>>
>>>>> PID: 40062
>>>>> Process: tup
>>>>> Kernel stack:
>>>>> machine_switch_context (in mach_kernel) + 753 (0x2a8a6e)
>>>>> thread_dispatch (in mach_kernel) + 1966 (0x226fbe)
>>>>> thread_block_reason (in mach_kernel) + 331 (0x22725d)
>>>>> thread_block (in mach_kernel) + 33 (0x2272eb)
>>>>> lck_mtx_lock_wait_x86 (in mach_kernel) + 330 (0x29f85a)
>>>>> lck_mtx_lock (in mach_kernel) + 504 (0x2995d8)
>>>>> lock_fsnode (in mach_kernel) + 97 (0x2fa9bd)
>>>>> VNOP_ACCESS (in mach_kernel) + 83 (0x2fe21b)
>>>>> vfs_mountroot (in mach_kernel) + 1617 (0x2e314b)
>>>>> kauth_authorize_action (in mach_kernel) + 80
>>>>> (0x46ba55)
>>>>> vnode_authorize (in mach_kernel) + 91 (0x2dd78f)
>>>>> vn_open_auth (in mach_kernel) + 1066
>>>>> (0x2f400f)
>>>>> link (in mach_kernel) + 1592 (0x2ed0d3)
>>>>> open_nocancel (in mach_kernel) + 243
>>>>> (0x2ed581)
>>>>> unix_syscall64 (in mach_kernel) + 617
>>>>> (0x4f6075)
>>>>> lo64_unix_scall (in mach_kernel) + 77
>>>>> (0x2a144d)
>>>>> User stack:
>>>>> open (in libSystem.B.dylib) + 10 (0x7fff8684495e)
>>>>> Continuation: kevent (in mach_kernel) + 97 (0x4783e9)
>>>>> User stack:
>>>>> kevent (in libSystem.B.dylib) + 10 (0x7fff8685312a)
>>>>> _dispatch_queue_invoke (in libSystem.B.dylib) + 185
>>>>> (0x7fff86854cd4)
>>>>> _dispatch_worker_thread2 (in libSystem.B.dylib) + 252
>>>>> (0x7fff868547fe)
>>>>> _pthread_wqthread (in libSystem.B.dylib) + 353
>>>>> (0x7fff86854128)
>>>>> start_wqthread (in libSystem.B.dylib) + 13 (0x7fff86853fc5)
>>>>> Continuation: semaphore_wait_continue (in mach_kernel) + 0
>>>>> (0x22a3e1)
>>>>> User stack:
>>>>> semaphore_wait_signal_trap (in libSystem.B.dylib) + 10
>>>>> (0x7fff8683a2e2)
>>>>> dir_mutex_lock (in tup) + 17 (0x10005f570)
>>>>> _______________________________________________
>>>>> Do not post admin requests to the list. They will be ignored.
>>>>> Darwin-kernel mailing list (email@hidden)
>>>>> Help/Unsubscribe/Update your Subscription:
>>>>>
>>>>> This email sent to email@hidden
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Do not post admin requests to the list. They will be ignored.
>>> Darwin-kernel mailing list (email@hidden)
>>> Help/Unsubscribe/Update your Subscription:
>>>
>>> This email sent to email@hidden
>>
>>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Darwin-kernel mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden