Re: Deadlock in open() syscall
Re: Deadlock in open() syscall
- Subject: Re: Deadlock in open() syscall
- From: Anatol Pomozov <email@hidden>
- Date: Tue, 12 Jul 2011 16:08:07 -0700
Hi,
I've got the load address from here
$ kextstat | grep fuse4x
166 0 0x5a9a5000 0x18000 0x17000 org.fuse4x.kext.fuse4x
(0.8.7) <7 5 4 3 1>
And then run it with the kext image
atos -o ~/sources/fuse4x/kext/build/Debug/fuse4x.kext/Contents/MacOS/fuse4x
-l 0x5a9a5000 -arch i386 -f kext.txt
As a result I've got this:
fuse_body_audit (in fuse4x) (fuse_ipc.c:656)
(0x5a9ab84c)
FSNodeGetOrCreateFileVNodeByID (in fuse4x) (fuse_node.c:41)
(0x5a9ad25e)
fuse_internal_attr_vat2fsai (in fuse4x) (fuse_internal.c:577)
(0x5a9a7b75)
fuse_skip_apple_double_mp (in fuse4x) (fuse_internal.h:368)
(0x5a9b21dd)
But it looks like a total junk to me. These functions are independent.
Could this mistake happen because of compiler optimizations? I built
kext with "Debug" configuration. What compiler flags I need to set to
make "atos" output as much informative as possible?
On 7/8/11, Antoine Missout <email@hidden> wrote:
> iirc, kextstat can give you your kext load address, and then just use the -l
> param of atos (and correct -arch)
> - Antoine
>
>
>
> On 2011-07-08, at 13:36, Anatol Pomozov wrote:
>
>> Hi,
>>
>> While we are here - how to symbolize KEXT addresses. Below there is a
>> group:
>>
>> 0x92f11df1 (0x92f11df1)
>> 0x92f0f0ef (0x92f0f0ef)
>> 0x92f1b3ae (0x92f1b3ae)
>> And I would like to get their function names. I tried 'symstacks.rb' and
>> 'atos' and I was not able to symbolize it. I believe atos needs
>> load-address but I do not know how to get it for my kext.
>>
>> Is there any documentation with examples?
>>
>> On Fri, Jul 1, 2011 at 8:07 PM, Shantonu Sen <email@hidden> wrote:
>> This is a bug in your fuse kext. Two threads are serialized behind:
>> Kernel stack:
>> machine_switch_context (in mach_kernel) + 753 (0x2a8a6e)
>> thread_dispatch (in mach_kernel) + 1966 (0x226fbe)
>> thread_block_reason (in mach_kernel) + 331 (0x22725d)
>> thread_block (in mach_kernel) + 33 (0x2272eb)
>> lck_mtx_sleep_deadline (in mach_kernel) + 104 (0x221407)
>> wakeup (in mach_kernel) + 286 (0x48ead0)
>> msleep (in mach_kernel) + 157 (0x48f1f3)
>> 0x92f11df1 (0x92f11df1)
>> 0x92f0f0ef (0x92f0f0ef)
>> 0x92f1b3ae (0x92f1b3ae)
>> VNOP_ACCESS (in mach_kernel) + 137 (0x2fe251)
>> vfs_mountroot (in mach_kernel) + 1617 (0x2e314b)
>> kauth_authorize_action (in mach_kernel) + 80
>> (0x46ba55)
>> vnode_authorize (in mach_kernel) + 91
>> (0x2dd78f)
>> inet_ntop (in mach_kernel) + 8233
>> (0x2c8f8d)
>> getattrlist (in mach_kernel) + 169
>> (0x2ca72c)
>> unix_syscall64 (in mach_kernel) + 617
>> (0x4f6075)
>> lo64_unix_scall (in mach_kernel) +
>> 77 (0x2a144d)
>> User stack:
>> getattrlist (in libSystem.B.dylib) + 10 (0x7fff86875fce)
>> FSMount::_getattrs(unsigned int, char const*, unsigned int, unsigned
>> int, FSAttributeInfo*, unsigned int, unsigned char*) (in CarbonCore) + 237
>> (0x7fff84613ad7)
>> FSMount::getattrs(unsigned int, char const*, unsigned int,
>> unsigned int, FSAttributeInfo*, unsigned int, unsigned char*) (in
>> CarbonCore) + 248 (0x7fff846139a4)
>> GetFSRefAttributes(FSMount*, FSRefPrivate const*, unsigned int,
>> FSAttributeInfo*, unsigned int, char*) (in CarbonCore) + 193
>> (0x7fff8461387f)
>> PBGetCatalogInfoSync (in CarbonCore) + 250 (0x7fff846135c5)
>> FSGetCatalogInfo (in CarbonCore) + 40 (0x7fff846134c8)
>> TFSInfo::Initialize(FSRef const&, unsigned int) (in
>> DesktopServicesPriv) + 190 (0x7fff838d64ee)
>> TFSInfo::GetVolumeInfo(short, unsigned int,
>> FSVolumeInfo&, TCountedPtr<TFSInfo>*) (in DesktopServicesPriv) + 97
>> (0x7fff838d6411)
>>
>> TFSVolumeInfo::Initialize(TCountedPtr<TVolumeSyncThread> const&, short,
>> unsigned int, bool&) (in DesktopServicesPriv) + 643 (0x7fff83940d6b)
>>
>> TFSVolumeInfo::AddVolume(TCountedPtr<TVolumeSyncThread> const&, short,
>> unsigned int, TCountedPtr<TFSVolumeInfo>&, bool&) (in DesktopServicesPriv)
>> + 201 (0x7fff839417b7)
>> TNode::AddVolume(TCountedPtr<TVolumeSyncThread>
>> const&, short, unsigned int, TNodePtr&, bool) (in DesktopServicesPriv) +
>> 206 (0x7fff83915e0e)
>> TNode::HandleVolumeAdded(TCountedPtr<TNodeTask>
>> const&, TNodePtr const&, TCountedPtr<TVolumeSyncThread> const&) (in
>> DesktopServicesPriv) + 65 (0x7fff83916213)
>>
>> TNode::HandleNodeRequest(TCountedPtr<TNodeTask> const&,
>> TCountedPtr<TVolumeSyncThread> const&) (in DesktopServicesPriv) + 1068
>> (0x7fff8391cad4)
>> __PostNodeTaskRequest_block_invoke_2 (in
>> DesktopServicesPriv) + 98 (0x7fff838dc654)
>> _dispatch_call_block_and_release (in
>> libSystem.B.dylib) + 15 (0x7fff86876284)
>> _dispatch_queue_drain (in
>> libSystem.B.dylib) + 251 (0x7fff86854df2)
>> _dispatch_queue_invoke (in
>> libSystem.B.dylib) + 57 (0x7fff86854c54)
>> _dispatch_worker_thread2 (in
>> libSystem.B.dylib) + 252 (0x7fff868547fe)
>> _pthread_wqthread (in
>> libSystem.B.dylib) + 353 (0x7fff86854128)
>> start_wqthread (in
>> libSystem.B.dylib) + 13 (0x7fff86853fc5)
>>
>>
>> This looks like the filesystem was unmounted while a callout to userspace
>> was still in flight, and the unmount code did not initiate a wakeup(),
>> leaving it orphaned and waiting forever.
>>
>> Shantonu
>>
>>
>> On Jun 30, 2011, at 3:58 PM, Anatol Pomozov wrote:
>>
>>> Hi, Shantonu
>>>
>>> Thanks for your answer.
>>>
>>> On Thu, Jun 30, 2011 at 3:50 PM, Shantonu Sen <email@hidden> wrote:
>>> You haven't provided the full stackshot, so it's hard to tell. Are other
>>> threads blocked in the filesystem?
>>>
>>> Here is the full stackhot. https://gist.github.com/1057485 (both raw and
>>> symbolized). I hope it does not leak any private information :)
>>>
>>> vfs_mountroot() is a red herring. It's the nearest global symbol to the
>>> function being called, which is most likely vnode_authorize_opaque()
>>>
>>> Yes, it is almost certainly related to using fuse and mounting/unmounting
>>> filesystems.
>>>
>>> I can also provide instructions how to setup tup tests and reproduce the
>>> issue locally.
>>>
>>>
>>> On Jun 30, 2011, at 3:38 PM, Anatol Pomozov wrote:
>>>
>>>> Hi,
>>>>
>>>> I work on a build tool called tup, and most of the time it works fine.
>>>> But unfortunately when I run tests it deadlocks on one of the open()
>>>> syscalls. The probability that a test deadlocks is ~1%, but it always
>>>> happens in different tests. I think deadlock happens here
>>>> https://github.com/gittup/tup/blob/master/src/compat/dir_mutex.c#L23
>>>>
>>>> Looking at the stacktrace that I got from stackshot I see that open()
>>>> syscall calls vfs_mountroot (??) that tried to acquire mutex on a some
>>>> node. It looks like the mutex is already hold by someone else. An
>>>> interesting thing is that I see the deadlock in tests only, and I never
>>>> saw it in the production. Maybe it happens because tests create a lot of
>>>> short-living fuse filesystems (and call a bunch of mount/umount)?
>>>>
>>>> I do not know whether this is an issue in the kernel. It can be also
>>>> related to fuse4x kernel extension that my application uses
>>>> http://github.com/fuse4x/kext
>>>>
>>>> Are there any pointers that help me to understand what is going on here?
>>>>
>>>> This is a stacktrace for 32 bits kernel (10.6.7) and 64 bits app:
>>>>
>>>> PID: 40062
>>>> Process: tup
>>>> Kernel stack:
>>>> machine_switch_context (in mach_kernel) + 753 (0x2a8a6e)
>>>> thread_dispatch (in mach_kernel) + 1966 (0x226fbe)
>>>> thread_block_reason (in mach_kernel) + 331 (0x22725d)
>>>> thread_block (in mach_kernel) + 33 (0x2272eb)
>>>> lck_mtx_lock_wait_x86 (in mach_kernel) + 330 (0x29f85a)
>>>> lck_mtx_lock (in mach_kernel) + 504 (0x2995d8)
>>>> lock_fsnode (in mach_kernel) + 97 (0x2fa9bd)
>>>> VNOP_ACCESS (in mach_kernel) + 83 (0x2fe21b)
>>>> vfs_mountroot (in mach_kernel) + 1617 (0x2e314b)
>>>> kauth_authorize_action (in mach_kernel) + 80
>>>> (0x46ba55)
>>>> vnode_authorize (in mach_kernel) + 91 (0x2dd78f)
>>>> vn_open_auth (in mach_kernel) + 1066
>>>> (0x2f400f)
>>>> link (in mach_kernel) + 1592 (0x2ed0d3)
>>>> open_nocancel (in mach_kernel) + 243
>>>> (0x2ed581)
>>>> unix_syscall64 (in mach_kernel) + 617
>>>> (0x4f6075)
>>>> lo64_unix_scall (in mach_kernel) + 77
>>>> (0x2a144d)
>>>> User stack:
>>>> open (in libSystem.B.dylib) + 10 (0x7fff8684495e)
>>>> Continuation: kevent (in mach_kernel) + 97 (0x4783e9)
>>>> User stack:
>>>> kevent (in libSystem.B.dylib) + 10 (0x7fff8685312a)
>>>> _dispatch_queue_invoke (in libSystem.B.dylib) + 185
>>>> (0x7fff86854cd4)
>>>> _dispatch_worker_thread2 (in libSystem.B.dylib) + 252
>>>> (0x7fff868547fe)
>>>> _pthread_wqthread (in libSystem.B.dylib) + 353
>>>> (0x7fff86854128)
>>>> start_wqthread (in libSystem.B.dylib) + 13 (0x7fff86853fc5)
>>>> Continuation: semaphore_wait_continue (in mach_kernel) + 0
>>>> (0x22a3e1)
>>>> User stack:
>>>> semaphore_wait_signal_trap (in libSystem.B.dylib) + 10
>>>> (0x7fff8683a2e2)
>>>> dir_mutex_lock (in tup) + 17 (0x10005f570)
>>>> _______________________________________________
>>>> Do not post admin requests to the list. They will be ignored.
>>>> Darwin-kernel mailing list (email@hidden)
>>>> Help/Unsubscribe/Update your Subscription:
>>>>
>>>> This email sent to email@hidden
>>>
>>>
>>
>>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Darwin-kernel mailing list (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>>
>> This email sent to email@hidden
>
>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden