From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx5-phx2.redhat.com ([209.132.183.37]:33605 "EHLO mx5-phx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753953AbcJUPig (ORCPT ); Fri, 21 Oct 2016 11:38:36 -0400 Date: Fri, 21 Oct 2016 11:38:30 -0400 (EDT) From: CAI Qian To: Jan Kara , Miklos Szeredi Cc: Al Viro , Linus Torvalds , linux-fsdevel@vger.kernel.org Message-ID: <1268904273.1449131.1477064310378.JavaMail.zimbra@redhat.com> In-Reply-To: <20161007070838.GA16260@quack2.suse.cz> References: <20161005153014.GC26977@htj.duckdns.org> <270577901.647921.1475682888765.JavaMail.zimbra@redhat.com> <874538236.682217.1475693824077.JavaMail.zimbra@redhat.com> <20161005200522.GE19539@ZenIV.linux.org.uk> <119370333.805584.1475756417736.JavaMail.zimbra@redhat.com> <1860793605.807021.1475756759147.JavaMail.zimbra@redhat.com> <20161007070838.GA16260@quack2.suse.cz> Subject: [4.9-rc1+] overlayfs lockdep MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: ----- Original Message ----- > From: "Jan Kara" > Sent: Friday, October 7, 2016 3:08:38 AM > Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked) > > > So I believe this may be just a problem in overlayfs lockdep annotation > (see below). Added Miklos to CC. > > > Wait. There is also a lockep happened before the xfs internal error as > > well. > > > > [ 5839.452325] ====================================================== > > [ 5839.459221] [ INFO: possible circular locking dependency detected ] > > [ 5839.466215] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted > > [ 5839.471945] ------------------------------------------------------- > > [ 5839.478937] trinity-c220/69531 is trying to acquire lock: > > [ 5839.484961] (&p->lock){+.+.+.}, at: [] > > seq_read+0x4c/0x3e0 > > [ 5839.492967] > > but task is already holding lock: > > [ 5839.499476] (sb_writers#8){.+.+.+}, at: [] > > __sb_start_write+0xd1/0xf0 > > [ 5839.508560] > > which lock already depends on the new lock. > > > > [ 5839.517686] > > the existing dependency chain (in reverse order) is: > > [ 5839.526036] > > -> #3 (sb_writers#8){.+.+.+}: > > [ 5839.530751] [] lock_acquire+0xd4/0x240 > > [ 5839.537368] [] percpu_down_read+0x4a/0x90 > > [ 5839.544275] [] __sb_start_write+0xd1/0xf0 > > [ 5839.551181] [] mnt_want_write+0x24/0x50 > > [ 5839.557892] [] ovl_want_write+0x1f/0x30 > > [overlay] > > [ 5839.565577] [] ovl_do_remove+0x46/0x480 > > [overlay] > > [ 5839.573259] [] ovl_unlink+0x13/0x20 [overlay] > > [ 5839.580555] [] vfs_unlink+0xda/0x190 > > [ 5839.586979] [] do_unlinkat+0x268/0x2b0 > > [ 5839.593599] [] SyS_unlinkat+0x1b/0x30 > > [ 5839.600120] [] do_syscall_64+0x6c/0x1e0 > > [ 5839.606836] [] return_from_SYSCALL_64+0x0/0x7a > > [ 5839.614231] > > So here is IMO the real culprit: do_unlinkat() grabs fs freeze protection > through mnt_want_write(), we grab also i_rwsem in do_unlinkat() in > I_MUTEX_PARENT class a bit after that and further down in vfs_unlink() we > grab i_rwsem for the unlinked inode itself in default I_MUTEX class. Then > in ovl_want_write() we grab freeze protection again, but this time for the > upper filesystem. That establishes sb_writers (overlay) -> I_MUTEX_PARENT > (overlay) -> I_MUTEX (overlay) -> sb_writers (FS-A) lock ordering > (we maintain locking classes per fs type so that's why I'm showing fs type > in parenthesis). > > Now this nesting is nasty because once you add locks that are not tracked > per fs type into the mix, you get cycles. In this case we've got > seq_file->lock and cred_guard_mutex into the mix - the splice path is > doing sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex (splicing > from seq_file into the real filesystem). Exec path further establishes > cred_guard_mutex -> I_MUTEX (overlay) which closes the full cycle: > > sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex -> i_mutex > (overlay) -> sb_writers (FS-A) > > If I analyzed the lockdep trace, this looks like a real (although remote) > deadlock possibility. Miklos? So this can still be reproduced in the yesterday's mainline. [40581.813575] [ INFO: possible circular locking dependency detected ] [40581.813578] 4.9.0-rc1-lockfix-uncorev2+ #51 Tainted: G W [40581.813581] ------------------------------------------------------- [40581.813582] trinity-c104/39795 is trying to acquire lock: [40581.813587] ( [40581.813588] &p->lock [40581.813589] ){+.+.+.} [40581.813600] , at: [40581.813601] [] seq_read+0xec/0x1400 [40581.813603] [40581.813603] but task is already holding lock: [40581.813605] ( [40581.813607] sb_writers [40581.813608] #8 [40581.813609] ){.+.+.+} [40581.813617] , at: [40581.813617] [] do_sendfile+0x9ea/0x1270 [40581.813618] [40581.813618] which lock already depends on the new lock. [40581.813618] [40581.813620] [40581.813620] the existing dependency chain (in reverse order) is: [40581.813623] [40581.813623] -> #3 [40581.813624] ( [40581.813625] sb_writers [40581.813626] #8 [40581.813628] ){.+.+.+} [40581.813628] : [40581.813636] [40581.813636] [] __lock_acquire+0x9aa/0x1710 [40581.813640] [40581.813640] [] lock_acquire+0x24e/0x5d0 [40581.813644] [40581.813645] [] __sb_start_write+0xae/0x360 [40581.813650] [40581.813650] [] mnt_want_write+0x4a/0xc0 [40581.813661] [40581.813661] [] ovl_want_write+0x8d/0xf0 [overlay] [40581.813668] [40581.813668] [] ovl_do_remove+0xe7/0x9a0 [overlay] [40581.813675] [40581.813676] [] ovl_rmdir+0x16/0x20 [overlay] [40581.813680] [40581.813680] [] vfs_rmdir+0x1bf/0x3e0 [40581.813685] [40581.813686] [] do_rmdir+0x2c5/0x430 [40581.813689] [40581.813690] [] SyS_unlinkat+0x22/0x30 [40581.813696] [40581.813696] [] do_syscall_64+0x19d/0x540 [40581.813704] [40581.813704] [] return_from_SYSCALL_64+0x0/0x7a [40581.813707] [40581.813707] -> #2 [40581.813709] ( [40581.813710] &sb->s_type->i_mutex_key [40581.813711] #17 [40581.813712] ){++++++} [40581.813713] : [40581.813720] [40581.813720] [] __lock_acquire+0x9aa/0x1710 [40581.813726] [40581.813726] [] lock_acquire+0x24e/0x5d0 [40581.813736] [40581.813736] [] down_read+0xa1/0x1c0 [40581.813740] [40581.813740] [] lookup_slow+0x17b/0x4f0 [40581.813744] [40581.813744] [] walk_component+0x728/0x1d10 [40581.813750] [40581.813750] [] link_path_walk+0x40e/0x1690 [40581.813758] [40581.813758] [] path_openat+0x1c4/0x3870 [40581.813764] [40581.813765] [] do_filp_open+0x1a9/0x2e0 [40581.813772] [40581.813772] [] do_open_execat+0xcb/0x420 [40581.813783] [40581.813784] [] open_exec+0x2b/0x50 [40581.813793] [40581.813793] [] load_elf_binary+0x103c/0x3550 [40581.813807] [40581.813807] [] search_binary_handler+0x162/0x480 [40581.813814] [40581.813815] [] do_execveat_common.isra.24+0x138a/0x2570 [40581.813823] [40581.813824] [] SyS_execve+0x3a/0x50 [40581.813828] [40581.813828] [] do_syscall_64+0x19d/0x540 [40581.813833] [40581.813834] [] return_from_SYSCALL_64+0x0/0x7a [40581.813843] [40581.813843] -> #1 [40581.813845] ( [40581.813850] &sig->cred_guard_mutex [40581.813852] ){+.+.+.} [40581.813852] : [40581.813861] [40581.813862] [] __lock_acquire+0x9aa/0x1710 [40581.813871] [40581.813871] [] lock_acquire+0x24e/0x5d0 [40581.813885] [40581.813886] [] mutex_lock_killable_nested+0x103/0xb90 [40581.813895] [40581.813896] [] do_io_accounting+0x186/0xcf0 [40581.813902] [40581.813903] [] proc_tgid_io_accounting+0x19/0x20 [40581.813908] [40581.813909] [] proc_single_show+0x114/0x1d0 [40581.813917] [40581.813917] [] seq_read+0x334/0x1400 [40581.813921] [40581.813921] [] __vfs_read+0x106/0x990 [40581.813927] [40581.813927] [] vfs_read+0x118/0x400 [40581.813931] [40581.813931] [] SyS_read+0xdf/0x1d0 [40581.813938] [40581.813938] [] do_syscall_64+0x19d/0x540 [40581.813945] [40581.813946] [] return_from_SYSCALL_64+0x0/0x7a [40581.813949] [40581.813949] -> #0 [40581.813951] ( [40581.813954] &p->lock [40581.813955] ){+.+.+.} [40581.813956] : [40581.813961] [40581.813961] [] validate_chain.isra.31+0x2b28/0x4c00 [40581.813965] [40581.813966] [] __lock_acquire+0x9aa/0x1710 [40581.813972] [40581.813972] [] lock_acquire+0x24e/0x5d0 [40581.813977] [40581.813977] [] mutex_lock_nested+0x108/0xa50 [40581.813983] [40581.813983] [] seq_read+0xec/0x1400 [40581.813993] [40581.813993] [] kernfs_fop_read+0x35e/0x640 [40581.813998] [40581.813998] [] do_loop_readv_writev+0xdf/0x250 [40581.814003] [40581.814003] [] do_readv_writev+0x6a5/0xab0 [40581.814007] [40581.814007] [] vfs_readv+0x86/0xe0 [40581.814020] [40581.814020] [] default_file_splice_read+0x49c/0xbb0 [40581.814026] [40581.814027] [] do_splice_to+0x104/0x1a0 [40581.814033] [40581.814033] [] splice_direct_to_actor+0x270/0xa00 [40581.814039] [40581.814039] [] do_splice_direct+0x194/0x300 [40581.814046] [40581.814046] [] do_sendfile+0x469/0x1270 [40581.814051] [40581.814051] [] SyS_sendfile64+0x140/0x150 [40581.814054] [40581.814055] [] do_syscall_64+0x19d/0x540 [40581.814059] [40581.814060] [] return_from_SYSCALL_64+0x0/0x7a [40581.814062] [40581.814062] other info that might help us debug this: [40581.814062] [40581.814066] Chain exists of: [40581.814066] [40581.814067] &p->lock [40581.814069] --> [40581.814070] &sb->s_type->i_mutex_key [40581.814071] #17 [40581.814073] --> [40581.814076] sb_writers [40581.814079] #8 [40581.814079] [40581.814079] [40581.814080] Possible unsafe locking scenario: [40581.814080] [40581.814081] CPU0 CPU1 [40581.814083] ---- ---- [40581.814085] lock( [40581.814088] sb_writers [40581.814089] #8 [40581.814089] ); [40581.814091] lock( [40581.814093] &sb->s_type->i_mutex_key [40581.814095] #17 [40581.814095] ); [40581.814097] lock( [40581.814098] sb_writers [40581.814099] #8 [40581.814099] ); [40581.814101] lock( [40581.814103] &p->lock [40581.814103] ); [40581.814104] [40581.814104] *** DEADLOCK *** [40581.814104] [40581.814106] 1 lock held by trinity-c104/39795: [40581.814109] #0: [40581.814111] ( [40581.814112] sb_writers [40581.814113] #8 [40581.814114] ){.+.+.+} [40581.814116] , at: [40581.814117] [] do_sendfile+0x9ea/0x1270 [40581.814118] [40581.814118] stack backtrace: [40581.814121] CPU: 25 PID: 39795 Comm: trinity-c104 Tainted: G W 4.9.0-rc1-lockfix-uncorev2+ #51 [40581.814123] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [40581.814131] ffff880825886da0 ffffffff81d37124 0000000041b58ab3 ffffffff83348dc7 [40581.814138] ffffffff81d37064 0000000000000001 0000000000000000 ffff8807b4d5d5d8 [40581.814145] 00000000bfc018be ffff880825886d78 0000000000000001 0000000000000000 [40581.814146] Call Trace: [40581.814155] [] dump_stack+0xc0/0x12c [40581.814159] [] ? _atomic_dec_and_lock+0xc4/0xc4 [40581.814168] [] print_circular_bug+0x3c9/0x5e0 [40581.814171] [] ? print_circular_bug_entry+0xd0/0xd0 [40581.814176] [] validate_chain.isra.31+0x2b28/0x4c00 [40581.814182] [] ? check_irq_usage+0x300/0x300 [40581.814192] [] ? check_irq_usage+0x300/0x300 [40581.814196] [] ? __this_cpu_preempt_check+0x13/0x20 [40581.814200] [] ? validate_chain.isra.31+0x1235/0x4c00 [40581.814204] [] ? print_usage_bug+0x700/0x700 [40581.814208] [] ? sched_clock_cpu+0x1b0/0x310 [40581.814214] [] ? print_usage_bug+0x700/0x700 [40581.814219] [] ? sched_clock_cpu+0x1b0/0x310 [40581.814226] [] __lock_acquire+0x9aa/0x1710 [40581.814232] [] lock_acquire+0x24e/0x5d0 [40581.814235] [] ? seq_read+0xec/0x1400 [40581.814240] [] ? seq_read+0xec/0x1400 [40581.814243] [] mutex_lock_nested+0x108/0xa50 [40581.814246] [] ? seq_read+0xec/0x1400 [40581.814250] [] ? seq_read+0xec/0x1400 [40581.814256] [] ? kasan_unpoison_shadow+0x36/0x50 [40581.814259] [] ? mutex_lock_interruptible_nested+0xb40/0xb40 [40581.814264] [] ? get_page_from_freelist+0x175c/0x2ed0 [40581.814271] [] ? __isolate_free_page+0x7e0/0x7e0 [40581.814275] [] ? mark_held_locks+0x109/0x290 [40581.814278] [] seq_read+0xec/0x1400 [40581.814283] [] ? rcu_lockdep_current_cpu_online+0x11d/0x1d0 [40581.814290] [] ? seq_hlist_start_percpu+0x4a0/0x4a0 [40581.814295] [] ? __fsnotify_update_child_dentry_flags.part.0+0x2b0/0x2b0 [40581.814298] [] ? __this_cpu_preempt_check+0x13/0x20 [40581.814300] [] kernfs_fop_read+0x35e/0x640 [40581.814305] [] ? selinux_file_permission+0x3c5/0x550 [40581.814310] [] ? kernfs_fop_open+0xf40/0xf40 [40581.814312] [] do_loop_readv_writev+0xdf/0x250 [40581.814318] [] do_readv_writev+0x6a5/0xab0 [40581.814324] [] ? vfs_write+0x5f0/0x5f0 [40581.814328] [] ? iov_iter_get_pages_alloc+0x53f/0x1990 [40581.814332] [] ? iov_iter_npages+0xed0/0xed0 [40581.814336] [] ? mark_held_locks+0x109/0x290 [40581.814339] [] ? __this_cpu_preempt_check+0x13/0x20 [40581.814344] [] ? trace_hardirqs_on_caller+0x520/0x720 [40581.814347] [] vfs_readv+0x86/0xe0 [40581.814352] [] default_file_splice_read+0x49c/0xbb0 [40581.814361] [] ? do_splice_direct+0x300/0x300 [40581.814363] [] ? kasan_kmalloc+0xad/0xe0 [40581.814366] [] ? alloc_pipe_info+0x1b7/0x410 [40581.814371] [] ? print_usage_bug+0x700/0x700 [40581.814373] [] ? SyS_sendfile64+0x140/0x150 [40581.814377] [] ? do_syscall_64+0x19d/0x540 [40581.814380] [] ? entry_SYSCALL64_slow_path+0x25/0x25 [40581.814382] [] ? sched_clock_cpu+0x1b0/0x310 [40581.814386] [] ? mark_held_locks+0x109/0x290 [40581.814390] [] ? trace_hardirqs_on_caller+0x520/0x720 [40581.814395] [] ? __fsnotify_update_child_dentry_flags.part.0+0x2b0/0x2b0 [40581.814398] [] ? selinux_file_permission+0x3c5/0x550 [40581.814404] [] ? security_file_permission+0x176/0x220 [40581.814408] [] ? rw_verify_area+0xd8/0x380 [40581.814411] [] do_splice_to+0x104/0x1a0 [40581.814415] [] ? alloc_pipe_info+0x2e7/0x410 [40581.814419] [] splice_direct_to_actor+0x270/0xa00 [40581.814424] [] ? wakeup_pipe_readers+0x90/0x90 [40581.814429] [] ? do_splice_to+0x1a0/0x1a0 [40581.814432] [] ? rw_verify_area+0xd8/0x380 [40581.814438] [] do_splice_direct+0x194/0x300 [40581.814443] [] ? splice_direct_to_actor+0xa00/0xa00 [40581.814450] [] ? preempt_count_sub+0x5e/0xe0 [40581.814452] [] ? __sb_start_write+0x145/0x360 [40581.814457] [] do_sendfile+0x469/0x1270 [40581.814461] [] ? do_compat_pwritev64.isra.16+0xd0/0xd0 [40581.814466] [] ? __audit_syscall_exit+0x637/0x960 [40581.814469] [] ? syscall_trace_enter+0x89b/0x1930 [40581.814473] [] ? kfree+0x3f3/0x620 [40581.814477] [] SyS_sendfile64+0x140/0x150 [40581.814479] [] ? SyS_sendfile+0x140/0x140 [40581.814482] [] ? __this_cpu_preempt_check+0x13/0x20 [40581.814485] [] ? SyS_sendfile+0x140/0x140 [40581.814487] [] do_syscall_64+0x19d/0x540 [40581.814491] [] entry_SYSCALL64_slow_path+0x25/0x25