* Re: [Bug 205135] System hang up when memory swapping (kswapd deadlock) [not found] ` <bug-205135-27-vbbrgnF9A3@https.bugzilla.kernel.org/> @ 2019-10-22 22:24 ` Andrew Morton 2019-10-23 1:22 ` Darrick J. Wong 0 siblings, 1 reply; 5+ messages in thread From: Andrew Morton @ 2019-10-22 22:24 UTC (permalink / raw) To: linux-mm, linux-xfs Cc: bugzilla-daemon, goodmirek, Hillf Danton, Dmitry Vyukov, Joel Fernandes, Tetsuo Handa (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=205135 > > --- Comment #7 from goodmirek@goodmirek.com --- > Everyone who uses a swapfile on XFS filesystem seem affected by this hang up. > Not sure about other filesystems, I did not have a chance to test it elsewhere. > > This unreproduced bot crash could be related: > https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/ Thanks. Might be core MM, might be XFS, might be Fedora. Hilf, does your patch look related? That seems to have gone quiet? Should we progress Tetsuo's patch? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 205135] System hang up when memory swapping (kswapd deadlock) 2019-10-22 22:24 ` [Bug 205135] System hang up when memory swapping (kswapd deadlock) Andrew Morton @ 2019-10-23 1:22 ` Darrick J. Wong 2019-10-23 2:37 ` Su Yue 2019-10-23 6:49 ` Dave Chinner 0 siblings, 2 replies; 5+ messages in thread From: Darrick J. Wong @ 2019-10-23 1:22 UTC (permalink / raw) To: Andrew Morton, Dave Chinner Cc: linux-mm, linux-xfs, bugzilla-daemon, goodmirek, Hillf Danton, Dmitry Vyukov, Joel Fernandes, Tetsuo Handa On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=205135 > > > > --- Comment #7 from goodmirek@goodmirek.com --- > > Everyone who uses a swapfile on XFS filesystem seem affected by this hang up. > > Not sure about other filesystems, I did not have a chance to test it elsewhere. > > > > This unreproduced bot crash could be related: > > https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/ > > Thanks. Might be core MM, might be XFS, might be Fedora. > > Hilf, does your patch look related? That seems to have gone quiet? > > Should we progress Tetsuo's patch? Hmm... Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64 (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019 ...istr 5.4-rc1 had some writeback bugs in it... -> #1 (fs_reclaim){+.+.}: Oct 09 13:47:08 kernel: fs_reclaim_acquire.part.0+0x25/0x30 Oct 09 13:47:08 kernel: __kmalloc+0x4f/0x330 Oct 09 13:47:08 kernel: kmem_alloc+0x83/0x1a0 [xfs] Oct 09 13:47:08 kernel: kmem_alloc_large+0x3c/0x100 [xfs] Oct 09 13:47:08 kernel: xfs_attr_copy_value+0x5d/0xa0 [xfs] Oct 09 13:47:08 kernel: xfs_attr_get+0xe7/0x1d0 [xfs] Oct 09 13:47:08 kernel: xfs_get_acl+0xad/0x1e0 [xfs] Oct 09 13:47:08 kernel: get_acl+0x81/0x110 Oct 09 13:47:08 kernel: posix_acl_create+0x58/0x160 Oct 09 13:47:08 kernel: xfs_generic_create+0x7e/0x2f0 [xfs] Oct 09 13:47:08 kernel: lookup_open+0x5bd/0x820 Oct 09 13:47:08 kernel: path_openat+0x340/0xcb0 Oct 09 13:47:08 kernel: do_filp_open+0x91/0x100 Oct 09 13:47:08 kernel: do_sys_open+0x184/0x220 Oct 09 13:47:08 kernel: do_syscall_64+0x5c/0xa0 Oct 09 13:47:08 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe That's XFS trying to allocate memory to load an acl off disk, only it looks this thread does a MAYFAIL allocation. It's a GFP_FS (since we don't set KM_NOFS) allocation so we recurse into fs reclaim, and the ACL-getter has locked the inode (which is probably why lockdep triggers). I wonder if that's really a deadlock vs. just super-slow behavior, but otoh I don't think we're supposed to allow reclaim to jump into the filesystems when the fs has locks held. That kmem_alloc_large should probably be changed to KM_NOFS. Dave? --D ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 205135] System hang up when memory swapping (kswapd deadlock) 2019-10-23 1:22 ` Darrick J. Wong @ 2019-10-23 2:37 ` Su Yue 2019-10-23 6:49 ` Dave Chinner 1 sibling, 0 replies; 5+ messages in thread From: Su Yue @ 2019-10-23 2:37 UTC (permalink / raw) To: Darrick J. Wong, Andrew Morton, Dave Chinner Cc: linux-mm, linux-xfs, bugzilla-daemon, goodmirek, Hillf Danton, Dmitry Vyukov, Joel Fernandes, Tetsuo Handa Just to remind, running xfstests/generic/273 could trigger the lockdep deadlock warning. -- Su On 2019/10/23 9:22 AM, Darrick J. Wong wrote: > On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote: >> >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: >> >>> https://bugzilla.kernel.org/show_bug.cgi?id=205135 >>> >>> --- Comment #7 from goodmirek@goodmirek.com --- >>> Everyone who uses a swapfile on XFS filesystem seem affected by this hang up. >>> Not sure about other filesystems, I did not have a chance to test it elsewhere. >>> >>> This unreproduced bot crash could be related: >>> https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/ >> >> Thanks. Might be core MM, might be XFS, might be Fedora. >> >> Hilf, does your patch look related? That seems to have gone quiet? >> >> Should we progress Tetsuo's patch? > > Hmm... > > Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64 (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019 > > ...istr 5.4-rc1 had some writeback bugs in it... > > -> #1 (fs_reclaim){+.+.}: > Oct 09 13:47:08 kernel: fs_reclaim_acquire.part.0+0x25/0x30 > Oct 09 13:47:08 kernel: __kmalloc+0x4f/0x330 > Oct 09 13:47:08 kernel: kmem_alloc+0x83/0x1a0 [xfs] > Oct 09 13:47:08 kernel: kmem_alloc_large+0x3c/0x100 [xfs] > Oct 09 13:47:08 kernel: xfs_attr_copy_value+0x5d/0xa0 [xfs] > Oct 09 13:47:08 kernel: xfs_attr_get+0xe7/0x1d0 [xfs] > Oct 09 13:47:08 kernel: xfs_get_acl+0xad/0x1e0 [xfs] > Oct 09 13:47:08 kernel: get_acl+0x81/0x110 > Oct 09 13:47:08 kernel: posix_acl_create+0x58/0x160 > Oct 09 13:47:08 kernel: xfs_generic_create+0x7e/0x2f0 [xfs] > Oct 09 13:47:08 kernel: lookup_open+0x5bd/0x820 > Oct 09 13:47:08 kernel: path_openat+0x340/0xcb0 > Oct 09 13:47:08 kernel: do_filp_open+0x91/0x100 > Oct 09 13:47:08 kernel: do_sys_open+0x184/0x220 > Oct 09 13:47:08 kernel: do_syscall_64+0x5c/0xa0 > Oct 09 13:47:08 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe > > That's XFS trying to allocate memory to load an acl off disk, only it > looks this thread does a MAYFAIL allocation. It's a GFP_FS (since we > don't set KM_NOFS) allocation so we recurse into fs reclaim, and the > ACL-getter has locked the inode (which is probably why lockdep > triggers). I wonder if that's really a deadlock vs. just super-slow > behavior, but otoh I don't think we're supposed to allow reclaim to jump > into the filesystems when the fs has locks held. > > That kmem_alloc_large should probably be changed to KM_NOFS. Dave? > > --D > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 205135] System hang up when memory swapping (kswapd deadlock) 2019-10-23 1:22 ` Darrick J. Wong 2019-10-23 2:37 ` Su Yue @ 2019-10-23 6:49 ` Dave Chinner 2019-10-23 7:12 ` Dave Chinner 1 sibling, 1 reply; 5+ messages in thread From: Dave Chinner @ 2019-10-23 6:49 UTC (permalink / raw) To: Darrick J. Wong Cc: Andrew Morton, linux-mm, linux-xfs, bugzilla-daemon, goodmirek, Hillf Danton, Dmitry Vyukov, Joel Fernandes, Tetsuo Handa On Tue, Oct 22, 2019 at 06:22:28PM -0700, Darrick J. Wong wrote: > On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote: > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=205135 > > > > > > --- Comment #7 from goodmirek@goodmirek.com --- > > > Everyone who uses a swapfile on XFS filesystem seem affected by this hang up. > > > Not sure about other filesystems, I did not have a chance to test it elsewhere. > > > > > > This unreproduced bot crash could be related: > > > https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/ > > > > Thanks. Might be core MM, might be XFS, might be Fedora. > > > > Hilf, does your patch look related? That seems to have gone quiet? > > > > Should we progress Tetsuo's patch? > > Hmm... > > Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64 (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019 > > ...istr 5.4-rc1 had some writeback bugs in it... > > -> #1 (fs_reclaim){+.+.}: > Oct 09 13:47:08 kernel: fs_reclaim_acquire.part.0+0x25/0x30 > Oct 09 13:47:08 kernel: __kmalloc+0x4f/0x330 > Oct 09 13:47:08 kernel: kmem_alloc+0x83/0x1a0 [xfs] > Oct 09 13:47:08 kernel: kmem_alloc_large+0x3c/0x100 [xfs] > Oct 09 13:47:08 kernel: xfs_attr_copy_value+0x5d/0xa0 [xfs] > Oct 09 13:47:08 kernel: xfs_attr_get+0xe7/0x1d0 [xfs] > Oct 09 13:47:08 kernel: xfs_get_acl+0xad/0x1e0 [xfs] > Oct 09 13:47:08 kernel: get_acl+0x81/0x110 > Oct 09 13:47:08 kernel: posix_acl_create+0x58/0x160 > Oct 09 13:47:08 kernel: xfs_generic_create+0x7e/0x2f0 [xfs] > Oct 09 13:47:08 kernel: lookup_open+0x5bd/0x820 > Oct 09 13:47:08 kernel: path_openat+0x340/0xcb0 > Oct 09 13:47:08 kernel: do_filp_open+0x91/0x100 > Oct 09 13:47:08 kernel: do_sys_open+0x184/0x220 > Oct 09 13:47:08 kernel: do_syscall_64+0x5c/0xa0 > Oct 09 13:47:08 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe > > That's XFS trying to allocate memory to load an acl off disk, only it > looks this thread does a MAYFAIL allocation. It's a GFP_FS (since we > don't set KM_NOFS) allocation so we recurse into fs reclaim, and the > ACL-getter has locked the inode (which is probably why lockdep > triggers). I wonder if that's really a deadlock vs. just super-slow > behavior, but otoh I don't think we're supposed to allow reclaim to jump > into the filesystems when the fs has locks held. > > That kmem_alloc_large should probably be changed to KM_NOFS. Dave? I suspect it's a false positive, but without the rest of the lockdep trace I don't have any context to determine if there is actually a deadlock vector there. i.e. the locked inode is referenced and we are not in a transaction context, so the only reclaim recursion that could attempt to lock it is dirty page writeback off the LRU from kswapd. i.e. direct reclaim will never see that inode, nor can I see how it would block on it. e.g. it's no different from doing memory allocation for BMBT metadata blocks with the XFS_ILOCK held when reading in the extent list on a data read or FIEMAP call. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 205135] System hang up when memory swapping (kswapd deadlock) 2019-10-23 6:49 ` Dave Chinner @ 2019-10-23 7:12 ` Dave Chinner 0 siblings, 0 replies; 5+ messages in thread From: Dave Chinner @ 2019-10-23 7:12 UTC (permalink / raw) To: Darrick J. Wong Cc: Andrew Morton, linux-mm, linux-xfs, bugzilla-daemon, goodmirek, Hillf Danton, Dmitry Vyukov, Joel Fernandes, Tetsuo Handa On Wed, Oct 23, 2019 at 05:49:05PM +1100, Dave Chinner wrote: > On Tue, Oct 22, 2019 at 06:22:28PM -0700, Darrick J. Wong wrote: > > On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote: > > > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > > bugzilla web interface). > > > > > > On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=205135 > > > > > > > > --- Comment #7 from goodmirek@goodmirek.com --- > > > > Everyone who uses a swapfile on XFS filesystem seem affected by this hang up. > > > > Not sure about other filesystems, I did not have a chance to test it elsewhere. > > > > > > > > This unreproduced bot crash could be related: > > > > https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@sina.com/ > > > > > > Thanks. Might be core MM, might be XFS, might be Fedora. > > > > > > Hilf, does your patch look related? That seems to have gone quiet? > > > > > > Should we progress Tetsuo's patch? > > > > Hmm... > > > > Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64 (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019 > > > > ...istr 5.4-rc1 had some writeback bugs in it... > > > > -> #1 (fs_reclaim){+.+.}: > > Oct 09 13:47:08 kernel: fs_reclaim_acquire.part.0+0x25/0x30 > > Oct 09 13:47:08 kernel: __kmalloc+0x4f/0x330 > > Oct 09 13:47:08 kernel: kmem_alloc+0x83/0x1a0 [xfs] > > Oct 09 13:47:08 kernel: kmem_alloc_large+0x3c/0x100 [xfs] > > Oct 09 13:47:08 kernel: xfs_attr_copy_value+0x5d/0xa0 [xfs] > > Oct 09 13:47:08 kernel: xfs_attr_get+0xe7/0x1d0 [xfs] > > Oct 09 13:47:08 kernel: xfs_get_acl+0xad/0x1e0 [xfs] > > Oct 09 13:47:08 kernel: get_acl+0x81/0x110 > > Oct 09 13:47:08 kernel: posix_acl_create+0x58/0x160 > > Oct 09 13:47:08 kernel: xfs_generic_create+0x7e/0x2f0 [xfs] > > Oct 09 13:47:08 kernel: lookup_open+0x5bd/0x820 > > Oct 09 13:47:08 kernel: path_openat+0x340/0xcb0 > > Oct 09 13:47:08 kernel: do_filp_open+0x91/0x100 > > Oct 09 13:47:08 kernel: do_sys_open+0x184/0x220 > > Oct 09 13:47:08 kernel: do_syscall_64+0x5c/0xa0 > > Oct 09 13:47:08 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > That's XFS trying to allocate memory to load an acl off disk, only it > > looks this thread does a MAYFAIL allocation. It's a GFP_FS (since we > > don't set KM_NOFS) allocation so we recurse into fs reclaim, and the > > ACL-getter has locked the inode (which is probably why lockdep > > triggers). I wonder if that's really a deadlock vs. just super-slow > > behavior, but otoh I don't think we're supposed to allow reclaim to jump > > into the filesystems when the fs has locks held. > > > > That kmem_alloc_large should probably be changed to KM_NOFS. Dave? > > I suspect it's a false positive, but without the rest of the lockdep > trace I don't have any context to determine if there is actually a > deadlock vector there. Ok, I've looked at the bz now, and the rest of the trace is kswapd locking an inode from the superblock shrinker. That means I'm pretty certain this is a false positive and has nothing to do with whatever hang is occuring on the user's machine. These: Oct 09 14:00:18 kernel: DMA-API: cacheline tracking ENOMEM, dma-debug disabled occur when a radix_tree_insert() call fails, but I don't see a radix_tree_preload() call anywhere around that code to ensure the radix tree insert has memory available before locks are taken and the insert is attempted. Ahhhh: static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT); Seems like that is guaranteed to fail under mempry pressure as it won't allow memory reclaim to block waiting for progress to be made. Hence I see nothing in the bug to back up the assertions that "Everyone who uses a swapfile on XFS filesystem seem affected by this hang up." There's no evidence at all that even points the subsystem that has hung. sysrq-w, sysrq-l and sysrq-t output are the first things we need from that machine to see if/where it is actually hung... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-23 7:12 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <bug-205135-27@https.bugzilla.kernel.org/> [not found] ` <bug-205135-27-vbbrgnF9A3@https.bugzilla.kernel.org/> 2019-10-22 22:24 ` [Bug 205135] System hang up when memory swapping (kswapd deadlock) Andrew Morton 2019-10-23 1:22 ` Darrick J. Wong 2019-10-23 2:37 ` Su Yue 2019-10-23 6:49 ` Dave Chinner 2019-10-23 7:12 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).