From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:55594 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727002AbeJPTFd (ORCPT ); Tue, 16 Oct 2018 15:05:33 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w9GB4DAd032211 for ; Tue, 16 Oct 2018 07:15:36 -0400 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 2n5dmmuqjw-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 16 Oct 2018 07:15:35 -0400 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 16 Oct 2018 12:15:33 +0100 Date: Tue, 16 Oct 2018 13:15:28 +0200 From: Martin Schwidefsky To: Al Viro Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: dcache endless loop in d_invalidate MIME-Version: 1.0 Message-Id: <20181016131528.6aac4876@mschwideX1> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hi Al, I am currently looking into a customer dump and found what looks like an issue in the dcache code. And I think the following commit of yours has something to do with it: commit fe91522a7ba82ca1a51b07e19954b3825e4aaa22 Author: Al Viro Date: Sat May 3 00:02:25 2014 -0400 don't remove from shrink list in select_collect() If we find something already on a shrink list, just increment data->found and do nothing else. Loops in shrink_dcache_parent() and check_submounts_and_drop() will do the right thing - everything we did put into our list will be evicted and if there had been nothing, but data->found got non-zero, well, we have somebody else shrinking those guys; just try again. Signed-off-by: Al Viro The dump I got is based on kernel v4.4 but the affected dcache functions look identical to the upstream version. Here is what I found in the dump: A lot of "rcu_sched kthread starved for jiffies!" messages Only one CPU, currently running process "run-crons" task 0x65a8008 It just called check_and_drop from d_walk, full backchain: PSW.addr check_and_drop at 30a0e8 %r14 d_walk at 308202 #0 [35b87b88] d_invalidate at 3096e8 #1 [35b87bd8] proc_flush_task at 37190c #2 [35b87c58] release_task at 13f202 #3 [35b87cc8] wait_task_zombie at 13fc36 #4 [35b87d50] wait_consider_task at 140150 #5 [35b87dc0] do_wait at 1403de #6 [35b87e18] sys_wait4 at 14181e #7 [35b87ea8] system_call at 659ec4 Tasks runtime is sum_exec_runtime 26813717162347 # nsec = 26813 seconds, utime = 3991252 # cputime = 974 seconds, stime = 99132516783832 # cputime = 24202 seconds, Task 0x65a8008 has TIF_NEED_RESCHED set d_walk() just called check_and_drop via the finish() function pointer, check_and_drop() will return and d_walk() will return as well. Look like an endless loop in d_invalidate(). The (struct dentry *) dentry in d_invalidate() is at 0x3cb15858 The struct detach_data data in d_invalidate() is at 0x35b87c28 dentry tree starting @ 0x3cb15858 has two entries in d_subdirs: 0x3cb15858 d_name.name: "11898" 0xb940d3d8 d_name.name: "cmdline" 0xb940dd98 d_name.name: "status" crash> px *(struct dentry *) 0x3cb15858 | grep d_flags d_flags = 0x2000cc, crash> px *(struct dentry *) 0xb940d3d8 | grep d_flags d_flags = 0x48048c, # DCACHE_SHRINK_LIST is set crash> px *(struct dentry *) 0xb940dd98 | grep d_flags d_flags = 0x48048c, # DCACHE_SHRINK_LIST is set crash> px *(struct detach_data *) 0x35b87c28 $29 = { select = { start = 0x3cb15858, dispose = { next = 0x35b87c30, prev = 0x35b87c30 }, found = 0x2 }, mountpoint = 0x0 } select_collect() called from detach_and_collect() will increment data.select.found in the struct detach_data @ 0x35b87c28 but will not add any dentries to the dispose lists. The shrink_dentry_list() call in d_invalidate() will do nothing as the dispose list is empty. The two dentries 0xb940d3d8 and 0xb940dd98 are still there. After d_walk returns d_invalidate() finds data.mountpoint == NULL and data.select.found == 2, it will start the loop again without progress. As this is a single CPU system without kernel preemption there is nobody else that will do the shrinking of those dcache entries. In short, this if-statement in select_collect: if (dentry->d_flags & DCACHE_SHRINK_LIST) { data->found++; } with assumption that "somebody else" will do the shrinking seems broken. Do you agree? -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.