From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83C70C3A59E for ; Wed, 4 Sep 2019 06:16:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5F67A23402 for ; Wed, 4 Sep 2019 06:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728764AbfIDGQO (ORCPT ); Wed, 4 Sep 2019 02:16:14 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:6199 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728756AbfIDGQO (ORCPT ); Wed, 4 Sep 2019 02:16:14 -0400 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 4E91F158A750B92E3B7A; Wed, 4 Sep 2019 14:16:12 +0800 (CST) Received: from [127.0.0.1] (10.184.213.217) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.439.0; Wed, 4 Sep 2019 14:16:04 +0800 Subject: Re: Possible FS race condition between iterate_dir and d_alloc_parallel To: Al Viro CC: , , , "zhangyi (F)" , References: <20190903154007.GJ1131@ZenIV.linux.org.uk> <20190903154114.GK1131@ZenIV.linux.org.uk> From: "zhengbin (A)" Message-ID: Date: Wed, 4 Sep 2019 14:15:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <20190903154114.GK1131@ZenIV.linux.org.uk> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [10.184.213.217] X-CFilter-Loop: Reflected Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On 2019/9/3 23:41, Al Viro wrote: > On Tue, Sep 03, 2019 at 04:40:07PM +0100, Al Viro wrote: >> On Tue, Sep 03, 2019 at 10:44:32PM +0800, zhengbin (A) wrote: >>> We recently encountered an oops(the filesystem is tmpfs) >>> crash> bt >>> #9 [ffff0000ae77bd60] dcache_readdir at ffff0000672954bc >>> >>> The reason is as follows: >>> Process 1 cat test which is not exist in directory A, process 2 cat test in directory A too. >>> process 3 create new file in directory B, process 4 ls directory A. >> >> good grief, what screen width do you have to make the table below readable? >> >> What I do not understand is how the hell does your dtry2 manage to get actually >> freed and reused without an RCU delay between its removal from parent's >> ->d_subdirs and freeing its memory. What should've happened in that >> scenario is >> * process 4, in next_positive() grabs rcu_read_lock(). >> * it walks into your dtry2, which might very well be >> just a chunk of memory waiting to be freed; it sure as hell is >> not positive. skipped is set to true, 'i' is not decremented. >> Note that ->d_child.next points to the next non-cursor sibling >> (if any) or to the ->d_subdir of parent, so we can keep walking. >> * we keep walking for a while; eventually we run out of >> counter and leave the loop. >> >> Only after that we do rcu_read_unlock() and only then anything >> observed in that loop might be freed and reused. You are right, I miss this. >> >> Confused... OTOH, I might be misreading that table of yours - >> it's about 30% wider than the widest xterm I can get while still >> being able to read the font... The table is my guess. This oops happens sometimes (We have one vmcore, others just have log, and the backtrace is same with vmcore, so the reason should be same). Unfortunately, we do not know how to reproduce it. The vmcore has such a law: 1、dirA has 177 files, and it is OK 2、dirB has 25 files, and it is OK 3、When we ls dirA, it begins with ".", "..", dirB's first file, second file... last file,  last file->next = &(dirB->d_subdirs) --------> crash> struct dir_context ffff0000ae77be30  --->dcache_readdir ctx struct dir_context { actor = 0xffff00006727d760 , pos = 27   --->27 = . + .. + 25 files } next_positive   for (p = from->next; p != &parent->d_subdirs; p = p->next)  --->parent is dirA, so will continue This should be a bug, I think it is related with locks,  especially with commit ebaaa80e8f20 ("lockless next_positive()"). Howerver, until now, I do not find the reason, Any suggestions? > Incidentally, which kernel was that on? 4.19-stable,  the code of iterate_dir and d_alloc_parallel is same with master > > . >