From: "zhengbin (A)" <zhengbin13@huawei.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: <jack@suse.cz>, <akpm@linux-foundation.org>,
<linux-fsdevel@vger.kernel.org>,
"zhangyi (F)" <yi.zhang@huawei.com>, <renxudong1@huawei.com>,
Hou Tao <houtao1@huawei.com>
Subject: Re: Possible FS race condition between iterate_dir and d_alloc_parallel
Date: Mon, 9 Sep 2019 22:10:00 +0800 [thread overview]
Message-ID: <afdfa1f4-c954-486b-1eb2-efea6fcc2e65@huawei.com> (raw)
In-Reply-To: <b5876e84-853c-e1f6-4fef-83d3d45e1767@huawei.com>
On 2019/9/4 14:15, zhengbin (A) wrote:
> On 2019/9/3 23:41, Al Viro wrote:
>
>> On Tue, Sep 03, 2019 at 04:40:07PM +0100, Al Viro wrote:
>>> On Tue, Sep 03, 2019 at 10:44:32PM +0800, zhengbin (A) wrote:
>>>> We recently encountered an oops(the filesystem is tmpfs)
>>>> crash> bt
>>>> #9 [ffff0000ae77bd60] dcache_readdir at ffff0000672954bc
>>>>
>>>> The reason is as follows:
>>>> Process 1 cat test which is not exist in directory A, process 2 cat test in directory A too.
>>>> process 3 create new file in directory B, process 4 ls directory A.
>>> good grief, what screen width do you have to make the table below readable?
>>>
>>> What I do not understand is how the hell does your dtry2 manage to get actually
>>> freed and reused without an RCU delay between its removal from parent's
>>> ->d_subdirs and freeing its memory. What should've happened in that
>>> scenario is
>>> * process 4, in next_positive() grabs rcu_read_lock().
>>> * it walks into your dtry2, which might very well be
>>> just a chunk of memory waiting to be freed; it sure as hell is
>>> not positive. skipped is set to true, 'i' is not decremented.
>>> Note that ->d_child.next points to the next non-cursor sibling
>>> (if any) or to the ->d_subdir of parent, so we can keep walking.
>>> * we keep walking for a while; eventually we run out of
>>> counter and leave the loop.
>>>
>>> Only after that we do rcu_read_unlock() and only then anything
>>> observed in that loop might be freed and reused.
> You are right, I miss this.
>>> Confused... OTOH, I might be misreading that table of yours -
>>> it's about 30% wider than the widest xterm I can get while still
>>> being able to read the font...
> The table is my guess. This oops happens sometimes
>
> (We have one vmcore, others just have log, and the backtrace is same with vmcore, so the reason should be same).
>
> Unfortunately, we do not know how to reproduce it. The vmcore has such a law:
>
> 1、dirA has 177 files, and it is OK
>
> 2、dirB has 25 files, and it is OK
>
> 3、When we ls dirA, it begins with ".", "..", dirB's first file, second file... last file, last file->next = &(dirB->d_subdirs)
>
> -------->
>
> crash> struct dir_context ffff0000ae77be30 --->dcache_readdir ctx
>
> struct dir_context {
>
> actor = 0xffff00006727d760 <filldir64>,
>
> pos = 27 --->27 = . + .. + 25 files
>
> }
>
>
> next_positive
>
> for (p = from->next; p != &parent->d_subdirs; p = p->next) --->parent is dirA, so will continue
>
>
> This should be a bug, I think it is related with locks, especially with commit ebaaa80e8f20 ("lockless next_positive()").
>
> Howerver, until now, I do not find the reason, Any suggestions?
They will be a such timing as follows:
1. insert a negative dentryB1 to dirB, dentryB1->next = dirB's first positive dentry(such as fileB) d_alloc_parallel-->d_alloc
2.insert a negative dentryB2 to dirB, dentryB2->next = dentryB1 d_alloc_parallel-->d_alloc
3. remove dentryB1 from dirB, dentryB1->next will be fileB too d_alloc_parallel->dput(new)
4. alloc dentryB1 to dirA, dirA's d_subdirs->next will be dentryB1
process 1(ls dirA) | process 2(alloc dentryB1 to dirA: d_alloc_parallel-->d_alloc)
dcache_readdir | d_alloc
p = &dentry->d_subdirs; |
next_positive |
| __d_alloc-->INIT_LIST_HEAD(&dentry->d_child)
| list_add(&dentry->d_child, &parent->d_subdirs) --->cpu may be executed out of order, first set parent->d_subdirs->next = dentryB1
p = from->next |
---> p will be dentryB1, and dentryB1->next will be fileB
We can solute it in 2 ways:
1. add a smp_wmb between __d_alloc and list_add(&dentry->d_child, &parent->d_subdirs)
2. revert commit ebaaa80e8f20 ("lockless next_positive()")
>> Incidentally, which kernel was that on?
> 4.19-stable, the code of iterate_dir and d_alloc_parallel is same with master
>> .
>>
next prev parent reply other threads:[~2019-09-09 14:10 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-03 14:44 Possible FS race condition between iterate_dir and d_alloc_parallel zhengbin (A)
2019-09-03 15:40 ` Al Viro
2019-09-03 15:41 ` Al Viro
2019-09-04 6:15 ` zhengbin (A)
2019-09-05 17:47 ` Al Viro
2019-09-06 0:55 ` Jun Li
2019-09-06 2:00 ` Al Viro
2019-09-06 2:32 ` zhengbin (A)
2019-09-09 14:10 ` zhengbin (A) [this message]
2019-09-09 14:59 ` Al Viro
2019-09-09 15:10 ` zhengbin (A)
[not found] ` <7e32cda5-dc89-719d-9651-cf2bd06ae728@huawei.com>
2019-09-10 21:53 ` Al Viro
2019-09-10 22:17 ` Al Viro
2019-09-14 16:16 ` [PATCH] " Al Viro
2019-09-14 16:49 ` Linus Torvalds
2019-09-14 17:01 ` Al Viro
2019-09-14 17:15 ` Linus Torvalds
2019-09-14 20:04 ` Al Viro
2019-09-14 22:57 ` Linus Torvalds
2019-09-15 0:50 ` Al Viro
2019-09-15 1:41 ` Linus Torvalds
2019-09-15 16:02 ` Al Viro
2019-09-15 17:58 ` Linus Torvalds
2019-09-21 14:07 ` Al Viro
2019-09-21 16:21 ` Linus Torvalds
2019-09-21 17:18 ` Al Viro
2019-09-21 17:38 ` Linus Torvalds
2019-09-24 2:52 ` Al Viro
2019-09-24 13:30 ` Josef Bacik
2019-09-24 14:51 ` Al Viro
2019-09-24 15:01 ` Josef Bacik
2019-09-24 15:11 ` Al Viro
2019-09-24 15:26 ` Josef Bacik
2019-09-24 16:33 ` Al Viro
[not found] ` <CAHk-=wiJ1eY7y6r_cFNRPCqD+BJZS7eJeQFO6OrXxRFjDAipsQ@mail.gmail.com>
2019-09-29 5:29 ` Al Viro
2019-09-25 11:59 ` Amir Goldstein
2019-09-25 12:22 ` Al Viro
2019-09-25 12:34 ` Amir Goldstein
2019-09-22 21:29 ` Al Viro
2019-09-23 3:32 ` zhengbin (A)
2019-09-23 5:08 ` Al Viro
[not found] ` <20190916020434.tutzwipgs4f6o3di@inn2.lkp.intel.com>
2019-09-16 2:58 ` 266a9a8b41: WARNING:possible_recursive_locking_detected Al Viro
2019-09-16 3:03 ` Al Viro
2019-09-16 3:44 ` Linus Torvalds
2019-09-16 17:16 ` Al Viro
2019-09-16 17:29 ` Al Viro
[not found] ` <bd707e64-9650-e9ed-a820-e2cabd02eaf8@huawei.com>
2019-09-17 12:01 ` Al Viro
2019-09-19 3:36 ` zhengbin (A)
2019-09-19 3:55 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afdfa1f4-c954-486b-1eb2-efea6fcc2e65@huawei.com \
--to=zhengbin13@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=houtao1@huawei.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=renxudong1@huawei.com \
--cc=viro@zeniv.linux.org.uk \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).