linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "zhengbin (A)" <zhengbin13@huawei.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: <jack@suse.cz>, <akpm@linux-foundation.org>,
	<linux-fsdevel@vger.kernel.org>,
	"zhangyi (F)" <yi.zhang@huawei.com>, <renxudong1@huawei.com>,
	Hou Tao <houtao1@huawei.com>
Subject: Re: Possible FS race condition between iterate_dir and d_alloc_parallel
Date: Mon, 9 Sep 2019 22:10:00 +0800	[thread overview]
Message-ID: <afdfa1f4-c954-486b-1eb2-efea6fcc2e65@huawei.com> (raw)
In-Reply-To: <b5876e84-853c-e1f6-4fef-83d3d45e1767@huawei.com>


On 2019/9/4 14:15, zhengbin (A) wrote:
> On 2019/9/3 23:41, Al Viro wrote:
>
>> On Tue, Sep 03, 2019 at 04:40:07PM +0100, Al Viro wrote:
>>> On Tue, Sep 03, 2019 at 10:44:32PM +0800, zhengbin (A) wrote:
>>>> We recently encountered an oops(the filesystem is tmpfs)
>>>> crash> bt
>>>>  #9 [ffff0000ae77bd60] dcache_readdir at ffff0000672954bc
>>>>
>>>> The reason is as follows:
>>>> Process 1 cat test which is not exist in directory A, process 2 cat test in directory A too.
>>>> process 3 create new file in directory B, process 4 ls directory A.
>>> good grief, what screen width do you have to make the table below readable?
>>>
>>> What I do not understand is how the hell does your dtry2 manage to get actually
>>> freed and reused without an RCU delay between its removal from parent's
>>> ->d_subdirs and freeing its memory.  What should've happened in that
>>> scenario is
>>> 	* process 4, in next_positive() grabs rcu_read_lock().
>>> 	* it walks into your dtry2, which might very well be
>>> just a chunk of memory waiting to be freed; it sure as hell is
>>> not positive.  skipped is set to true, 'i' is not decremented.
>>> Note that ->d_child.next points to the next non-cursor sibling
>>> (if any) or to the ->d_subdir of parent, so we can keep walking.
>>> 	* we keep walking for a while; eventually we run out of
>>> counter and leave the loop.
>>>
>>> Only after that we do rcu_read_unlock() and only then anything
>>> observed in that loop might be freed and reused.
> You are right, I miss this.
>>> Confused...  OTOH, I might be misreading that table of yours -
>>> it's about 30% wider than the widest xterm I can get while still
>>> being able to read the font...
> The table is my guess. This oops happens sometimes
>
> (We have one vmcore, others just have log, and the backtrace is same with vmcore, so the reason should be same).
>
> Unfortunately, we do not know how to reproduce it. The vmcore has such a law:
>
> 1、dirA has 177 files, and it is OK
>
> 2、dirB has 25 files, and it is OK
>
> 3、When we ls dirA, it begins with ".", "..", dirB's first file, second file... last file,  last file->next = &(dirB->d_subdirs)
>
> -------->
>
> crash> struct dir_context ffff0000ae77be30  --->dcache_readdir ctx
>
> struct dir_context {
>
> actor = 0xffff00006727d760 <filldir64>,
>
> pos = 27   --->27 = . + .. + 25 files
>
> }
>
>
> next_positive
>
>   for (p = from->next; p != &parent->d_subdirs; p = p->next)  --->parent is dirA, so will continue
>
>
> This should be a bug, I think it is related with locks,  especially with commit ebaaa80e8f20 ("lockless next_positive()").
>
> Howerver, until now, I do not find the reason, Any suggestions?

They will be a such timing as follows:

1. insert a negative dentryB1 to dirB,  dentryB1->next = dirB's first positive dentry(such as fileB)      d_alloc_parallel-->d_alloc

2.insert a negative dentryB2 to dirB, dentryB2->next = dentryB1                                                        d_alloc_parallel-->d_alloc

3. remove dentryB1 from dirB,  dentryB1->next will be fileB too                                                         d_alloc_parallel->dput(new)

4. alloc dentryB1 to dirA,  dirA's d_subdirs->next will be dentryB1


process 1(ls dirA)                       |  process 2(alloc dentryB1 to dirA: d_alloc_parallel-->d_alloc)

dcache_readdir                          |  d_alloc                             

    p = &dentry->d_subdirs;      |      

    next_positive                         |      

                                                  |       __d_alloc-->INIT_LIST_HEAD(&dentry->d_child)

                                                  |       list_add(&dentry->d_child, &parent->d_subdirs)  --->cpu may be executed out of order, first set parent->d_subdirs->next = dentryB1

        p = from->next                 |

        ---> p will be dentryB1, and dentryB1->next will be fileB


We can solute it in 2 ways:

1. add a smp_wmb between __d_alloc and list_add(&dentry->d_child, &parent->d_subdirs)

2. revert commit ebaaa80e8f20 ("lockless next_positive()")

>> Incidentally, which kernel was that on?
> 4.19-stable,  the code of iterate_dir and d_alloc_parallel is same with master
>> .
>>


  parent reply	other threads:[~2019-09-09 14:10 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 14:44 Possible FS race condition between iterate_dir and d_alloc_parallel zhengbin (A)
2019-09-03 15:40 ` Al Viro
2019-09-03 15:41   ` Al Viro
2019-09-04  6:15     ` zhengbin (A)
2019-09-05 17:47       ` Al Viro
2019-09-06  0:55         ` Jun Li
2019-09-06  2:00           ` Al Viro
2019-09-06  2:32         ` zhengbin (A)
2019-09-09 14:10       ` zhengbin (A) [this message]
2019-09-09 14:59         ` Al Viro
2019-09-09 15:10           ` zhengbin (A)
     [not found]             ` <7e32cda5-dc89-719d-9651-cf2bd06ae728@huawei.com>
2019-09-10 21:53               ` Al Viro
2019-09-10 22:17                 ` Al Viro
2019-09-14 16:16                 ` [PATCH] " Al Viro
2019-09-14 16:49                   ` Linus Torvalds
2019-09-14 17:01                     ` Al Viro
2019-09-14 17:15                       ` Linus Torvalds
2019-09-14 20:04                         ` Al Viro
2019-09-14 22:57                           ` Linus Torvalds
2019-09-15  0:50                             ` Al Viro
2019-09-15  1:41                               ` Linus Torvalds
2019-09-15 16:02                                 ` Al Viro
2019-09-15 17:58                                   ` Linus Torvalds
2019-09-21 14:07                                     ` Al Viro
2019-09-21 16:21                                       ` Linus Torvalds
2019-09-21 17:18                                         ` Al Viro
2019-09-21 17:38                                           ` Linus Torvalds
2019-09-24  2:52                                       ` Al Viro
2019-09-24 13:30                                         ` Josef Bacik
2019-09-24 14:51                                           ` Al Viro
2019-09-24 15:01                                             ` Josef Bacik
2019-09-24 15:11                                               ` Al Viro
2019-09-24 15:26                                                 ` Josef Bacik
2019-09-24 16:33                                                   ` Al Viro
     [not found]                                         ` <CAHk-=wiJ1eY7y6r_cFNRPCqD+BJZS7eJeQFO6OrXxRFjDAipsQ@mail.gmail.com>
2019-09-29  5:29                                           ` Al Viro
2019-09-25 11:59                                       ` Amir Goldstein
2019-09-25 12:22                                         ` Al Viro
2019-09-25 12:34                                           ` Amir Goldstein
2019-09-22 21:29                     ` Al Viro
2019-09-23  3:32                       ` zhengbin (A)
2019-09-23  5:08                         ` Al Viro
     [not found]                   ` <20190916020434.tutzwipgs4f6o3di@inn2.lkp.intel.com>
2019-09-16  2:58                     ` 266a9a8b41: WARNING:possible_recursive_locking_detected Al Viro
2019-09-16  3:03                       ` Al Viro
2019-09-16  3:44                         ` Linus Torvalds
2019-09-16 17:16                           ` Al Viro
2019-09-16 17:29                             ` Al Viro
     [not found]                             ` <bd707e64-9650-e9ed-a820-e2cabd02eaf8@huawei.com>
2019-09-17 12:01                               ` Al Viro
2019-09-19  3:36                                 ` zhengbin (A)
2019-09-19  3:55                                   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afdfa1f4-c954-486b-1eb2-efea6fcc2e65@huawei.com \
    --to=zhengbin13@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=houtao1@huawei.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=renxudong1@huawei.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).