linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: "zhengbin (A)" <zhengbin13@huawei.com>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"zhangyi (F)" <yi.zhang@huawei.com>,
	renxudong1@huawei.com, Hou Tao <houtao1@huawei.com>
Subject: Re: [PATCH] Re: Possible FS race condition between iterate_dir and d_alloc_parallel
Date: Sat, 14 Sep 2019 15:57:15 -0700	[thread overview]
Message-ID: <CAHk-=whpoQ_hX2KeqjQs3DeX6Wb4Tmb8BkHa5zr-Xu=S55+ORg@mail.gmail.com> (raw)
In-Reply-To: <20190914200412.GU1131@ZenIV.linux.org.uk>

On Sat, Sep 14, 2019 at 1:04 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> An obvious approach would be to have them sit in the lists hanging off
> dentries, with pointer to dentry in the cursor itself.  It's not hard
> to do, with "move" costing O(#Cursors(dentry1)) and everything else
> being O(1), but I hate adding a pointer to each struct dentry, when
> it's completely useless for most of the filesystems *and* NULL for
> most of dentries on dcache_readdir()-using one.

Yeah, no, I think we should just do the straightforward proper locking
for now, and see if anything even cares.

Last time I think it was a microbenchmark that showed a regression,
not necessarily even a real load (reaim), and there wasn't a _ton_ of
debugging on exactly what triggered the higher system time. It was
also on a fairly unusual system that would show the lock contention
much more than 99+% of anything else.

So I suspect we might have other avenues of improvement than just the
cursor thing. Yes, it would be absolutely lovely to not have the
cursor, and avoid the resulting growth of the dentry child list, which
then makes everything else much more expensive.

But it is also possible that we could avoid some of that O(n**2)
behavior by simply not adding the corsor to the end of the dentry
child list at all. Right now your patch *always* sets the cursor at a
valid point - even if it's at the end of the directory. But we could
skip the "end of the directory" case entirely and just set a flag in
the file for "at eof" instead.

That way the cursor at least wouldn't exist for the common cases when
we return to user space (at the beginning of the readdir and at the
end). Which might make the cursors simply not be quite as common, even
when you have a _lot_ of concurrent readdir() users.

There may be other similar things we could do to minimize the pressure
on the parent dentry lock. For example, maybe we could insert a cursor
only _between_ readdir() calls, but then in the readdir() loop itself,
we know we hold the inode lock shared for the directory, so during
_that_ loop we can use the existing positive dentries that we keep a
refcount to as cursors.

Because if the only reason to not use existing dentry pointers as
cursors is the concurrent rename() issue, then _that_ is certainly
something that the parent inode shared lock should protect against,
even if the child dentry list can change in other ways).

So if the main 'reaim' problem was that the dentry child list itself
grew due to the cursor pressure (and that is likely) and that in turn
then caused the O(n**2) bad locking behavior due to having to walk
much longer dentry child chains, then I suspect that we can do a few
tweaks to simply not make that happen in practice.

Yes, I realize that I'm handwaving a bit on the two above suggestions,
but don't you think that sounds doable?

            Linus

  reply	other threads:[~2019-09-14 23:08 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 14:44 Possible FS race condition between iterate_dir and d_alloc_parallel zhengbin (A)
2019-09-03 15:40 ` Al Viro
2019-09-03 15:41   ` Al Viro
2019-09-04  6:15     ` zhengbin (A)
2019-09-05 17:47       ` Al Viro
2019-09-06  0:55         ` Jun Li
2019-09-06  2:00           ` Al Viro
2019-09-06  2:32         ` zhengbin (A)
2019-09-09 14:10       ` zhengbin (A)
2019-09-09 14:59         ` Al Viro
2019-09-09 15:10           ` zhengbin (A)
     [not found]             ` <7e32cda5-dc89-719d-9651-cf2bd06ae728@huawei.com>
2019-09-10 21:53               ` Al Viro
2019-09-10 22:17                 ` Al Viro
2019-09-14 16:16                 ` [PATCH] " Al Viro
2019-09-14 16:49                   ` Linus Torvalds
2019-09-14 17:01                     ` Al Viro
2019-09-14 17:15                       ` Linus Torvalds
2019-09-14 20:04                         ` Al Viro
2019-09-14 22:57                           ` Linus Torvalds [this message]
2019-09-15  0:50                             ` Al Viro
2019-09-15  1:41                               ` Linus Torvalds
2019-09-15 16:02                                 ` Al Viro
2019-09-15 17:58                                   ` Linus Torvalds
2019-09-21 14:07                                     ` Al Viro
2019-09-21 16:21                                       ` Linus Torvalds
2019-09-21 17:18                                         ` Al Viro
2019-09-21 17:38                                           ` Linus Torvalds
2019-09-24  2:52                                       ` Al Viro
2019-09-24 13:30                                         ` Josef Bacik
2019-09-24 14:51                                           ` Al Viro
2019-09-24 15:01                                             ` Josef Bacik
2019-09-24 15:11                                               ` Al Viro
2019-09-24 15:26                                                 ` Josef Bacik
2019-09-24 16:33                                                   ` Al Viro
     [not found]                                         ` <CAHk-=wiJ1eY7y6r_cFNRPCqD+BJZS7eJeQFO6OrXxRFjDAipsQ@mail.gmail.com>
2019-09-29  5:29                                           ` Al Viro
2019-09-25 11:59                                       ` Amir Goldstein
2019-09-25 12:22                                         ` Al Viro
2019-09-25 12:34                                           ` Amir Goldstein
2019-09-22 21:29                     ` Al Viro
2019-09-23  3:32                       ` zhengbin (A)
2019-09-23  5:08                         ` Al Viro
     [not found]                   ` <20190916020434.tutzwipgs4f6o3di@inn2.lkp.intel.com>
2019-09-16  2:58                     ` 266a9a8b41: WARNING:possible_recursive_locking_detected Al Viro
2019-09-16  3:03                       ` Al Viro
2019-09-16  3:44                         ` Linus Torvalds
2019-09-16 17:16                           ` Al Viro
2019-09-16 17:29                             ` Al Viro
     [not found]                             ` <bd707e64-9650-e9ed-a820-e2cabd02eaf8@huawei.com>
2019-09-17 12:01                               ` Al Viro
2019-09-19  3:36                                 ` zhengbin (A)
2019-09-19  3:55                                   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whpoQ_hX2KeqjQs3DeX6Wb4Tmb8BkHa5zr-Xu=S55+ORg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=houtao1@huawei.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=renxudong1@huawei.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yi.zhang@huawei.com \
    --cc=zhengbin13@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).