All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Christian Brauner <brauner@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	 linux-fsdevel@vger.kernel.org,
	Seth Forshee <sforshee@kernel.org>,
	 Tycho Andersen <tycho@tycho.pizza>
Subject: Re: [PATCH 2/2] pidfd: add pidfdfs
Date: Sun, 18 Feb 2024 10:57:19 -0800	[thread overview]
Message-ID: <CAHk-=wgtLF5Z5=15-LKAczWm=-tUjHO+Bpf7WjBG+UU3s=fEQw@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgSjKuYHXd56nJNmcW3ECQR4=a5_14jQiUswuZje+XF_Q@mail.gmail.com>

On Sun, 18 Feb 2024 at 10:08, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The only ugliness I see is the one that comes from the original code -
> I'm not thrilled about the "return -EAGAIN" part, and I think that if
> we found a previously stashed entry after all, we should loop.
>
> But I think that whole horror comes from a fear of an endless loop
> when the dentry is marked dead, and another CPU still sees the old
> value (so you can't re-use it, and yet it's not NULL).

Actually, I think this is fairly easily fixable, but let's fix it
*after* you've done your cleanups.

The eventual fix is fairly simple: allow installing a new dentry not
just as a replacement for a previous NULL dentry, but also replacing a
previous dead dentry.

That requires just two simple changes:

 - the ->d_prune() callback should no longer just blindly set the
stashed value to NULL, it would do

        // Somebody could have re-used our stash as the dentry
        // died, so we only NULL it out of if matches our pruned one
        cmpxchg(&stashed, dentry, NULL);

 - when installing, instead of doing

        if (cmpxchg(&stashed, NULL, dentry) .. FAIL ..

   we'd loop with something like this:

        guard(rcu)();
        for (;;) {
                struct dentry *old;

                // Assume any old dentry was cleared out
                old = cmpxchg(&stashed, NULL, dentry);
                if (likely(!old))
                        break;

                // Maybe somebody else installed a good dentry
                // .. so release ours and use the new one
                if (lockref_get_not_dead(&old->d_lockref)) {
                        d_delete(dentry);
                        dput(dentry);
                        return old;
                }

                // There's an old dead dentry there, try to take it over
                if (likely(try_cmpxchg(&stashed, old, dentry)))
                        break;

                // Ooops, that failed, to back and try again
                cpu_relax();
        }

        // We successfully installed our dentry
        // as the new stashed one
        return dentry;

which really isn't doesn't look that complicated (note the RCU guard
as a way to make sure this all runs RCU-locked without needing to
worry about the unlock sequence).

Note: your initial optimistic "get_stashed_dentry()" stays exactly as
it is. The above loop is just for the "oh, we didn't trivially re-use
an old dentry, so now we need to allocate a new one and install it"
case.

Anyway, the above is written just in the MUA, there's no testing of
the above, and again - I think this should be done *after* you've done
the cleanups of the current code. But I think it would clarify the odd
race condition with an old dentry dying just as a new one is created,
and make sure there isn't some -EAGAIN case that people need to worry
about.

Because either we can re-use the old one, or there isn't an old one,
or we find a dead one that can't be reused but can just be replaced.

Fairly straightforward, no?

               Linus

  reply	other threads:[~2024-02-18 18:57 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 16:45 [PATCH 0/2] Move pidfd to tiny pseudo fs Christian Brauner
2024-02-13 16:45 ` [PATCH 1/2] pidfd: move struct pidfd_fops Christian Brauner
2024-02-13 16:45 ` [PATCH 2/2] pidfd: add pidfdfs Christian Brauner
2024-02-13 17:17   ` Linus Torvalds
2024-02-14 14:40     ` Christian Brauner
2024-02-14 18:27       ` Christian Brauner
2024-02-14 18:37         ` Linus Torvalds
2024-02-15 16:11           ` Christian Brauner
2024-02-16 11:50             ` Christian Brauner
2024-02-16 16:41               ` Christian Brauner
2024-02-17 13:59               ` Oleg Nesterov
2024-02-17 17:30                 ` Linus Torvalds
2024-02-17 17:38                   ` Linus Torvalds
2024-02-18 11:15                   ` Christian Brauner
2024-02-18 11:33                     ` Christian Brauner
2024-02-18 17:54                       ` Christian Brauner
2024-02-18 18:08                         ` Linus Torvalds
2024-02-18 18:57                           ` Linus Torvalds [this message]
2024-02-19 18:05                             ` Christian Brauner
2024-02-19 18:34                               ` Linus Torvalds
2024-02-19 21:18                                 ` Christian Brauner
2024-02-19 23:24                                   ` Linus Torvalds
2024-02-18 14:27                     ` Oleg Nesterov
2024-02-18  9:30                 ` Christian Brauner
2024-02-22 19:03   ` Nathan Chancellor
2024-02-23 10:18     ` Heiko Carstens
2024-02-23 11:56       ` Christian Brauner
2024-02-23 11:55     ` Christian Brauner
2024-02-23 12:57       ` Heiko Carstens
2024-02-23 13:27         ` Christian Brauner
2024-02-23 13:35           ` Heiko Carstens
2024-02-23 13:41       ` Christian Brauner
2024-02-23 21:26       ` Christian Brauner
2024-02-23 21:58         ` Linus Torvalds
2024-02-24  5:52           ` Christian Brauner
2024-02-24  6:05             ` Christian Brauner
2024-02-24 18:48             ` Linus Torvalds
2024-02-24 19:15               ` Christian Brauner
2024-02-24 19:19                 ` Christian Brauner
2024-02-24 19:21                 ` Linus Torvalds
2024-02-27 19:26                 ` Nathan Chancellor
2024-02-27 22:13                   ` Christian Brauner
2024-03-12 10:35   ` Geert Uytterhoeven
2024-03-12 14:09     ` Christian Brauner
2024-02-13 17:02 ` [PATCH 0/2] Move pidfd to tiny pseudo fs Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgtLF5Z5=15-LKAczWm=-tUjHO+Bpf7WjBG+UU3s=fEQw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=sforshee@kernel.org \
    --cc=tycho@tycho.pizza \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.