All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Felipe Balbi <balbi@kernel.org>
Subject: Re: [RFC PATCH] sched/wait: Make interruptible exclusive waitqueue wakeups reliable
Date: Mon, 9 Dec 2019 13:08:53 +0100	[thread overview]
Message-ID: <20191209120852.GA5388@redhat.com> (raw)
In-Reply-To: <20191209091813.GA41320@gmail.com>

On 12/09, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > The reason it is buggy is that wait_event_interruptible_exclusive()
> > does this (inside the __wait_event() macro that it expands to):
> >
> >                 long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);
> >
> >                 if (condition)
> >                         break;
> >                 if (___wait_is_interruptible(state) && __int) {
> >                         __ret = __int;
> >                         goto __out;
> >
> > and the thing is, if does that "__ret = __int" case and returns
> > -ERESTARTSYS,

But note that it checks "condition" after prepare_to_wait_event(), if it is
true then ___wait_is_interruptible() won't be even called.

> it's possible that the wakeup event has already been
> > consumed, because we've added ourselves as an exclusive writer to the
> > queue. So it _says_ it was interrupted, not woken up, and the wait got
> > cancelled, but because we were an exclusive waiter, we might be the
> > _only_ thing that got woken up, and the wakeup basically got forgotten
> > - all the other exclusive waiters will remain waiting.
>
> So the place that detects interruption is prepare_to_wait_event():

Yes,

> long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state)
> {
>         unsigned long flags;
>         long ret = 0;
>
>         spin_lock_irqsave(&wq_head->lock, flags);
>         if (signal_pending_state(state, current)) {
>                 /*
>                  * Exclusive waiter must not fail if it was selected by wakeup,
>                  * it should "consume" the condition we were waiting for.
>                  *
>                  * The caller will recheck the condition and return success if
>                  * we were already woken up, we can not miss the event because
>                  * wakeup locks/unlocks the same wq_head->lock.
>                  *
>                  * But we need to ensure that set-condition + wakeup after that
>                  * can't see us, it should wake up another exclusive waiter if
>                  * we fail.
>                  */
>                 list_del_init(&wq_entry->entry);
>                 ret = -ERESTARTSYS;

...

> I think we can indeed lose an exclusive event here, despite the comment
> that argues that we shouldn't: if we were already removed from the list

If we were already removed from the list and condition is true, we can't
miss it, ret = -ERESTARTSYS won't be used. This is what this part of the
comment above

	 * The caller will recheck the condition and return success if
	 * we were already woken up, we can not miss the event because
	 * wakeup locks/unlocks the same wq_head->lock.

tries to explain.

> then list_del_init() does nothing and loses the exclusive event AFAICS.

list_del_init() ensures that wake_up() can't pick this task after
prepare_to_wait_event() returns.

IOW. Suppose that ___wait_event() races with

	condition = true;
	wake_up();

if wake_up() happens before prepare_to_wait_event(), __wait_event() will
see condition == true, -ERESTARTSYS returned by prepare_to_wait_event() has
no effect.

If wake_up() comes after prepare_to_wait_event(), the task was already removed
from the list, another exclusive waiter (if any) will be woken up. In this case
__wait_event() can return success or -ERESTARTSYS, both are correct.

No?

Oleg.


  parent reply	other threads:[~2019-12-09 12:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-08 21:12 Fundamental race condition in wait_event_interruptible_exclusive() ? Linus Torvalds
2019-12-09  9:18 ` [RFC PATCH] sched/wait: Make interruptible exclusive waitqueue wakeups reliable Ingo Molnar
2019-12-09 10:27   ` Ingo Molnar
2019-12-09 13:00     ` Oleg Nesterov
2019-12-10  7:21       ` Ingo Molnar
2019-12-10 19:19         ` [PATCH] sched/wait: fix ___wait_var_event(exclusive) Oleg Nesterov
2019-12-17 12:39           ` [tip: sched/core] " tip-bot2 for Oleg Nesterov
2019-12-09 18:06     ` [RFC PATCH] sched/wait: Make interruptible exclusive waitqueue wakeups reliable Linus Torvalds
2019-12-09 12:08   ` Oleg Nesterov [this message]
2019-12-10  7:29     ` Ingo Molnar
2019-12-10 17:30       ` Oleg Nesterov
2019-12-09 17:38 ` Fundamental race condition in wait_event_interruptible_exclusive() ? Oleg Nesterov
2019-12-09 18:03   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191209120852.GA5388@redhat.com \
    --to=oleg@redhat.com \
    --cc=balbi@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.