All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: 5.13-rt1 + KVM = WARNING: at fs/eventfd.c:74 eventfd_signal()
Date: Fri, 23 Jul 2021 09:59:55 +0200	[thread overview]
Message-ID: <f4ee6cdd-85a0-5f24-b028-0e3736027e7a@redhat.com> (raw)
In-Reply-To: <20210723022356.1301-1-hdanton@sina.com>

On 23/07/21 04:23, Hillf Danton wrote:
> Detect concurrent reader and writer by reading event counter before and
> after poll_wait(), and determine feedback with the case of unstable
> counter taken into account.
> 
> Cut the big comment as the added barriers speak for themselves.

First and foremost, I'm not sure what you are trying to fix.

Second, the patch is wrong even without taking into account the lockless
accesses, because the condition for returning EPOLLOUT is certainly wrong.

Third, barriers very rarely speak for themselves.  In particular what
do they pair with?  It seems to me that you are basically reintroducing
the same mistake that commit a484c3dd9426 ("eventfd: document lockless
access in eventfd_poll", 2016-03-22) fixed, at the time where the big
comment was introduced:

     Things aren't as simple as the read barrier in eventfd_poll
     would suggest.  In fact, the read barrier, besides lacking a
     comment, is not paired in any obvious manner with another read
     barrier, and it is pointless because it is sitting between a write
     (deep in poll_wait) and the read of ctx->count.

Paolo


> +++ x/fs/eventfd.c
> @@ -131,49 +131,20 @@ static __poll_t eventfd_poll(struct file
>   {
>   	struct eventfd_ctx *ctx = file->private_data;
>   	__poll_t events = 0;
> -	u64 count;
> +	u64 c0, count;
> +
> +	c0 = ctx->count;
> +	smp_rmb();
>   
>   	poll_wait(file, &ctx->wqh, wait);
>   
> -	/*
> -	 * All writes to ctx->count occur within ctx->wqh.lock.  This read
> -	 * can be done outside ctx->wqh.lock because we know that poll_wait
> -	 * takes that lock (through add_wait_queue) if our caller will sleep.
> -	 *
> -	 * The read _can_ therefore seep into add_wait_queue's critical
> -	 * section, but cannot move above it!  add_wait_queue's spin_lock acts
> -	 * as an acquire barrier and ensures that the read be ordered properly
> -	 * against the writes.  The following CAN happen and is safe:
> -	 *
> -	 *     poll                               write
> -	 *     -----------------                  ------------
> -	 *     lock ctx->wqh.lock (in poll_wait)
> -	 *     count = ctx->count
> -	 *     __add_wait_queue
> -	 *     unlock ctx->wqh.lock
> -	 *                                        lock ctx->qwh.lock
> -	 *                                        ctx->count += n
> -	 *                                        if (waitqueue_active)
> -	 *                                          wake_up_locked_poll
> -	 *                                        unlock ctx->qwh.lock
> -	 *     eventfd_poll returns 0
> -	 *
> -	 * but the following, which would miss a wakeup, cannot happen:
> -	 *
> -	 *     poll                               write
> -	 *     -----------------                  ------------
> -	 *     count = ctx->count (INVALID!)
> -	 *                                        lock ctx->qwh.lock
> -	 *                                        ctx->count += n
> -	 *                                        **waitqueue_active is false**
> -	 *                                        **no wake_up_locked_poll!**
> -	 *                                        unlock ctx->qwh.lock
> -	 *     lock ctx->wqh.lock (in poll_wait)
> -	 *     __add_wait_queue
> -	 *     unlock ctx->wqh.lock
> -	 *     eventfd_poll returns 0
> -	 */
> -	count = READ_ONCE(ctx->count);
> +	smp_rmb();
> +	count = ctx->count;
> +
> +	if (c0 < count)
> +		return EPOLLIN;
> +	if (c0 > count)
> +		return EPOLLOUT;
>   
>   	if (count > 0)
>   		events |= EPOLLIN;
> 


  reply	other threads:[~2021-07-23  8:00 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14  8:01 5.13-rt1 + KVM = WARNING: at fs/eventfd.c:74 eventfd_signal() Daniel Bristot de Oliveira
2021-07-14  8:10 ` Paolo Bonzini
2021-07-14  9:23   ` Jason Wang
2021-07-14 10:35     ` Paolo Bonzini
2021-07-14 10:41       ` Michael S. Tsirkin
2021-07-14 10:44         ` Paolo Bonzini
2021-07-14 12:20       ` Daniel Bristot de Oliveira
2021-07-15  4:14       ` Jason Wang
2021-07-15  5:58         ` Paolo Bonzini
2021-07-15  6:45           ` Jason Wang
2021-07-15  8:22       ` Daniel Bristot de Oliveira
2021-07-15  8:44         ` He Zhe
2021-07-15  9:51           ` Paolo Bonzini
2021-07-15 10:10             ` He Zhe
2021-07-15 11:05               ` Paolo Bonzini
2021-07-16  2:26                 ` Jason Wang
2021-07-16  2:43                   ` He Zhe
2021-07-16  2:46                     ` Jason Wang
2021-07-15  9:46         ` Paolo Bonzini
2021-07-15 12:34           ` Daniel Bristot de Oliveira
     [not found]       ` <20210715102249.2205-1-hdanton@sina.com>
2021-07-15 12:31         ` Daniel Bristot de Oliveira
     [not found]         ` <20210716020611.2288-1-hdanton@sina.com>
2021-07-16  6:54           ` Paolo Bonzini
     [not found]           ` <20210716075539.2376-1-hdanton@sina.com>
2021-07-16  7:59             ` Paolo Bonzini
     [not found]             ` <20210716093725.2438-1-hdanton@sina.com>
2021-07-16 11:55               ` Paolo Bonzini
2021-07-18 12:42                 ` Hillf Danton
2021-07-19 15:38                   ` Paolo Bonzini
2021-07-21  7:04                     ` Hillf Danton
2021-07-21  7:25                       ` Thomas Gleixner
2021-07-21 10:11                         ` Hillf Danton
2021-07-21 10:59                           ` Paolo Bonzini
2021-07-22  5:58                             ` Hillf Danton
2021-07-23  2:23                             ` Hillf Danton
2021-07-23  7:59                               ` Paolo Bonzini [this message]
2021-07-23  9:48                                 ` Hillf Danton
2021-07-23 10:56                                   ` Paolo Bonzini
2021-07-24  4:33                                     ` Hillf Danton
2021-07-26 11:03                                       ` Paolo Bonzini
2021-07-28  8:06       ` Thomas Gleixner
2021-07-28 10:21         ` Paolo Bonzini
2021-07-28 19:07           ` Thomas Gleixner
2021-07-29 11:01             ` [PATCH] eventfd: Make signal recursion protection a task bit Thomas Gleixner
2021-07-29 14:32               ` Daniel Bristot de Oliveira
2021-07-29 19:23               ` Daniel Bristot de Oliveira
2021-08-26  7:03               ` Jason Wang
2021-08-27 23:41               ` [tip: sched/core] " tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4ee6cdd-85a0-5f24-b028-0e3736027e7a@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=bigeasy@linutronix.de \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.