All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Bristot de Oliveira <bristot@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Juri Lelli <jlelli@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>, He Zhe <zhe.he@windriver.com>
Subject: Re: 5.13-rt1 + KVM = WARNING: at fs/eventfd.c:74 eventfd_signal()
Date: Thu, 15 Jul 2021 10:22:46 +0200	[thread overview]
Message-ID: <a56ddd50-2cd1-f16e-5180-5232c449fbd0@redhat.com> (raw)
In-Reply-To: <475f84e2-78ee-1a24-ef57-b16c1f2651ed@redhat.com>

On 7/14/21 12:35 PM, Paolo Bonzini wrote:
> On 14/07/21 11:23, Jason Wang wrote:
>>> This was added in 2020, so it's unlikely to be the direct cause of the
>>> change.  What is a known-good version for the host?
>>>
>>> Since it is not KVM stuff, I'm CCing Michael and Jason.
>>
>> I think this can be probably fixed here:
>>
>> https://lore.kernel.org/lkml/20210618084412.18257-1-zhe.he@windriver.com/
> 
> That seems wrong; in particular it wouldn't protect against AB/BA deadlocks.
> In fact, the bug is with the locking; the code assumes that
> spin_lock_irqsave/spin_unlock_irqrestore is non-preemptable and therefore
> increments and decrements the percpu variable inside the critical section.
> 
> This obviously does not fly with PREEMPT_RT; the right fix should be
> using a local_lock.  Something like this (untested!!):

the lock needs to be per-pcu... but so far, so good. I will continue using the
system in the next days to see if it blows on another way.

The patch looks like this now:

------------------------- 8< ---------------------
Subject: [PATCH] eventfd: protect eventfd_wake_count with a local_lock

eventfd_signal assumes that spin_lock_irqsave/spin_unlock_irqrestore is
non-preemptable and therefore increments and decrements the percpu
variable inside the critical section.

This obviously does not fly with PREEMPT_RT. If eventfd_signal is
preempted and an unrelated thread calls eventfd_signal, the result is
a spurious WARN. To avoid this, protect the percpu variable with a
local_lock.

Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: b5e683d5cab8 ("eventfd: track eventfd_signal() recursion depth")
Cc: stable@vger.kernel.org
Cc: He Zhe <zhe.he@windriver.com>
Cc: Jens Axboe <axboe@kernel.dk>
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Co-developed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
---
 fs/eventfd.c            | 27 ++++++++++++++++++++++-----
 include/linux/eventfd.h |  7 +------
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index e265b6dd4f34..9754fcd38690 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -12,6 +12,7 @@
 #include <linux/fs.h>
 #include <linux/sched/signal.h>
 #include <linux/kernel.h>
+#include <linux/local_lock.h>
 #include <linux/slab.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
@@ -25,8 +26,6 @@
 #include <linux/idr.h>
 #include <linux/uio.h>

-DEFINE_PER_CPU(int, eventfd_wake_count);
-
 static DEFINE_IDA(eventfd_ida);

 struct eventfd_ctx {
@@ -45,6 +44,20 @@ struct eventfd_ctx {
 	int id;
 };

+struct event_fd_recursion {
+	local_lock_t lock;
+	int count;
+};
+
+static DEFINE_PER_CPU(struct event_fd_recursion, event_fd_recursion) = {
+	.lock = INIT_LOCAL_LOCK(lock),
+};
+
+bool eventfd_signal_count(void)
+{
+	return this_cpu_read(event_fd_recursion.count);
+}
+
 /**
  * eventfd_signal - Adds @n to the eventfd counter.
  * @ctx: [in] Pointer to the eventfd context.
@@ -71,18 +84,22 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
 	 * it returns true, the eventfd_signal() call should be deferred to a
 	 * safe context.
 	 */
-	if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
+	local_lock(&event_fd_recursion.lock);
+	if (WARN_ON_ONCE(this_cpu_read(event_fd_recursion.count))) {
+		local_unlock(&event_fd_recursion.lock);
 		return 0;
+	}

 	spin_lock_irqsave(&ctx->wqh.lock, flags);
-	this_cpu_inc(eventfd_wake_count);
+	this_cpu_inc(event_fd_recursion.count);
 	if (ULLONG_MAX - ctx->count < n)
 		n = ULLONG_MAX - ctx->count;
 	ctx->count += n;
 	if (waitqueue_active(&ctx->wqh))
 		wake_up_locked_poll(&ctx->wqh, EPOLLIN);
-	this_cpu_dec(eventfd_wake_count);
+	this_cpu_dec(event_fd_recursion.count);
 	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+	local_unlock(&event_fd_recursion.lock);

 	return n;
 }
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index fa0a524baed0..ca89d6c409c1 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -43,12 +43,7 @@ int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx,
wait_queue_entry_t *w
 				  __u64 *cnt);
 void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt);

-DECLARE_PER_CPU(int, eventfd_wake_count);
-
-static inline bool eventfd_signal_count(void)
-{
-	return this_cpu_read(eventfd_wake_count);
-}
+bool eventfd_signal_count(void);

 #else /* CONFIG_EVENTFD */

-- 


  parent reply	other threads:[~2021-07-15  8:22 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14  8:01 5.13-rt1 + KVM = WARNING: at fs/eventfd.c:74 eventfd_signal() Daniel Bristot de Oliveira
2021-07-14  8:10 ` Paolo Bonzini
2021-07-14  9:23   ` Jason Wang
2021-07-14 10:35     ` Paolo Bonzini
2021-07-14 10:41       ` Michael S. Tsirkin
2021-07-14 10:44         ` Paolo Bonzini
2021-07-14 12:20       ` Daniel Bristot de Oliveira
2021-07-15  4:14       ` Jason Wang
2021-07-15  5:58         ` Paolo Bonzini
2021-07-15  6:45           ` Jason Wang
2021-07-15  8:22       ` Daniel Bristot de Oliveira [this message]
2021-07-15  8:44         ` He Zhe
2021-07-15  9:51           ` Paolo Bonzini
2021-07-15 10:10             ` He Zhe
2021-07-15 11:05               ` Paolo Bonzini
2021-07-16  2:26                 ` Jason Wang
2021-07-16  2:43                   ` He Zhe
2021-07-16  2:46                     ` Jason Wang
2021-07-15  9:46         ` Paolo Bonzini
2021-07-15 12:34           ` Daniel Bristot de Oliveira
     [not found]       ` <20210715102249.2205-1-hdanton@sina.com>
2021-07-15 12:31         ` Daniel Bristot de Oliveira
     [not found]         ` <20210716020611.2288-1-hdanton@sina.com>
2021-07-16  6:54           ` Paolo Bonzini
     [not found]           ` <20210716075539.2376-1-hdanton@sina.com>
2021-07-16  7:59             ` Paolo Bonzini
     [not found]             ` <20210716093725.2438-1-hdanton@sina.com>
2021-07-16 11:55               ` Paolo Bonzini
2021-07-18 12:42                 ` Hillf Danton
2021-07-19 15:38                   ` Paolo Bonzini
2021-07-21  7:04                     ` Hillf Danton
2021-07-21  7:25                       ` Thomas Gleixner
2021-07-21 10:11                         ` Hillf Danton
2021-07-21 10:59                           ` Paolo Bonzini
2021-07-22  5:58                             ` Hillf Danton
2021-07-23  2:23                             ` Hillf Danton
2021-07-23  7:59                               ` Paolo Bonzini
2021-07-23  9:48                                 ` Hillf Danton
2021-07-23 10:56                                   ` Paolo Bonzini
2021-07-24  4:33                                     ` Hillf Danton
2021-07-26 11:03                                       ` Paolo Bonzini
2021-07-28  8:06       ` Thomas Gleixner
2021-07-28 10:21         ` Paolo Bonzini
2021-07-28 19:07           ` Thomas Gleixner
2021-07-29 11:01             ` [PATCH] eventfd: Make signal recursion protection a task bit Thomas Gleixner
2021-07-29 14:32               ` Daniel Bristot de Oliveira
2021-07-29 19:23               ` Daniel Bristot de Oliveira
2021-08-26  7:03               ` Jason Wang
2021-08-27 23:41               ` [tip: sched/core] " tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a56ddd50-2cd1-f16e-5180-5232c449fbd0@redhat.com \
    --to=bristot@redhat.com \
    --cc=bigeasy@linutronix.de \
    --cc=jasowang@redhat.com \
    --cc=jlelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zhe.he@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.