All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: [RFC PATCH 1/2] sched/wait: Add add_wait_queue_priority()
Date: Mon, 26 Oct 2020 17:53:24 +0000	[thread overview]
Message-ID: <20201026175325.585623-1-dwmw2@infradead.org> (raw)

From: David Woodhouse <dwmw@amazon.co.uk>

This allows an exclusive wait_queue_entry to be added at the head of the
queue, instead of the tail as normal. Thus, it gets to consume events
first.

The problem I'm trying to solve here is interrupt remapping invalidation
vs. MSI interrupts from VFIO. I'd really like KVM IRQFD to be able to
consume events before (and indeed instead of) userspace.

When the remapped MSI target in the KVM routing table is invalidated,
the VMM needs to *deassociate* the IRQFD and fall back to handling the
next IRQ in userspace, so it can be retranslated and a fault reported
if appropriate.

It's possible to do that by constantly registering and deregistering the
fd in the userspace poll loop, but it gets ugly especially because the
fallback handler isn't really local to the core MSI handling.

It's much nicer if the userspace handler can just remain registered all
the time, and it just doesn't get any events when KVM steals them first.
Which is precisely what happens with posted interrupts, and this makes
it consistent. (Unless I'm missing something that prevents posted
interrupts from working when there's another listener on the eventfd?)

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 include/linux/wait.h | 12 +++++++++++-
 kernel/sched/wait.c  | 11 +++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 27fb99cfeb02..fe10e8570a52 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -22,6 +22,7 @@ int default_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int
 #define WQ_FLAG_BOOKMARK	0x04
 #define WQ_FLAG_CUSTOM		0x08
 #define WQ_FLAG_DONE		0x10
+#define WQ_FLAG_PRIORITY	0x20
 
 /*
  * A single wait-queue entry structure:
@@ -164,11 +165,20 @@ static inline bool wq_has_sleeper(struct wait_queue_head *wq_head)
 
 extern void add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry);
 extern void add_wait_queue_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry);
+extern void add_wait_queue_priority(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry);
 extern void remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry);
 
 static inline void __add_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry)
 {
-	list_add(&wq_entry->entry, &wq_head->head);
+	struct list_head *head = &wq_head->head;
+	struct wait_queue_entry *wq;
+
+	list_for_each_entry(wq, &wq_head->head, entry) {
+		if (!(wq->flags & WQ_FLAG_PRIORITY))
+			break;
+		head = &wq->entry;
+	}
+	list_add(&wq_entry->entry, head);
 }
 
 /*
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 01f5d3020589..d2a84c8e88bf 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -37,6 +37,17 @@ void add_wait_queue_exclusive(struct wait_queue_head *wq_head, struct wait_queue
 }
 EXPORT_SYMBOL(add_wait_queue_exclusive);
 
+void add_wait_queue_priority(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry)
+{
+	unsigned long flags;
+
+	wq_entry->flags |= WQ_FLAG_EXCLUSIVE | WQ_FLAG_PRIORITY;
+	spin_lock_irqsave(&wq_head->lock, flags);
+	__add_wait_queue(wq_head, wq_entry);
+	spin_unlock_irqrestore(&wq_head->lock, flags);
+}
+EXPORT_SYMBOL_GPL(add_wait_queue_priority);
+
 void remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry)
 {
 	unsigned long flags;
-- 
2.26.2


             reply	other threads:[~2020-10-26 17:53 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26 17:53 David Woodhouse [this message]
2020-10-26 17:53 ` [RFC PATCH 2/2] kvm/eventfd: Use priority waitqueue to catch events before userspace David Woodhouse
2020-10-27  8:01   ` Paolo Bonzini
2020-10-27 10:15     ` David Woodhouse
2020-10-27 13:55     ` [PATCH 0/3] Allow in-kernel consumers to drain events from eventfd David Woodhouse
2020-10-27 13:55       ` [PATCH 1/3] eventfd: Export eventfd_ctx_do_read() David Woodhouse
2020-10-27 13:55       ` [PATCH 2/3] vfio/virqfd: Drain events from eventfd in virqfd_wakeup() David Woodhouse
2020-11-06 23:29         ` Alex Williamson
2020-11-08  9:17           ` Paolo Bonzini
2020-10-27 13:55       ` [PATCH 3/3] kvm/eventfd: Drain events from eventfd in irqfd_wakeup() David Woodhouse
2020-10-27 18:41         ` kernel test robot
2020-10-27 18:41           ` kernel test robot
2020-10-27 21:42         ` kernel test robot
2020-10-27 21:42           ` kernel test robot
2020-10-27 23:13         ` kernel test robot
2020-10-27 23:13           ` kernel test robot
2020-10-27 14:39 ` [PATCH v2 0/2] Allow KVM IRQFD to consistently intercept events David Woodhouse
2020-10-27 14:39   ` [PATCH v2 1/2] sched/wait: Add add_wait_queue_priority() David Woodhouse
2020-10-27 19:09     ` Peter Zijlstra
2020-10-27 19:27       ` David Woodhouse
2020-10-27 20:30         ` Peter Zijlstra
2020-10-27 20:49           ` David Woodhouse
2020-10-27 21:32           ` David Woodhouse
2020-10-28 14:20             ` Peter Zijlstra
2020-10-28 14:44               ` Paolo Bonzini
2020-10-28 14:35     ` Peter Zijlstra
2020-11-04  9:35       ` David Woodhouse
2020-11-04 11:25         ` Paolo Bonzini
2020-11-06 10:17         ` Paolo Bonzini
2020-11-06 16:32           ` Alex Williamson
2020-11-06 17:18             ` David Woodhouse
2020-10-27 14:39   ` [PATCH v2 2/2] kvm/eventfd: Use priority waitqueue to catch events before userspace David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201026175325.585623-1-dwmw2@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.