linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V2 00/15] Lock ordering documentation and annotation for lockdep
@ 2020-03-18 20:43 Thomas Gleixner
  2020-03-18 20:43 ` [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait() Thomas Gleixner
                   ` (12 more replies)
  0 siblings, 13 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

This is the second version of this work. The first one can be found here:

   https://lore.kernel.org/r/20200313174701.148376-1-bigeasy@linutronix.de

Changes since V1:

  - Split the PCI/switchtec patch (picked up the fix from Logan) and
    reworked the change log.

  - Addressed Linus feedback vs. completions.

    Most of the places which had open coded completion variants have been
    analysed and fixed up to use the regular interfaces.

    The PS3 one got converted by Peter Zijlstra to rcu_wait().

    Add explanation in the change log why swait actually fits the
    completion semantics.

  - Addressed Randys feedback on documentation

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait()
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-18 21:25   ` Bjorn Helgaas
  2020-03-18 20:43 ` [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll Thomas Gleixner
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

From: Logan Gunthorpe <logang@deltatee.com>

The call to init_completion() in mrpc_queue_cmd() can theoretically
race with the call to poll_wait() in switchtec_dev_poll().

  poll()			write()
    switchtec_dev_poll()   	  switchtec_dev_write()
      poll_wait(&s->comp.wait);      mrpc_queue_cmd()
			               init_completion(&s->comp)
				         init_waitqueue_head(&s->comp.wait)

To my knowledge, no one has hit this bug.

Fix this by using reinit_completion() instead of init_completion() in
mrpc_queue_cmd().

Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com

---
 drivers/pci/switch/switchtec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index a823b4b8ef8a..81dc7ac01381 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	init_completion(&stuser->comp);
+	reinit_completion(&stuser->comp);
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
  2020-03-18 20:43 ` [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait() Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-18 21:26   ` Bjorn Helgaas
  2020-03-18 22:11   ` Logan Gunthorpe
  2020-03-18 20:43 ` [patch V2 03/15] usb: gadget: Use completion interface instead of open coding it Thomas Gleixner
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Kurt Schwemmer, Logan Gunthorpe,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The poll callback is using the completion wait queue and sticks it into
poll_wait() to wake up pollers after a command has completed.

This works to some extent, but cannot provide EPOLLEXCLUSIVE support
because the waker side uses complete_all() which unconditionally wakes up
all waiters. complete_all() is required because completions internally use
exclusive wait and complete() only wakes up one waiter by default.

This mixes conceptually different mechanisms and relies on internal
implementation details of completions, which in turn puts contraints on
changing the internal implementation of completions.

Replace it with a regular wait queue and store the state in struct
switchtec_user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
V2: Reworded changelog.
---
 drivers/pci/switch/switchtec.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -52,10 +52,11 @@ struct switchtec_user {
 
 	enum mrpc_state state;
 
-	struct completion comp;
+	wait_queue_head_t cmd_comp;
 	struct kref kref;
 	struct list_head list;
 
+	bool cmd_done;
 	u32 cmd;
 	u32 status;
 	u32 return_code;
@@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
 	stuser->stdev = stdev;
 	kref_init(&stuser->kref);
 	INIT_LIST_HEAD(&stuser->list);
-	init_completion(&stuser->comp);
+	init_waitqueue_head(&stuser->cmd_comp);
 	stuser->event_cnt = atomic_read(&stdev->event_cnt);
 
 	dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
@@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	reinit_completion(&stuser->comp);
+	stuser->cmd_done = false;
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
@@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
 		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
 			      stuser->read_len);
 out:
-	complete_all(&stuser->comp);
+	stuser->cmd_done = true;
+	wake_up_interruptible(&stuser->cmd_comp);
 	list_del_init(&stuser->list);
 	stuser_put(stuser);
 	stdev->mrpc_busy = 0;
@@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
 	mutex_unlock(&stdev->mrpc_mutex);
 
 	if (filp->f_flags & O_NONBLOCK) {
-		if (!try_wait_for_completion(&stuser->comp))
+		if (!stuser->cmd_done)
 			return -EAGAIN;
 	} else {
-		rc = wait_for_completion_interruptible(&stuser->comp);
+		rc = wait_event_interruptible(stuser->cmd_comp,
+					      stuser->cmd_done);
 		if (rc < 0)
 			return rc;
 	}
@@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
 	struct switchtec_dev *stdev = stuser->stdev;
 	__poll_t ret = 0;
 
-	poll_wait(filp, &stuser->comp.wait, wait);
+	poll_wait(filp, &stuser->cmd_comp, wait);
 	poll_wait(filp, &stdev->event_wq, wait);
 
 	if (lock_mutex_and_test_alive(stdev))
@@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
 
 	mutex_unlock(&stdev->mrpc_mutex);
 
-	if (try_wait_for_completion(&stuser->comp))
+	if (stuser->cmd_done)
 		ret |= EPOLLIN | EPOLLRDNORM;
 
 	if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
@@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
 
 	/* Wake up and kill any users waiting on an MRPC request */
 	list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
-		complete_all(&stuser->comp);
+		stuser->cmd_done = true;
+		wake_up_interruptible(&stuser->cmd_comp);
 		list_del_init(&stuser->list);
 		stuser_put(stuser);
 	}


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 03/15] usb: gadget: Use completion interface instead of open coding it
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
  2020-03-18 20:43 ` [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait() Thomas Gleixner
  2020-03-18 20:43 ` [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-19  8:41   ` Greg Kroah-Hartman
  2020-03-18 20:43 ` [patch V2 04/15] orinoco_usb: Use the regular completion interfaces Thomas Gleixner
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Logan Gunthorpe, Kurt Schwemmer, Bjorn Helgaas,
	linux-pci, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

ep_io() uses a completion on stack and open codes the waiting with:

  wait_event_interruptible (done.wait, done.done);
and
  wait_event (done.wait, done.done);

This waits in non-exclusive mode for complete(), but there is no reason to
do so because the completion can only be waited for by the task itself and
complete() wakes exactly one exlusive waiter.

Replace the open coded implementation with the corresponding
wait_for_completion*() functions.

No functional change.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid the conversion to swait interfaces later
---
 drivers/usb/gadget/legacy/inode.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf
 	spin_unlock_irq (&epdata->dev->lock);
 
 	if (likely (value == 0)) {
-		value = wait_event_interruptible (done.wait, done.done);
+		value = wait_for_completion_interruptible(&done);
 		if (value != 0) {
 			spin_lock_irq (&epdata->dev->lock);
 			if (likely (epdata->ep != NULL)) {
@@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf
 				usb_ep_dequeue (epdata->ep, epdata->req);
 				spin_unlock_irq (&epdata->dev->lock);
 
-				wait_event (done.wait, done.done);
+				wait_for_completion(&done);
 				if (epdata->status == -ECONNRESET)
 					epdata->status = -EINTR;
 			} else {


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 04/15] orinoco_usb: Use the regular completion interfaces
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 03/15] usb: gadget: Use completion interface instead of open coding it Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-19  8:40   ` Greg Kroah-Hartman
  2020-03-18 20:43 ` [patch V2 05/15] acpi: Remove header dependency Thomas Gleixner
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, linuxppc-dev

From: Thomas Gleixner <tglx@linutronix.de>

The completion usage in this driver is interesting:

  - it uses a magic complete function which according to the comment was
    implemented by invoking complete() four times in a row because
    complete_all() was not exported at that time.

  - it uses an open coded wait/poll which checks completion:done. Only one wait
    side (device removal) uses the regular wait_for_completion() interface.

The rationale behind this is to prevent that wait_for_completion() consumes
completion::done which would prevent that all waiters are woken. This is not
necessary with complete_all() as that sets completion::done to UINT_MAX which
is left unmodified by the woken waiters.

Replace the magic complete function with complete_all() and convert the
open coded wait/poll to regular completion interfaces.

This changes the wait to exclusive wait mode. But that does not make any
difference because the wakers use complete_all() which ignores the
exclusive mode.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
---
V2: New patch to avoid conversion to swait functions later.
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -365,17 +365,6 @@ static struct request_context *ezusb_all
 	return ctx;
 }
 
-
-/* Hopefully the real complete_all will soon be exported, in the mean
- * while this should work. */
-static inline void ezusb_complete_all(struct completion *comp)
-{
-	complete(comp);
-	complete(comp);
-	complete(comp);
-	complete(comp);
-}
-
 static void ezusb_ctx_complete(struct request_context *ctx)
 {
 	struct ezusb_priv *upriv = ctx->upriv;
@@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct re
 
 			netif_wake_queue(dev);
 		}
-		ezusb_complete_all(&ctx->done);
+		complete_all(&ctx->done);
 		ezusb_request_context_put(ctx);
 		break;
 
@@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct re
 			/* This is normal, as all request contexts get flushed
 			 * when the device is disconnected */
 			err("Called, CTX not terminating, but device gone");
-			ezusb_complete_all(&ctx->done);
+			complete_all(&ctx->done);
 			ezusb_request_context_put(ctx);
 			break;
 		}
@@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ez
 			 * get the chance to run themselves. So we make sure
 			 * that we don't sleep for ever */
 			int msecs = DEF_TIMEOUT * (1000 / HZ);
-			while (!ctx->done.done && msecs--)
+
+			while (!try_wait_for_completion(&ctx->done) && msecs--)
 				udelay(1000);
 		} else {
-			wait_event_interruptible(ctx->done.wait,
-						 ctx->done.done);
+			wait_for_completion(&ctx->done);
 		}
 		break;
 	default:


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 05/15] acpi: Remove header dependency
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (3 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 04/15] orinoco_usb: Use the regular completion interfaces Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-18 20:43 ` [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event() Thomas Gleixner
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

In order to avoid future header hell, remove the inclusion of
proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
struct.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/platform/x86/dell-smo8800.c                      |    1 +
 drivers/platform/x86/wmi.c                               |    1 +
 drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
 include/acpi/acpi_bus.h                                  |    2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/platform/x86/dell-smo8800.c
+++ b/drivers/platform/x86/dell-smo8800.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/miscdevice.h>
 #include <linux/uaccess.h>
+#include <linux/fs.h>
 
 struct smo8800_device {
 	u32 irq;                     /* acpi device irq */
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -29,6 +29,7 @@
 #include <linux/uaccess.h>
 #include <linux/uuid.h>
 #include <linux/wmi.h>
+#include <linux/fs.h>
 #include <uapi/linux/wmi.h>
 
 ACPI_MODULE_NAME("wmi");
--- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
+++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
@@ -19,6 +19,7 @@
 #include <linux/acpi.h>
 #include <linux/uaccess.h>
 #include <linux/miscdevice.h>
+#include <linux/fs.h>
 #include "acpi_thermal_rel.h"
 
 static acpi_handle acpi_thermal_rel_handle;
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
 
 #ifdef CONFIG_ACPI
 
-#include <linux/proc_fs.h>
+struct proc_dir_entry;
 
 #define ACPI_BUS_FILE_ROOT	"acpi"
 extern struct proc_dir_entry *acpi_root_dir;


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event()
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (4 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 05/15] acpi: Remove header dependency Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-20  5:36   ` Davidlohr Bueso
  2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
  2020-03-18 20:43 ` [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait Thomas Gleixner
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Oleg Nesterov, Davidlohr Bueso, Sebastian Andrzej Siewior,
	Logan Gunthorpe, Kurt Schwemmer, Bjorn Helgaas, linux-pci,
	Felipe Balbi, Greg Kroah-Hartman, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Michael Ellerman,
	Arnd Bergmann, linuxppc-dev

Extend rcuwait_wait_event() with a state variable so that it is not
restricted to UNINTERRUPTIBLE waits.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
---
 include/linux/rcuwait.h       |   12 ++++++++++--
 kernel/locking/percpu-rwsem.c |    2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -3,6 +3,7 @@
 #define _LINUX_RCUWAIT_H_
 
 #include <linux/rcupdate.h>
+#include <linux/sched/signal.h>
 
 /*
  * rcuwait provides a way of blocking and waking up a single
@@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwa
  * The caller is responsible for locking around rcuwait_wait_event(),
  * such that writes to @task are properly serialized.
  */
-#define rcuwait_wait_event(w, condition)				\
+#define rcuwait_wait_event(w, condition, state)				\
 ({									\
+	int __ret = 0;							\
 	rcu_assign_pointer((w)->task, current);				\
 	for (;;) {							\
 		/*							\
 		 * Implicit barrier (A) pairs with (B) in		\
 		 * rcuwait_wake_up().					\
 		 */							\
-		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		set_current_state(state);				\
 		if (condition)						\
 			break;						\
 									\
+		if (signal_pending_state(state, current)) {		\
+			__ret = -EINTR;					\
+			break;						\
+		}							\
+									\
 		schedule();						\
 	}								\
 									\
 	WRITE_ONCE((w)->task, NULL);					\
 	__set_current_state(TASK_RUNNING);				\
+	__ret;								\
 })
 
 #endif /* _LINUX_RCUWAIT_H_ */
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -162,7 +162,7 @@ void percpu_down_write(struct percpu_rw_
 	 */
 
 	/* Wait for all now active readers to complete. */
-	rcuwait_wait_event(&sem->writer, readers_active_check(sem));
+	rcuwait_wait_event(&sem->writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (5 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event() Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-19  9:00   ` Sebastian Andrzej Siewior
                     ` (2 more replies)
  2020-03-18 20:43 ` [patch V2 08/15] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
                   ` (5 subsequent siblings)
  12 siblings, 3 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Michael Ellerman, Arnd Bergmann, linuxppc-dev,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso

The PS3 notification interrupt and kthread use a hacked up completion to
communicate. Since we're wanting to change the completion implementation and
this is abuse anyway, replace it with a simple rcuwait since there is only ever
the one waiter.

AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
cannot receive signals by default and this one doesn't look different. Use
TASK_IDLE instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
---
V2: New patch to avoid the magic completion wait variant
---
 arch/powerpc/platforms/ps3/device-init.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/reboot.h>
+#include <linux/rcuwait.h>
 
 #include <asm/firmware.h>
 #include <asm/lv1call.h>
@@ -670,7 +671,8 @@ struct ps3_notification_device {
 	spinlock_t lock;
 	u64 tag;
 	u64 lv1_status;
-	struct completion done;
+	struct rcuwait wait;
+	bool done;
 };
 
 enum ps3_notify_type {
@@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_inte
 		pr_debug("%s:%u: completed, status 0x%llx\n", __func__,
 			 __LINE__, status);
 		dev->lv1_status = status;
-		complete(&dev->done);
+		dev->done = true;
+		rcuwait_wake_up(&dev->wait);
 	}
 	spin_unlock(&dev->lock);
 	return IRQ_HANDLED;
@@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
 	unsigned long flags;
 	int res;
 
-	init_completion(&dev->done);
 	spin_lock_irqsave(&dev->lock, flags);
 	res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 					&dev->tag)
 		    : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 				       &dev->tag);
+	dev->done = false;
 	spin_unlock_irqrestore(&dev->lock, flags);
 	if (res) {
 		pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
@@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
 	}
 	pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
 
-	res = wait_event_interruptible(dev->done.wait,
-				       dev->done.done || kthread_should_stop());
+	rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE);
+
 	if (kthread_should_stop())
 		res = -EINTR;
-	if (res) {
-		pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op);
-		return res;
-	}
 
 	if (dev->lv1_status) {
 		pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__,


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (6 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-18 22:31   ` Paul E. McKenney
                     ` (3 more replies)
  2020-03-18 20:43 ` [patch V2 09/15] timekeeping: Split jiffies seqlock Thomas Gleixner
                   ` (4 subsequent siblings)
  12 siblings, 4 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

From: Thomas Gleixner <tglx@linutronix.de>

The kernel provides a variety of locking primitives. The nesting of these
lock types and the implications of them on RT enabled kernels is nowhere
documented.

Add initial documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Addressed review comments from Randy
---
 Documentation/locking/index.rst     |    1 
 Documentation/locking/locktypes.rst |  298 ++++++++++++++++++++++++++++++++++++
 2 files changed, 299 insertions(+)
 create mode 100644 Documentation/locking/locktypes.rst

--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -7,6 +7,7 @@ locking
 .. toctree::
     :maxdepth: 1
 
+    locktypes
     lockdep-design
     lockstat
     locktorture
--- /dev/null
+++ b/Documentation/locking/locktypes.rst
@@ -0,0 +1,298 @@
+.. _kernel_hacking_locktypes:
+
+==========================
+Lock types and their rules
+==========================
+
+Introduction
+============
+
+The kernel provides a variety of locking primitives which can be divided
+into two categories:
+
+ - Sleeping locks
+ - Spinning locks
+
+This document describes the lock types at least at the conceptual level and
+provides rules for nesting of lock types also under the aspect of PREEMPT_RT.
+
+Lock categories
+===============
+
+Sleeping locks
+--------------
+
+Sleeping locks can only be acquired in preemptible task context.
+
+Some of the implementations allow try_lock() attempts from other contexts,
+but that has to be really evaluated carefully including the question
+whether the unlock can be done from that context safely as well.
+
+Note that some lock types change their implementation details when
+debugging is enabled, so this should be really only considered if there is
+no other option.
+
+Sleeping lock types:
+
+ - mutex
+ - rt_mutex
+ - semaphore
+ - rw_semaphore
+ - ww_mutex
+ - percpu_rw_semaphore
+
+On a PREEMPT_RT enabled kernel the following lock types are converted to
+sleeping locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks
+--------------
+
+ - raw_spinlock_t
+ - bit spinlocks
+
+On a non PREEMPT_RT enabled kernel the following lock types are spinning
+locks as well:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks implicitly disable preemption and the lock / unlock functions
+can have suffixes which apply further protections:
+
+ ===================  ====================================================
+ _bh()                Disable / enable bottom halves (soft interrupts)
+ _irq()               Disable / enable interrupts
+ _irqsave/restore()   Save and disable / restore interrupt disabled state
+ ===================  ====================================================
+
+
+rtmutex
+=======
+
+RT-mutexes are mutexes with support for priority inheritance (PI).
+
+PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+interrupt disabled sections.
+
+On a PREEMPT_RT enabled kernel most of these sections are fully
+preemptible. This is possible because PREEMPT_RT forces most executions
+into task context, especially interrupt handlers and soft interrupts, which
+allows to substitute spinlock_t and rwlock_t with RT-mutex based
+implementations.
+
+
+raw_spinlock_t and spinlock_t
+=============================
+
+raw_spinlock_t
+--------------
+
+raw_spinlock_t is a strict spinning lock implementation regardless of the
+kernel configuration including PREEMPT_RT enabled kernels.
+
+raw_spinlock_t is to be used only in real critical core code, low level
+interrupt handling and places where protecting (hardware) state is required
+to be safe against preemption and eventually interrupts.
+
+Another reason to use raw_spinlock_t is when the critical section is tiny
+to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the
+contended case.
+
+spinlock_t
+----------
+
+The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+
+On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+and has exactly the same semantics.
+
+spinlock_t and PREEMPT_RT
+-------------------------
+
+On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+implementation based on rt_mutex which changes the semantics:
+
+ - Preemption is not disabled
+
+ - The hard interrupt related suffixes for spin_lock / spin_unlock
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
+   interrupt disabled state
+
+ - The soft interrupt related suffix (_bh()) is still disabling the
+   execution of soft interrupts, but contrary to a non PREEMPT_RT enabled
+   kernel, which utilizes the preemption count, this is achieved by a per
+   CPU bottom half locking mechanism.
+
+All other semantics of spinlock_t are preserved:
+
+ - Migration of tasks which hold a spinlock_t is prevented. On a non
+   PREEMPT_RT enabled kernel this is implicit due to preemption disable.
+   PREEMPT_RT has a separate mechanism to achieve this. This ensures that
+   pointers to per CPU variables stay valid even if the task is preempted.
+
+ - Task state preservation. The task state is not affected when a lock is
+   contended and the task has to schedule out and wait for the lock to
+   become available. The lock wake up restores the task state unless there
+   was a regular (not lock related) wake up on the task. This ensures that
+   the task state rules are always correct independent of the kernel
+   configuration.
+
+rwlock_t
+========
+
+rwlock_t is a multiple readers and single writer lock mechanism.
+
+On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning
+lock and the suffix rules of spinlock_t apply accordingly. The
+implementation is fair and prevents writer starvation.
+
+rwlock_t and PREEMPT_RT
+-----------------------
+
+On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate
+implementation based on rt_mutex which changes the semantics:
+
+ - Same changes as for spinlock_t
+
+ - The implementation is not fair and can cause writer starvation under
+   certain circumstances. The reason for this is that a writer cannot grant
+   its priority to multiple readers. Readers which are blocked on a writer
+   fully support the priority inheritance protocol.
+
+
+PREEMPT_RT caveats
+==================
+
+spinlock_t and rwlock_t
+-----------------------
+
+The substitution of spinlock_t and rwlock_t on PREEMPT_RT enabled kernels
+with RT-mutex based implementations has a few implications.
+
+On a non PREEMPT_RT enabled kernel the following code construct is
+perfectly fine::
+
+   local_irq_disable();
+   spin_lock(&lock);
+
+and fully equivalent to::
+
+   spin_lock_irq(&lock);
+
+Same applies to rwlock_t and the _irqsave() suffix variant.
+
+On a PREEMPT_RT enabled kernel this breaks because the RT-mutex
+substitution expects a fully preemptible context.
+
+The preferred solution is to use :c:func:`spin_lock_irq()` or
+:c:func:`spin_lock_irqsave()` and their unlock counterparts.
+
+PREEMPT_RT also offers a local_lock mechanism to substitute the
+local_irq_disable/save() constructs in cases where a separation of the
+interrupt disabling and the locking is really unavoidable. This should be
+restricted to very rare cases.
+
+
+raw_spinlock_t
+--------------
+
+Locking of a raw_spinlock_t disables preemption and eventually interrupts.
+Therefore code inside the critical region has to be careful to avoid calls
+into code which takes a regular spinlock_t or rwlock_t. A prime example is
+memory allocation.
+
+On a non PREEMPT_RT enabled kernel the following code construct is
+perfectly fine code::
+
+  raw_spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+On a PREEMPT_RT enabled kernel this breaks because the memory allocator is
+fully preemptible and therefore does not support allocations from truly
+atomic contexts.
+
+Contrary to that the following code construct is perfectly fine on
+PREEMPT_RT as spin_lock() does not disable preemption::
+
+  spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+Most places which use GFP_ATOMIC allocations are safe on PREEMPT_RT as the
+execution is forced into thread context and the lock substitution is
+ensuring preemptibility.
+
+
+bit spinlocks
+-------------
+
+Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
+substituted by an RT-mutex based implementation for obvious reasons.
+
+The semantics of bit spinlocks are preserved on a PREEMPT_RT enabled kernel
+and the caveats vs. raw_spinlock_t apply.
+
+Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
+this requires conditional (#ifdef'ed) code changes at the usage side while
+the spinlock_t substitution is simply done by the compiler and the
+conditionals are restricted to header files and core implementation of the
+locking primitives and the usage sites do not require any changes.
+
+
+Lock type nesting rules
+=======================
+
+The most basic rules are:
+
+  - Lock types of the same lock category (sleeping, spinning) can nest
+    arbitrarily as long as they respect the general lock ordering rules to
+    prevent deadlocks.
+
+  - Sleeping lock types cannot nest inside spinning lock types.
+
+  - Spinning lock types can nest inside sleeping lock types.
+
+These rules apply in general independent of CONFIG_PREEMPT_RT.
+
+As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
+spinning to sleeping this has obviously restrictions how they can nest with
+raw_spinlock_t.
+
+This results in the following nest ordering:
+
+  1) Sleeping locks
+  2) spinlock_t and rwlock_t
+  3) raw_spinlock_t and bit spinlocks
+
+Lockdep is aware of these constraints to ensure that they are respected.
+
+
+Owner semantics
+===============
+
+Most lock types in the Linux kernel have strict owner semantics, i.e. the
+context (task) which acquires a lock has to release it.
+
+There are two exceptions:
+
+  - semaphores
+  - rwsems
+
+semaphores have no strict owner semantics for historical reasons. They are
+often used for both serialization and waiting purposes. That's generally
+discouraged and should be replaced by separate serialization and wait
+mechanisms.
+
+rwsems have grown interfaces which allow non owner release for special
+purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
+substitutes all locking primitives except semaphores with RT-mutex based
+implementations to provide priority inheritance for all lock types except
+the truly spinning ones. Priority inheritance on ownerless locks is
+obviously impossible.
+
+For now the rwsem non-owner release excludes code which utilizes it from
+being used on PREEMPT_RT enabled kernels. In same cases this can be
+mitigated by disabling portions of the code, in other cases the complete
+functionality has to be disabled until a workable solution has been found.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 09/15] timekeeping: Split jiffies seqlock
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (7 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 08/15] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-18 20:43 ` [patch V2 10/15] sched/swait: Prepare usage in completions Thomas Gleixner
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

From: Thomas Gleixner <tglx@linutronix.de>

seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/jiffies.c     |    7 ++++---
 kernel/time/tick-common.c |   10 ++++++----
 kernel/time/tick-sched.c  |   19 ++++++++++++-------
 kernel/time/timekeeping.c |    6 ++++--
 kernel/time/timekeeping.h |    3 ++-
 5 files changed, 28 insertions(+), 17 deletions(-)

--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -58,7 +58,8 @@ static struct clocksource clocksource_ji
 	.max_cycles	= 10,
 };
 
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
+__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
+__cacheline_aligned_in_smp seqcount_t jiffies_seq;
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
@@ -67,9 +68,9 @@ u64 get_jiffies_64(void)
 	u64 ret;
 
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		ret = jiffies_64;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	return ret;
 }
 EXPORT_SYMBOL(get_jiffies_64);
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -84,13 +84,15 @@ int tick_is_oneshot_available(void)
 static void tick_periodic(int cpu)
 {
 	if (tick_do_timer_cpu == cpu) {
-		write_seqlock(&jiffies_lock);
+		raw_spin_lock(&jiffies_lock);
+		write_seqcount_begin(&jiffies_seq);
 
 		/* Keep track of the next tick event */
 		tick_next_period = ktime_add(tick_next_period, tick_period);
 
 		do_timer(1);
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		update_wall_time();
 	}
 
@@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_ev
 		ktime_t next;
 
 		do {
-			seq = read_seqbegin(&jiffies_lock);
+			seq = read_seqcount_begin(&jiffies_seq);
 			next = tick_next_period;
-		} while (read_seqretry(&jiffies_lock, seq));
+		} while (read_seqcount_retry(&jiffies_seq, seq));
 
 		clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(kti
 		return;
 
 	/* Reevaluate with jiffies_lock held */
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 
 	delta = ktime_sub(now, last_jiffies_update);
 	if (delta >= tick_period) {
@@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(kti
 		/* Keep the tick_next_period variable up to date */
 		tick_next_period = ktime_add(last_jiffies_update, tick_period);
 	} else {
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		return;
 	}
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
 
@@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(vo
 {
 	ktime_t period;
 
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	/* Did we start the jiffies update yet ? */
 	if (last_jiffies_update == 0)
 		last_jiffies_update = tick_next_period;
 	period = last_jiffies_update;
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	return period;
 }
 
@@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(stru
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		basemono = last_jiffies_update;
 		basejiff = jiffies;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	ts->last_jiffies = basejiff;
 	ts->timer_expires_base = basemono;
 
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps);
  */
 void xtime_update(unsigned long ticks)
 {
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	do_timer(ticks);
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
--- a/kernel/time/timekeeping.h
+++ b/kernel/time/timekeeping.h
@@ -25,7 +25,8 @@ static inline void sched_clock_resume(vo
 extern void do_timer(unsigned long ticks);
 extern void update_wall_time(void);
 
-extern seqlock_t jiffies_lock;
+extern raw_spinlock_t jiffies_lock;
+extern seqcount_t jiffies_seq;
 
 #define CS_NAME_LEN	32
 


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 10/15] sched/swait: Prepare usage in completions
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (8 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 09/15] timekeeping: Split jiffies seqlock Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

From: Thomas Gleixner <tglx@linutronix.de>

As a preparation to use simple wait queues for completions:

  - Provide swake_up_all_locked() to support complete_all()
  - Make __prepare_to_swait() public available

This is done to enable the usage of complete() within truly atomic contexts
on a PREEMPT_RT enabled kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Add comment to swake_up_all_locked()
---
 kernel/sched/sched.h |    3 +++
 kernel/sched/swait.c |   15 ++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(st
 	return true;
 }
 #endif
+
+void swake_up_all_locked(struct swait_queue_head *q);
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_
 }
 EXPORT_SYMBOL(swake_up_locked);
 
+/*
+ * Wake up all waiters. This is an interface which is solely exposed for
+ * completions and not for general usage.
+ *
+ * It is intentionally different from swake_up_all() to allow usage from
+ * hard interrupt context and interrupt disabled regions.
+ */
+void swake_up_all_locked(struct swait_queue_head *q)
+{
+	while (!list_empty(&q->task_list))
+		swake_up_locked(q);
+}
+
 void swake_up_one(struct swait_queue_head *q)
 {
 	unsigned long flags;
@@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_hea
 }
 EXPORT_SYMBOL(swake_up_all);
 
-static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	wait->task = current;
 	if (list_empty(&wait->task_list))


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [patch V2 11/15] completion: Use simple wait queues
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (9 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 10/15] sched/swait: Prepare usage in completions Thomas Gleixner
@ 2020-03-18 20:43 ` Thomas Gleixner
  2020-03-18 22:28   ` Logan Gunthorpe
                     ` (5 more replies)
  2020-03-20  8:50 ` [patch V2 00/15] Lock ordering documentation and annotation for lockdep Davidlohr Bueso
  2020-03-20  8:55 ` [PATCH 16/15] rcuwait: Get rid of stale name comment Davidlohr Bueso
  12 siblings, 6 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-18 20:43 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Arnd Bergmann, Sebastian Andrzej Siewior, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Oleg Nesterov, Davidlohr Bueso,
	Michael Ellerman, linuxppc-dev

From: Thomas Gleixner <tglx@linutronix.de>

completion uses a wait_queue_head_t to enqueue waiters.

wait_queue_head_t contains a spinlock_t to protect the list of waiters
which excludes it from being used in truly atomic context on a PREEMPT_RT
enabled kernel.

The spinlock in the wait queue head cannot be replaced by a raw_spinlock
because:

  - wait queues can have custom wakeup callbacks, which acquire other
    spinlock_t locks and have potentially long execution times

  - wake_up() walks an unbounded number of list entries during the wake up
    and may wake an unbounded number of waiters.

For simplicity and performance reasons complete() should be usable on
PREEMPT_RT enabled kernels.

completions do not use custom wakeup callbacks and are usually single
waiter, except for a few corner cases.

Replace the wait queue in the completion with a simple wait queue (swait),
which uses a raw_spinlock_t for protecting the waiter list and therefore is
safe to use inside truly atomic regions on PREEMPT_RT.

There is no semantical or functional change:

  - completions use the exclusive wait mode which is what swait provides

  - complete() wakes one exclusive waiter

  - complete_all() wakes all waiters while holding the lock which protects
    the wait queue against newly incoming waiters. The conversion to swait
    preserves this behaviour.

complete_all() might cause unbound latencies with a large number of waiters
being woken at once, but most complete_all() usage sites are either in
testing or initialization code or have only a really small number of
concurrent waiters which for now does not cause a latency problem. Keep it
simple for now.

The fixup of the warning check in the USB gadget driver is just a straight
forward conversion of the lockless waiter check from one waitqueue type to
the other.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
---
V2: Split out the orinoco and usb gadget parts and amended change log
---
 drivers/usb/gadget/function/f_fs.c |    2 +-
 include/linux/completion.h         |    8 ++++----
 kernel/sched/completion.c          |   36 +++++++++++++++++++-----------------
 3 files changed, 24 insertions(+), 22 deletions(-)

--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
 		pr_info("%s(): freeing\n", __func__);
 		ffs_data_clear(ffs);
 		BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
-		       waitqueue_active(&ffs->ep0req_completion.wait) ||
+		       swait_active(&ffs->ep0req_completion.wait) ||
 		       waitqueue_active(&ffs->wait));
 		destroy_workqueue(ffs->io_completion_wq);
 		kfree(ffs->dev_name);
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,7 +9,7 @@
  * See kernel/sched/completion.c for details.
  */
 
-#include <linux/wait.h>
+#include <linux/swait.h>
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,7 +25,7 @@
  */
 struct completion {
 	unsigned int done;
-	wait_queue_head_t wait;
+	struct swait_queue_head wait;
 };
 
 #define init_completion_map(x, m) __init_completion(x)
@@ -34,7 +34,7 @@ static inline void complete_acquire(stru
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -85,7 +85,7 @@ static inline void complete_release(stru
 static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
-	init_waitqueue_head(&x->wait);
+	init_swait_queue_head(&x->wait);
 }
 
 /**
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,12 +29,12 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
 		x->done++;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete);
 
@@ -58,10 +58,12 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	WARN_ON(irqs_disabled());
+
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 0);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_all_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete_all);
 
@@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
 		   long (*action)(long), long timeout, int state)
 {
 	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
+		DECLARE_SWAITQUEUE(wait);
 
-		__add_wait_queue_entry_tail_exclusive(&x->wait, &wait);
 		do {
 			if (signal_pending_state(state, current)) {
 				timeout = -ERESTARTSYS;
 				break;
 			}
+			__prepare_to_swait(&x->wait, &wait);
 			__set_current_state(state);
-			spin_unlock_irq(&x->wait.lock);
+			raw_spin_unlock_irq(&x->wait.lock);
 			timeout = action(timeout);
-			spin_lock_irq(&x->wait.lock);
+			raw_spin_lock_irq(&x->wait.lock);
 		} while (!x->done && timeout);
-		__remove_wait_queue(&x->wait, &wait);
+		__finish_swait(&x->wait, &wait);
 		if (!x->done)
 			return timeout;
 	}
@@ -100,9 +102,9 @@ static inline long __sched
 
 	complete_acquire(x);
 
-	spin_lock_irq(&x->wait.lock);
+	raw_spin_lock_irq(&x->wait.lock);
 	timeout = do_wait_for_common(x, action, timeout, state);
-	spin_unlock_irq(&x->wait.lock);
+	raw_spin_unlock_irq(&x->wait.lock);
 
 	complete_release(x);
 
@@ -291,12 +293,12 @@ bool try_wait_for_completion(struct comp
 	if (!READ_ONCE(x->done))
 		return false;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	if (!x->done)
 		ret = false;
 	else if (x->done != UINT_MAX)
 		x->done--;
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(try_wait_for_completion);
@@ -322,8 +324,8 @@ bool completion_done(struct completion *
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
 	 */
-	spin_lock_irqsave(&x->wait.lock, flags);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait()
  2020-03-18 20:43 ` [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait() Thomas Gleixner
@ 2020-03-18 21:25   ` Bjorn Helgaas
  0 siblings, 0 replies; 72+ messages in thread
From: Bjorn Helgaas @ 2020-03-18 21:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	linux-pci, Felipe Balbi, Greg Kroah-Hartman, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Wed, Mar 18, 2020 at 09:43:03PM +0100, Thomas Gleixner wrote:
> From: Logan Gunthorpe <logang@deltatee.com>
> 
> The call to init_completion() in mrpc_queue_cmd() can theoretically
> race with the call to poll_wait() in switchtec_dev_poll().
> 
>   poll()			write()
>     switchtec_dev_poll()   	  switchtec_dev_write()
>       poll_wait(&s->comp.wait);      mrpc_queue_cmd()
> 			               init_completion(&s->comp)
> 				         init_waitqueue_head(&s->comp.wait)
> 
> To my knowledge, no one has hit this bug.
> 
> Fix this by using reinit_completion() instead of init_completion() in
> mrpc_queue_cmd().
> 
> Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Not because I understand and have reviewed this, but because I trust
you to do the right thing and it belongs with the rest of the series.

> ---
>  drivers/pci/switch/switchtec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
> index a823b4b8ef8a..81dc7ac01381 100644
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
>  	kref_get(&stuser->kref);
>  	stuser->read_len = sizeof(stuser->data);
>  	stuser_set_state(stuser, MRPC_QUEUED);
> -	init_completion(&stuser->comp);
> +	reinit_completion(&stuser->comp);
>  	list_add_tail(&stuser->list, &stdev->mrpc_queue);
>  
>  	mrpc_cmd_submit(stdev);
> -- 
> 2.20.1
> 
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll
  2020-03-18 20:43 ` [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll Thomas Gleixner
@ 2020-03-18 21:26   ` Bjorn Helgaas
  2020-03-18 22:11   ` Logan Gunthorpe
  1 sibling, 0 replies; 72+ messages in thread
From: Bjorn Helgaas @ 2020-03-18 21:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Kurt Schwemmer, Logan Gunthorpe,
	linux-pci, Felipe Balbi, Greg Kroah-Hartman, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Wed, Mar 18, 2020 at 09:43:04PM +0100, Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The poll callback is using the completion wait queue and sticks it into
> poll_wait() to wake up pollers after a command has completed.
> 
> This works to some extent, but cannot provide EPOLLEXCLUSIVE support
> because the waker side uses complete_all() which unconditionally wakes up
> all waiters. complete_all() is required because completions internally use
> exclusive wait and complete() only wakes up one waiter by default.
> 
> This mixes conceptually different mechanisms and relies on internal
> implementation details of completions, which in turn puts contraints on
> changing the internal implementation of completions.
> 
> Replace it with a regular wait queue and store the state in struct
> switchtec_user.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

But please tweak the subject so it matches the other:

  - pci/switchtec: Replace completion wait queue usage for poll
  + PCI/switchtec: Replace completion wait queue usage for poll

> ---
> V2: Reworded changelog.
> ---
>  drivers/pci/switch/switchtec.c |   22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -52,10 +52,11 @@ struct switchtec_user {
>  
>  	enum mrpc_state state;
>  
> -	struct completion comp;
> +	wait_queue_head_t cmd_comp;
>  	struct kref kref;
>  	struct list_head list;
>  
> +	bool cmd_done;
>  	u32 cmd;
>  	u32 status;
>  	u32 return_code;
> @@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
>  	stuser->stdev = stdev;
>  	kref_init(&stuser->kref);
>  	INIT_LIST_HEAD(&stuser->list);
> -	init_completion(&stuser->comp);
> +	init_waitqueue_head(&stuser->cmd_comp);
>  	stuser->event_cnt = atomic_read(&stdev->event_cnt);
>  
>  	dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
> @@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
>  	kref_get(&stuser->kref);
>  	stuser->read_len = sizeof(stuser->data);
>  	stuser_set_state(stuser, MRPC_QUEUED);
> -	reinit_completion(&stuser->comp);
> +	stuser->cmd_done = false;
>  	list_add_tail(&stuser->list, &stdev->mrpc_queue);
>  
>  	mrpc_cmd_submit(stdev);
> @@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
>  		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
>  			      stuser->read_len);
>  out:
> -	complete_all(&stuser->comp);
> +	stuser->cmd_done = true;
> +	wake_up_interruptible(&stuser->cmd_comp);
>  	list_del_init(&stuser->list);
>  	stuser_put(stuser);
>  	stdev->mrpc_busy = 0;
> @@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
>  	mutex_unlock(&stdev->mrpc_mutex);
>  
>  	if (filp->f_flags & O_NONBLOCK) {
> -		if (!try_wait_for_completion(&stuser->comp))
> +		if (!stuser->cmd_done)
>  			return -EAGAIN;
>  	} else {
> -		rc = wait_for_completion_interruptible(&stuser->comp);
> +		rc = wait_event_interruptible(stuser->cmd_comp,
> +					      stuser->cmd_done);
>  		if (rc < 0)
>  			return rc;
>  	}
> @@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
>  	struct switchtec_dev *stdev = stuser->stdev;
>  	__poll_t ret = 0;
>  
> -	poll_wait(filp, &stuser->comp.wait, wait);
> +	poll_wait(filp, &stuser->cmd_comp, wait);
>  	poll_wait(filp, &stdev->event_wq, wait);
>  
>  	if (lock_mutex_and_test_alive(stdev))
> @@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
>  
>  	mutex_unlock(&stdev->mrpc_mutex);
>  
> -	if (try_wait_for_completion(&stuser->comp))
> +	if (stuser->cmd_done)
>  		ret |= EPOLLIN | EPOLLRDNORM;
>  
>  	if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
> @@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
>  
>  	/* Wake up and kill any users waiting on an MRPC request */
>  	list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
> -		complete_all(&stuser->comp);
> +		stuser->cmd_done = true;
> +		wake_up_interruptible(&stuser->cmd_comp);
>  		list_del_init(&stuser->list);
>  		stuser_put(stuser);
>  	}
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll
  2020-03-18 20:43 ` [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll Thomas Gleixner
  2020-03-18 21:26   ` Bjorn Helgaas
@ 2020-03-18 22:11   ` Logan Gunthorpe
  1 sibling, 0 replies; 72+ messages in thread
From: Logan Gunthorpe @ 2020-03-18 22:11 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Kurt Schwemmer, Bjorn Helgaas,
	linux-pci, Felipe Balbi, Greg Kroah-Hartman, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev



On 2020-03-18 2:43 p.m., Thomas Gleixner wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The poll callback is using the completion wait queue and sticks it into
> poll_wait() to wake up pollers after a command has completed.
> 
> This works to some extent, but cannot provide EPOLLEXCLUSIVE support
> because the waker side uses complete_all() which unconditionally wakes up
> all waiters. complete_all() is required because completions internally use
> exclusive wait and complete() only wakes up one waiter by default.
> 
> This mixes conceptually different mechanisms and relies on internal
> implementation details of completions, which in turn puts contraints on
> changing the internal implementation of completions.
> 
> Replace it with a regular wait queue and store the state in struct
> switchtec_user.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

While I've been against open coding the completion in this driver for a
while, I'm convinced by the EPOLLEXCLUSIVE argument for this change.
I've reviewed and lightly tested the change with hardware:

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Thanks,

Logan

> Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
> V2: Reworded changelog.
> ---
>  drivers/pci/switch/switchtec.c |   22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -52,10 +52,11 @@ struct switchtec_user {
>  
>  	enum mrpc_state state;
>  
> -	struct completion comp;
> +	wait_queue_head_t cmd_comp;
>  	struct kref kref;
>  	struct list_head list;
>  
> +	bool cmd_done;
>  	u32 cmd;
>  	u32 status;
>  	u32 return_code;
> @@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
>  	stuser->stdev = stdev;
>  	kref_init(&stuser->kref);
>  	INIT_LIST_HEAD(&stuser->list);
> -	init_completion(&stuser->comp);
> +	init_waitqueue_head(&stuser->cmd_comp);
>  	stuser->event_cnt = atomic_read(&stdev->event_cnt);
>  
>  	dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
> @@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
>  	kref_get(&stuser->kref);
>  	stuser->read_len = sizeof(stuser->data);
>  	stuser_set_state(stuser, MRPC_QUEUED);
> -	reinit_completion(&stuser->comp);
> +	stuser->cmd_done = false;
>  	list_add_tail(&stuser->list, &stdev->mrpc_queue);
>  
>  	mrpc_cmd_submit(stdev);
> @@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
>  		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
>  			      stuser->read_len);
>  out:
> -	complete_all(&stuser->comp);
> +	stuser->cmd_done = true;
> +	wake_up_interruptible(&stuser->cmd_comp);
>  	list_del_init(&stuser->list);
>  	stuser_put(stuser);
>  	stdev->mrpc_busy = 0;
> @@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
>  	mutex_unlock(&stdev->mrpc_mutex);
>  
>  	if (filp->f_flags & O_NONBLOCK) {
> -		if (!try_wait_for_completion(&stuser->comp))
> +		if (!stuser->cmd_done)
>  			return -EAGAIN;
>  	} else {
> -		rc = wait_for_completion_interruptible(&stuser->comp);
> +		rc = wait_event_interruptible(stuser->cmd_comp,
> +					      stuser->cmd_done);
>  		if (rc < 0)
>  			return rc;
>  	}
> @@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
>  	struct switchtec_dev *stdev = stuser->stdev;
>  	__poll_t ret = 0;
>  
> -	poll_wait(filp, &stuser->comp.wait, wait);
> +	poll_wait(filp, &stuser->cmd_comp, wait);
>  	poll_wait(filp, &stdev->event_wq, wait);
>  
>  	if (lock_mutex_and_test_alive(stdev))
> @@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
>  
>  	mutex_unlock(&stdev->mrpc_mutex);
>  
> -	if (try_wait_for_completion(&stuser->comp))
> +	if (stuser->cmd_done)
>  		ret |= EPOLLIN | EPOLLRDNORM;
>  
>  	if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
> @@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
>  
>  	/* Wake up and kill any users waiting on an MRPC request */
>  	list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
> -		complete_all(&stuser->comp);
> +		stuser->cmd_done = true;
> +		wake_up_interruptible(&stuser->cmd_comp);
>  		list_del_init(&stuser->list);
>  		stuser_put(stuser);
>  	}
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 11/15] completion: Use simple wait queues
  2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
@ 2020-03-18 22:28   ` Logan Gunthorpe
  2020-03-19  0:33   ` Joel Fernandes
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Logan Gunthorpe @ 2020-03-18 22:28 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Arnd Bergmann, Sebastian Andrzej Siewior, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, linuxppc-dev



On 2020-03-18 2:43 p.m., Thomas Gleixner wrote:
> There is no semantical or functional change:
> 
>   - completions use the exclusive wait mode which is what swait provides
> 
>   - complete() wakes one exclusive waiter
> 
>   - complete_all() wakes all waiters while holding the lock which protects
>     the wait queue against newly incoming waiters. The conversion to swait
>     preserves this behaviour.
> 
> complete_all() might cause unbound latencies with a large number of waiters
> being woken at once, but most complete_all() usage sites are either in
> testing or initialization code or have only a really small number of
> concurrent waiters which for now does not cause a latency problem. Keep it
> simple for now.

Seems like it would be worth adding a note for this to the
complete_all() doc string. Otherwise developers will not likely find out
about this issue and may not keep it as simple as you'd like.

Logan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-18 20:43 ` [patch V2 08/15] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
@ 2020-03-18 22:31   ` Paul E. McKenney
  2020-03-19 18:02     ` Thomas Gleixner
  2020-03-19  8:51   ` Davidlohr Bueso
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2020-03-18 22:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Wed, Mar 18, 2020 at 09:43:10PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The kernel provides a variety of locking primitives. The nesting of these
> lock types and the implications of them on RT enabled kernels is nowhere
> documented.
> 
> Add initial documentation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Mostly native-English-speaker services below, so please feel free to
ignore.  The one place I made a substantive change, I marked it "@@@".
I only did about half of this document, but should this prove useful,
I will do the other half later.

							Thanx, Paul

> ---
> V2: Addressed review comments from Randy
> ---
>  Documentation/locking/index.rst     |    1 
>  Documentation/locking/locktypes.rst |  298 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 299 insertions(+)
>  create mode 100644 Documentation/locking/locktypes.rst
> 
> --- a/Documentation/locking/index.rst
> +++ b/Documentation/locking/index.rst
> @@ -7,6 +7,7 @@ locking
>  .. toctree::
>      :maxdepth: 1
>  
> +    locktypes
>      lockdep-design
>      lockstat
>      locktorture
> --- /dev/null
> +++ b/Documentation/locking/locktypes.rst
> @@ -0,0 +1,298 @@
> +.. _kernel_hacking_locktypes:
> +
> +==========================
> +Lock types and their rules
> +==========================
> +
> +Introduction
> +============
> +
> +The kernel provides a variety of locking primitives which can be divided
> +into two categories:
> +
> + - Sleeping locks
> + - Spinning locks
> +
> +This document describes the lock types at least at the conceptual level and
> +provides rules for nesting of lock types also under the aspect of PREEMPT_RT.

I suggest something like this:

This document conceptually describes these lock types and provides rules
for their nesting, including the rules for use under PREEMPT_RT.

> +
> +Lock categories
> +===============
> +
> +Sleeping locks
> +--------------
> +
> +Sleeping locks can only be acquired in preemptible task context.
> +
> +Some of the implementations allow try_lock() attempts from other contexts,
> +but that has to be really evaluated carefully including the question
> +whether the unlock can be done from that context safely as well.
> +
> +Note that some lock types change their implementation details when
> +debugging is enabled, so this should be really only considered if there is
> +no other option.

How about something like this?

Although implementations allow try_lock() from other contexts, it is
necessary to carefully evaluate the safety of unlock() as well as of
try_lock().  Furthermore, it is also necessary to evaluate the debugging
versions of these primitives.  In short, don't acquire sleeping locks
from other contexts unless there is no other option.

> +Sleeping lock types:
> +
> + - mutex
> + - rt_mutex
> + - semaphore
> + - rw_semaphore
> + - ww_mutex
> + - percpu_rw_semaphore
> +
> +On a PREEMPT_RT enabled kernel the following lock types are converted to
> +sleeping locks:

On PREEMPT_RT kernels, these lock types are converted to sleeping locks:

> + - spinlock_t
> + - rwlock_t
> +
> +Spinning locks
> +--------------
> +
> + - raw_spinlock_t
> + - bit spinlocks
> +
> +On a non PREEMPT_RT enabled kernel the following lock types are spinning
> +locks as well:

On non-PREEMPT_RT kernels, these lock types are also spinning locks:

> + - spinlock_t
> + - rwlock_t
> +
> +Spinning locks implicitly disable preemption and the lock / unlock functions
> +can have suffixes which apply further protections:
> +
> + ===================  ====================================================
> + _bh()                Disable / enable bottom halves (soft interrupts)
> + _irq()               Disable / enable interrupts
> + _irqsave/restore()   Save and disable / restore interrupt disabled state
> + ===================  ====================================================
> +
> +
> +rtmutex
> +=======
> +
> +RT-mutexes are mutexes with support for priority inheritance (PI).
> +
> +PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
> +interrupt disabled sections.
> +
> +On a PREEMPT_RT enabled kernel most of these sections are fully
> +preemptible. This is possible because PREEMPT_RT forces most executions
> +into task context, especially interrupt handlers and soft interrupts, which
> +allows to substitute spinlock_t and rwlock_t with RT-mutex based
> +implementations.

PI clearly cannot preempt preemption-disabled or interrupt-disabled
regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
execute most such regions of code in preemptible task context, especially
interrupt handlers and soft interrupts.  This conversion allows spinlock_t
and rwlock_t to be implemented via RT-mutexes.

> +
> +raw_spinlock_t and spinlock_t
> +=============================
> +
> +raw_spinlock_t
> +--------------
> +
> +raw_spinlock_t is a strict spinning lock implementation regardless of the
> +kernel configuration including PREEMPT_RT enabled kernels.
> +
> +raw_spinlock_t is to be used only in real critical core code, low level
> +interrupt handling and places where protecting (hardware) state is required
> +to be safe against preemption and eventually interrupts.
> +
> +Another reason to use raw_spinlock_t is when the critical section is tiny
> +to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the
> +contended case.

raw_spinlock_t is a strict spinning lock implementation in all kernels,
including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
core code, low level interrupt handling and places where disabling
preemption or interrupts is required, for example, to safely access
hardware state.  raw_spinlock_t can sometimes also be used when the
critical section is tiny and the lock is lightly contended, thus avoiding
RT-mutex overhead.

@@@  I added the point about the lock being lightly contended.

> +spinlock_t
> +----------
> +
> +The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
> +
> +On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
> +and has exactly the same semantics.
> +
> +spinlock_t and PREEMPT_RT
> +-------------------------
> +
> +On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
> +implementation based on rt_mutex which changes the semantics:
> +
> + - Preemption is not disabled
> +
> + - The hard interrupt related suffixes for spin_lock / spin_unlock
> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
                                                                  CPU's
> +   interrupt disabled state
> +
> + - The soft interrupt related suffix (_bh()) is still disabling the
> +   execution of soft interrupts, but contrary to a non PREEMPT_RT enabled
> +   kernel, which utilizes the preemption count, this is achieved by a per
> +   CPU bottom half locking mechanism.

 - The soft interrupt related suffix (_bh()) still disables softirq
   handlers.  However, unlike non-PREEMPT_RT kernels (which disable
   preemption to get this effect), PREEMPT_RT kernels use a per-CPU
   per-bottom-half locking mechanism.

> +All other semantics of spinlock_t are preserved:
> +
> + - Migration of tasks which hold a spinlock_t is prevented. On a non
> +   PREEMPT_RT enabled kernel this is implicit due to preemption disable.
> +   PREEMPT_RT has a separate mechanism to achieve this. This ensures that
> +   pointers to per CPU variables stay valid even if the task is preempted.
> +
> + - Task state preservation. The task state is not affected when a lock is
> +   contended and the task has to schedule out and wait for the lock to
> +   become available. The lock wake up restores the task state unless there
> +   was a regular (not lock related) wake up on the task. This ensures that
> +   the task state rules are always correct independent of the kernel
> +   configuration.
> +
> +rwlock_t
> +========
> +
> +rwlock_t is a multiple readers and single writer lock mechanism.
> +
> +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning
> +lock and the suffix rules of spinlock_t apply accordingly. The
> +implementation is fair and prevents writer starvation.
> +
> +rwlock_t and PREEMPT_RT
> +-----------------------
> +
> +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate
> +implementation based on rt_mutex which changes the semantics:
> +
> + - Same changes as for spinlock_t
> +
> + - The implementation is not fair and can cause writer starvation under
> +   certain circumstances. The reason for this is that a writer cannot grant
> +   its priority to multiple readers. Readers which are blocked on a writer
> +   fully support the priority inheritance protocol.
> +
> +
> +PREEMPT_RT caveats
> +==================
> +
> +spinlock_t and rwlock_t
> +-----------------------
> +
> +The substitution of spinlock_t and rwlock_t on PREEMPT_RT enabled kernels
> +with RT-mutex based implementations has a few implications.
> +
> +On a non PREEMPT_RT enabled kernel the following code construct is
> +perfectly fine::
> +
> +   local_irq_disable();
> +   spin_lock(&lock);
> +
> +and fully equivalent to::
> +
> +   spin_lock_irq(&lock);
> +
> +Same applies to rwlock_t and the _irqsave() suffix variant.
> +
> +On a PREEMPT_RT enabled kernel this breaks because the RT-mutex
> +substitution expects a fully preemptible context.
> +
> +The preferred solution is to use :c:func:`spin_lock_irq()` or
> +:c:func:`spin_lock_irqsave()` and their unlock counterparts.
> +
> +PREEMPT_RT also offers a local_lock mechanism to substitute the
> +local_irq_disable/save() constructs in cases where a separation of the
> +interrupt disabling and the locking is really unavoidable. This should be
> +restricted to very rare cases.
> +
> +
> +raw_spinlock_t
> +--------------
> +
> +Locking of a raw_spinlock_t disables preemption and eventually interrupts.
> +Therefore code inside the critical region has to be careful to avoid calls
> +into code which takes a regular spinlock_t or rwlock_t. A prime example is
> +memory allocation.
> +
> +On a non PREEMPT_RT enabled kernel the following code construct is
> +perfectly fine code::
> +
> +  raw_spin_lock(&lock);
> +  p = kmalloc(sizeof(*p), GFP_ATOMIC);
> +
> +On a PREEMPT_RT enabled kernel this breaks because the memory allocator is
> +fully preemptible and therefore does not support allocations from truly
> +atomic contexts.
> +
> +Contrary to that the following code construct is perfectly fine on
> +PREEMPT_RT as spin_lock() does not disable preemption::
> +
> +  spin_lock(&lock);
> +  p = kmalloc(sizeof(*p), GFP_ATOMIC);
> +
> +Most places which use GFP_ATOMIC allocations are safe on PREEMPT_RT as the
> +execution is forced into thread context and the lock substitution is
> +ensuring preemptibility.
> +
> +
> +bit spinlocks
> +-------------
> +
> +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
> +substituted by an RT-mutex based implementation for obvious reasons.
> +
> +The semantics of bit spinlocks are preserved on a PREEMPT_RT enabled kernel
> +and the caveats vs. raw_spinlock_t apply.
> +
> +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
> +this requires conditional (#ifdef'ed) code changes at the usage side while
> +the spinlock_t substitution is simply done by the compiler and the
> +conditionals are restricted to header files and core implementation of the
> +locking primitives and the usage sites do not require any changes.
> +
> +
> +Lock type nesting rules
> +=======================
> +
> +The most basic rules are:
> +
> +  - Lock types of the same lock category (sleeping, spinning) can nest
> +    arbitrarily as long as they respect the general lock ordering rules to
> +    prevent deadlocks.
> +
> +  - Sleeping lock types cannot nest inside spinning lock types.
> +
> +  - Spinning lock types can nest inside sleeping lock types.
> +
> +These rules apply in general independent of CONFIG_PREEMPT_RT.
> +
> +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
> +spinning to sleeping this has obviously restrictions how they can nest with
> +raw_spinlock_t.
> +
> +This results in the following nest ordering:
> +
> +  1) Sleeping locks
> +  2) spinlock_t and rwlock_t
> +  3) raw_spinlock_t and bit spinlocks
> +
> +Lockdep is aware of these constraints to ensure that they are respected.
> +
> +
> +Owner semantics
> +===============
> +
> +Most lock types in the Linux kernel have strict owner semantics, i.e. the
> +context (task) which acquires a lock has to release it.
> +
> +There are two exceptions:
> +
> +  - semaphores
> +  - rwsems
> +
> +semaphores have no strict owner semantics for historical reasons. They are
> +often used for both serialization and waiting purposes. That's generally
> +discouraged and should be replaced by separate serialization and wait
> +mechanisms.
> +
> +rwsems have grown interfaces which allow non owner release for special
> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> +substitutes all locking primitives except semaphores with RT-mutex based
> +implementations to provide priority inheritance for all lock types except
> +the truly spinning ones. Priority inheritance on ownerless locks is
> +obviously impossible.
> +
> +For now the rwsem non-owner release excludes code which utilizes it from
> +being used on PREEMPT_RT enabled kernels. In same cases this can be
> +mitigated by disabling portions of the code, in other cases the complete
> +functionality has to be disabled until a workable solution has been found.
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 11/15] completion: Use simple wait queues
  2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
  2020-03-18 22:28   ` Logan Gunthorpe
@ 2020-03-19  0:33   ` Joel Fernandes
  2020-03-19  0:44     ` Thomas Gleixner
  2020-03-19  8:42   ` Greg Kroah-Hartman
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 72+ messages in thread
From: Joel Fernandes @ 2020-03-19  0:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Steven Rostedt, Randy Dunlap, Arnd Bergmann,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, linuxppc-dev

Hi Thomas,

On Wed, Mar 18, 2020 at 09:43:13PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> completion uses a wait_queue_head_t to enqueue waiters.
> 
> wait_queue_head_t contains a spinlock_t to protect the list of waiters
> which excludes it from being used in truly atomic context on a PREEMPT_RT
> enabled kernel.
> 
> The spinlock in the wait queue head cannot be replaced by a raw_spinlock
> because:
> 
>   - wait queues can have custom wakeup callbacks, which acquire other
>     spinlock_t locks and have potentially long execution times

Cool, makes sense.

>   - wake_up() walks an unbounded number of list entries during the wake up
>     and may wake an unbounded number of waiters.

Just to clarify here, wake_up() will really wake up just 1 waiter if all the
waiters on the queue are exclusive right? So in such scenario at least, the
"unbounded number of waiters" would not be an issue if everything waiting was
exclusive and waitqueue with wake_up() was used. Please correct me if I'm
wrong about that though.

So the main reasons to avoid waitqueue in favor of swait (as you mentioned)
would be the sleep-while-atomic issue in truly atomic context on RT, and the
fact that callbacks can take a long time.

> 
> For simplicity and performance reasons complete() should be usable on
> PREEMPT_RT enabled kernels.
> 
> completions do not use custom wakeup callbacks and are usually single
> waiter, except for a few corner cases.
> 
> Replace the wait queue in the completion with a simple wait queue (swait),
> which uses a raw_spinlock_t for protecting the waiter list and therefore is
> safe to use inside truly atomic regions on PREEMPT_RT.
> 
> There is no semantical or functional change:
> 
>   - completions use the exclusive wait mode which is what swait provides
> 
>   - complete() wakes one exclusive waiter
> 
>   - complete_all() wakes all waiters while holding the lock which protects
>     the wait queue against newly incoming waiters. The conversion to swait
>     preserves this behaviour.
> 
> complete_all() might cause unbound latencies with a large number of waiters
> being woken at once, but most complete_all() usage sites are either in
> testing or initialization code or have only a really small number of
> concurrent waiters which for now does not cause a latency problem. Keep it
> simple for now.
> 
> The fixup of the warning check in the USB gadget driver is just a straight
> forward conversion of the lockless waiter check from one waitqueue type to
> the other.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Arnd Bergmann <arnd@arndb.de>

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


> ---
> V2: Split out the orinoco and usb gadget parts and amended change log
> ---
>  drivers/usb/gadget/function/f_fs.c |    2 +-
>  include/linux/completion.h         |    8 ++++----
>  kernel/sched/completion.c          |   36 +++++++++++++++++++-----------------
>  3 files changed, 24 insertions(+), 22 deletions(-)
> 
> --- a/drivers/usb/gadget/function/f_fs.c
> +++ b/drivers/usb/gadget/function/f_fs.c
> @@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
>  		pr_info("%s(): freeing\n", __func__);
>  		ffs_data_clear(ffs);
>  		BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
> -		       waitqueue_active(&ffs->ep0req_completion.wait) ||
> +		       swait_active(&ffs->ep0req_completion.wait) ||
>  		       waitqueue_active(&ffs->wait));
>  		destroy_workqueue(ffs->io_completion_wq);
>  		kfree(ffs->dev_name);
> --- a/include/linux/completion.h
> +++ b/include/linux/completion.h
> @@ -9,7 +9,7 @@
>   * See kernel/sched/completion.c for details.
>   */
>  
> -#include <linux/wait.h>
> +#include <linux/swait.h>
>  
>  /*
>   * struct completion - structure used to maintain state for a "completion"
> @@ -25,7 +25,7 @@
>   */
>  struct completion {
>  	unsigned int done;
> -	wait_queue_head_t wait;
> +	struct swait_queue_head wait;
>  };
>  
>  #define init_completion_map(x, m) __init_completion(x)
> @@ -34,7 +34,7 @@ static inline void complete_acquire(stru
>  static inline void complete_release(struct completion *x) {}
>  
>  #define COMPLETION_INITIALIZER(work) \
> -	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
> +	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
>  
>  #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
>  	(*({ init_completion_map(&(work), &(map)); &(work); }))
> @@ -85,7 +85,7 @@ static inline void complete_release(stru
>  static inline void __init_completion(struct completion *x)
>  {
>  	x->done = 0;
> -	init_waitqueue_head(&x->wait);
> +	init_swait_queue_head(&x->wait);
>  }
>  
>  /**
> --- a/kernel/sched/completion.c
> +++ b/kernel/sched/completion.c
> @@ -29,12 +29,12 @@ void complete(struct completion *x)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&x->wait.lock, flags);
> +	raw_spin_lock_irqsave(&x->wait.lock, flags);
>  
>  	if (x->done != UINT_MAX)
>  		x->done++;
> -	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
> -	spin_unlock_irqrestore(&x->wait.lock, flags);
> +	swake_up_locked(&x->wait);
> +	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
>  }
>  EXPORT_SYMBOL(complete);
>  
> @@ -58,10 +58,12 @@ void complete_all(struct completion *x)
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&x->wait.lock, flags);
> +	WARN_ON(irqs_disabled());
> +
> +	raw_spin_lock_irqsave(&x->wait.lock, flags);
>  	x->done = UINT_MAX;
> -	__wake_up_locked(&x->wait, TASK_NORMAL, 0);
> -	spin_unlock_irqrestore(&x->wait.lock, flags);
> +	swake_up_all_locked(&x->wait);
> +	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
>  }
>  EXPORT_SYMBOL(complete_all);
>  
> @@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
>  		   long (*action)(long), long timeout, int state)
>  {
>  	if (!x->done) {
> -		DECLARE_WAITQUEUE(wait, current);
> +		DECLARE_SWAITQUEUE(wait);
>  
> -		__add_wait_queue_entry_tail_exclusive(&x->wait, &wait);
>  		do {
>  			if (signal_pending_state(state, current)) {
>  				timeout = -ERESTARTSYS;
>  				break;
>  			}
> +			__prepare_to_swait(&x->wait, &wait);
>  			__set_current_state(state);
> -			spin_unlock_irq(&x->wait.lock);
> +			raw_spin_unlock_irq(&x->wait.lock);
>  			timeout = action(timeout);
> -			spin_lock_irq(&x->wait.lock);
> +			raw_spin_lock_irq(&x->wait.lock);
>  		} while (!x->done && timeout);
> -		__remove_wait_queue(&x->wait, &wait);
> +		__finish_swait(&x->wait, &wait);
>  		if (!x->done)
>  			return timeout;
>  	}
> @@ -100,9 +102,9 @@ static inline long __sched
>  
>  	complete_acquire(x);
>  
> -	spin_lock_irq(&x->wait.lock);
> +	raw_spin_lock_irq(&x->wait.lock);
>  	timeout = do_wait_for_common(x, action, timeout, state);
> -	spin_unlock_irq(&x->wait.lock);
> +	raw_spin_unlock_irq(&x->wait.lock);
>  
>  	complete_release(x);
>  
> @@ -291,12 +293,12 @@ bool try_wait_for_completion(struct comp
>  	if (!READ_ONCE(x->done))
>  		return false;
>  
> -	spin_lock_irqsave(&x->wait.lock, flags);
> +	raw_spin_lock_irqsave(&x->wait.lock, flags);
>  	if (!x->done)
>  		ret = false;
>  	else if (x->done != UINT_MAX)
>  		x->done--;
> -	spin_unlock_irqrestore(&x->wait.lock, flags);
> +	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
>  	return ret;
>  }
>  EXPORT_SYMBOL(try_wait_for_completion);
> @@ -322,8 +324,8 @@ bool completion_done(struct completion *
>  	 * otherwise we can end up freeing the completion before complete()
>  	 * is done referencing it.
>  	 */
> -	spin_lock_irqsave(&x->wait.lock, flags);
> -	spin_unlock_irqrestore(&x->wait.lock, flags);
> +	raw_spin_lock_irqsave(&x->wait.lock, flags);
> +	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
>  	return true;
>  }
>  EXPORT_SYMBOL(completion_done);
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 11/15] completion: Use simple wait queues
  2020-03-19  0:33   ` Joel Fernandes
@ 2020-03-19  0:44     ` Thomas Gleixner
  0 siblings, 0 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-19  0:44 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Steven Rostedt, Randy Dunlap, Arnd Bergmann,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, linuxppc-dev

Joel,

Joel Fernandes <joel@joelfernandes.org> writes:
> On Wed, Mar 18, 2020 at 09:43:13PM +0100, Thomas Gleixner wrote:
>> The spinlock in the wait queue head cannot be replaced by a raw_spinlock
>> because:
>> 
>>   - wait queues can have custom wakeup callbacks, which acquire other
>>     spinlock_t locks and have potentially long execution times
>
> Cool, makes sense.
>
>>   - wake_up() walks an unbounded number of list entries during the wake up
>>     and may wake an unbounded number of waiters.
>
> Just to clarify here, wake_up() will really wake up just 1 waiter if all the
> waiters on the queue are exclusive right? So in such scenario at least, the
> "unbounded number of waiters" would not be an issue if everything waiting was
> exclusive and waitqueue with wake_up() was used. Please correct me if I'm
> wrong about that though.

Correct.

> So the main reasons to avoid waitqueue in favor of swait (as you mentioned)
> would be the sleep-while-atomic issue in truly atomic context on RT, and the
> fact that callbacks can take a long time.

Yes.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 04/15] orinoco_usb: Use the regular completion interfaces
  2020-03-18 20:43 ` [patch V2 04/15] orinoco_usb: Use the regular completion interfaces Thomas Gleixner
@ 2020-03-19  8:40   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 72+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-19  8:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, linux-usb, Oleg Nesterov,
	Davidlohr Bueso, Michael Ellerman, Arnd Bergmann, linuxppc-dev

On Wed, Mar 18, 2020 at 09:43:06PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The completion usage in this driver is interesting:
> 
>   - it uses a magic complete function which according to the comment was
>     implemented by invoking complete() four times in a row because
>     complete_all() was not exported at that time.
> 
>   - it uses an open coded wait/poll which checks completion:done. Only one wait
>     side (device removal) uses the regular wait_for_completion() interface.
> 
> The rationale behind this is to prevent that wait_for_completion() consumes
> completion::done which would prevent that all waiters are woken. This is not
> necessary with complete_all() as that sets completion::done to UINT_MAX which
> is left unmodified by the woken waiters.
> 
> Replace the magic complete function with complete_all() and convert the
> open coded wait/poll to regular completion interfaces.
> 
> This changes the wait to exclusive wait mode. But that does not make any
> difference because the wakers use complete_all() which ignores the
> exclusive mode.
> 
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Cc: netdev@vger.kernel.org
> ---
> V2: New patch to avoid conversion to swait functions later.
> ---
>  drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)
> 
> --- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
> +++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
> @@ -365,17 +365,6 @@ static struct request_context *ezusb_all
>  	return ctx;
>  }
>  
> -
> -/* Hopefully the real complete_all will soon be exported, in the mean
> - * while this should work. */
> -static inline void ezusb_complete_all(struct completion *comp)
> -{
> -	complete(comp);
> -	complete(comp);
> -	complete(comp);
> -	complete(comp);
> -}

That's so funny... :(

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 03/15] usb: gadget: Use completion interface instead of open coding it
  2020-03-18 20:43 ` [patch V2 03/15] usb: gadget: Use completion interface instead of open coding it Thomas Gleixner
@ 2020-03-19  8:41   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 72+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-19  8:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Felipe Balbi, linux-usb,
	Logan Gunthorpe, Kurt Schwemmer, Bjorn Helgaas, linux-pci,
	Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Wed, Mar 18, 2020 at 09:43:05PM +0100, Thomas Gleixner wrote:
> ep_io() uses a completion on stack and open codes the waiting with:
> 
>   wait_event_interruptible (done.wait, done.done);
> and
>   wait_event (done.wait, done.done);
> 
> This waits in non-exclusive mode for complete(), but there is no reason to
> do so because the completion can only be waited for by the task itself and
> complete() wakes exactly one exlusive waiter.
> 
> Replace the open coded implementation with the corresponding
> wait_for_completion*() functions.
> 
> No functional change.
> 
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: linux-usb@vger.kernel.org
> ---
> V2: New patch to avoid the conversion to swait interfaces later
> ---

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 11/15] completion: Use simple wait queues
  2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
  2020-03-18 22:28   ` Logan Gunthorpe
  2020-03-19  0:33   ` Joel Fernandes
@ 2020-03-19  8:42   ` Greg Kroah-Hartman
  2020-03-19 17:12   ` Linus Torvalds
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-19  8:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Arnd Bergmann, Sebastian Andrzej Siewior, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, linuxppc-dev

On Wed, Mar 18, 2020 at 09:43:13PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> completion uses a wait_queue_head_t to enqueue waiters.
> 
> wait_queue_head_t contains a spinlock_t to protect the list of waiters
> which excludes it from being used in truly atomic context on a PREEMPT_RT
> enabled kernel.
> 
> The spinlock in the wait queue head cannot be replaced by a raw_spinlock
> because:
> 
>   - wait queues can have custom wakeup callbacks, which acquire other
>     spinlock_t locks and have potentially long execution times
> 
>   - wake_up() walks an unbounded number of list entries during the wake up
>     and may wake an unbounded number of waiters.
> 
> For simplicity and performance reasons complete() should be usable on
> PREEMPT_RT enabled kernels.
> 
> completions do not use custom wakeup callbacks and are usually single
> waiter, except for a few corner cases.
> 
> Replace the wait queue in the completion with a simple wait queue (swait),
> which uses a raw_spinlock_t for protecting the waiter list and therefore is
> safe to use inside truly atomic regions on PREEMPT_RT.
> 
> There is no semantical or functional change:
> 
>   - completions use the exclusive wait mode which is what swait provides
> 
>   - complete() wakes one exclusive waiter
> 
>   - complete_all() wakes all waiters while holding the lock which protects
>     the wait queue against newly incoming waiters. The conversion to swait
>     preserves this behaviour.
> 
> complete_all() might cause unbound latencies with a large number of waiters
> being woken at once, but most complete_all() usage sites are either in
> testing or initialization code or have only a really small number of
> concurrent waiters which for now does not cause a latency problem. Keep it
> simple for now.
> 
> The fixup of the warning check in the USB gadget driver is just a straight
> forward conversion of the lockless waiter check from one waitqueue type to
> the other.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Arnd Bergmann <arnd@arndb.de>
> ---
> V2: Split out the orinoco and usb gadget parts and amended change log
> ---
>  drivers/usb/gadget/function/f_fs.c |    2 +-
>  include/linux/completion.h         |    8 ++++----
>  kernel/sched/completion.c          |   36 +++++++++++++++++++-----------------
>  3 files changed, 24 insertions(+), 22 deletions(-)

For USB portion:

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-18 20:43 ` [patch V2 08/15] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
  2020-03-18 22:31   ` Paul E. McKenney
@ 2020-03-19  8:51   ` Davidlohr Bueso
  2020-03-19 15:04   ` Jonathan Corbet
  2020-03-21 21:21   ` Joel Fernandes
  3 siblings, 0 replies; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-19  8:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Michael Ellerman, Arnd Bergmann, linuxppc-dev

On Wed, 18 Mar 2020, Thomas Gleixner wrote:
>+Owner semantics
>+===============
>+
>+Most lock types in the Linux kernel have strict owner semantics, i.e. the
>+context (task) which acquires a lock has to release it.
>+
>+There are two exceptions:
>+
>+  - semaphores
>+  - rwsems
>+
>+semaphores have no strict owner semantics for historical reasons. They are

I would rephrase this to:

semaphores have no owner semantics for historical reason, and as such
trylock and release operations can be called from interrupt context. They
are ...

>+often used for both serialization and waiting purposes. That's generally
>+discouraged and should be replaced by separate serialization and wait
>+mechanisms.
            ^ , such as mutexes or completions.

>+
>+rwsems have grown interfaces which allow non owner release for special
>+purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
>+substitutes all locking primitives except semaphores with RT-mutex based
>+implementations to provide priority inheritance for all lock types except
>+the truly spinning ones. Priority inheritance on ownerless locks is
>+obviously impossible.
>+
>+For now the rwsem non-owner release excludes code which utilizes it from
>+being used on PREEMPT_RT enabled kernels. In same cases this can be
>+mitigated by disabling portions of the code, in other cases the complete
>+functionality has to be disabled until a workable solution has been found.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-18 20:43 ` [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait Thomas Gleixner
@ 2020-03-19  9:00   ` Sebastian Andrzej Siewior
  2020-03-19  9:18     ` Peter Zijlstra
  2020-03-19  9:21   ` Davidlohr Bueso
  2020-03-19 10:04   ` Christoph Hellwig
  2 siblings, 1 reply; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-19  9:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Michael Ellerman, Arnd Bergmann, linuxppc-dev, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Oleg Nesterov, Davidlohr Bueso

On 2020-03-18 21:43:09 [+0100], Thomas Gleixner wrote:
> --- a/arch/powerpc/platforms/ps3/device-init.c
> +++ b/arch/powerpc/platforms/ps3/device-init.c
> @@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
>  	unsigned long flags;
>  	int res;
>  
> -	init_completion(&dev->done);
>  	spin_lock_irqsave(&dev->lock, flags);
>  	res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
>  					&dev->tag)
>  		    : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
>  				       &dev->tag);
> +	dev->done = false;
>  	spin_unlock_irqrestore(&dev->lock, flags);
>  	if (res) {
>  		pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
> @@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
>  	}
>  	pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
>  
> -	res = wait_event_interruptible(dev->done.wait,
> -				       dev->done.done || kthread_should_stop());
> +	rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE);
> +
…

Not sure it matters but this struct `dev' is allocated on stack. Should
the interrupt fire *before* rcuwait_wait_event() set wait.task to NULL
then it is of random value on the first invocation of rcuwait_wake_up().
->

diff --git a/arch/powerpc/platforms/ps3/device-init.c b/arch/powerpc/platforms/ps3/device-init.c
index 197347c3c0b24..e87360a0fb40d 100644
--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -809,6 +809,7 @@ static int ps3_probe_thread(void *data)
 	}
 
 	spin_lock_init(&dev.lock);
+	rcuwait_init(&dev.wait);
 
 	res = request_irq(irq, ps3_notification_interrupt, 0,
 			  "ps3_notification", &dev);


Sebastian

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-19  9:00   ` Sebastian Andrzej Siewior
@ 2020-03-19  9:18     ` Peter Zijlstra
  0 siblings, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2020-03-19  9:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Michael Ellerman, Arnd Bergmann, linuxppc-dev, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Oleg Nesterov, Davidlohr Bueso

On Thu, Mar 19, 2020 at 10:00:24AM +0100, Sebastian Andrzej Siewior wrote:
> On 2020-03-18 21:43:09 [+0100], Thomas Gleixner wrote:
> > --- a/arch/powerpc/platforms/ps3/device-init.c
> > +++ b/arch/powerpc/platforms/ps3/device-init.c
> > @@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
> >  	unsigned long flags;
> >  	int res;
> >  
> > -	init_completion(&dev->done);
> >  	spin_lock_irqsave(&dev->lock, flags);
> >  	res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
> >  					&dev->tag)
> >  		    : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
> >  				       &dev->tag);
> > +	dev->done = false;
> >  	spin_unlock_irqrestore(&dev->lock, flags);
> >  	if (res) {
> >  		pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
> > @@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
> >  	}
> >  	pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
> >  
> > -	res = wait_event_interruptible(dev->done.wait,
> > -				       dev->done.done || kthread_should_stop());
> > +	rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE);
> > +
> …
> 
> Not sure it matters but this struct `dev' is allocated on stack. Should
> the interrupt fire *before* rcuwait_wait_event() set wait.task to NULL
> then it is of random value on the first invocation of rcuwait_wake_up().
> ->
> 
> diff --git a/arch/powerpc/platforms/ps3/device-init.c b/arch/powerpc/platforms/ps3/device-init.c
> index 197347c3c0b24..e87360a0fb40d 100644
> --- a/arch/powerpc/platforms/ps3/device-init.c
> +++ b/arch/powerpc/platforms/ps3/device-init.c
> @@ -809,6 +809,7 @@ static int ps3_probe_thread(void *data)
>  	}
>  
>  	spin_lock_init(&dev.lock);
> +	rcuwait_init(&dev.wait);
>  
>  	res = request_irq(irq, ps3_notification_interrupt, 0,
>  			  "ps3_notification", &dev);
> 

Very good, sorry for that.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-18 20:43 ` [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait Thomas Gleixner
  2020-03-19  9:00   ` Sebastian Andrzej Siewior
@ 2020-03-19  9:21   ` Davidlohr Bueso
  2020-03-19 10:04   ` Christoph Hellwig
  2 siblings, 0 replies; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-19  9:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Michael Ellerman, Arnd Bergmann, linuxppc-dev,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov

On Wed, 18 Mar 2020, Thomas Gleixner wrote:
>AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
>cannot receive signals by default and this one doesn't look different. Use
>TASK_IDLE instead.

Hmm it seems in general this needs to be done kernel-wide. This kthread abuse of
TASK_INTERRUPTIBLE seems to be a common thing. There's also the users doing
schedule_timeout_interruptible()...

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-18 20:43 ` [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait Thomas Gleixner
  2020-03-19  9:00   ` Sebastian Andrzej Siewior
  2020-03-19  9:21   ` Davidlohr Bueso
@ 2020-03-19 10:04   ` Christoph Hellwig
  2020-03-19 10:26     ` Sebastian Andrzej Siewior
  2020-03-21 10:41     ` Thomas Gleixner
  2 siblings, 2 replies; 72+ messages in thread
From: Christoph Hellwig @ 2020-03-19 10:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Randy Dunlap, Peter Zijlstra, linux-pci,
	Sebastian Andrzej Siewior, netdev, Joel Fernandes, Will Deacon,
	Ingo Molnar, Davidlohr Bueso, Paul E . McKenney, Logan Gunthorpe,
	Arnd Bergmann, linuxppc-dev, Steven Rostedt, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, linux-wireless, Oleg Nesterov, Linus Torvalds,
	David S. Miller

On Wed, Mar 18, 2020 at 09:43:09PM +0100, Thomas Gleixner wrote:
> The PS3 notification interrupt and kthread use a hacked up completion to
> communicate. Since we're wanting to change the completion implementation and
> this is abuse anyway, replace it with a simple rcuwait since there is only ever
> the one waiter.
> 
> AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
> cannot receive signals by default and this one doesn't look different. Use
> TASK_IDLE instead.

I think the right fix here is to jut convert the thing to a threaded
interrupt handler and kill off the stupid kthread.

But I wonder how alive the whole PS3 support is to start with..

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-19 10:04   ` Christoph Hellwig
@ 2020-03-19 10:26     ` Sebastian Andrzej Siewior
  2020-03-20  0:01       ` Geoff Levand
  2020-03-20  0:45       ` Michael Ellerman
  2020-03-21 10:41     ` Thomas Gleixner
  1 sibling, 2 replies; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-19 10:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Thomas Gleixner, LKML, Randy Dunlap, Peter Zijlstra, linux-pci,
	netdev, Joel Fernandes, Will Deacon, Ingo Molnar,
	Davidlohr Bueso, Paul E . McKenney, Logan Gunthorpe,
	Arnd Bergmann, linuxppc-dev, Steven Rostedt, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, linux-wireless, Oleg Nesterov, Linus Torvalds,
	David S. Miller

On 2020-03-19 03:04:59 [-0700], Christoph Hellwig wrote:
> But I wonder how alive the whole PS3 support is to start with..

OtherOS can only be used on "old" PS3 which do not have have their
firmware upgraded past version 3.21, released April 1, 2010 [0].
It was not possible to install OtherOS on PS3-slim and I don't remember
if it was a successor or a budget version (but it had lower power
consumption as per my memory).
*I* remember from back then that a few universities bought quite a few
of them and used them as a computation cluster. However, whatever broke
over the last 10 years is broken.

[0] https://en.wikipedia.org/wiki/OtherOS

Sebastian

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-18 20:43 ` [patch V2 08/15] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
  2020-03-18 22:31   ` Paul E. McKenney
  2020-03-19  8:51   ` Davidlohr Bueso
@ 2020-03-19 15:04   ` Jonathan Corbet
  2020-03-19 18:04     ` Thomas Gleixner
  2020-03-21 21:21   ` Joel Fernandes
  3 siblings, 1 reply; 72+ messages in thread
From: Jonathan Corbet @ 2020-03-19 15:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Wed, 18 Mar 2020 21:43:10 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The kernel provides a variety of locking primitives. The nesting of these
> lock types and the implications of them on RT enabled kernels is nowhere
> documented.
> 
> Add initial documentation.

...time to add a a couple of nits...:)

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Addressed review comments from Randy
> ---
>  Documentation/locking/index.rst     |    1 
>  Documentation/locking/locktypes.rst |  298 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 299 insertions(+)
>  create mode 100644 Documentation/locking/locktypes.rst
> 
> --- a/Documentation/locking/index.rst
> +++ b/Documentation/locking/index.rst
> @@ -7,6 +7,7 @@ locking
>  .. toctree::
>      :maxdepth: 1
>  
> +    locktypes
>      lockdep-design
>      lockstat
>      locktorture
> --- /dev/null
> +++ b/Documentation/locking/locktypes.rst
> @@ -0,0 +1,298 @@
> +.. _kernel_hacking_locktypes:
> +

So ... I vaguely remember that some Thomas guy added a document saying we
should be putting SPDX tags on our files? :)

> +==========================
> +Lock types and their rules
> +==========================

[...]

> +PREEMPT_RT caveats
> +==================
> +
> +spinlock_t and rwlock_t
> +-----------------------
> +
> +The substitution of spinlock_t and rwlock_t on PREEMPT_RT enabled kernels
> +with RT-mutex based implementations has a few implications.
> +
> +On a non PREEMPT_RT enabled kernel the following code construct is
> +perfectly fine::
> +
> +   local_irq_disable();
> +   spin_lock(&lock);
> +
> +and fully equivalent to::
> +
> +   spin_lock_irq(&lock);
> +
> +Same applies to rwlock_t and the _irqsave() suffix variant.
> +
> +On a PREEMPT_RT enabled kernel this breaks because the RT-mutex
> +substitution expects a fully preemptible context.
> +
> +The preferred solution is to use :c:func:`spin_lock_irq()` or
> +:c:func:`spin_lock_irqsave()` and their unlock counterparts.

We don't need (and shouldn't use) :c:func: anymore; just saying
spin_lock_irq() will cause the Right Things to happen.

Thanks,
jon

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 11/15] completion: Use simple wait queues
  2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
                     ` (2 preceding siblings ...)
  2020-03-19  8:42   ` Greg Kroah-Hartman
@ 2020-03-19 17:12   ` Linus Torvalds
  2020-03-19 23:25   ` Julian Calaby
  2020-03-20  9:01   ` Davidlohr Bueso
  5 siblings, 0 replies; 72+ messages in thread
From: Linus Torvalds @ 2020-03-19 17:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Arnd Bergmann, Sebastian Andrzej Siewior, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, Netdev, Oleg Nesterov, Davidlohr Bueso,
	Michael Ellerman, linuxppc-dev

On Wed, Mar 18, 2020 at 1:47 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> There is no semantical or functional change:

Ack, with just the explanation, I'm no longer objecting to this.

Plus you fixed and cleaned up the odd usb gadget code separately
(well, most of it).

              Linus

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-18 22:31   ` Paul E. McKenney
@ 2020-03-19 18:02     ` Thomas Gleixner
  2020-03-20 16:01       ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-19 18:02 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

Paul,

"Paul E. McKenney" <paulmck@kernel.org> writes:

> On Wed, Mar 18, 2020 at 09:43:10PM +0100, Thomas Gleixner wrote:
>
> Mostly native-English-speaker services below, so please feel free to
> ignore.  The one place I made a substantive change, I marked it "@@@".
> I only did about half of this document, but should this prove useful,
> I will do the other half later.

Native speaker services are always useful and appreciated.

>> +The kernel provides a variety of locking primitives which can be divided
>> +into two categories:
>> +
>> + - Sleeping locks
>> + - Spinning locks
>> +
>> +This document describes the lock types at least at the conceptual level and
>> +provides rules for nesting of lock types also under the aspect of PREEMPT_RT.
>
> I suggest something like this:
>
> This document conceptually describes these lock types and provides rules
> for their nesting, including the rules for use under PREEMPT_RT.

Way better :)

>> +Sleeping locks can only be acquired in preemptible task context.
>> +
>> +Some of the implementations allow try_lock() attempts from other contexts,
>> +but that has to be really evaluated carefully including the question
>> +whether the unlock can be done from that context safely as well.
>> +
>> +Note that some lock types change their implementation details when
>> +debugging is enabled, so this should be really only considered if there is
>> +no other option.
>
> How about something like this?
>
> Although implementations allow try_lock() from other contexts, it is
> necessary to carefully evaluate the safety of unlock() as well as of
> try_lock().  Furthermore, it is also necessary to evaluate the debugging
> versions of these primitives.  In short, don't acquire sleeping locks
> from other contexts unless there is no other option.

Yup.

>> +Sleeping lock types:
>> +
>> + - mutex
>> + - rt_mutex
>> + - semaphore
>> + - rw_semaphore
>> + - ww_mutex
>> + - percpu_rw_semaphore
>> +
>> +On a PREEMPT_RT enabled kernel the following lock types are converted to
>> +sleeping locks:
>
> On PREEMPT_RT kernels, these lock types are converted to sleeping
> locks:

Ok.

>> + - spinlock_t
>> + - rwlock_t
>> +
>> +Spinning locks
>> +--------------
>> +
>> + - raw_spinlock_t
>> + - bit spinlocks
>> +
>> +On a non PREEMPT_RT enabled kernel the following lock types are spinning
>> +locks as well:
>
> On non-PREEMPT_RT kernels, these lock types are also spinning locks:

Ok.

>> + - spinlock_t
>> + - rwlock_t
>> +
>> +Spinning locks implicitly disable preemption and the lock / unlock functions
>> +can have suffixes which apply further protections:
>> +
>> + ===================  ====================================================
>> + _bh()                Disable / enable bottom halves (soft interrupts)
>> + _irq()               Disable / enable interrupts
>> + _irqsave/restore()   Save and disable / restore interrupt disabled state
>> + ===================  ====================================================
>> +
>> +
>> +rtmutex
>> +=======
>> +
>> +RT-mutexes are mutexes with support for priority inheritance (PI).
>> +
>> +PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
>> +interrupt disabled sections.
>> +
>> +On a PREEMPT_RT enabled kernel most of these sections are fully
>> +preemptible. This is possible because PREEMPT_RT forces most executions
>> +into task context, especially interrupt handlers and soft interrupts, which
>> +allows to substitute spinlock_t and rwlock_t with RT-mutex based
>> +implementations.
>
> PI clearly cannot preempt preemption-disabled or interrupt-disabled
> regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
> execute most such regions of code in preemptible task context, especially
> interrupt handlers and soft interrupts.  This conversion allows spinlock_t
> and rwlock_t to be implemented via RT-mutexes.

Nice.

>> +
>> +raw_spinlock_t and spinlock_t
>> +=============================
>> +
>> +raw_spinlock_t
>> +--------------
>> +
>> +raw_spinlock_t is a strict spinning lock implementation regardless of the
>> +kernel configuration including PREEMPT_RT enabled kernels.
>> +
>> +raw_spinlock_t is to be used only in real critical core code, low level
>> +interrupt handling and places where protecting (hardware) state is required
>> +to be safe against preemption and eventually interrupts.
>> +
>> +Another reason to use raw_spinlock_t is when the critical section is tiny
>> +to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the
>> +contended case.
>
> raw_spinlock_t is a strict spinning lock implementation in all kernels,
> including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
> core code, low level interrupt handling and places where disabling
> preemption or interrupts is required, for example, to safely access
> hardware state.  raw_spinlock_t can sometimes also be used when the
> critical section is tiny and the lock is lightly contended, thus avoiding
> RT-mutex overhead.
>
> @@@  I added the point about the lock being lightly contended.

Hmm, not sure. The point is that if the critical section is small the
overhead of cross CPU boosting along with the resulting IPIs is going to
be at least an order of magnitude larger. And on contention this is just
pushing the raw_spinlock contention off to the raw_spinlock in the rt
mutex plus the owning tasks pi_lock which makes things even worse.

>> + - The hard interrupt related suffixes for spin_lock / spin_unlock
>> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
>                                                                   CPU's

Si senor!

>> +   interrupt disabled state
>> +
>> + - The soft interrupt related suffix (_bh()) is still disabling the
>> +   execution of soft interrupts, but contrary to a non PREEMPT_RT enabled
>> +   kernel, which utilizes the preemption count, this is achieved by a per
>> +   CPU bottom half locking mechanism.
>
>  - The soft interrupt related suffix (_bh()) still disables softirq
>    handlers.  However, unlike non-PREEMPT_RT kernels (which disable
>    preemption to get this effect), PREEMPT_RT kernels use a per-CPU
>    per-bottom-half locking mechanism.

it's not per-bottom-half anymore. That turned out to be dangerous due to
dependencies between BH types, e.g. network and timers.

I hope I was able to encourage you to comment on the other half as well :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-19 15:04   ` Jonathan Corbet
@ 2020-03-19 18:04     ` Thomas Gleixner
  0 siblings, 0 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-19 18:04 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

Jonathan Corbet <corbet@lwn.net> writes:
> On Wed, 18 Mar 2020 21:43:10 +0100
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> Add initial documentation.
>
> ...time to add a a couple of nits...:)

...time

Is that valid RST?

>> +++ b/Documentation/locking/locktypes.rst
>> @@ -0,0 +1,298 @@
>> +.. _kernel_hacking_locktypes:
>> +
>
> So ... I vaguely remember that some Thomas guy added a document saying we
> should be putting SPDX tags on our files? :)

Never met him or heard about that.

>> +
>> +The preferred solution is to use :c:func:`spin_lock_irq()` or
>> +:c:func:`spin_lock_irqsave()` and their unlock counterparts.
>
> We don't need (and shouldn't use) :c:func: anymore; just saying
> spin_lock_irq() will cause the Right Things to happen.

Good to know. Will fix.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 11/15] completion: Use simple wait queues
  2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
                     ` (3 preceding siblings ...)
  2020-03-19 17:12   ` Linus Torvalds
@ 2020-03-19 23:25   ` Julian Calaby
  2020-03-20  6:59     ` Christoph Hellwig
  2020-03-20  9:01   ` Davidlohr Bueso
  5 siblings, 1 reply; 72+ messages in thread
From: Julian Calaby @ 2020-03-19 23:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Arnd Bergmann, Sebastian Andrzej Siewior, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Oleg Nesterov, Davidlohr Bueso,
	Michael Ellerman, linuxppc-dev

Hi Thomas,

On Thu, Mar 19, 2020 at 7:48 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Thomas Gleixner <tglx@linutronix.de>
>
> completion uses a wait_queue_head_t to enqueue waiters.
>
> wait_queue_head_t contains a spinlock_t to protect the list of waiters
> which excludes it from being used in truly atomic context on a PREEMPT_RT
> enabled kernel.
>
> The spinlock in the wait queue head cannot be replaced by a raw_spinlock
> because:
>
>   - wait queues can have custom wakeup callbacks, which acquire other
>     spinlock_t locks and have potentially long execution times
>
>   - wake_up() walks an unbounded number of list entries during the wake up
>     and may wake an unbounded number of waiters.
>
> For simplicity and performance reasons complete() should be usable on
> PREEMPT_RT enabled kernels.
>
> completions do not use custom wakeup callbacks and are usually single
> waiter, except for a few corner cases.
>
> Replace the wait queue in the completion with a simple wait queue (swait),
> which uses a raw_spinlock_t for protecting the waiter list and therefore is
> safe to use inside truly atomic regions on PREEMPT_RT.
>
> There is no semantical or functional change:
>
>   - completions use the exclusive wait mode which is what swait provides
>
>   - complete() wakes one exclusive waiter
>
>   - complete_all() wakes all waiters while holding the lock which protects
>     the wait queue against newly incoming waiters. The conversion to swait
>     preserves this behaviour.
>
> complete_all() might cause unbound latencies with a large number of waiters
> being woken at once, but most complete_all() usage sites are either in
> testing or initialization code or have only a really small number of
> concurrent waiters which for now does not cause a latency problem. Keep it
> simple for now.
>
> The fixup of the warning check in the USB gadget driver is just a straight
> forward conversion of the lockless waiter check from one waitqueue type to
> the other.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Arnd Bergmann <arnd@arndb.de>
> ---
> V2: Split out the orinoco and usb gadget parts and amended change log
> ---
>  drivers/usb/gadget/function/f_fs.c |    2 +-
>  include/linux/completion.h         |    8 ++++----
>  kernel/sched/completion.c          |   36 +++++++++++++++++++-----------------
>  3 files changed, 24 insertions(+), 22 deletions(-)
>
> --- a/drivers/usb/gadget/function/f_fs.c
> +++ b/drivers/usb/gadget/function/f_fs.c
> @@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
>                 pr_info("%s(): freeing\n", __func__);
>                 ffs_data_clear(ffs);
>                 BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
> -                      waitqueue_active(&ffs->ep0req_completion.wait) ||
> +                      swait_active(&ffs->ep0req_completion.wait) ||

This looks like some code is reaching deep into the dirty dark corners
of the completion implementation, should there be some wrapper around
this to hide that?

Thanks,

-- 
Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-19 10:26     ` Sebastian Andrzej Siewior
@ 2020-03-20  0:01       ` Geoff Levand
  2020-03-20  0:45       ` Michael Ellerman
  1 sibling, 0 replies; 72+ messages in thread
From: Geoff Levand @ 2020-03-20  0:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Christoph Hellwig, Peter Zijlstra,
	Thomas Gleixner
  Cc: linux-usb, linux-pci, Oleg Nesterov, Joel Fernandes, Will Deacon,
	Davidlohr Bueso, Paul E . McKenney, Logan Gunthorpe, Ingo Molnar,
	Linus Torvalds, Arnd Bergmann, Steven Rostedt, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Greg Kroah-Hartman,
	Randy Dunlap, linux-wireless, LKML, netdev, linuxppc-dev,
	David S. Miller

Hi,

On 3/19/20 3:26 AM, Sebastian Andrzej Siewior wrote:
> On 2020-03-19 03:04:59 [-0700], Christoph Hellwig wrote:
>> But I wonder how alive the whole PS3 support is to start with..
> 
> OtherOS can only be used on "old" PS3 which do not have have their
> firmware upgraded past version 3.21, released April 1, 2010 [0].
> It was not possible to install OtherOS on PS3-slim and I don't remember
> if it was a successor or a budget version (but it had lower power
> consumption as per my memory).
> *I* remember from back then that a few universities bought quite a few
> of them and used them as a computation cluster. However, whatever broke
> over the last 10 years is broken.
> 
> [0] https://en.wikipedia.org/wiki/OtherOS
There are still PS3-Linux users out there.  They generally use firmware
and other tools available through the 'hacker' communities that allow
Linux to be run on more than just the 'officially supported' platforms.

Anyway, the change to use rcuwait seems fine if that's needed for the
completion re-work.  I'll try to do some testing with the patch set
next week.

-Geoff

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-19 10:26     ` Sebastian Andrzej Siewior
  2020-03-20  0:01       ` Geoff Levand
@ 2020-03-20  0:45       ` Michael Ellerman
  1 sibling, 0 replies; 72+ messages in thread
From: Michael Ellerman @ 2020-03-20  0:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Christoph Hellwig
  Cc: linux-usb, Peter Zijlstra, linux-pci, Oleg Nesterov,
	Joel Fernandes, Will Deacon, Thomas Gleixner, Davidlohr Bueso,
	Paul E . McKenney, Logan Gunthorpe, Ingo Molnar, Linus Torvalds,
	Arnd Bergmann, Steven Rostedt, Bjorn Helgaas, Kurt Schwemmer,
	Kalle Valo, Felipe Balbi, Greg Kroah-Hartman, Randy Dunlap,
	linux-wireless, LKML, netdev, linuxppc-dev, David S. Miller,
	Geoff Levand

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> On 2020-03-19 03:04:59 [-0700], Christoph Hellwig wrote:
>> But I wonder how alive the whole PS3 support is to start with..
>
> OtherOS can only be used on "old" PS3 which do not have have their
> firmware upgraded past version 3.21, released April 1, 2010 [0].
> It was not possible to install OtherOS on PS3-slim and I don't remember
> if it was a successor or a budget version (but it had lower power
> consumption as per my memory).
> *I* remember from back then that a few universities bought quite a few
> of them and used them as a computation cluster. However, whatever broke
> over the last 10 years is broken.
>
> [0] https://en.wikipedia.org/wiki/OtherOS

Last time I asked on the list there were still a handful of users.

And I had a patch submitted from a user as recently as last October, so
it still has some life.

cheers

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event()
  2020-03-18 20:43 ` [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event() Thomas Gleixner
@ 2020-03-20  5:36   ` Davidlohr Bueso
  2020-03-20  8:45     ` Sebastian Andrzej Siewior
  2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
  1 sibling, 1 reply; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-20  5:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Oleg Nesterov, Sebastian Andrzej Siewior, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Wed, 18 Mar 2020, Thomas Gleixner wrote:

>--- a/include/linux/rcuwait.h
>+++ b/include/linux/rcuwait.h
>@@ -3,6 +3,7 @@
> #define _LINUX_RCUWAIT_H_
>
> #include <linux/rcupdate.h>
>+#include <linux/sched/signal.h>

So this is causing build to fail for me:

  CC      arch/x86/boot/compressed/cmdline.o
arch/x86/boot/compressed/cmdline.c:5:20: error: conflicting types for ‘set_fs’
 static inline void set_fs(unsigned long seg)
                    ^~~~~~
In file included from ./include/linux/uaccess.h:11:0,
                 from ./include/linux/sched/task.h:11,
                 from ./include/linux/sched/signal.h:9,
                 from ./include/linux/rcuwait.h:6,
                 from ./include/linux/percpu-rwsem.h:8,
                 from ./include/linux/fs.h:34,
                 from ./include/linux/proc_fs.h:9,
                 from ./include/acpi/acpi_bus.h:83,
                 from ./include/linux/acpi.h:32,
                 from arch/x86/boot/compressed/misc.h:28,
                 from arch/x86/boot/compressed/cmdline.c:2:
./arch/x86/include/asm/uaccess.h:29:20: note: previous definition of ‘set_fs’ was here
 static inline void set_fs(mm_segment_t fs)
                    ^~~~~~
make[2]: *** [scripts/Makefile.build:268: arch/x86/boot/compressed/cmdline.o] Error 1
make[1]: *** [arch/x86/boot/Makefile:113: arch/x86/boot/compressed/vmlinux] Error 2
make: *** [arch/x86/Makefile:285: bzImage] Error 2

Right now I'm not sure what the proper fix should be.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 11/15] completion: Use simple wait queues
  2020-03-19 23:25   ` Julian Calaby
@ 2020-03-20  6:59     ` Christoph Hellwig
  0 siblings, 0 replies; 72+ messages in thread
From: Christoph Hellwig @ 2020-03-20  6:59 UTC (permalink / raw)
  To: Julian Calaby
  Cc: Thomas Gleixner, Randy Dunlap, Peter Zijlstra, linux-pci,
	Sebastian Andrzej Siewior, Oleg Nesterov, Joel Fernandes,
	Will Deacon, Ingo Molnar, Davidlohr Bueso, Paul E . McKenney,
	Logan Gunthorpe, Arnd Bergmann, linuxppc-dev, Steven Rostedt,
	Bjorn Helgaas, Kurt Schwemmer, Kalle Valo, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, linux-wireless, LKML, netdev,
	Linus Torvalds, David S. Miller

On Fri, Mar 20, 2020 at 10:25:41AM +1100, Julian Calaby wrote:
> > +++ b/drivers/usb/gadget/function/f_fs.c
> > @@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
> >                 pr_info("%s(): freeing\n", __func__);
> >                 ffs_data_clear(ffs);
> >                 BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
> > -                      waitqueue_active(&ffs->ep0req_completion.wait) ||
> > +                      swait_active(&ffs->ep0req_completion.wait) ||
> 
> This looks like some code is reaching deep into the dirty dark corners
> of the completion implementation, should there be some wrapper around
> this to hide that?

Or just remote it entirely..

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event()
  2020-03-20  5:36   ` Davidlohr Bueso
@ 2020-03-20  8:45     ` Sebastian Andrzej Siewior
  2020-03-20  8:58       ` Davidlohr Bueso
  0 siblings, 1 reply; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  8:45 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Linus Torvalds,
	Ingo Molnar, Will Deacon, Paul E . McKenney, Joel Fernandes,
	Steven Rostedt, Randy Dunlap, Oleg Nesterov, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On 2020-03-19 22:36:57 [-0700], Davidlohr Bueso wrote:
> On Wed, 18 Mar 2020, Thomas Gleixner wrote:
> 
> Right now I'm not sure what the proper fix should be.

I though that v2 has it fixed with the previous commit (acpi: Remove
header dependency). The kbot just reported that everything is fine.
Let me look…

> Thanks,
> Davidlohr

Sebastian

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 00/15] Lock ordering documentation and annotation for lockdep
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (10 preceding siblings ...)
  2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
@ 2020-03-20  8:50 ` Davidlohr Bueso
  2020-03-20  8:55 ` [PATCH 16/15] rcuwait: Get rid of stale name comment Davidlohr Bueso
  12 siblings, 0 replies; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-20  8:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Michael Ellerman, Arnd Bergmann, linuxppc-dev

On Wed, 18 Mar 2020, Thomas Gleixner wrote:
>    The PS3 one got converted by Peter Zijlstra to rcu_wait().

While at it, I think it makes sense to finally convert the kvm vcpu swait
to rcuwait (patch 6/15 starts the necessary api changes). I'm sending
some patches on top of this patchset.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 16/15] rcuwait: Get rid of stale name comment
  2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
                   ` (11 preceding siblings ...)
  2020-03-20  8:50 ` [patch V2 00/15] Lock ordering documentation and annotation for lockdep Davidlohr Bueso
@ 2020-03-20  8:55 ` Davidlohr Bueso
  2020-03-20  8:55   ` [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted Davidlohr Bueso
                     ` (2 more replies)
  12 siblings, 3 replies; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-20  8:55 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will,
	Davidlohr Bueso

The 'trywake' name was renamed to simply 'wake',
update the comment.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/exit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 0b81b26a872a..6cc6cc485d07 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -243,7 +243,7 @@ void rcuwait_wake_up(struct rcuwait *w)
 	/*
 	 * Order condition vs @task, such that everything prior to the load
 	 * of @task is visible. This is the condition as to why the user called
-	 * rcuwait_trywake() in the first place. Pairs with set_current_state()
+	 * rcuwait_wake() in the first place. Pairs with set_current_state()
 	 * barrier (A) in rcuwait_wait_event().
 	 *
 	 *    WAIT                WAKE
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted
  2020-03-20  8:55 ` [PATCH 16/15] rcuwait: Get rid of stale name comment Davidlohr Bueso
@ 2020-03-20  8:55   ` Davidlohr Bueso
  2020-03-20  9:13     ` Sebastian Andrzej Siewior
  2020-03-20 10:44     ` Peter Zijlstra
  2020-03-20  8:55   ` [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait Davidlohr Bueso
  2020-03-20  8:55   ` [PATCH 19/15] sched/swait: Reword some of the main description Davidlohr Bueso
  2 siblings, 2 replies; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-20  8:55 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will,
	Davidlohr Bueso

Let the caller know if wake_up_process() was actually called or not;
some users can use this information for ad-hoc. Of course returning
true does not guarantee that wake_up_process() actually woke anything
up.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/rcuwait.h |  2 +-
 kernel/exit.c           | 10 ++++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
index 6e8798458091..3f83b9a12ad3 100644
--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -24,7 +24,7 @@ static inline void rcuwait_init(struct rcuwait *w)
 	w->task = NULL;
 }
 
-extern void rcuwait_wake_up(struct rcuwait *w);
+extern bool rcuwait_wake_up(struct rcuwait *w);
 
 /*
  * The caller is responsible for locking around rcuwait_wait_event(),
diff --git a/kernel/exit.c b/kernel/exit.c
index 6cc6cc485d07..b0bb0a8ec4b1 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -234,9 +234,10 @@ void release_task(struct task_struct *p)
 		goto repeat;
 }
 
-void rcuwait_wake_up(struct rcuwait *w)
+bool rcuwait_wake_up(struct rcuwait *w)
 {
 	struct task_struct *task;
+	bool ret = false;
 
 	rcu_read_lock();
 
@@ -254,10 +255,15 @@ void rcuwait_wake_up(struct rcuwait *w)
 	smp_mb(); /* (B) */
 
 	task = rcu_dereference(w->task);
-	if (task)
+	if (task) {
 		wake_up_process(task);
+	        ret = true;
+	}
 	rcu_read_unlock();
+
+	return ret;
 }
+EXPORT_SYMBOL_GPL(rcuwait_wake_up);
 
 /*
  * Determine if a process group is "orphaned", according to the POSIX
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait
  2020-03-20  8:55 ` [PATCH 16/15] rcuwait: Get rid of stale name comment Davidlohr Bueso
  2020-03-20  8:55   ` [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted Davidlohr Bueso
@ 2020-03-20  8:55   ` Davidlohr Bueso
  2020-03-20 11:20     ` Paolo Bonzini
  2020-03-20 12:54     ` Peter Zijlstra
  2020-03-20  8:55   ` [PATCH 19/15] sched/swait: Reword some of the main description Davidlohr Bueso
  2 siblings, 2 replies; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-20  8:55 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will, Paolo Bonzini,
	Davidlohr Bueso

The use of any sort of waitqueue (simple or regular) for
wait/waking vcpus has always been an overkill and semantically
wrong. Because this is per-vcpu (which is blocked) there is
only ever a single waiting vcpu, thus no need for any sort of
queue.

As such, make use of the rcuwait primitive, with the following
considerations:

  - rcuwait already provides the proper barriers that serialize
  concurrent waiter and waker.

  - Task wakeup is done in rcu read critical region, with a
  stable task pointer.

  - Because there is no concurrency among waiters, we need
  not worry about rcuwait_wait_event() calls corrupting
  the wait->task. As a consequence, this saves the locking
  done in swait when adding to the queue.

The x86-tscdeadline_latency test mentioned in 8577370fb0cb
("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg,
latency is reduced by around 15% with this change.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---

Only compiled and tested on x86.

 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kvm/book3s_hv.c        | 10 ++++------
 arch/x86/kvm/lapic.c                |  2 +-
 include/linux/kvm_host.h            | 10 +++++-----
 virt/kvm/arm/arch_timer.c           |  2 +-
 virt/kvm/arm/arm.c                  |  9 +++++----
 virt/kvm/async_pf.c                 |  3 +--
 virt/kvm/kvm_main.c                 | 33 +++++++++++++--------------------
 8 files changed, 31 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 6e8b8ffd06ad..e2b4a1e3fb7d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -752,7 +752,7 @@ struct kvm_vcpu_arch {
 	u8 irq_pending; /* Used by XIVE to signal pending guest irqs */
 	u32 last_inst;
 
-	struct swait_queue_head *wqp;
+	struct rcuwait *waitp;
 	struct kvmppc_vcore *vcore;
 	int ret;
 	int trap;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2cefd071b848..c7cbc4bd06e9 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -231,13 +231,11 @@ static bool kvmppc_ipi_thread(int cpu)
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
 	int cpu;
-	struct swait_queue_head *wqp;
+	struct rcuwait *wait;
 
-	wqp = kvm_arch_vcpu_wq(vcpu);
-	if (swq_has_sleeper(wqp)) {
-		swake_up_one(wqp);
+	wait = kvm_arch_vcpu_get_wait(vcpu);
+	if (rcuwait_wake_up(wait))
 		++vcpu->stat.halt_wakeup;
-	}
 
 	cpu = READ_ONCE(vcpu->arch.thread_cpu);
 	if (cpu >= 0 && kvmppc_ipi_thread(cpu))
@@ -4274,7 +4272,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	}
 	user_vrsave = mfspr(SPRN_VRSAVE);
 
-	vcpu->arch.wqp = &vcpu->arch.vcore->wq;
+	vcpu->arch.waitp = &vcpu->arch.vcore->wait;
 	vcpu->arch.pgdir = kvm->mm->pgd;
 	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e3099c642fec..a4420c26dfbc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1815,7 +1815,7 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
 	/* If the preempt notifier has already run, it also called apic_timer_expired */
 	if (!apic->lapic_timer.hv_timer_in_use)
 		goto out;
-	WARN_ON(swait_active(&vcpu->wq));
+	WARN_ON(rcu_dereference(vcpu->wait.task));
 	cancel_hv_timer(apic);
 	apic_timer_expired(apic);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bcb9b2ac0791..b5694429aede 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -23,7 +23,7 @@
 #include <linux/irqflags.h>
 #include <linux/context_tracking.h>
 #include <linux/irqbypass.h>
-#include <linux/swait.h>
+#include <linux/rcuwait.h>
 #include <linux/refcount.h>
 #include <linux/nospec.h>
 #include <asm/signal.h>
@@ -277,7 +277,7 @@ struct kvm_vcpu {
 	struct mutex mutex;
 	struct kvm_run *run;
 
-	struct swait_queue_head wq;
+	struct rcuwait wait;
 	struct pid __rcu *pid;
 	int sigset_active;
 	sigset_t sigset;
@@ -952,12 +952,12 @@ static inline bool kvm_arch_has_assigned_device(struct kvm *kvm)
 }
 #endif
 
-static inline struct swait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
+static inline struct rcuwait *kvm_arch_vcpu_get_wait(struct kvm_vcpu *vcpu)
 {
 #ifdef __KVM_HAVE_ARCH_WQP
-	return vcpu->arch.wqp;
+	return vcpu->arch.wait;
 #else
-	return &vcpu->wq;
+	return &vcpu->wait;
 #endif
 }
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 0d9438e9de2a..4be71cb58691 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -593,7 +593,7 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
-	if (swait_active(kvm_arch_vcpu_wq(vcpu)))
+	if (rcu_dereference(kvm_arch_vpu_get_wait(vcpu)) != NULL)
 		kvm_timer_blocking(vcpu);
 
 	/*
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index eda7b624eab8..4a704866e9b6 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -579,16 +579,17 @@ void kvm_arm_resume_guest(struct kvm *kvm)
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		vcpu->arch.pause = false;
-		swake_up_one(kvm_arch_vcpu_wq(vcpu));
+		rcuwait_wake_up(kvm_arch_vcpu_get_wait(vcpu));
 	}
 }
 
 static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 {
-	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
+	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
 
-	swait_event_interruptible_exclusive(*wq, ((!vcpu->arch.power_off) &&
-				       (!vcpu->arch.pause)));
+	rcuwait_wait_event(*wait,
+			   (!vcpu->arch.power_off) && (!vcpu->arch.pause),
+			   TASK_INTERRUPTIBLE);
 
 	if (vcpu->arch.power_off || vcpu->arch.pause) {
 		/* Awaken to handle a signal, request we sleep again later. */
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 15e5b037f92d..10b533f641a6 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -80,8 +80,7 @@ static void async_pf_execute(struct work_struct *work)
 
 	trace_kvm_async_pf_completed(addr, cr2_or_gpa);
 
-	if (swq_has_sleeper(&vcpu->wq))
-		swake_up_one(&vcpu->wq);
+	rcuwait_wake_up(&vcpu->wait);
 
 	mmput(mm);
 	kvm_put_kvm(vcpu->kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70f03ce0e5c1..6b49dcb321e2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -343,7 +343,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	vcpu->kvm = kvm;
 	vcpu->vcpu_id = id;
 	vcpu->pid = NULL;
-	init_swait_queue_head(&vcpu->wq);
+	rcuwait_init(&vcpu->wait);
 	kvm_async_pf_vcpu_init(vcpu);
 
 	vcpu->pre_pcpu = -1;
@@ -2465,9 +2465,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 {
 	ktime_t start, cur;
-	DECLARE_SWAITQUEUE(wait);
-	bool waited = false;
 	u64 block_ns;
+	int block_check = -EINTR;
 
 	kvm_arch_vcpu_blocking(vcpu);
 
@@ -2487,21 +2486,14 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 					++vcpu->stat.halt_poll_invalid;
 				goto out;
 			}
+
 			cur = ktime_get();
 		} while (single_task_running() && ktime_before(cur, stop));
 	}
 
-	for (;;) {
-		prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
-
-		if (kvm_vcpu_check_block(vcpu) < 0)
-			break;
-
-		waited = true;
-		schedule();
-	}
-
-	finish_swait(&vcpu->wq, &wait);
+	rcuwait_wait_event(&vcpu->wait,
+			   (block_check = kvm_vcpu_check_block(vcpu)) < 0,
+			   TASK_INTERRUPTIBLE);
 	cur = ktime_get();
 out:
 	kvm_arch_vcpu_unblocking(vcpu);
@@ -2525,18 +2517,18 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	trace_kvm_vcpu_wakeup(block_ns, waited, vcpu_valid_wakeup(vcpu));
+	trace_kvm_vcpu_wakeup(block_ns, block_check < 0 ? false : true,
+			      vcpu_valid_wakeup(vcpu));
 	kvm_arch_vcpu_block_finish(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);
 
 bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 {
-	struct swait_queue_head *wqp;
+	struct rcuwait *wait;
 
-	wqp = kvm_arch_vcpu_wq(vcpu);
-	if (swq_has_sleeper(wqp)) {
-		swake_up_one(wqp);
+	wait = kvm_arch_vcpu_get_wait(vcpu);
+	if (rcuwait_wake_up(wait)) {
 		WRITE_ONCE(vcpu->ready, true);
 		++vcpu->stat.halt_wakeup;
 		return true;
@@ -2678,7 +2670,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 				continue;
 			if (vcpu == me)
 				continue;
-			if (swait_active(&vcpu->wq) && !vcpu_dy_runnable(vcpu))
+			if (rcu_dereference(vcpu->wait.task) &&
+			    !vcpu_dy_runnable(vcpu))
 				continue;
 			if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
 				!kvm_arch_vcpu_in_kernel(vcpu))
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 19/15] sched/swait: Reword some of the main description
  2020-03-20  8:55 ` [PATCH 16/15] rcuwait: Get rid of stale name comment Davidlohr Bueso
  2020-03-20  8:55   ` [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted Davidlohr Bueso
  2020-03-20  8:55   ` [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait Davidlohr Bueso
@ 2020-03-20  8:55   ` Davidlohr Bueso
  2020-03-20  9:19     ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-20  8:55 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will,
	Davidlohr Bueso

With both the increased use of swait and kvm no longer using
it, we can reword some of the comments. While removing Linus'
valid rant, I've also cared to explicitly mention that swait
is very different than regular wait. In addition it is
mentioned against using swait in favor of the regular flavor.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/swait.h | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 73e06e9986d4..6e5b5d0e64fd 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -9,23 +9,10 @@
 #include <asm/current.h>
 
 /*
- * BROKEN wait-queues.
- *
- * These "simple" wait-queues are broken garbage, and should never be
- * used. The comments below claim that they are "similar" to regular
- * wait-queues, but the semantics are actually completely different, and
- * every single user we have ever had has been buggy (or pointless).
- *
- * A "swake_up_one()" only wakes up _one_ waiter, which is not at all what
- * "wake_up()" does, and has led to problems. In other cases, it has
- * been fine, because there's only ever one waiter (kvm), but in that
- * case gthe whole "simple" wait-queue is just pointless to begin with,
- * since there is no "queue". Use "wake_up_process()" with a direct
- * pointer instead.
- *
- * While these are very similar to regular wait queues (wait.h) the most
- * important difference is that the simple waitqueue allows for deterministic
- * behaviour -- IOW it has strictly bounded IRQ and lock hold times.
+ * Simple waitqueues are semantically very different to regular wait queues
+ * (wait.h). The most important difference is that the simple waitqueue allows
+ * for deterministic behaviour -- IOW it has strictly bounded IRQ and lock hold
+ * times.
  *
  * Mainly, this is accomplished by two things. Firstly not allowing swake_up_all
  * from IRQ disabled, and dropping the lock upon every wakeup, giving a higher
@@ -39,7 +26,7 @@
  *    sleeper state.
  *
  *  - the !exclusive mode; because that leads to O(n) wakeups, everything is
- *    exclusive.
+ *    exclusive. As such swait_wake_up_one will only ever awake _one_ waiter.
  *
  *  - custom wake callback functions; because you cannot give any guarantees
  *    about random code. This also allows swait to be used in RT, such that
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event()
  2020-03-20  8:45     ` Sebastian Andrzej Siewior
@ 2020-03-20  8:58       ` Davidlohr Bueso
  0 siblings, 0 replies; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-20  8:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Linus Torvalds,
	Ingo Molnar, Will Deacon, Paul E . McKenney, Joel Fernandes,
	Steven Rostedt, Randy Dunlap, Oleg Nesterov, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Fri, 20 Mar 2020, Sebastian Andrzej Siewior wrote:

>I though that v2 has it fixed with the previous commit (acpi: Remove
>header dependency). The kbot just reported that everything is fine.
>Let me look???

Nah my bad, that build did not have the full series applied :)

Sorry for the noise.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 11/15] completion: Use simple wait queues
  2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
                     ` (4 preceding siblings ...)
  2020-03-19 23:25   ` Julian Calaby
@ 2020-03-20  9:01   ` Davidlohr Bueso
  5 siblings, 0 replies; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-20  9:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Arnd Bergmann, Sebastian Andrzej Siewior, Logan Gunthorpe,
	Kurt Schwemmer, Bjorn Helgaas, linux-pci, Felipe Balbi,
	Greg Kroah-Hartman, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Oleg Nesterov, Michael Ellerman,
	linuxppc-dev

On Wed, 18 Mar 2020, Thomas Gleixner wrote:

>From: Thomas Gleixner <tglx@linutronix.de>
>
>completion uses a wait_queue_head_t to enqueue waiters.
>
>wait_queue_head_t contains a spinlock_t to protect the list of waiters
>which excludes it from being used in truly atomic context on a PREEMPT_RT
>enabled kernel.
>
>The spinlock in the wait queue head cannot be replaced by a raw_spinlock
>because:
>
>  - wait queues can have custom wakeup callbacks, which acquire other
>    spinlock_t locks and have potentially long execution times
>
>  - wake_up() walks an unbounded number of list entries during the wake up
>    and may wake an unbounded number of waiters.
>
>For simplicity and performance reasons complete() should be usable on
>PREEMPT_RT enabled kernels.
>
>completions do not use custom wakeup callbacks and are usually single
>waiter, except for a few corner cases.
>
>Replace the wait queue in the completion with a simple wait queue (swait),
>which uses a raw_spinlock_t for protecting the waiter list and therefore is
>safe to use inside truly atomic regions on PREEMPT_RT.
>
>There is no semantical or functional change:
>
>  - completions use the exclusive wait mode which is what swait provides
>
>  - complete() wakes one exclusive waiter
>
>  - complete_all() wakes all waiters while holding the lock which protects
>    the wait queue against newly incoming waiters. The conversion to swait
>    preserves this behaviour.
>
>complete_all() might cause unbound latencies with a large number of waiters
>being woken at once, but most complete_all() usage sites are either in
>testing or initialization code or have only a really small number of
>concurrent waiters which for now does not cause a latency problem. Keep it
>simple for now.
>
>The fixup of the warning check in the USB gadget driver is just a straight
>forward conversion of the lockless waiter check from one waitqueue type to
>the other.
>
>Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>Cc: Arnd Bergmann <arnd@arndb.de>

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted
  2020-03-20  8:55   ` [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted Davidlohr Bueso
@ 2020-03-20  9:13     ` Sebastian Andrzej Siewior
  2020-03-20 10:44     ` Peter Zijlstra
  1 sibling, 0 replies; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  9:13 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: tglx, arnd, balbi, bhelgaas, davem, gregkh, joel, kurt.schwemmer,
	kvalo, linux-kernel, linux-pci, linux-usb, linux-wireless,
	linuxppc-dev, logang, mingo, mpe, netdev, oleg, paulmck, peterz,
	rdunlap, rostedt, torvalds, will, Davidlohr Bueso

On 2020-03-20 01:55:25 [-0700], Davidlohr Bueso wrote:
> Let the caller know if wake_up_process() was actually called or not;
> some users can use this information for ad-hoc. Of course returning
> true does not guarantee that wake_up_process() actually woke anything
> up.

Wouldn't it make sense to return wake_up_process() return value to know
if a change of state occurred or not?

Sebastian

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 19/15] sched/swait: Reword some of the main description
  2020-03-20  8:55   ` [PATCH 19/15] sched/swait: Reword some of the main description Davidlohr Bueso
@ 2020-03-20  9:19     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  9:19 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: tglx, arnd, balbi, bhelgaas, davem, gregkh, joel, kurt.schwemmer,
	kvalo, linux-kernel, linux-pci, linux-usb, linux-wireless,
	linuxppc-dev, logang, mingo, mpe, netdev, oleg, paulmck, peterz,
	rdunlap, rostedt, torvalds, will, Davidlohr Bueso

On 2020-03-20 01:55:27 [-0700], Davidlohr Bueso wrote:
> diff --git a/include/linux/swait.h b/include/linux/swait.h
> index 73e06e9986d4..6e5b5d0e64fd 100644
> --- a/include/linux/swait.h
> +++ b/include/linux/swait.h
> @@ -39,7 +26,7 @@
>   *    sleeper state.
>   *
>   *  - the !exclusive mode; because that leads to O(n) wakeups, everything is
> - *    exclusive.
> + *    exclusive. As such swait_wake_up_one will only ever awake _one_ waiter.
                            swake_up_one()

>   *  - custom wake callback functions; because you cannot give any guarantees
>   *    about random code. This also allows swait to be used in RT, such that

Sebastian

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h
  2020-03-18 20:43 ` [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event() Thomas Gleixner
  2020-03-20  5:36   ` Davidlohr Bueso
@ 2020-03-20  9:48   ` Sebastian Andrzej Siewior
  2020-03-20  9:48     ` [PATCH 1/5] nds32: Remove mm.h from asm/uaccess.h Sebastian Andrzej Siewior
                       ` (4 more replies)
  1 sibling, 5 replies; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  9:48 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will

The following mini-series removes linux/mm.h from the asm/uaccess.h of
the individual architecture. The series has been compile tested with the
defconfig and additionally for ia64 with the "special" allmodconfig
supplied by the bot. The regular allmod for the architectures does not
compile (even without the series).

Sebastian



^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/5] nds32: Remove mm.h from asm/uaccess.h
  2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
@ 2020-03-20  9:48     ` Sebastian Andrzej Siewior
  2020-03-20  9:48     ` [PATCH 2/5] csky: " Sebastian Andrzej Siewior
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  9:48 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will, Nick Hu,
	Greentime Hu, Vincent Chen, kbuild test robot

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/nds32/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/nds32/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 8916ad9f9f139..3a9219f53ee0d 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <asm/errno.h>
 #include <asm/memory.h>
 #include <asm/types.h>
-#include <linux/mm.h>
 
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h
  2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
  2020-03-20  9:48     ` [PATCH 1/5] nds32: Remove mm.h from asm/uaccess.h Sebastian Andrzej Siewior
@ 2020-03-20  9:48     ` Sebastian Andrzej Siewior
  2020-03-21 11:24       ` Guo Ren
  2020-03-20  9:48     ` [PATCH 3/5] hexagon: " Sebastian Andrzej Siewior
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  9:48 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will, Guo Ren,
	linux-csky, kbuild test robot

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/csky/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/csky/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c3403a424..abefa125b93cf 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/sched.h>
-#include <linux/mm.h>
 #include <linux/string.h>
 #include <linux/version.h>
 #include <asm/segment.h>
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/5] hexagon: Remove mm.h from asm/uaccess.h
  2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
  2020-03-20  9:48     ` [PATCH 1/5] nds32: Remove mm.h from asm/uaccess.h Sebastian Andrzej Siewior
  2020-03-20  9:48     ` [PATCH 2/5] csky: " Sebastian Andrzej Siewior
@ 2020-03-20  9:48     ` Sebastian Andrzej Siewior
  2020-03-20  9:48     ` [PATCH 4/5] ia64: " Sebastian Andrzej Siewior
  2020-03-20  9:48     ` [PATCH 5/5] microblaze: " Sebastian Andrzej Siewior
  4 siblings, 0 replies; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  9:48 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will, Brian Cain,
	linux-hexagon, kbuild test robot

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/hexagon/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Brian Cain <bcain@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/hexagon/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h
index 00cb38faad0c4..c1019a736ff13 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -10,7 +10,6 @@
 /*
  * User space memory access functions
  */
-#include <linux/mm.h>
 #include <asm/sections.h>
 
 /*
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 4/5] ia64: Remove mm.h from asm/uaccess.h
  2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
                       ` (2 preceding siblings ...)
  2020-03-20  9:48     ` [PATCH 3/5] hexagon: " Sebastian Andrzej Siewior
@ 2020-03-20  9:48     ` Sebastian Andrzej Siewior
  2020-03-20  9:48     ` [PATCH 5/5] microblaze: " Sebastian Andrzej Siewior
  4 siblings, 0 replies; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  9:48 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will, Tony Luck,
	Fenghua Yu, linux-ia64, kbuild test robot

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/ia64/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/ia64/include/asm/uaccess.h | 1 -
 arch/ia64/kernel/process.c      | 1 +
 arch/ia64/mm/ioremap.c          | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 89782ad3fb887..5c7e79eccaeed 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -35,7 +35,6 @@
 
 #include <linux/compiler.h>
 #include <linux/page-flags.h>
-#include <linux/mm.h>
 
 #include <asm/intrinsics.h>
 #include <asm/pgtable.h>
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 968b5f33e725e..743aaf5283278 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -681,3 +681,4 @@ machine_power_off (void)
 	machine_halt();
 }
 
+EXPORT_SYMBOL(ia64_delay_loop);
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index a09cfa0645369..55fd3eb753ff9 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/efi.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/vmalloc.h>
 #include <asm/io.h>
 #include <asm/meminit.h>
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 5/5] microblaze: Remove mm.h from asm/uaccess.h
  2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
                       ` (3 preceding siblings ...)
  2020-03-20  9:48     ` [PATCH 4/5] ia64: " Sebastian Andrzej Siewior
@ 2020-03-20  9:48     ` Sebastian Andrzej Siewior
  4 siblings, 0 replies; 72+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-20  9:48 UTC (permalink / raw)
  To: tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, dave, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will, Michal Simek,
	kbuild test robot

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/microblaze/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Michal Simek <monstr@monstr.eu>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index a1f206b90753a..4916d5fbea5e3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include <linux/kernel.h>
-#include <linux/mm.h>
 
 #include <asm/mmu.h>
 #include <asm/page.h>
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted
  2020-03-20  8:55   ` [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted Davidlohr Bueso
  2020-03-20  9:13     ` Sebastian Andrzej Siewior
@ 2020-03-20 10:44     ` Peter Zijlstra
  1 sibling, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2020-03-20 10:44 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: tglx, arnd, balbi, bhelgaas, bigeasy, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, rdunlap, rostedt, torvalds, will, Davidlohr Bueso

On Fri, Mar 20, 2020 at 01:55:25AM -0700, Davidlohr Bueso wrote:

> diff --git a/kernel/exit.c b/kernel/exit.c
> index 6cc6cc485d07..b0bb0a8ec4b1 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -234,9 +234,10 @@ void release_task(struct task_struct *p)
>  		goto repeat;
>  }
>  
> -void rcuwait_wake_up(struct rcuwait *w)
> +bool rcuwait_wake_up(struct rcuwait *w)
>  {
>  	struct task_struct *task;
> +	bool ret = false;
>  
>  	rcu_read_lock();
>  
> @@ -254,10 +255,15 @@ void rcuwait_wake_up(struct rcuwait *w)
>  	smp_mb(); /* (B) */
>  
>  	task = rcu_dereference(w->task);
> -	if (task)
> +	if (task) {
>  		wake_up_process(task);
> +	        ret = true;

		ret = wake_up_process(task); ?

> +	}
>  	rcu_read_unlock();
> +
> +	return ret;
>  }
> +EXPORT_SYMBOL_GPL(rcuwait_wake_up);

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait
  2020-03-20  8:55   ` [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait Davidlohr Bueso
@ 2020-03-20 11:20     ` Paolo Bonzini
  2020-03-20 12:54     ` Peter Zijlstra
  1 sibling, 0 replies; 72+ messages in thread
From: Paolo Bonzini @ 2020-03-20 11:20 UTC (permalink / raw)
  To: Davidlohr Bueso, tglx
  Cc: arnd, balbi, bhelgaas, bigeasy, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, peterz, rdunlap, rostedt, torvalds, will,
	Davidlohr Bueso, Paul Mackerras

On 20/03/20 09:55, Davidlohr Bueso wrote:
> Only compiled and tested on x86.

It shows :) as the __KVM_HAVE_ARCH_WQP case is broken.  But no problem, 
Paul and I can pick this up and fix it.

This is missing:

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 506e4df2d730..6e5d85ba588d 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -78,7 +78,7 @@ struct kvmppc_vcore {
 	struct kvm_vcpu *runnable_threads[MAX_SMT_THREADS];
 	struct list_head preempt_list;
 	spinlock_t lock;
-	struct swait_queue_head wq;
+	struct rcuwait wait;
 	spinlock_t stoltb_lock;	/* protects stolen_tb and preempt_tb */
 	u64 stolen_tb;
 	u64 preempt_tb;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1af96fb5dc6f..8c8122c30b89 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -754,7 +754,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	if (err)
 		goto out_vcpu_uninit;
 
-	vcpu->arch.wqp = &vcpu->wq;
+	vcpu->arch.waitp = &vcpu->wait;
 	kvmppc_create_vcpu_debugfs(vcpu, vcpu->vcpu_id);
 	return 0;
 

and...

> -static inline struct swait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
> +static inline struct rcuwait *kvm_arch_vcpu_get_wait(struct kvm_vcpu *vcpu)
>  {
>  #ifdef __KVM_HAVE_ARCH_WQP
> -	return vcpu->arch.wqp;
> +	return vcpu->arch.wait;

... this needs to be vcpu->arch.waitp.  That should be it.

Thanks!

Paolo

>  #else
> -	return &vcpu->wq;
> +	return &vcpu->wait;
>  #endif
>  }
>  
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 0d9438e9de2a..4be71cb58691 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -593,7 +593,7 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> -	if (swait_active(kvm_arch_vcpu_wq(vcpu)))
> +	if (rcu_dereference(kvm_arch_vpu_get_wait(vcpu)) != NULL)
>  		kvm_timer_blocking(vcpu);
>  
>  	/*
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index eda7b624eab8..4a704866e9b6 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -579,16 +579,17 @@ void kvm_arm_resume_guest(struct kvm *kvm)
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		vcpu->arch.pause = false;
> -		swake_up_one(kvm_arch_vcpu_wq(vcpu));
> +		rcuwait_wake_up(kvm_arch_vcpu_get_wait(vcpu));
>  	}
>  }
>  
>  static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
>  {
> -	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> +	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
>  
> -	swait_event_interruptible_exclusive(*wq, ((!vcpu->arch.power_off) &&
> -				       (!vcpu->arch.pause)));
> +	rcuwait_wait_event(*wait,
> +			   (!vcpu->arch.power_off) && (!vcpu->arch.pause),
> +			   TASK_INTERRUPTIBLE);
>  
>  	if (vcpu->arch.power_off || vcpu->arch.pause) {
>  		/* Awaken to handle a signal, request we sleep again later. */
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 15e5b037f92d..10b533f641a6 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -80,8 +80,7 @@ static void async_pf_execute(struct work_struct *work)
>  
>  	trace_kvm_async_pf_completed(addr, cr2_or_gpa);
>  
> -	if (swq_has_sleeper(&vcpu->wq))
> -		swake_up_one(&vcpu->wq);
> +	rcuwait_wake_up(&vcpu->wait);
>  
>  	mmput(mm);
>  	kvm_put_kvm(vcpu->kvm);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 70f03ce0e5c1..6b49dcb321e2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -343,7 +343,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>  	vcpu->kvm = kvm;
>  	vcpu->vcpu_id = id;
>  	vcpu->pid = NULL;
> -	init_swait_queue_head(&vcpu->wq);
> +	rcuwait_init(&vcpu->wait);
>  	kvm_async_pf_vcpu_init(vcpu);
>  
>  	vcpu->pre_pcpu = -1;
> @@ -2465,9 +2465,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
>  void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  {
>  	ktime_t start, cur;
> -	DECLARE_SWAITQUEUE(wait);
> -	bool waited = false;
>  	u64 block_ns;
> +	int block_check = -EINTR;
>  
>  	kvm_arch_vcpu_blocking(vcpu);
>  
> @@ -2487,21 +2486,14 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  					++vcpu->stat.halt_poll_invalid;
>  				goto out;
>  			}
> +
>  			cur = ktime_get();
>  		} while (single_task_running() && ktime_before(cur, stop));
>  	}
>  
> -	for (;;) {
> -		prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> -
> -		if (kvm_vcpu_check_block(vcpu) < 0)
> -			break;
> -
> -		waited = true;
> -		schedule();
> -	}
> -
> -	finish_swait(&vcpu->wq, &wait);
> +	rcuwait_wait_event(&vcpu->wait,
> +			   (block_check = kvm_vcpu_check_block(vcpu)) < 0,
> +			   TASK_INTERRUPTIBLE);
>  	cur = ktime_get();
>  out:
>  	kvm_arch_vcpu_unblocking(vcpu);
> @@ -2525,18 +2517,18 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> -	trace_kvm_vcpu_wakeup(block_ns, waited, vcpu_valid_wakeup(vcpu));
> +	trace_kvm_vcpu_wakeup(block_ns, block_check < 0 ? false : true,
> +			      vcpu_valid_wakeup(vcpu));
>  	kvm_arch_vcpu_block_finish(vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_vcpu_block);
>  
>  bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
>  {
> -	struct swait_queue_head *wqp;
> +	struct rcuwait *wait;
>  
> -	wqp = kvm_arch_vcpu_wq(vcpu);
> -	if (swq_has_sleeper(wqp)) {
> -		swake_up_one(wqp);
> +	wait = kvm_arch_vcpu_get_wait(vcpu);
> +	if (rcuwait_wake_up(wait)) {
>  		WRITE_ONCE(vcpu->ready, true);
>  		++vcpu->stat.halt_wakeup;
>  		return true;
> @@ -2678,7 +2670,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
>  				continue;
>  			if (vcpu == me)
>  				continue;
> -			if (swait_active(&vcpu->wq) && !vcpu_dy_runnable(vcpu))
> +			if (rcu_dereference(vcpu->wait.task) &&
> +			    !vcpu_dy_runnable(vcpu))
>  				continue;
>  			if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
>  				!kvm_arch_vcpu_in_kernel(vcpu))
> 


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait
  2020-03-20  8:55   ` [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait Davidlohr Bueso
  2020-03-20 11:20     ` Paolo Bonzini
@ 2020-03-20 12:54     ` Peter Zijlstra
  2020-03-22 16:33       ` Davidlohr Bueso
  1 sibling, 1 reply; 72+ messages in thread
From: Peter Zijlstra @ 2020-03-20 12:54 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: tglx, arnd, balbi, bhelgaas, bigeasy, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, rdunlap, rostedt, torvalds, will, Paolo Bonzini,
	Davidlohr Bueso

On Fri, Mar 20, 2020 at 01:55:26AM -0700, Davidlohr Bueso wrote:
> -	swait_event_interruptible_exclusive(*wq, ((!vcpu->arch.power_off) &&
> -				       (!vcpu->arch.pause)));
> +	rcuwait_wait_event(*wait,
> +			   (!vcpu->arch.power_off) && (!vcpu->arch.pause),
> +			   TASK_INTERRUPTIBLE);

> -	for (;;) {
> -		prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> -
> -		if (kvm_vcpu_check_block(vcpu) < 0)
> -			break;
> -
> -		waited = true;
> -		schedule();
> -	}
> -
> -	finish_swait(&vcpu->wq, &wait);
> +	rcuwait_wait_event(&vcpu->wait,
> +			   (block_check = kvm_vcpu_check_block(vcpu)) < 0,
> +			   TASK_INTERRUPTIBLE);

Are these yet more instances that really want to be TASK_IDLE ?


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-19 18:02     ` Thomas Gleixner
@ 2020-03-20 16:01       ` Paul E. McKenney
  2020-03-20 19:51         ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2020-03-20 16:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Thu, Mar 19, 2020 at 07:02:17PM +0100, Thomas Gleixner wrote:
> Paul,
> 
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> 
> > On Wed, Mar 18, 2020 at 09:43:10PM +0100, Thomas Gleixner wrote:
> >
> > Mostly native-English-speaker services below, so please feel free to
> > ignore.  The one place I made a substantive change, I marked it "@@@".
> > I only did about half of this document, but should this prove useful,
> > I will do the other half later.
> 
> Native speaker services are always useful and appreciated.

Glad it is helpful.  ;-)

[ . . . ]

> >> +
> >> +raw_spinlock_t and spinlock_t
> >> +=============================
> >> +
> >> +raw_spinlock_t
> >> +--------------
> >> +
> >> +raw_spinlock_t is a strict spinning lock implementation regardless of the
> >> +kernel configuration including PREEMPT_RT enabled kernels.
> >> +
> >> +raw_spinlock_t is to be used only in real critical core code, low level
> >> +interrupt handling and places where protecting (hardware) state is required
> >> +to be safe against preemption and eventually interrupts.
> >> +
> >> +Another reason to use raw_spinlock_t is when the critical section is tiny
> >> +to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the
> >> +contended case.
> >
> > raw_spinlock_t is a strict spinning lock implementation in all kernels,
> > including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
> > core code, low level interrupt handling and places where disabling
> > preemption or interrupts is required, for example, to safely access
> > hardware state.  raw_spinlock_t can sometimes also be used when the
> > critical section is tiny and the lock is lightly contended, thus avoiding
> > RT-mutex overhead.
> >
> > @@@  I added the point about the lock being lightly contended.
> 
> Hmm, not sure. The point is that if the critical section is small the
> overhead of cross CPU boosting along with the resulting IPIs is going to
> be at least an order of magnitude larger. And on contention this is just
> pushing the raw_spinlock contention off to the raw_spinlock in the rt
> mutex plus the owning tasks pi_lock which makes things even worse.

Fair enough.  So, leaving that out:

raw_spinlock_t is a strict spinning lock implementation in all kernels,
including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
core code, low level interrupt handling and places where disabling
preemption or interrupts is required, for example, to safely access
hardware state.  In addition, raw_spinlock_t can sometimes be used when
the critical section is tiny, thus avoiding RT-mutex overhead.

> >> + - The hard interrupt related suffixes for spin_lock / spin_unlock
> >> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
> 
> Si senor!

;-)

> >> +   interrupt disabled state
> >> +
> >> + - The soft interrupt related suffix (_bh()) is still disabling the
> >> +   execution of soft interrupts, but contrary to a non PREEMPT_RT enabled
> >> +   kernel, which utilizes the preemption count, this is achieved by a per
> >> +   CPU bottom half locking mechanism.
> >
> >  - The soft interrupt related suffix (_bh()) still disables softirq
> >    handlers.  However, unlike non-PREEMPT_RT kernels (which disable
> >    preemption to get this effect), PREEMPT_RT kernels use a per-CPU
> >    per-bottom-half locking mechanism.
> 
> it's not per-bottom-half anymore. That turned out to be dangerous due to
> dependencies between BH types, e.g. network and timers.

Ah!  OK, how about this?

 - The soft interrupt related suffix (_bh()) still disables softirq
   handlers.  However, unlike non-PREEMPT_RT kernels (which disable
   preemption to get this effect), PREEMPT_RT kernels use a per-CPU
   lock to exclude softirq handlers.

> I hope I was able to encourage you to comment on the other half as well :)

OK, here goes...

> +All other semantics of spinlock_t are preserved:
> +
> + - Migration of tasks which hold a spinlock_t is prevented. On a non
> +   PREEMPT_RT enabled kernel this is implicit due to preemption disable.
> +   PREEMPT_RT has a separate mechanism to achieve this. This ensures that
> +   pointers to per CPU variables stay valid even if the task is preempted.
> +
> + - Task state preservation. The task state is not affected when a lock is
> +   contended and the task has to schedule out and wait for the lock to
> +   become available. The lock wake up restores the task state unless there
> +   was a regular (not lock related) wake up on the task. This ensures that
> +   the task state rules are always correct independent of the kernel
> +   configuration.

How about this?

PREEMPT_RT kernels preserve all other spinlock_t semantics:

 - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
   disable migration, which ensures that pointers to per-CPU variables
   remain valid even if the task is preempted.

 - Task state is preserved across spinlock acquisition, ensuring that the
   task-state rules apply to all kernel configurations.  In non-PREEMPT_RT
   kernels leave task state untouched.  However, PREEMPT_RT must change
   task state if the task blocks during acquisition.  Therefore, the
   corresponding lock wakeup restores the task state.  Note that regular
   (not lock related) wakeups do not restore task state.

> +rwlock_t
> +========
> +
> +rwlock_t is a multiple readers and single writer lock mechanism.
> +
> +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning
> +lock and the suffix rules of spinlock_t apply accordingly. The
> +implementation is fair and prevents writer starvation.

Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
suffix rules of spinlock_t apply accordingly. The implementation is fair,
thus preventing writer starvation.

> +rwlock_t and PREEMPT_RT
> +-----------------------
> +
> +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate
> +implementation based on rt_mutex which changes the semantics:
> +
> + - Same changes as for spinlock_t
> +
> + - The implementation is not fair and can cause writer starvation under
> +   certain circumstances. The reason for this is that a writer cannot grant
> +   its priority to multiple readers. Readers which are blocked on a writer
> +   fully support the priority inheritance protocol.

PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
implementation, thus changing semantics:

 - All the spinlock_t changes also apply to rwlock_t.

 - Because an rwlock_t writer cannot grant its priority to multiple
   readers, a preempted low-priority reader will continue holding its
   lock, thus starving even high-priority writers.  In contrast, because
   readers can grant their priority to a writer, a preempted low-priority
   writer will have its priority boosted until it releases the lock,
   thus preventing that writer from starving readers.

> +PREEMPT_RT caveats
> +==================
> +
> +spinlock_t and rwlock_t
> +-----------------------
> +
> +The substitution of spinlock_t and rwlock_t on PREEMPT_RT enabled kernels
> +with RT-mutex based implementations has a few implications.
> +
> +On a non PREEMPT_RT enabled kernel the following code construct is
> +perfectly fine::

These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
have a few implications.  For example, on a non-PREEMPT_RT kernel the
following code sequence works as expected::

> +   local_irq_disable();
> +   spin_lock(&lock);
> +
> +and fully equivalent to::

and is fully equivalent to::

> +   spin_lock_irq(&lock);
> +
> +Same applies to rwlock_t and the _irqsave() suffix variant.

The same applies to rwlock_t and its _irqsave() variant.

> +On a PREEMPT_RT enabled kernel this breaks because the RT-mutex
> +substitution expects a fully preemptible context.
> +
> +The preferred solution is to use :c:func:`spin_lock_irq()` or
> +:c:func:`spin_lock_irqsave()` and their unlock counterparts.
> +
> +PREEMPT_RT also offers a local_lock mechanism to substitute the
> +local_irq_disable/save() constructs in cases where a separation of the
> +interrupt disabling and the locking is really unavoidable. This should be
> +restricted to very rare cases.

On PREEMPT_RT kernel this code sequence breaks because RT-mutex
requires a fully preemptible context.  Instead, use spin_lock_irq() or
spin_lock_irqsave() and their unlock counterparts.  In cases where the
interrupt disabling and locking must remain separate, PREEMPT_RT offers
a local_lock mechanism.  Acquiring the local_lock pins the task to a
CPU, allowing things like per-CPU irq-disabled locks to be acquired.
However, this approach should be used only where absolutely necessary.


> +raw_spinlock_t
> +--------------
> +
> +Locking of a raw_spinlock_t disables preemption and eventually interrupts.
> +Therefore code inside the critical region has to be careful to avoid calls
> +into code which takes a regular spinlock_t or rwlock_t. A prime example is
> +memory allocation.
> +
> +On a non PREEMPT_RT enabled kernel the following code construct is
> +perfectly fine code::

Acquiring a raw_spinlock_t disables preemption and possibly also
interrupts, so the critical section must avoid acquiring a regular
spinlock_t or rwlock_t, for example, the critical section must avoid
allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
works perfectly::

> +  raw_spin_lock(&lock);
> +  p = kmalloc(sizeof(*p), GFP_ATOMIC);
> +
> +On a PREEMPT_RT enabled kernel this breaks because the memory allocator is
> +fully preemptible and therefore does not support allocations from truly
> +atomic contexts.
> +
> +Contrary to that the following code construct is perfectly fine on
> +PREEMPT_RT as spin_lock() does not disable preemption::

But this code failes on PREEMPT_RT kernels because the memory allocator
is fully preemptible and therefore cannot be invoked from truly atomic
contexts.  However, it is perfectly fine to invoke the memory allocator
while holding a normal non-raw spinlocks because they do not disable
preemption::

> +  spin_lock(&lock);
> +  p = kmalloc(sizeof(*p), GFP_ATOMIC);
> +
> +Most places which use GFP_ATOMIC allocations are safe on PREEMPT_RT as the
> +execution is forced into thread context and the lock substitution is
> +ensuring preemptibility.

Interestingly enough, most uses of GFP_ATOMIC allocations are
actually safe on PREEMPT_RT because the the lock substitution ensures
preemptibility.  Only those GFP_ATOMIC allocations that are invoke
while holding a raw spinlock or with preemption otherwise disabled need
adjustment to work correctly on PREEMPT_RT.

[ I am not as confident of the above as I would like to be... ]

And meeting time, will continue later!

							Thanx, Paul

> +bit spinlocks
> +-------------
> +
> +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
> +substituted by an RT-mutex based implementation for obvious reasons.
> +
> +The semantics of bit spinlocks are preserved on a PREEMPT_RT enabled kernel
> +and the caveats vs. raw_spinlock_t apply.
> +
> +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
> +this requires conditional (#ifdef'ed) code changes at the usage side while
> +the spinlock_t substitution is simply done by the compiler and the
> +conditionals are restricted to header files and core implementation of the
> +locking primitives and the usage sites do not require any changes.
> +
> +
> +Lock type nesting rules
> +=======================
> +
> +The most basic rules are:
> +
> +  - Lock types of the same lock category (sleeping, spinning) can nest
> +    arbitrarily as long as they respect the general lock ordering rules to
> +    prevent deadlocks.
> +
> +  - Sleeping lock types cannot nest inside spinning lock types.
> +
> +  - Spinning lock types can nest inside sleeping lock types.
> +
> +These rules apply in general independent of CONFIG_PREEMPT_RT.
> +
> +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
> +spinning to sleeping this has obviously restrictions how they can nest with
> +raw_spinlock_t.
> +
> +This results in the following nest ordering:
> +
> +  1) Sleeping locks
> +  2) spinlock_t and rwlock_t
> +  3) raw_spinlock_t and bit spinlocks
> +
> +Lockdep is aware of these constraints to ensure that they are respected.
> +
> +
> +Owner semantics
> +===============
> +
> +Most lock types in the Linux kernel have strict owner semantics, i.e. the
> +context (task) which acquires a lock has to release it.
> +
> +There are two exceptions:
> +
> +  - semaphores
> +  - rwsems
> +
> +semaphores have no strict owner semantics for historical reasons. They are
> +often used for both serialization and waiting purposes. That's generally
> +discouraged and should be replaced by separate serialization and wait
> +mechanisms.
> +
> +rwsems have grown interfaces which allow non owner release for special
> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> +substitutes all locking primitives except semaphores with RT-mutex based
> +implementations to provide priority inheritance for all lock types except
> +the truly spinning ones. Priority inheritance on ownerless locks is
> +obviously impossible.
> +
> +For now the rwsem non-owner release excludes code which utilizes it from
> +being used on PREEMPT_RT enabled kernels. In same cases this can be
> +mitigated by disabling portions of the code, in other cases the complete
> +functionality has to be disabled until a workable solution has been found.
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-20 16:01       ` Paul E. McKenney
@ 2020-03-20 19:51         ` Thomas Gleixner
  2020-03-20 21:02           ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-20 19:51 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

"Paul E. McKenney" <paulmck@kernel.org> writes:
>
>  - The soft interrupt related suffix (_bh()) still disables softirq
>    handlers.  However, unlike non-PREEMPT_RT kernels (which disable
>    preemption to get this effect), PREEMPT_RT kernels use a per-CPU
>    lock to exclude softirq handlers.

I've made that:

  - The soft interrupt related suffix (_bh()) still disables softirq
    handlers.

    Non-PREEMPT_RT kernels disable preemption to get this effect.

    PREEMPT_RT kernels use a per-CPU lock for serialization. The lock
    disables softirq handlers and prevents reentrancy by a preempting
    task.
    
On non-RT this is implicit through preemption disable, but it's non
obvious for RT as preemption stays enabled.

> PREEMPT_RT kernels preserve all other spinlock_t semantics:
>
>  - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
>    avoid migration by disabling preemption.  PREEMPT_RT kernels instead
>    disable migration, which ensures that pointers to per-CPU variables
>    remain valid even if the task is preempted.
>
>  - Task state is preserved across spinlock acquisition, ensuring that the
>    task-state rules apply to all kernel configurations.  In non-PREEMPT_RT
>    kernels leave task state untouched.  However, PREEMPT_RT must change
>    task state if the task blocks during acquisition.  Therefore, the
>    corresponding lock wakeup restores the task state.  Note that regular
>    (not lock related) wakeups do not restore task state.

   - Task state is preserved across spinlock acquisition, ensuring that the
     task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
     kernels leave task state untouched.  However, PREEMPT_RT must change
     task state if the task blocks during acquisition.  Therefore, it
     saves the current task state before blocking and the corresponding
     lock wakeup restores it. A regular not lock related wakeup sets the
     task state to RUNNING. If this happens while the task is blocked on
     a spinlock then the saved task state is changed so that correct
     state is restored on lock wakeup.

Hmm?

> But this code failes on PREEMPT_RT kernels because the memory allocator
> is fully preemptible and therefore cannot be invoked from truly atomic
> contexts.  However, it is perfectly fine to invoke the memory allocator
> while holding a normal non-raw spinlocks because they do not disable
> preemption::
>
>> +  spin_lock(&lock);
>> +  p = kmalloc(sizeof(*p), GFP_ATOMIC);
>> +
>> +Most places which use GFP_ATOMIC allocations are safe on PREEMPT_RT as the
>> +execution is forced into thread context and the lock substitution is
>> +ensuring preemptibility.
>
> Interestingly enough, most uses of GFP_ATOMIC allocations are
> actually safe on PREEMPT_RT because the the lock substitution ensures
> preemptibility.  Only those GFP_ATOMIC allocations that are invoke
> while holding a raw spinlock or with preemption otherwise disabled need
> adjustment to work correctly on PREEMPT_RT.
>
> [ I am not as confident of the above as I would like to be... ]

I'd leave that whole paragraph out. This documents the rules and from
the above code examples it's pretty clear what works and what not :)

> And meeting time, will continue later!

Enjoy!

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-20 19:51         ` Thomas Gleixner
@ 2020-03-20 21:02           ` Paul E. McKenney
  2020-03-20 22:36             ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2020-03-20 21:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Fri, Mar 20, 2020 at 08:51:44PM +0100, Thomas Gleixner wrote:
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> >
> >  - The soft interrupt related suffix (_bh()) still disables softirq
> >    handlers.  However, unlike non-PREEMPT_RT kernels (which disable
> >    preemption to get this effect), PREEMPT_RT kernels use a per-CPU
> >    lock to exclude softirq handlers.
> 
> I've made that:
> 
>   - The soft interrupt related suffix (_bh()) still disables softirq
>     handlers.
> 
>     Non-PREEMPT_RT kernels disable preemption to get this effect.
> 
>     PREEMPT_RT kernels use a per-CPU lock for serialization. The lock
>     disables softirq handlers and prevents reentrancy by a preempting
>     task.

That works!  At the end, I would instead say "prevents reentrancy
due to task preemption", but what you have works.

> On non-RT this is implicit through preemption disable, but it's non
> obvious for RT as preemption stays enabled.
> 
> > PREEMPT_RT kernels preserve all other spinlock_t semantics:
> >
> >  - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
> >    avoid migration by disabling preemption.  PREEMPT_RT kernels instead
> >    disable migration, which ensures that pointers to per-CPU variables
> >    remain valid even if the task is preempted.
> >
> >  - Task state is preserved across spinlock acquisition, ensuring that the
> >    task-state rules apply to all kernel configurations.  In non-PREEMPT_RT
> >    kernels leave task state untouched.  However, PREEMPT_RT must change
> >    task state if the task blocks during acquisition.  Therefore, the
> >    corresponding lock wakeup restores the task state.  Note that regular
> >    (not lock related) wakeups do not restore task state.
> 
>    - Task state is preserved across spinlock acquisition, ensuring that the
>      task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
>      kernels leave task state untouched.  However, PREEMPT_RT must change
>      task state if the task blocks during acquisition.  Therefore, it
>      saves the current task state before blocking and the corresponding
>      lock wakeup restores it. A regular not lock related wakeup sets the
>      task state to RUNNING. If this happens while the task is blocked on
>      a spinlock then the saved task state is changed so that correct
>      state is restored on lock wakeup.
> 
> Hmm?

I of course cannot resist editing the last two sentences:

   ... Other types of wakeups unconditionally set task state to RUNNING.
   If this happens while a task is blocked while acquiring a spinlock,
   then the task state is restored to its pre-acquisition value at
   lock-wakeup time.

> > But this code failes on PREEMPT_RT kernels because the memory allocator
> > is fully preemptible and therefore cannot be invoked from truly atomic
> > contexts.  However, it is perfectly fine to invoke the memory allocator
> > while holding a normal non-raw spinlocks because they do not disable
> > preemption::
> >
> >> +  spin_lock(&lock);
> >> +  p = kmalloc(sizeof(*p), GFP_ATOMIC);
> >> +
> >> +Most places which use GFP_ATOMIC allocations are safe on PREEMPT_RT as the
> >> +execution is forced into thread context and the lock substitution is
> >> +ensuring preemptibility.
> >
> > Interestingly enough, most uses of GFP_ATOMIC allocations are
> > actually safe on PREEMPT_RT because the the lock substitution ensures
> > preemptibility.  Only those GFP_ATOMIC allocations that are invoke
> > while holding a raw spinlock or with preemption otherwise disabled need
> > adjustment to work correctly on PREEMPT_RT.
> >
> > [ I am not as confident of the above as I would like to be... ]
> 
> I'd leave that whole paragraph out. This documents the rules and from
> the above code examples it's pretty clear what works and what not :)

Works for me!  ;-)

> > And meeting time, will continue later!
> 
> Enjoy!

Not bad, actually, as meetings go.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-20 21:02           ` Paul E. McKenney
@ 2020-03-20 22:36             ` Thomas Gleixner
  2020-03-21  2:29               ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-20 22:36 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Fri, Mar 20, 2020 at 08:51:44PM +0100, Thomas Gleixner wrote:
>> "Paul E. McKenney" <paulmck@kernel.org> writes:
>> >
>> >  - The soft interrupt related suffix (_bh()) still disables softirq
>> >    handlers.  However, unlike non-PREEMPT_RT kernels (which disable
>> >    preemption to get this effect), PREEMPT_RT kernels use a per-CPU
>> >    lock to exclude softirq handlers.
>> 
>> I've made that:
>> 
>>   - The soft interrupt related suffix (_bh()) still disables softirq
>>     handlers.
>> 
>>     Non-PREEMPT_RT kernels disable preemption to get this effect.
>> 
>>     PREEMPT_RT kernels use a per-CPU lock for serialization. The lock
>>     disables softirq handlers and prevents reentrancy by a preempting
>>     task.
>
> That works!  At the end, I would instead say "prevents reentrancy
> due to task preemption", but what you have works.

Yours is better.

>>    - Task state is preserved across spinlock acquisition, ensuring that the
>>      task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
>>      kernels leave task state untouched.  However, PREEMPT_RT must change
>>      task state if the task blocks during acquisition.  Therefore, it
>>      saves the current task state before blocking and the corresponding
>>      lock wakeup restores it. A regular not lock related wakeup sets the
>>      task state to RUNNING. If this happens while the task is blocked on
>>      a spinlock then the saved task state is changed so that correct
>>      state is restored on lock wakeup.
>> 
>> Hmm?
>
> I of course cannot resist editing the last two sentences:
>
>    ... Other types of wakeups unconditionally set task state to RUNNING.
>    If this happens while a task is blocked while acquiring a spinlock,
>    then the task state is restored to its pre-acquisition value at
>    lock-wakeup time.

Errm no. That would mean

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                               non lock wakeup
                                 state = RUNNING    <--- FAIL #1

                               lock wakeup
                                 state = real_state <--- FAIL #2

How it works is:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                               non lock wakeup
                                 real_state = RUNNING

                               lock wakeup
                                 state = real_state == RUNNING

If there is no 'non lock wakeup' before the lock wakeup:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                               lock wakeup
                                 state = real_state == UNINTERRUPTIBLE

I agree that what I tried to express is hard to parse, but it's at least
halfways correct :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-20 22:36             ` Thomas Gleixner
@ 2020-03-21  2:29               ` Paul E. McKenney
  2020-03-21 10:26                 ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Paul E. McKenney @ 2020-03-21  2:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Fri, Mar 20, 2020 at 11:36:03PM +0100, Thomas Gleixner wrote:
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > On Fri, Mar 20, 2020 at 08:51:44PM +0100, Thomas Gleixner wrote:
> >> "Paul E. McKenney" <paulmck@kernel.org> writes:
> >> >
> >> >  - The soft interrupt related suffix (_bh()) still disables softirq
> >> >    handlers.  However, unlike non-PREEMPT_RT kernels (which disable
> >> >    preemption to get this effect), PREEMPT_RT kernels use a per-CPU
> >> >    lock to exclude softirq handlers.
> >> 
> >> I've made that:
> >> 
> >>   - The soft interrupt related suffix (_bh()) still disables softirq
> >>     handlers.
> >> 
> >>     Non-PREEMPT_RT kernels disable preemption to get this effect.
> >> 
> >>     PREEMPT_RT kernels use a per-CPU lock for serialization. The lock
> >>     disables softirq handlers and prevents reentrancy by a preempting
> >>     task.
> >
> > That works!  At the end, I would instead say "prevents reentrancy
> > due to task preemption", but what you have works.
> 
> Yours is better.
> 
> >>    - Task state is preserved across spinlock acquisition, ensuring that the
> >>      task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
> >>      kernels leave task state untouched.  However, PREEMPT_RT must change
> >>      task state if the task blocks during acquisition.  Therefore, it
> >>      saves the current task state before blocking and the corresponding
> >>      lock wakeup restores it. A regular not lock related wakeup sets the
> >>      task state to RUNNING. If this happens while the task is blocked on
> >>      a spinlock then the saved task state is changed so that correct
> >>      state is restored on lock wakeup.
> >> 
> >> Hmm?
> >
> > I of course cannot resist editing the last two sentences:
> >
> >    ... Other types of wakeups unconditionally set task state to RUNNING.
> >    If this happens while a task is blocked while acquiring a spinlock,
> >    then the task state is restored to its pre-acquisition value at
> >    lock-wakeup time.
> 
> Errm no. That would mean
> 
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
> 
>                                non lock wakeup
>                                  state = RUNNING    <--- FAIL #1
> 
>                                lock wakeup
>                                  state = real_state <--- FAIL #2
> 
> How it works is:
> 
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
> 
>                                non lock wakeup
>                                  real_state = RUNNING
> 
>                                lock wakeup
>                                  state = real_state == RUNNING
> 
> If there is no 'non lock wakeup' before the lock wakeup:
> 
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
> 
>                                lock wakeup
>                                  state = real_state == UNINTERRUPTIBLE
> 
> I agree that what I tried to express is hard to parse, but it's at least
> halfways correct :)

Apologies!  That is what I get for not looking it up in the source.  :-/

OK, so I am stupid enough not only to get it wrong, but also to try again:

   ... Other types of wakeups would normally unconditionally set the
   task state to RUNNING, but that does not work here because the task
   must remain blocked until the lock becomes available.  Therefore,
   when a non-lock wakeup attempts to awaken a task blocked waiting
   for a spinlock, it instead sets the saved state to RUNNING.  Then,
   when the lock acquisition completes, the lock wakeup sets the task
   state to the saved state, in this case setting it to RUNNING.

Is that better?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-21  2:29               ` Paul E. McKenney
@ 2020-03-21 10:26                 ` Thomas Gleixner
  2020-03-21 17:23                   ` Paul E. McKenney
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-21 10:26 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Fri, Mar 20, 2020 at 11:36:03PM +0100, Thomas Gleixner wrote:
>> I agree that what I tried to express is hard to parse, but it's at least
>> halfways correct :)
>
> Apologies!  That is what I get for not looking it up in the source.  :-/
>
> OK, so I am stupid enough not only to get it wrong, but also to try again:
>
>    ... Other types of wakeups would normally unconditionally set the
>    task state to RUNNING, but that does not work here because the task
>    must remain blocked until the lock becomes available.  Therefore,
>    when a non-lock wakeup attempts to awaken a task blocked waiting
>    for a spinlock, it instead sets the saved state to RUNNING.  Then,
>    when the lock acquisition completes, the lock wakeup sets the task
>    state to the saved state, in this case setting it to RUNNING.
>
> Is that better?

Definitely!

Thanks for all the editorial work!

       tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait
  2020-03-19 10:04   ` Christoph Hellwig
  2020-03-19 10:26     ` Sebastian Andrzej Siewior
@ 2020-03-21 10:41     ` Thomas Gleixner
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-21 10:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: LKML, Randy Dunlap, Peter Zijlstra, linux-pci,
	Sebastian Andrzej Siewior, netdev, Joel Fernandes, Will Deacon,
	Ingo Molnar, Davidlohr Bueso, Paul E . McKenney, Logan Gunthorpe,
	Arnd Bergmann, linuxppc-dev, Steven Rostedt, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, linux-wireless, Oleg Nesterov, Linus Torvalds,
	David S. Miller

Christoph Hellwig <hch@infradead.org> writes:

> On Wed, Mar 18, 2020 at 09:43:09PM +0100, Thomas Gleixner wrote:
>> The PS3 notification interrupt and kthread use a hacked up completion to
>> communicate. Since we're wanting to change the completion implementation and
>> this is abuse anyway, replace it with a simple rcuwait since there is only ever
>> the one waiter.
>> 
>> AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
>> cannot receive signals by default and this one doesn't look different. Use
>> TASK_IDLE instead.
>
> I think the right fix here is to jut convert the thing to a threaded
> interrupt handler and kill off the stupid kthread.

That'd be a major surgery.

> But I wonder how alive the whole PS3 support is to start with..

There seem to be a few enthusiast left which have a Other-OS capable PS3

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h
  2020-03-20  9:48     ` [PATCH 2/5] csky: " Sebastian Andrzej Siewior
@ 2020-03-21 11:24       ` Guo Ren
  2020-03-21 12:08         ` Thomas Gleixner
  0 siblings, 1 reply; 72+ messages in thread
From: Guo Ren @ 2020-03-21 11:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Arnd Bergmann, balbi, bhelgaas, dave,
	David Miller, gregkh, joel, kurt.schwemmer, kvalo,
	Linux Kernel Mailing List, linux-pci, linux-usb, linux-wireless,
	linuxppc-dev, logang, mingo, Michael Ellerman, netdev, oleg,
	paulmck, Peter Zijlstra, Randy Dunlap, Steven Rostedt, torvalds,
	Will Deacon, linux-csky, kbuild test robot

Tested and Acked by me.

Queued for next pull request, thx

On Fri, Mar 20, 2020 at 5:49 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The defconfig compiles without linux/mm.h. With mm.h included the
> include chain leands to:
> |   CC      kernel/locking/percpu-rwsem.o
> | In file included from include/linux/huge_mm.h:8,
> |                  from include/linux/mm.h:567,
> |                  from arch/csky/include/asm/uaccess.h:,
> |                  from include/linux/uaccess.h:11,
> |                  from include/linux/sched/task.h:11,
> |                  from include/linux/sched/signal.h:9,
> |                  from include/linux/rcuwait.h:6,
> |                  from include/linux/percpu-rwsem.h:8,
> |                  from kernel/locking/percpu-rwsem.c:6:
> | include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
> |  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
>
> once rcuwait.h includes linux/sched/signal.h.
>
> Remove the linux/mm.h include.
>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: linux-csky@vger.kernel.org
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/csky/include/asm/uaccess.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
> index eaa1c3403a424..abefa125b93cf 100644
> --- a/arch/csky/include/asm/uaccess.h
> +++ b/arch/csky/include/asm/uaccess.h
> @@ -11,7 +11,6 @@
>  #include <linux/errno.h>
>  #include <linux/types.h>
>  #include <linux/sched.h>
> -#include <linux/mm.h>
>  #include <linux/string.h>
>  #include <linux/version.h>
>  #include <asm/segment.h>
> --
> 2.26.0.rc2
>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h
  2020-03-21 11:24       ` Guo Ren
@ 2020-03-21 12:08         ` Thomas Gleixner
  2020-03-21 14:11           ` Guo Ren
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-21 12:08 UTC (permalink / raw)
  To: Guo Ren, Sebastian Andrzej Siewior
  Cc: Arnd Bergmann, balbi, bhelgaas, dave, David Miller, gregkh, joel,
	kurt.schwemmer, kvalo, Linux Kernel Mailing List, linux-pci,
	linux-usb, linux-wireless, linuxppc-dev, logang, mingo,
	Michael Ellerman, netdev, oleg, paulmck, Peter Zijlstra,
	Randy Dunlap, Steven Rostedt, torvalds, Will Deacon, linux-csky,
	kbuild test robot

Guo Ren <guoren@kernel.org> writes:

> Tested and Acked by me.
>
> Queued for next pull request, thx

Can we please route that with the rcuwait changes to avoid breakage
unless you ship it to Linus right away?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/5] csky: Remove mm.h from asm/uaccess.h
  2020-03-21 12:08         ` Thomas Gleixner
@ 2020-03-21 14:11           ` Guo Ren
  0 siblings, 0 replies; 72+ messages in thread
From: Guo Ren @ 2020-03-21 14:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Arnd Bergmann, balbi, Bjorn Helgaas,
	dave, David Miller, gregkh, joel, kurt.schwemmer, kvalo,
	Linux Kernel Mailing List, linux-pci, linux-usb, linux-wireless,
	linuxppc-dev, logang, mingo, Michael Ellerman, netdev, oleg,
	paulmck, Peter Zijlstra, Randy Dunlap, Steven Rostedt, torvalds,
	Will Deacon, linux-csky, kbuild test robot

On Sat, Mar 21, 2020 at 8:08 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Guo Ren <guoren@kernel.org> writes:
>
> > Tested and Acked by me.
> >
> > Queued for next pull request, thx
>
> Can we please route that with the rcuwait changes to avoid breakage
> unless you ship it to Linus right away?

Ok, I won't queue it.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-21 10:26                 ` Thomas Gleixner
@ 2020-03-21 17:23                   ` Paul E. McKenney
  0 siblings, 0 replies; 72+ messages in thread
From: Paul E. McKenney @ 2020-03-21 17:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Joel Fernandes, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Sat, Mar 21, 2020 at 11:26:06AM +0100, Thomas Gleixner wrote:
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > On Fri, Mar 20, 2020 at 11:36:03PM +0100, Thomas Gleixner wrote:
> >> I agree that what I tried to express is hard to parse, but it's at least
> >> halfways correct :)
> >
> > Apologies!  That is what I get for not looking it up in the source.  :-/
> >
> > OK, so I am stupid enough not only to get it wrong, but also to try again:
> >
> >    ... Other types of wakeups would normally unconditionally set the
> >    task state to RUNNING, but that does not work here because the task
> >    must remain blocked until the lock becomes available.  Therefore,
> >    when a non-lock wakeup attempts to awaken a task blocked waiting
> >    for a spinlock, it instead sets the saved state to RUNNING.  Then,
> >    when the lock acquisition completes, the lock wakeup sets the task
> >    state to the saved state, in this case setting it to RUNNING.
> >
> > Is that better?
> 
> Definitely!
> 
> Thanks for all the editorial work!

NP, and glad you like it!

But I felt even more stupid sometime in the middle of the night.  Why on
earth didn't I work in your nice examples?  :-/

I will pull them in later.  Time to go hike!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-18 20:43 ` [patch V2 08/15] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
                     ` (2 preceding siblings ...)
  2020-03-19 15:04   ` Jonathan Corbet
@ 2020-03-21 21:21   ` Joel Fernandes
  2020-03-21 21:49     ` Thomas Gleixner
  3 siblings, 1 reply; 72+ messages in thread
From: Joel Fernandes @ 2020-03-21 21:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

Hi Thomas,

Just a few comments:

[...]
> +rtmutex
> +=======
> +
> +RT-mutexes are mutexes with support for priority inheritance (PI).
> +
> +PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
> +interrupt disabled sections.
> +
> +On a PREEMPT_RT enabled kernel most of these sections are fully
> +preemptible. This is possible because PREEMPT_RT forces most executions
> +into task context, especially interrupt handlers and soft interrupts, which
> +allows to substitute spinlock_t and rwlock_t with RT-mutex based
> +implementations.
> +
> +
> +raw_spinlock_t and spinlock_t
> +=============================
> +
> +raw_spinlock_t
> +--------------
> +
> +raw_spinlock_t is a strict spinning lock implementation regardless of the
> +kernel configuration including PREEMPT_RT enabled kernels.
> +
> +raw_spinlock_t is to be used only in real critical core code, low level
> +interrupt handling and places where protecting (hardware) state is required
> +to be safe against preemption and eventually interrupts.
> +
> +Another reason to use raw_spinlock_t is when the critical section is tiny
> +to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the
> +contended case.
> +
> +spinlock_t
> +----------
> +
> +The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
> +
> +On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
> +and has exactly the same semantics.
> +
> +spinlock_t and PREEMPT_RT
> +-------------------------
> +
> +On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
> +implementation based on rt_mutex which changes the semantics:
> +
> + - Preemption is not disabled
> +
> + - The hard interrupt related suffixes for spin_lock / spin_unlock
> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
> +   interrupt disabled state
> +
> + - The soft interrupt related suffix (_bh()) is still disabling the
> +   execution of soft interrupts, but contrary to a non PREEMPT_RT enabled
> +   kernel, which utilizes the preemption count, this is achieved by a per
> +   CPU bottom half locking mechanism.
> +
> +All other semantics of spinlock_t are preserved:
> +
> + - Migration of tasks which hold a spinlock_t is prevented. On a non
> +   PREEMPT_RT enabled kernel this is implicit due to preemption disable.
> +   PREEMPT_RT has a separate mechanism to achieve this. This ensures that
> +   pointers to per CPU variables stay valid even if the task is preempted.
> +
> + - Task state preservation. The task state is not affected when a lock is
> +   contended and the task has to schedule out and wait for the lock to
> +   become available. The lock wake up restores the task state unless there
> +   was a regular (not lock related) wake up on the task. This ensures that
> +   the task state rules are always correct independent of the kernel
> +   configuration.
> +
> +rwlock_t
> +========
> +
> +rwlock_t is a multiple readers and single writer lock mechanism.
> +
> +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning
> +lock and the suffix rules of spinlock_t apply accordingly. The
> +implementation is fair and prevents writer starvation.
>

You mentioned writer starvation, but I think it would be good to also mention
that rwlock_t on a non-PREEMPT_RT kernel also does not have _reader_
starvation problem, since it uses queued implementation.  This fact is worth
mentioning here, since further below you explain that an rwlock in PREEMPT_RT
does have reader starvation problem.

> +rwlock_t and PREEMPT_RT
> +-----------------------
> +
> +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate
> +implementation based on rt_mutex which changes the semantics:
> +
> + - Same changes as for spinlock_t
> +
> + - The implementation is not fair and can cause writer starvation under
> +   certain circumstances. The reason for this is that a writer cannot grant
> +   its priority to multiple readers. Readers which are blocked on a writer
> +   fully support the priority inheritance protocol.

Is it hard to give priority to multiple readers because the number of readers
to give priority to could be unbounded?

> +
> +
> +PREEMPT_RT caveats
> +==================
> +
> +spinlock_t and rwlock_t
> +-----------------------
> +
> +The substitution of spinlock_t and rwlock_t on PREEMPT_RT enabled kernels
> +with RT-mutex based implementations has a few implications.
> +
> +On a non PREEMPT_RT enabled kernel the following code construct is
> +perfectly fine::
> +
> +   local_irq_disable();
> +   spin_lock(&lock);
> +
> +and fully equivalent to::
> +
> +   spin_lock_irq(&lock);
> +
> +Same applies to rwlock_t and the _irqsave() suffix variant.
> +
> +On a PREEMPT_RT enabled kernel this breaks because the RT-mutex
> +substitution expects a fully preemptible context.
> +
> +The preferred solution is to use :c:func:`spin_lock_irq()` or
> +:c:func:`spin_lock_irqsave()` and their unlock counterparts.
> +
> +PREEMPT_RT also offers a local_lock mechanism to substitute the
> +local_irq_disable/save() constructs in cases where a separation of the
> +interrupt disabling and the locking is really unavoidable. This should be
> +restricted to very rare cases.

It would also be nice to mention where else local_lock() can be used, such as
protecting per-cpu variables without disabling preemption. Could we add a
section on protecting per-cpu data? (Happy to do that and send a patch if you
prefer).

> +raw_spinlock_t
> +--------------
> +
> +Locking of a raw_spinlock_t disables preemption and eventually interrupts.
> +Therefore code inside the critical region has to be careful to avoid calls
> +into code which takes a regular spinlock_t or rwlock_t. A prime example is
> +memory allocation.
> +
> +On a non PREEMPT_RT enabled kernel the following code construct is
> +perfectly fine code::
> +
> +  raw_spin_lock(&lock);
> +  p = kmalloc(sizeof(*p), GFP_ATOMIC);
> +
> +On a PREEMPT_RT enabled kernel this breaks because the memory allocator is
> +fully preemptible and therefore does not support allocations from truly
> +atomic contexts.
> +
> +Contrary to that the following code construct is perfectly fine on
> +PREEMPT_RT as spin_lock() does not disable preemption::
> +
> +  spin_lock(&lock);
> +  p = kmalloc(sizeof(*p), GFP_ATOMIC);
> +
> +Most places which use GFP_ATOMIC allocations are safe on PREEMPT_RT as the
> +execution is forced into thread context and the lock substitution is
> +ensuring preemptibility.
> +
> +
> +bit spinlocks
> +-------------
> +
> +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
> +substituted by an RT-mutex based implementation for obvious reasons.
> +
> +The semantics of bit spinlocks are preserved on a PREEMPT_RT enabled kernel
> +and the caveats vs. raw_spinlock_t apply.
> +
> +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
> +this requires conditional (#ifdef'ed) code changes at the usage side while
> +the spinlock_t substitution is simply done by the compiler and the
> +conditionals are restricted to header files and core implementation of the
> +locking primitives and the usage sites do not require any changes.
> +
> +
> +Lock type nesting rules
> +=======================
> +
> +The most basic rules are:
> +
> +  - Lock types of the same lock category (sleeping, spinning) can nest
> +    arbitrarily as long as they respect the general lock ordering rules to
> +    prevent deadlocks.
> +
> +  - Sleeping lock types cannot nest inside spinning lock types.
> +
> +  - Spinning lock types can nest inside sleeping lock types.
> +
> +These rules apply in general independent of CONFIG_PREEMPT_RT.
> +
> +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
> +spinning to sleeping this has obviously restrictions how they can nest with
> +raw_spinlock_t.
> +
> +This results in the following nest ordering:
> +
> +  1) Sleeping locks
> +  2) spinlock_t and rwlock_t
> +  3) raw_spinlock_t and bit spinlocks
> +
> +Lockdep is aware of these constraints to ensure that they are respected.
> +
> +
> +Owner semantics
> +===============
> +
> +Most lock types in the Linux kernel have strict owner semantics, i.e. the
> +context (task) which acquires a lock has to release it.
> +
> +There are two exceptions:
> +
> +  - semaphores
> +  - rwsems
> +
> +semaphores have no strict owner semantics for historical reasons. They are
> +often used for both serialization and waiting purposes. That's generally
> +discouraged and should be replaced by separate serialization and wait
> +mechanisms.
> +
> +rwsems have grown interfaces which allow non owner release for special
> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> +substitutes all locking primitives except semaphores with RT-mutex based
> +implementations to provide priority inheritance for all lock types except
> +the truly spinning ones. Priority inheritance on ownerless locks is
> +obviously impossible.
> +
> +For now the rwsem non-owner release excludes code which utilizes it from
> +being used on PREEMPT_RT enabled kernels.

I could not parse the last sentence here, but I think you meant "For now,
PREEMPT_RT enabled kernels disable code that perform a non-owner release of
an rwsem". Correct me if I'm wrong.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


> In same cases this can be
> +mitigated by disabling portions of the code, in other cases the complete
> +functionality has to be disabled until a workable solution has been found.
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-21 21:21   ` Joel Fernandes
@ 2020-03-21 21:49     ` Thomas Gleixner
  2020-03-22  1:36       ` Joel Fernandes
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Gleixner @ 2020-03-21 21:49 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

Joel Fernandes <joel@joelfernandes.org> writes:
>> +rwlock_t
>> +========
>> +
>> +rwlock_t is a multiple readers and single writer lock mechanism.
>> +
>> +On a non PREEMPT_RT enabled kernel rwlock_t is implemented as a spinning
>> +lock and the suffix rules of spinlock_t apply accordingly. The
>> +implementation is fair and prevents writer starvation.
>>
>
> You mentioned writer starvation, but I think it would be good to also mention
> that rwlock_t on a non-PREEMPT_RT kernel also does not have _reader_
> starvation problem, since it uses queued implementation.  This fact is worth
> mentioning here, since further below you explain that an rwlock in PREEMPT_RT
> does have reader starvation problem.

It's worth mentioning. But RT really has only write starvation not
reader starvation.

>> +rwlock_t and PREEMPT_RT
>> +-----------------------
>> +
>> +On a PREEMPT_RT enabled kernel rwlock_t is mapped to a separate
>> +implementation based on rt_mutex which changes the semantics:
>> +
>> + - Same changes as for spinlock_t
>> +
>> + - The implementation is not fair and can cause writer starvation under
>> +   certain circumstances. The reason for this is that a writer cannot grant
>> +   its priority to multiple readers. Readers which are blocked on a writer
>> +   fully support the priority inheritance protocol.
>
> Is it hard to give priority to multiple readers because the number of readers
> to give priority to could be unbounded?

Yes, and it's horribly complex and racy. We had an implemetation years
ago which taught us not to try it again :)

>> +PREEMPT_RT also offers a local_lock mechanism to substitute the
>> +local_irq_disable/save() constructs in cases where a separation of the
>> +interrupt disabling and the locking is really unavoidable. This should be
>> +restricted to very rare cases.
>
> It would also be nice to mention where else local_lock() can be used, such as
> protecting per-cpu variables without disabling preemption. Could we add a
> section on protecting per-cpu data? (Happy to do that and send a patch if you
> prefer).

The local lock section will come soon when we post the local lock
patches again.

>> +rwsems have grown interfaces which allow non owner release for special
>> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
>> +substitutes all locking primitives except semaphores with RT-mutex based
>> +implementations to provide priority inheritance for all lock types except
>> +the truly spinning ones. Priority inheritance on ownerless locks is
>> +obviously impossible.
>> +
>> +For now the rwsem non-owner release excludes code which utilizes it from
>> +being used on PREEMPT_RT enabled kernels.
>
> I could not parse the last sentence here, but I think you meant "For now,
> PREEMPT_RT enabled kernels disable code that perform a non-owner release of
> an rwsem". Correct me if I'm wrong.

Right, that's what I wanted to say :)

Care to send a delta patch?

Thanks!

        tglx

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
  2020-03-21 21:49     ` Thomas Gleixner
@ 2020-03-22  1:36       ` Joel Fernandes
  0 siblings, 0 replies; 72+ messages in thread
From: Joel Fernandes @ 2020-03-22  1:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Linus Torvalds, Ingo Molnar, Will Deacon,
	Paul E . McKenney, Steven Rostedt, Randy Dunlap,
	Sebastian Andrzej Siewior, Logan Gunthorpe, Kurt Schwemmer,
	Bjorn Helgaas, linux-pci, Felipe Balbi, Greg Kroah-Hartman,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Oleg Nesterov, Davidlohr Bueso, Michael Ellerman, Arnd Bergmann,
	linuxppc-dev

On Sat, Mar 21, 2020 at 10:49:04PM +0100, Thomas Gleixner wrote:
[...] 
> >> +rwsems have grown interfaces which allow non owner release for special
> >> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> >> +substitutes all locking primitives except semaphores with RT-mutex based
> >> +implementations to provide priority inheritance for all lock types except
> >> +the truly spinning ones. Priority inheritance on ownerless locks is
> >> +obviously impossible.
> >> +
> >> +For now the rwsem non-owner release excludes code which utilizes it from
> >> +being used on PREEMPT_RT enabled kernels.
> >
> > I could not parse the last sentence here, but I think you meant "For now,
> > PREEMPT_RT enabled kernels disable code that perform a non-owner release of
> > an rwsem". Correct me if I'm wrong.
> 
> Right, that's what I wanted to say :)
> 
> Care to send a delta patch?

Absolutely, doing that now. :-)

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait
  2020-03-20 12:54     ` Peter Zijlstra
@ 2020-03-22 16:33       ` Davidlohr Bueso
  2020-03-22 22:32         ` Peter Zijlstra
  0 siblings, 1 reply; 72+ messages in thread
From: Davidlohr Bueso @ 2020-03-22 16:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, arnd, balbi, bhelgaas, bigeasy, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, rdunlap, rostedt, torvalds, will, Paolo Bonzini,
	Davidlohr Bueso

On Fri, 20 Mar 2020, Peter Zijlstra wrote:

>On Fri, Mar 20, 2020 at 01:55:26AM -0700, Davidlohr Bueso wrote:
>> -	swait_event_interruptible_exclusive(*wq, ((!vcpu->arch.power_off) &&
>> -				       (!vcpu->arch.pause)));
>> +	rcuwait_wait_event(*wait,
>> +			   (!vcpu->arch.power_off) && (!vcpu->arch.pause),
>> +			   TASK_INTERRUPTIBLE);
>
>> -	for (;;) {
>> -		prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>> -
>> -		if (kvm_vcpu_check_block(vcpu) < 0)
>> -			break;
>> -
>> -		waited = true;
>> -		schedule();
>> -	}
>> -
>> -	finish_swait(&vcpu->wq, &wait);
>> +	rcuwait_wait_event(&vcpu->wait,
>> +			   (block_check = kvm_vcpu_check_block(vcpu)) < 0,
>> +			   TASK_INTERRUPTIBLE);
>
>Are these yet more instances that really want to be TASK_IDLE ?

Hmm probably as it makes sense for a blocked vcpu not to be contributing to
the loadavg. So if this is the only reason to use interruptible, then yes we
ought to change it.

However, I'll make this a separate patch, given this (ab)use isn't as obvious
as the PS3 case, which is a kthread and therefore signals are masked.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait
  2020-03-22 16:33       ` Davidlohr Bueso
@ 2020-03-22 22:32         ` Peter Zijlstra
  0 siblings, 0 replies; 72+ messages in thread
From: Peter Zijlstra @ 2020-03-22 22:32 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: tglx, arnd, balbi, bhelgaas, bigeasy, davem, gregkh, joel,
	kurt.schwemmer, kvalo, linux-kernel, linux-pci, linux-usb,
	linux-wireless, linuxppc-dev, logang, mingo, mpe, netdev, oleg,
	paulmck, rdunlap, rostedt, torvalds, will, Paolo Bonzini,
	Davidlohr Bueso

On Sun, Mar 22, 2020 at 09:33:17AM -0700, Davidlohr Bueso wrote:
> On Fri, 20 Mar 2020, Peter Zijlstra wrote:
> 
> > On Fri, Mar 20, 2020 at 01:55:26AM -0700, Davidlohr Bueso wrote:
> > > -	swait_event_interruptible_exclusive(*wq, ((!vcpu->arch.power_off) &&
> > > -				       (!vcpu->arch.pause)));
> > > +	rcuwait_wait_event(*wait,
> > > +			   (!vcpu->arch.power_off) && (!vcpu->arch.pause),
> > > +			   TASK_INTERRUPTIBLE);
> > 
> > > -	for (;;) {
> > > -		prepare_to_swait_exclusive(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> > > -
> > > -		if (kvm_vcpu_check_block(vcpu) < 0)
> > > -			break;
> > > -
> > > -		waited = true;
> > > -		schedule();
> > > -	}
> > > -
> > > -	finish_swait(&vcpu->wq, &wait);
> > > +	rcuwait_wait_event(&vcpu->wait,
> > > +			   (block_check = kvm_vcpu_check_block(vcpu)) < 0,
> > > +			   TASK_INTERRUPTIBLE);
> > 
> > Are these yet more instances that really want to be TASK_IDLE ?
> 
> Hmm probably as it makes sense for a blocked vcpu not to be contributing to
> the loadavg. So if this is the only reason to use interruptible, then yes we
> ought to change it.
> 
> However, I'll make this a separate patch, given this (ab)use isn't as obvious
> as the PS3 case, which is a kthread and therefore signals are masked.

The thing that was a dead give-away was that the return value of the
interruptible wait wasn't used.

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2020-03-22 22:34 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
2020-03-18 20:43 ` [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait() Thomas Gleixner
2020-03-18 21:25   ` Bjorn Helgaas
2020-03-18 20:43 ` [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll Thomas Gleixner
2020-03-18 21:26   ` Bjorn Helgaas
2020-03-18 22:11   ` Logan Gunthorpe
2020-03-18 20:43 ` [patch V2 03/15] usb: gadget: Use completion interface instead of open coding it Thomas Gleixner
2020-03-19  8:41   ` Greg Kroah-Hartman
2020-03-18 20:43 ` [patch V2 04/15] orinoco_usb: Use the regular completion interfaces Thomas Gleixner
2020-03-19  8:40   ` Greg Kroah-Hartman
2020-03-18 20:43 ` [patch V2 05/15] acpi: Remove header dependency Thomas Gleixner
2020-03-18 20:43 ` [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event() Thomas Gleixner
2020-03-20  5:36   ` Davidlohr Bueso
2020-03-20  8:45     ` Sebastian Andrzej Siewior
2020-03-20  8:58       ` Davidlohr Bueso
2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
2020-03-20  9:48     ` [PATCH 1/5] nds32: Remove mm.h from asm/uaccess.h Sebastian Andrzej Siewior
2020-03-20  9:48     ` [PATCH 2/5] csky: " Sebastian Andrzej Siewior
2020-03-21 11:24       ` Guo Ren
2020-03-21 12:08         ` Thomas Gleixner
2020-03-21 14:11           ` Guo Ren
2020-03-20  9:48     ` [PATCH 3/5] hexagon: " Sebastian Andrzej Siewior
2020-03-20  9:48     ` [PATCH 4/5] ia64: " Sebastian Andrzej Siewior
2020-03-20  9:48     ` [PATCH 5/5] microblaze: " Sebastian Andrzej Siewior
2020-03-18 20:43 ` [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait Thomas Gleixner
2020-03-19  9:00   ` Sebastian Andrzej Siewior
2020-03-19  9:18     ` Peter Zijlstra
2020-03-19  9:21   ` Davidlohr Bueso
2020-03-19 10:04   ` Christoph Hellwig
2020-03-19 10:26     ` Sebastian Andrzej Siewior
2020-03-20  0:01       ` Geoff Levand
2020-03-20  0:45       ` Michael Ellerman
2020-03-21 10:41     ` Thomas Gleixner
2020-03-18 20:43 ` [patch V2 08/15] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
2020-03-18 22:31   ` Paul E. McKenney
2020-03-19 18:02     ` Thomas Gleixner
2020-03-20 16:01       ` Paul E. McKenney
2020-03-20 19:51         ` Thomas Gleixner
2020-03-20 21:02           ` Paul E. McKenney
2020-03-20 22:36             ` Thomas Gleixner
2020-03-21  2:29               ` Paul E. McKenney
2020-03-21 10:26                 ` Thomas Gleixner
2020-03-21 17:23                   ` Paul E. McKenney
2020-03-19  8:51   ` Davidlohr Bueso
2020-03-19 15:04   ` Jonathan Corbet
2020-03-19 18:04     ` Thomas Gleixner
2020-03-21 21:21   ` Joel Fernandes
2020-03-21 21:49     ` Thomas Gleixner
2020-03-22  1:36       ` Joel Fernandes
2020-03-18 20:43 ` [patch V2 09/15] timekeeping: Split jiffies seqlock Thomas Gleixner
2020-03-18 20:43 ` [patch V2 10/15] sched/swait: Prepare usage in completions Thomas Gleixner
2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
2020-03-18 22:28   ` Logan Gunthorpe
2020-03-19  0:33   ` Joel Fernandes
2020-03-19  0:44     ` Thomas Gleixner
2020-03-19  8:42   ` Greg Kroah-Hartman
2020-03-19 17:12   ` Linus Torvalds
2020-03-19 23:25   ` Julian Calaby
2020-03-20  6:59     ` Christoph Hellwig
2020-03-20  9:01   ` Davidlohr Bueso
2020-03-20  8:50 ` [patch V2 00/15] Lock ordering documentation and annotation for lockdep Davidlohr Bueso
2020-03-20  8:55 ` [PATCH 16/15] rcuwait: Get rid of stale name comment Davidlohr Bueso
2020-03-20  8:55   ` [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted Davidlohr Bueso
2020-03-20  9:13     ` Sebastian Andrzej Siewior
2020-03-20 10:44     ` Peter Zijlstra
2020-03-20  8:55   ` [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait Davidlohr Bueso
2020-03-20 11:20     ` Paolo Bonzini
2020-03-20 12:54     ` Peter Zijlstra
2020-03-22 16:33       ` Davidlohr Bueso
2020-03-22 22:32         ` Peter Zijlstra
2020-03-20  8:55   ` [PATCH 19/15] sched/swait: Reword some of the main description Davidlohr Bueso
2020-03-20  9:19     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).