All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 11:25 ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

This is the third and hopefully final version of this work. The second one
can be found here:

   https://lore.kernel.org/r/20200318204302.693307984@linutronix.de

Changes since V2:

  - Included the arch/XXX fixups for the rcuwait changes (Sebastian)

  - Folded the init fix for the PS3 change (Sebastian)

  - Addressed feedback on documentation (Paul, Davidlohr, Jonathan)

  - Picked up acks and reviewed tags

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 11:25 ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Greg Kroah-Hartman, Felipe Balbi,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Kalle Valo, David S. Miller,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Darren Hart, Andy Shevchenko,
	platform-drive

This is the third and hopefully final version of this work. The second one
can be found here:

   https://lore.kernel.org/r/20200318204302.693307984-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org

Changes since V2:

  - Included the arch/XXX fixups for the rcuwait changes (Sebastian)

  - Folded the init fix for the PS3 change (Sebastian)

  - Addressed feedback on documentation (Paul, Davidlohr, Jonathan)

  - Picked up acks and reviewed tags

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 11:25 ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

This is the third and hopefully final version of this work. The second one
can be found here:

   https://lore.kernel.org/r/20200318204302.693307984@linutronix.de

Changes since V2:

  - Included the arch/XXX fixups for the rcuwait changes (Sebastian)

  - Folded the init fix for the PS3 change (Sebastian)

  - Addressed feedback on documentation (Paul, Davidlohr, Jonathan)

  - Picked up acks and reviewed tags

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 11:25 ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

This is the third and hopefully final version of this work. The second one
can be found here:

   https://lore.kernel.org/r/20200318204302.693307984@linutronix.de

Changes since V2:

  - Included the arch/XXX fixups for the rcuwait changes (Sebastian)

  - Folded the init fix for the PS3 change (Sebastian)

  - Addressed feedback on documentation (Paul, Davidlohr, Jonathan)

  - Picked up acks and reviewed tags

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 01/20] PCI/switchtec: Fix init_completion race condition with poll_wait()
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Logan Gunthorpe <logang@deltatee.com>

The call to init_completion() in mrpc_queue_cmd() can theoretically
race with the call to poll_wait() in switchtec_dev_poll().

  poll()			write()
    switchtec_dev_poll()   	  switchtec_dev_write()
      poll_wait(&s->comp.wait);      mrpc_queue_cmd()
			               init_completion(&s->comp)
				         init_waitqueue_head(&s->comp.wait)

To my knowledge, no one has hit this bug.

Fix this by using reinit_completion() instead of init_completion() in
mrpc_queue_cmd().

Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: linux-pci@vger.kernel.org
Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com

---
 drivers/pci/switch/switchtec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index a823b4b8ef8a..81dc7ac01381 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	init_completion(&stuser->comp);
+	reinit_completion(&stuser->comp);
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
-- 
2.20.1




^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 01/20] PCI/switchtec: Fix init_completion race condition with poll_wait()
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Logan Gunthorpe <logang@deltatee.com>

The call to init_completion() in mrpc_queue_cmd() can theoretically
race with the call to poll_wait() in switchtec_dev_poll().

  poll()			write()
    switchtec_dev_poll()   	  switchtec_dev_write()
      poll_wait(&s->comp.wait);      mrpc_queue_cmd()
			               init_completion(&s->comp)
				         init_waitqueue_head(&s->comp.wait)

To my knowledge, no one has hit this bug.

Fix this by using reinit_completion() instead of init_completion() in
mrpc_queue_cmd().

Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: linux-pci@vger.kernel.org
Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com

---
 drivers/pci/switch/switchtec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index a823b4b8ef8a..81dc7ac01381 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	init_completion(&stuser->comp);
+	reinit_completion(&stuser->comp);
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 01/20] PCI/switchtec: Fix init_completion race condition with poll_wait()
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Logan Gunthorpe <logang@deltatee.com>

The call to init_completion() in mrpc_queue_cmd() can theoretically
race with the call to poll_wait() in switchtec_dev_poll().

  poll()			write()
    switchtec_dev_poll()   	  switchtec_dev_write()
      poll_wait(&s->comp.wait);      mrpc_queue_cmd()
			               init_completion(&s->comp)
				         init_waitqueue_head(&s->comp.wait)

To my knowledge, no one has hit this bug.

Fix this by using reinit_completion() instead of init_completion() in
mrpc_queue_cmd().

Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: linux-pci@vger.kernel.org
Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com

---
 drivers/pci/switch/switchtec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index a823b4b8ef8a..81dc7ac01381 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	init_completion(&stuser->comp);
+	reinit_completion(&stuser->comp);
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
-- 
2.20.1




^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 01/20] PCI/switchtec: Fix init_completion race condition with poll_wait()
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Logan Gunthorpe <logang@deltatee.com>

The call to init_completion() in mrpc_queue_cmd() can theoretically
race with the call to poll_wait() in switchtec_dev_poll().

  poll()			write()
    switchtec_dev_poll()   	  switchtec_dev_write()
      poll_wait(&s->comp.wait);      mrpc_queue_cmd()
			               init_completion(&s->comp)
				         init_waitqueue_head(&s->comp.wait)

To my knowledge, no one has hit this bug.

Fix this by using reinit_completion() instead of init_completion() in
mrpc_queue_cmd().

Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: linux-pci@vger.kernel.org
Link: https://lkml.kernel.org/r/20200313183608.2646-1-logang@deltatee.com

---
 drivers/pci/switch/switchtec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index a823b4b8ef8a..81dc7ac01381 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -175,7 +175,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	init_completion(&stuser->comp);
+	reinit_completion(&stuser->comp);
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 02/20] pci/switchtec: Replace completion wait queue usage for poll
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The poll callback is using the completion wait queue and sticks it into
poll_wait() to wake up pollers after a command has completed.

This works to some extent, but cannot provide EPOLLEXCLUSIVE support
because the waker side uses complete_all() which unconditionally wakes up
all waiters. complete_all() is required because completions internally use
exclusive wait and complete() only wakes up one waiter by default.

This mixes conceptually different mechanisms and relies on internal
implementation details of completions, which in turn puts contraints on
changing the internal implementation of completions.

Replace it with a regular wait queue and store the state in struct
switchtec_user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: linux-pci@vger.kernel.org
---
V2: Reworded changelog.
---
 drivers/pci/switch/switchtec.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -52,10 +52,11 @@ struct switchtec_user {
 
 	enum mrpc_state state;
 
-	struct completion comp;
+	wait_queue_head_t cmd_comp;
 	struct kref kref;
 	struct list_head list;
 
+	bool cmd_done;
 	u32 cmd;
 	u32 status;
 	u32 return_code;
@@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
 	stuser->stdev = stdev;
 	kref_init(&stuser->kref);
 	INIT_LIST_HEAD(&stuser->list);
-	init_completion(&stuser->comp);
+	init_waitqueue_head(&stuser->cmd_comp);
 	stuser->event_cnt = atomic_read(&stdev->event_cnt);
 
 	dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
@@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	reinit_completion(&stuser->comp);
+	stuser->cmd_done = false;
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
@@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
 		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
 			      stuser->read_len);
 out:
-	complete_all(&stuser->comp);
+	stuser->cmd_done = true;
+	wake_up_interruptible(&stuser->cmd_comp);
 	list_del_init(&stuser->list);
 	stuser_put(stuser);
 	stdev->mrpc_busy = 0;
@@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
 	mutex_unlock(&stdev->mrpc_mutex);
 
 	if (filp->f_flags & O_NONBLOCK) {
-		if (!try_wait_for_completion(&stuser->comp))
+		if (!stuser->cmd_done)
 			return -EAGAIN;
 	} else {
-		rc = wait_for_completion_interruptible(&stuser->comp);
+		rc = wait_event_interruptible(stuser->cmd_comp,
+					      stuser->cmd_done);
 		if (rc < 0)
 			return rc;
 	}
@@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
 	struct switchtec_dev *stdev = stuser->stdev;
 	__poll_t ret = 0;
 
-	poll_wait(filp, &stuser->comp.wait, wait);
+	poll_wait(filp, &stuser->cmd_comp, wait);
 	poll_wait(filp, &stdev->event_wq, wait);
 
 	if (lock_mutex_and_test_alive(stdev))
@@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
 
 	mutex_unlock(&stdev->mrpc_mutex);
 
-	if (try_wait_for_completion(&stuser->comp))
+	if (stuser->cmd_done)
 		ret |= EPOLLIN | EPOLLRDNORM;
 
 	if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
@@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
 
 	/* Wake up and kill any users waiting on an MRPC request */
 	list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
-		complete_all(&stuser->comp);
+		stuser->cmd_done = true;
+		wake_up_interruptible(&stuser->cmd_comp);
 		list_del_init(&stuser->list);
 		stuser_put(stuser);
 	}



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 02/20] pci/switchtec: Replace completion wait queue usage for poll
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Greg Kroah-Hartman, Felipe Balbi,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Kalle Valo, David S. Miller,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Darren Hart, Andy Shevchenko,
	platform-drive

From: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>

The poll callback is using the completion wait queue and sticks it into
poll_wait() to wake up pollers after a command has completed.

This works to some extent, but cannot provide EPOLLEXCLUSIVE support
because the waker side uses complete_all() which unconditionally wakes up
all waiters. complete_all() is required because completions internally use
exclusive wait and complete() only wakes up one waiter by default.

This mixes conceptually different mechanisms and relies on internal
implementation details of completions, which in turn puts contraints on
changing the internal implementation of completions.

Replace it with a regular wait queue and store the state in struct
switchtec_user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Reviewed-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
Acked-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Acked-by: Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Kurt Schwemmer <kurt.schwemmer-dzo6w/eZyo2tG0bUXCXiUA@public.gmane.org>
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
V2: Reworded changelog.
---
 drivers/pci/switch/switchtec.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -52,10 +52,11 @@ struct switchtec_user {
 
 	enum mrpc_state state;
 
-	struct completion comp;
+	wait_queue_head_t cmd_comp;
 	struct kref kref;
 	struct list_head list;
 
+	bool cmd_done;
 	u32 cmd;
 	u32 status;
 	u32 return_code;
@@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
 	stuser->stdev = stdev;
 	kref_init(&stuser->kref);
 	INIT_LIST_HEAD(&stuser->list);
-	init_completion(&stuser->comp);
+	init_waitqueue_head(&stuser->cmd_comp);
 	stuser->event_cnt = atomic_read(&stdev->event_cnt);
 
 	dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
@@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	reinit_completion(&stuser->comp);
+	stuser->cmd_done = false;
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
@@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
 		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
 			      stuser->read_len);
 out:
-	complete_all(&stuser->comp);
+	stuser->cmd_done = true;
+	wake_up_interruptible(&stuser->cmd_comp);
 	list_del_init(&stuser->list);
 	stuser_put(stuser);
 	stdev->mrpc_busy = 0;
@@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
 	mutex_unlock(&stdev->mrpc_mutex);
 
 	if (filp->f_flags & O_NONBLOCK) {
-		if (!try_wait_for_completion(&stuser->comp))
+		if (!stuser->cmd_done)
 			return -EAGAIN;
 	} else {
-		rc = wait_for_completion_interruptible(&stuser->comp);
+		rc = wait_event_interruptible(stuser->cmd_comp,
+					      stuser->cmd_done);
 		if (rc < 0)
 			return rc;
 	}
@@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
 	struct switchtec_dev *stdev = stuser->stdev;
 	__poll_t ret = 0;
 
-	poll_wait(filp, &stuser->comp.wait, wait);
+	poll_wait(filp, &stuser->cmd_comp, wait);
 	poll_wait(filp, &stdev->event_wq, wait);
 
 	if (lock_mutex_and_test_alive(stdev))
@@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
 
 	mutex_unlock(&stdev->mrpc_mutex);
 
-	if (try_wait_for_completion(&stuser->comp))
+	if (stuser->cmd_done)
 		ret |= EPOLLIN | EPOLLRDNORM;
 
 	if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
@@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
 
 	/* Wake up and kill any users waiting on an MRPC request */
 	list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
-		complete_all(&stuser->comp);
+		stuser->cmd_done = true;
+		wake_up_interruptible(&stuser->cmd_comp);
 		list_del_init(&stuser->list);
 		stuser_put(stuser);
 	}

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 02/20] pci/switchtec: Replace completion wait queue usage for poll
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The poll callback is using the completion wait queue and sticks it into
poll_wait() to wake up pollers after a command has completed.

This works to some extent, but cannot provide EPOLLEXCLUSIVE support
because the waker side uses complete_all() which unconditionally wakes up
all waiters. complete_all() is required because completions internally use
exclusive wait and complete() only wakes up one waiter by default.

This mixes conceptually different mechanisms and relies on internal
implementation details of completions, which in turn puts contraints on
changing the internal implementation of completions.

Replace it with a regular wait queue and store the state in struct
switchtec_user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: linux-pci@vger.kernel.org
---
V2: Reworded changelog.
---
 drivers/pci/switch/switchtec.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -52,10 +52,11 @@ struct switchtec_user {
 
 	enum mrpc_state state;
 
-	struct completion comp;
+	wait_queue_head_t cmd_comp;
 	struct kref kref;
 	struct list_head list;
 
+	bool cmd_done;
 	u32 cmd;
 	u32 status;
 	u32 return_code;
@@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
 	stuser->stdev = stdev;
 	kref_init(&stuser->kref);
 	INIT_LIST_HEAD(&stuser->list);
-	init_completion(&stuser->comp);
+	init_waitqueue_head(&stuser->cmd_comp);
 	stuser->event_cnt = atomic_read(&stdev->event_cnt);
 
 	dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
@@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	reinit_completion(&stuser->comp);
+	stuser->cmd_done = false;
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
@@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
 		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
 			      stuser->read_len);
 out:
-	complete_all(&stuser->comp);
+	stuser->cmd_done = true;
+	wake_up_interruptible(&stuser->cmd_comp);
 	list_del_init(&stuser->list);
 	stuser_put(stuser);
 	stdev->mrpc_busy = 0;
@@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
 	mutex_unlock(&stdev->mrpc_mutex);
 
 	if (filp->f_flags & O_NONBLOCK) {
-		if (!try_wait_for_completion(&stuser->comp))
+		if (!stuser->cmd_done)
 			return -EAGAIN;
 	} else {
-		rc = wait_for_completion_interruptible(&stuser->comp);
+		rc = wait_event_interruptible(stuser->cmd_comp,
+					      stuser->cmd_done);
 		if (rc < 0)
 			return rc;
 	}
@@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
 	struct switchtec_dev *stdev = stuser->stdev;
 	__poll_t ret = 0;
 
-	poll_wait(filp, &stuser->comp.wait, wait);
+	poll_wait(filp, &stuser->cmd_comp, wait);
 	poll_wait(filp, &stdev->event_wq, wait);
 
 	if (lock_mutex_and_test_alive(stdev))
@@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
 
 	mutex_unlock(&stdev->mrpc_mutex);
 
-	if (try_wait_for_completion(&stuser->comp))
+	if (stuser->cmd_done)
 		ret |= EPOLLIN | EPOLLRDNORM;
 
 	if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
@@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
 
 	/* Wake up and kill any users waiting on an MRPC request */
 	list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
-		complete_all(&stuser->comp);
+		stuser->cmd_done = true;
+		wake_up_interruptible(&stuser->cmd_comp);
 		list_del_init(&stuser->list);
 		stuser_put(stuser);
 	}



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 02/20] pci/switchtec: Replace completion wait queue usage for poll
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The poll callback is using the completion wait queue and sticks it into
poll_wait() to wake up pollers after a command has completed.

This works to some extent, but cannot provide EPOLLEXCLUSIVE support
because the waker side uses complete_all() which unconditionally wakes up
all waiters. complete_all() is required because completions internally use
exclusive wait and complete() only wakes up one waiter by default.

This mixes conceptually different mechanisms and relies on internal
implementation details of completions, which in turn puts contraints on
changing the internal implementation of completions.

Replace it with a regular wait queue and store the state in struct
switchtec_user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: linux-pci@vger.kernel.org
---
V2: Reworded changelog.
---
 drivers/pci/switch/switchtec.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -52,10 +52,11 @@ struct switchtec_user {
 
 	enum mrpc_state state;
 
-	struct completion comp;
+	wait_queue_head_t cmd_comp;
 	struct kref kref;
 	struct list_head list;
 
+	bool cmd_done;
 	u32 cmd;
 	u32 status;
 	u32 return_code;
@@ -77,7 +78,7 @@ static struct switchtec_user *stuser_cre
 	stuser->stdev = stdev;
 	kref_init(&stuser->kref);
 	INIT_LIST_HEAD(&stuser->list);
-	init_completion(&stuser->comp);
+	init_waitqueue_head(&stuser->cmd_comp);
 	stuser->event_cnt = atomic_read(&stdev->event_cnt);
 
 	dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
@@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switcht
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	reinit_completion(&stuser->comp);
+	stuser->cmd_done = false;
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
@@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct swi
 		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
 			      stuser->read_len);
 out:
-	complete_all(&stuser->comp);
+	stuser->cmd_done = true;
+	wake_up_interruptible(&stuser->cmd_comp);
 	list_del_init(&stuser->list);
 	stuser_put(stuser);
 	stdev->mrpc_busy = 0;
@@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct
 	mutex_unlock(&stdev->mrpc_mutex);
 
 	if (filp->f_flags & O_NONBLOCK) {
-		if (!try_wait_for_completion(&stuser->comp))
+		if (!stuser->cmd_done)
 			return -EAGAIN;
 	} else {
-		rc = wait_for_completion_interruptible(&stuser->comp);
+		rc = wait_event_interruptible(stuser->cmd_comp,
+					      stuser->cmd_done);
 		if (rc < 0)
 			return rc;
 	}
@@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struc
 	struct switchtec_dev *stdev = stuser->stdev;
 	__poll_t ret = 0;
 
-	poll_wait(filp, &stuser->comp.wait, wait);
+	poll_wait(filp, &stuser->cmd_comp, wait);
 	poll_wait(filp, &stdev->event_wq, wait);
 
 	if (lock_mutex_and_test_alive(stdev))
@@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struc
 
 	mutex_unlock(&stdev->mrpc_mutex);
 
-	if (try_wait_for_completion(&stuser->comp))
+	if (stuser->cmd_done)
 		ret |= EPOLLIN | EPOLLRDNORM;
 
 	if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
@@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_
 
 	/* Wake up and kill any users waiting on an MRPC request */
 	list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
-		complete_all(&stuser->comp);
+		stuser->cmd_done = true;
+		wake_up_interruptible(&stuser->cmd_comp);
 		list_del_init(&stuser->list);
 		stuser_put(stuser);
 	}


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

ep_io() uses a completion on stack and open codes the waiting with:

  wait_event_interruptible (done.wait, done.done);
and
  wait_event (done.wait, done.done);

This waits in non-exclusive mode for complete(), but there is no reason to
do so because the completion can only be waited for by the task itself and
complete() wakes exactly one exlusive waiter.

Replace the open coded implementation with the corresponding
wait_for_completion*() functions.

No functional change.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid the conversion to swait interfaces later
---
 drivers/usb/gadget/legacy/inode.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf
 	spin_unlock_irq (&epdata->dev->lock);
 
 	if (likely (value == 0)) {
-		value = wait_event_interruptible (done.wait, done.done);
+		value = wait_for_completion_interruptible(&done);
 		if (value != 0) {
 			spin_lock_irq (&epdata->dev->lock);
 			if (likely (epdata->ep != NULL)) {
@@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf
 				usb_ep_dequeue (epdata->ep, epdata->req);
 				spin_unlock_irq (&epdata->dev->lock);
 
-				wait_event (done.wait, done.done);
+				wait_for_completion(&done);
 				if (epdata->status == -ECONNRESET)
 					epdata->status = -EINTR;
 			} else {



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86

From: Thomas Gleixner <tglx@linutronix.de>

ep_io() uses a completion on stack and open codes the waiting with:

  wait_event_interruptible (done.wait, done.done);
and
  wait_event (done.wait, done.done);

This waits in non-exclusive mode for complete(), but there is no reason to
do so because the completion can only be waited for by the task itself and
complete() wakes exactly one exlusive waiter.

Replace the open coded implementation with the corresponding
wait_for_completion*() functions.

No functional change.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid the conversion to swait interfaces later
---
 drivers/usb/gadget/legacy/inode.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf
 	spin_unlock_irq (&epdata->dev->lock);
 
 	if (likely (value == 0)) {
-		value = wait_event_interruptible (done.wait, done.done);
+		value = wait_for_completion_interruptible(&done);
 		if (value != 0) {
 			spin_lock_irq (&epdata->dev->lock);
 			if (likely (epdata->ep != NULL)) {
@@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf
 				usb_ep_dequeue (epdata->ep, epdata->req);
 				spin_unlock_irq (&epdata->dev->lock);
 
-				wait_event (done.wait, done.done);
+				wait_for_completion(&done);
 				if (epdata->status == -ECONNRESET)
 					epdata->status = -EINTR;
 			} else {

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Thomas Gleixner <tglx@linutronix.de>

ep_io() uses a completion on stack and open codes the waiting with:

  wait_event_interruptible (done.wait, done.done);
and
  wait_event (done.wait, done.done);

This waits in non-exclusive mode for complete(), but there is no reason to
do so because the completion can only be waited for by the task itself and
complete() wakes exactly one exlusive waiter.

Replace the open coded implementation with the corresponding
wait_for_completion*() functions.

No functional change.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid the conversion to swait interfaces later
---
 drivers/usb/gadget/legacy/inode.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf
 	spin_unlock_irq (&epdata->dev->lock);
 
 	if (likely (value == 0)) {
-		value = wait_event_interruptible (done.wait, done.done);
+		value = wait_for_completion_interruptible(&done);
 		if (value != 0) {
 			spin_lock_irq (&epdata->dev->lock);
 			if (likely (epdata->ep != NULL)) {
@@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf
 				usb_ep_dequeue (epdata->ep, epdata->req);
 				spin_unlock_irq (&epdata->dev->lock);
 
-				wait_event (done.wait, done.done);
+				wait_for_completion(&done);
 				if (epdata->status == -ECONNRESET)
 					epdata->status = -EINTR;
 			} else {



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

ep_io() uses a completion on stack and open codes the waiting with:

  wait_event_interruptible (done.wait, done.done);
and
  wait_event (done.wait, done.done);

This waits in non-exclusive mode for complete(), but there is no reason to
do so because the completion can only be waited for by the task itself and
complete() wakes exactly one exlusive waiter.

Replace the open coded implementation with the corresponding
wait_for_completion*() functions.

No functional change.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid the conversion to swait interfaces later
---
 drivers/usb/gadget/legacy/inode.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf
 	spin_unlock_irq (&epdata->dev->lock);
 
 	if (likely (value = 0)) {
-		value = wait_event_interruptible (done.wait, done.done);
+		value = wait_for_completion_interruptible(&done);
 		if (value != 0) {
 			spin_lock_irq (&epdata->dev->lock);
 			if (likely (epdata->ep != NULL)) {
@@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf
 				usb_ep_dequeue (epdata->ep, epdata->req);
 				spin_unlock_irq (&epdata->dev->lock);
 
-				wait_event (done.wait, done.done);
+				wait_for_completion(&done);
 				if (epdata->status = -ECONNRESET)
 					epdata->status = -EINTR;
 			} else {


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 04/20] orinoco_usb: Use the regular completion interfaces
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, Kalle Valo, David S. Miller, linux-wireless,
	netdev, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

The completion usage in this driver is interesting:

  - it uses a magic complete function which according to the comment was
    implemented by invoking complete() four times in a row because
    complete_all() was not exported at that time.

  - it uses an open coded wait/poll which checks completion:done. Only one wait
    side (device removal) uses the regular wait_for_completion() interface.

The rationale behind this is to prevent that wait_for_completion() consumes
completion::done which would prevent that all waiters are woken. This is not
necessary with complete_all() as that sets completion::done to UINT_MAX which
is left unmodified by the woken waiters.

Replace the magic complete function with complete_all() and convert the
open coded wait/poll to regular completion interfaces.

This changes the wait to exclusive wait mode. But that does not make any
difference because the wakers use complete_all() which ignores the
exclusive mode.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid conversion to swait functions later.
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -365,17 +365,6 @@ static struct request_context *ezusb_all
 	return ctx;
 }
 
-
-/* Hopefully the real complete_all will soon be exported, in the mean
- * while this should work. */
-static inline void ezusb_complete_all(struct completion *comp)
-{
-	complete(comp);
-	complete(comp);
-	complete(comp);
-	complete(comp);
-}
-
 static void ezusb_ctx_complete(struct request_context *ctx)
 {
 	struct ezusb_priv *upriv = ctx->upriv;
@@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct re
 
 			netif_wake_queue(dev);
 		}
-		ezusb_complete_all(&ctx->done);
+		complete_all(&ctx->done);
 		ezusb_request_context_put(ctx);
 		break;
 
@@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct re
 			/* This is normal, as all request contexts get flushed
 			 * when the device is disconnected */
 			err("Called, CTX not terminating, but device gone");
-			ezusb_complete_all(&ctx->done);
+			complete_all(&ctx->done);
 			ezusb_request_context_put(ctx);
 			break;
 		}
@@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ez
 			 * get the chance to run themselves. So we make sure
 			 * that we don't sleep for ever */
 			int msecs = DEF_TIMEOUT * (1000 / HZ);
-			while (!ctx->done.done && msecs--)
+
+			while (!try_wait_for_completion(&ctx->done) && msecs--)
 				udelay(1000);
 		} else {
-			wait_event_interruptible(ctx->done.wait,
-						 ctx->done.done);
+			wait_for_completion(&ctx->done);
 		}
 		break;
 	default:



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 04/20] orinoco_usb: Use the regular completion interfaces
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, Kalle Valo, David S. Miller, linux-wireless,
	netdev, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, Darren Hart,
	Andy Shevchenko, platform-driver-x86

From: Thomas Gleixner <tglx@linutronix.de>

The completion usage in this driver is interesting:

  - it uses a magic complete function which according to the comment was
    implemented by invoking complete() four times in a row because
    complete_all() was not exported at that time.

  - it uses an open coded wait/poll which checks completion:done. Only one wait
    side (device removal) uses the regular wait_for_completion() interface.

The rationale behind this is to prevent that wait_for_completion() consumes
completion::done which would prevent that all waiters are woken. This is not
necessary with complete_all() as that sets completion::done to UINT_MAX which
is left unmodified by the woken waiters.

Replace the magic complete function with complete_all() and convert the
open coded wait/poll to regular completion interfaces.

This changes the wait to exclusive wait mode. But that does not make any
difference because the wakers use complete_all() which ignores the
exclusive mode.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid conversion to swait functions later.
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -365,17 +365,6 @@ static struct request_context *ezusb_all
 	return ctx;
 }
 
-
-/* Hopefully the real complete_all will soon be exported, in the mean
- * while this should work. */
-static inline void ezusb_complete_all(struct completion *comp)
-{
-	complete(comp);
-	complete(comp);
-	complete(comp);
-	complete(comp);
-}
-
 static void ezusb_ctx_complete(struct request_context *ctx)
 {
 	struct ezusb_priv *upriv = ctx->upriv;
@@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct re
 
 			netif_wake_queue(dev);
 		}
-		ezusb_complete_all(&ctx->done);
+		complete_all(&ctx->done);
 		ezusb_request_context_put(ctx);
 		break;
 
@@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct re
 			/* This is normal, as all request contexts get flushed
 			 * when the device is disconnected */
 			err("Called, CTX not terminating, but device gone");
-			ezusb_complete_all(&ctx->done);
+			complete_all(&ctx->done);
 			ezusb_request_context_put(ctx);
 			break;
 		}
@@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ez
 			 * get the chance to run themselves. So we make sure
 			 * that we don't sleep for ever */
 			int msecs = DEF_TIMEOUT * (1000 / HZ);
-			while (!ctx->done.done && msecs--)
+
+			while (!try_wait_for_completion(&ctx->done) && msecs--)
 				udelay(1000);
 		} else {
-			wait_event_interruptible(ctx->done.wait,
-						 ctx->done.done);
+			wait_for_completion(&ctx->done);
 		}
 		break;
 	default:

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 04/20] orinoco_usb: Use the regular completion interfaces
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Thomas Gleixner <tglx@linutronix.de>

The completion usage in this driver is interesting:

  - it uses a magic complete function which according to the comment was
    implemented by invoking complete() four times in a row because
    complete_all() was not exported at that time.

  - it uses an open coded wait/poll which checks completion:done. Only one wait
    side (device removal) uses the regular wait_for_completion() interface.

The rationale behind this is to prevent that wait_for_completion() consumes
completion::done which would prevent that all waiters are woken. This is not
necessary with complete_all() as that sets completion::done to UINT_MAX which
is left unmodified by the woken waiters.

Replace the magic complete function with complete_all() and convert the
open coded wait/poll to regular completion interfaces.

This changes the wait to exclusive wait mode. But that does not make any
difference because the wakers use complete_all() which ignores the
exclusive mode.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid conversion to swait functions later.
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -365,17 +365,6 @@ static struct request_context *ezusb_all
 	return ctx;
 }
 
-
-/* Hopefully the real complete_all will soon be exported, in the mean
- * while this should work. */
-static inline void ezusb_complete_all(struct completion *comp)
-{
-	complete(comp);
-	complete(comp);
-	complete(comp);
-	complete(comp);
-}
-
 static void ezusb_ctx_complete(struct request_context *ctx)
 {
 	struct ezusb_priv *upriv = ctx->upriv;
@@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct re
 
 			netif_wake_queue(dev);
 		}
-		ezusb_complete_all(&ctx->done);
+		complete_all(&ctx->done);
 		ezusb_request_context_put(ctx);
 		break;
 
@@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct re
 			/* This is normal, as all request contexts get flushed
 			 * when the device is disconnected */
 			err("Called, CTX not terminating, but device gone");
-			ezusb_complete_all(&ctx->done);
+			complete_all(&ctx->done);
 			ezusb_request_context_put(ctx);
 			break;
 		}
@@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ez
 			 * get the chance to run themselves. So we make sure
 			 * that we don't sleep for ever */
 			int msecs = DEF_TIMEOUT * (1000 / HZ);
-			while (!ctx->done.done && msecs--)
+
+			while (!try_wait_for_completion(&ctx->done) && msecs--)
 				udelay(1000);
 		} else {
-			wait_event_interruptible(ctx->done.wait,
-						 ctx->done.done);
+			wait_for_completion(&ctx->done);
 		}
 		break;
 	default:



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 04/20] orinoco_usb: Use the regular completion interfaces
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, Kalle Valo, David S. Miller, linux-wireless,
	netdev, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

The completion usage in this driver is interesting:

  - it uses a magic complete function which according to the comment was
    implemented by invoking complete() four times in a row because
    complete_all() was not exported at that time.

  - it uses an open coded wait/poll which checks completion:done. Only one wait
    side (device removal) uses the regular wait_for_completion() interface.

The rationale behind this is to prevent that wait_for_completion() consumes
completion::done which would prevent that all waiters are woken. This is not
necessary with complete_all() as that sets completion::done to UINT_MAX which
is left unmodified by the woken waiters.

Replace the magic complete function with complete_all() and convert the
open coded wait/poll to regular completion interfaces.

This changes the wait to exclusive wait mode. But that does not make any
difference because the wakers use complete_all() which ignores the
exclusive mode.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-usb@vger.kernel.org
---
V2: New patch to avoid conversion to swait functions later.
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -365,17 +365,6 @@ static struct request_context *ezusb_all
 	return ctx;
 }
 
-
-/* Hopefully the real complete_all will soon be exported, in the mean
- * while this should work. */
-static inline void ezusb_complete_all(struct completion *comp)
-{
-	complete(comp);
-	complete(comp);
-	complete(comp);
-	complete(comp);
-}
-
 static void ezusb_ctx_complete(struct request_context *ctx)
 {
 	struct ezusb_priv *upriv = ctx->upriv;
@@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct re
 
 			netif_wake_queue(dev);
 		}
-		ezusb_complete_all(&ctx->done);
+		complete_all(&ctx->done);
 		ezusb_request_context_put(ctx);
 		break;
 
@@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct re
 			/* This is normal, as all request contexts get flushed
 			 * when the device is disconnected */
 			err("Called, CTX not terminating, but device gone");
-			ezusb_complete_all(&ctx->done);
+			complete_all(&ctx->done);
 			ezusb_request_context_put(ctx);
 			break;
 		}
@@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ez
 			 * get the chance to run themselves. So we make sure
 			 * that we don't sleep for ever */
 			int msecs = DEF_TIMEOUT * (1000 / HZ);
-			while (!ctx->done.done && msecs--)
+
+			while (!try_wait_for_completion(&ctx->done) && msecs--)
 				udelay(1000);
 		} else {
-			wait_event_interruptible(ctx->done.wait,
-						 ctx->done.done);
+			wait_for_completion(&ctx->done);
 		}
 		break;
 	default:


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 05/20] acpi: Remove header dependency
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Greg Kroah-Hartman,
	Zhang Rui, Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Peter Zijlstra <peterz@infradead.org>

In order to avoid future header hell, remove the inclusion of
proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
struct.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
---
 drivers/platform/x86/dell-smo8800.c                      |    1 +
 drivers/platform/x86/wmi.c                               |    1 +
 drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
 include/acpi/acpi_bus.h                                  |    2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/platform/x86/dell-smo8800.c
+++ b/drivers/platform/x86/dell-smo8800.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/miscdevice.h>
 #include <linux/uaccess.h>
+#include <linux/fs.h>
 
 struct smo8800_device {
 	u32 irq;                     /* acpi device irq */
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -29,6 +29,7 @@
 #include <linux/uaccess.h>
 #include <linux/uuid.h>
 #include <linux/wmi.h>
+#include <linux/fs.h>
 #include <uapi/linux/wmi.h>
 
 ACPI_MODULE_NAME("wmi");
--- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
+++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
@@ -19,6 +19,7 @@
 #include <linux/acpi.h>
 #include <linux/uaccess.h>
 #include <linux/miscdevice.h>
+#include <linux/fs.h>
 #include "acpi_thermal_rel.h"
 
 static acpi_handle acpi_thermal_rel_handle;
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
 
 #ifdef CONFIG_ACPI
 
-#include <linux/proc_fs.h>
+struct proc_dir_entry;
 
 #define ACPI_BUS_FILE_ROOT	"acpi"
 extern struct proc_dir_entry *acpi_root_dir;



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 05/20] acpi: Remove header dependency
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Greg Kroah-Hartman,
	Zhang Rui, Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Felipe

From: Peter Zijlstra <peterz@infradead.org>

In order to avoid future header hell, remove the inclusion of
proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
struct.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
---
 drivers/platform/x86/dell-smo8800.c                      |    1 +
 drivers/platform/x86/wmi.c                               |    1 +
 drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
 include/acpi/acpi_bus.h                                  |    2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/platform/x86/dell-smo8800.c
+++ b/drivers/platform/x86/dell-smo8800.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/miscdevice.h>
 #include <linux/uaccess.h>
+#include <linux/fs.h>
 
 struct smo8800_device {
 	u32 irq;                     /* acpi device irq */
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -29,6 +29,7 @@
 #include <linux/uaccess.h>
 #include <linux/uuid.h>
 #include <linux/wmi.h>
+#include <linux/fs.h>
 #include <uapi/linux/wmi.h>
 
 ACPI_MODULE_NAME("wmi");
--- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
+++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
@@ -19,6 +19,7 @@
 #include <linux/acpi.h>
 #include <linux/uaccess.h>
 #include <linux/miscdevice.h>
+#include <linux/fs.h>
 #include "acpi_thermal_rel.h"
 
 static acpi_handle acpi_thermal_rel_handle;
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
 
 #ifdef CONFIG_ACPI
 
-#include <linux/proc_fs.h>
+struct proc_dir_entry;
 
 #define ACPI_BUS_FILE_ROOT	"acpi"
 extern struct proc_dir_entry *acpi_root_dir;

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 05/20] acpi: Remove header dependency
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Linus Torvalds,
	Joel Fernandes, Vincent Chen, Ingo Molnar, Davidlohr Bueso,
	kbuild test robot, Brian Cain, Jonathan Corbet,
	Paul E . McKenney, linux-hexagon, Rafael J. Wysocki, linux-csky,
	linux-acpi, Darren Hart, Zhang Rui, Len Brown, Fenghua Yu,
	Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, Kalle Valo, Felipe Balbi,
	Michal Simek, Tony Luck, Nick Hu, Geoff Levand,
	Greg Kroah-Hartman, linux-usb, linux-wireless, Oleg Nesterov,
	Davidlohr Bueso, netdev, Logan Gunthorpe, David S. Miller,
	Andy Shevchenko

From: Peter Zijlstra <peterz@infradead.org>

In order to avoid future header hell, remove the inclusion of
proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
struct.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
---
 drivers/platform/x86/dell-smo8800.c                      |    1 +
 drivers/platform/x86/wmi.c                               |    1 +
 drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
 include/acpi/acpi_bus.h                                  |    2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/platform/x86/dell-smo8800.c
+++ b/drivers/platform/x86/dell-smo8800.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/miscdevice.h>
 #include <linux/uaccess.h>
+#include <linux/fs.h>
 
 struct smo8800_device {
 	u32 irq;                     /* acpi device irq */
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -29,6 +29,7 @@
 #include <linux/uaccess.h>
 #include <linux/uuid.h>
 #include <linux/wmi.h>
+#include <linux/fs.h>
 #include <uapi/linux/wmi.h>
 
 ACPI_MODULE_NAME("wmi");
--- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
+++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
@@ -19,6 +19,7 @@
 #include <linux/acpi.h>
 #include <linux/uaccess.h>
 #include <linux/miscdevice.h>
+#include <linux/fs.h>
 #include "acpi_thermal_rel.h"
 
 static acpi_handle acpi_thermal_rel_handle;
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
 
 #ifdef CONFIG_ACPI
 
-#include <linux/proc_fs.h>
+struct proc_dir_entry;
 
 #define ACPI_BUS_FILE_ROOT	"acpi"
 extern struct proc_dir_entry *acpi_root_dir;



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 05/20] acpi: Remove header dependency
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Greg Kroah-Hartman,
	Zhang Rui, Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Peter Zijlstra <peterz@infradead.org>

In order to avoid future header hell, remove the inclusion of
proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
struct.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
---
 drivers/platform/x86/dell-smo8800.c                      |    1 +
 drivers/platform/x86/wmi.c                               |    1 +
 drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
 include/acpi/acpi_bus.h                                  |    2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/platform/x86/dell-smo8800.c
+++ b/drivers/platform/x86/dell-smo8800.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/miscdevice.h>
 #include <linux/uaccess.h>
+#include <linux/fs.h>
 
 struct smo8800_device {
 	u32 irq;                     /* acpi device irq */
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -29,6 +29,7 @@
 #include <linux/uaccess.h>
 #include <linux/uuid.h>
 #include <linux/wmi.h>
+#include <linux/fs.h>
 #include <uapi/linux/wmi.h>
 
 ACPI_MODULE_NAME("wmi");
--- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
+++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
@@ -19,6 +19,7 @@
 #include <linux/acpi.h>
 #include <linux/uaccess.h>
 #include <linux/miscdevice.h>
+#include <linux/fs.h>
 #include "acpi_thermal_rel.h"
 
 static acpi_handle acpi_thermal_rel_handle;
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
 
 #ifdef CONFIG_ACPI
 
-#include <linux/proc_fs.h>
+struct proc_dir_entry;
 
 #define ACPI_BUS_FILE_ROOT	"acpi"
 extern struct proc_dir_entry *acpi_root_dir;


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 06/20] nds32: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/nds32/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
---
V3: New patch
---
 arch/nds32/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 8916ad9f9f139..3a9219f53ee0d 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <asm/errno.h>
 #include <asm/memory.h>
 #include <asm/types.h>
-#include <linux/mm.h>
 
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 06/20] nds32: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/nds32/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
---
V3: New patch
---
 arch/nds32/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 8916ad9f9f139..3a9219f53ee0d 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <asm/errno.h>
 #include <asm/memory.h>
 #include <asm/types.h>
-#include <linux/mm.h>
 
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
-- 
2.26.0.rc2

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 06/20] nds32: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/nds32/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
---
V3: New patch
---
 arch/nds32/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 8916ad9f9f139..3a9219f53ee0d 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <asm/errno.h>
 #include <asm/memory.h>
 #include <asm/types.h>
-#include <linux/mm.h>
 
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 06/20] nds32: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/nds32/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
---
V3: New patch
---
 arch/nds32/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 8916ad9f9f139..3a9219f53ee0d 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <asm/errno.h>
 #include <asm/memory.h>
 #include <asm/types.h>
-#include <linux/mm.h>
 
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 07/20] csky: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Guo Ren, linux-csky, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, Nick Hu, Greentime Hu, Vincent Chen,
	Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu, linux-ia64,
	Michal Simek, Michael Ellerman, Arnd Bergmann, Geoff Levand,
	linuxppc-dev, Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/csky/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
V3: New patch
---
 arch/csky/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c3403a424..abefa125b93cf 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/sched.h>
-#include <linux/mm.h>
 #include <linux/string.h>
 #include <linux/version.h>
 #include <asm/segment.h>
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 07/20] csky: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Guo Ren, linux-csky, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/csky/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
V3: New patch
---
 arch/csky/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c3403a424..abefa125b93cf 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/sched.h>
-#include <linux/mm.h>
 #include <linux/string.h>
 #include <linux/version.h>
 #include <asm/segment.h>
-- 
2.26.0.rc2

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 07/20] csky: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/csky/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
V3: New patch
---
 arch/csky/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c3403a424..abefa125b93cf 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/sched.h>
-#include <linux/mm.h>
 #include <linux/string.h>
 #include <linux/version.h>
 #include <asm/segment.h>
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 07/20] csky: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Guo Ren, linux-csky, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, Nick Hu, Greentime Hu, Vincent Chen,
	Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu, linux-ia64,
	Michal Simek, Michael Ellerman, Arnd Bergmann, Geoff Levand,
	linuxppc-dev, Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/csky/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
V3: New patch
---
 arch/csky/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c3403a424..abefa125b93cf 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/sched.h>
-#include <linux/mm.h>
 #include <linux/string.h>
 #include <linux/version.h>
 #include <asm/segment.h>
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-hexagon, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, Nick Hu, Greentime Hu, Vincent Chen,
	Guo Ren, linux-csky, Tony Luck, Fenghua Yu, linux-ia64,
	Michal Simek, Michael Ellerman, Arnd Bergmann, Geoff Levand,
	linuxppc-dev, Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/hexagon/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
---
V3: New patch
---
 arch/hexagon/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h
index 00cb38faad0c4..c1019a736ff13 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -10,7 +10,6 @@
 /*
  * User space memory access functions
  */
-#include <linux/mm.h>
 #include <asm/sections.h>
 
 /*
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-hexagon, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/hexagon/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
---
V3: New patch
---
 arch/hexagon/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h
index 00cb38faad0c4..c1019a736ff13 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -10,7 +10,6 @@
 /*
  * User space memory access functions
  */
-#include <linux/mm.h>
 #include <asm/sections.h>
 
 /*
-- 
2.26.0.rc2

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/hexagon/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
---
V3: New patch
---
 arch/hexagon/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h
index 00cb38faad0c4..c1019a736ff13 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -10,7 +10,6 @@
 /*
  * User space memory access functions
  */
-#include <linux/mm.h>
 #include <asm/sections.h>
 
 /*
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-hexagon, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, Nick Hu, Greentime Hu, Vincent Chen,
	Guo Ren, linux-csky, Tony Luck, Fenghua Yu, linux-ia64,
	Michal Simek, Michael Ellerman, Arnd Bergmann, Geoff Levand,
	linuxppc-dev, Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/hexagon/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
---
V3: New patch
---
 arch/hexagon/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h
index 00cb38faad0c4..c1019a736ff13 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -10,7 +10,6 @@
 /*
  * User space memory access functions
  */
-#include <linux/mm.h>
 #include <asm/sections.h>
 
 /*
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 09/20] ia64: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Tony Luck, Fenghua Yu, linux-ia64,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi, Nick Hu,
	Greentime Hu, Vincent Chen, Guo Ren, linux-csky, Brian Cain,
	linux-hexagon, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/ia64/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
---
V3: New patch
---
 arch/ia64/include/asm/uaccess.h | 1 -
 arch/ia64/kernel/process.c      | 1 +
 arch/ia64/mm/ioremap.c          | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 89782ad3fb887..5c7e79eccaeed 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -35,7 +35,6 @@
 
 #include <linux/compiler.h>
 #include <linux/page-flags.h>
-#include <linux/mm.h>
 
 #include <asm/intrinsics.h>
 #include <asm/pgtable.h>
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 968b5f33e725e..743aaf5283278 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -681,3 +681,4 @@ machine_power_off (void)
 	machine_halt();
 }
 
+EXPORT_SYMBOL(ia64_delay_loop);
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index a09cfa0645369..55fd3eb753ff9 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/efi.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/vmalloc.h>
 #include <asm/io.h>
 #include <asm/meminit.h>
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 09/20] ia64: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Tony Luck, Fenghua Yu, linux-ia64,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireles

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/ia64/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
---
V3: New patch
---
 arch/ia64/include/asm/uaccess.h | 1 -
 arch/ia64/kernel/process.c      | 1 +
 arch/ia64/mm/ioremap.c          | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 89782ad3fb887..5c7e79eccaeed 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -35,7 +35,6 @@
 
 #include <linux/compiler.h>
 #include <linux/page-flags.h>
-#include <linux/mm.h>
 
 #include <asm/intrinsics.h>
 #include <asm/pgtable.h>
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 968b5f33e725e..743aaf5283278 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -681,3 +681,4 @@ machine_power_off (void)
 	machine_halt();
 }
 
+EXPORT_SYMBOL(ia64_delay_loop);
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index a09cfa0645369..55fd3eb753ff9 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/efi.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/vmalloc.h>
 #include <asm/io.h>
 #include <asm/meminit.h>
-- 
2.26.0.rc2

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 09/20] ia64: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, netdev, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, Logan Gunthorpe, David S. Miller,
	Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/ia64/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
---
V3: New patch
---
 arch/ia64/include/asm/uaccess.h | 1 -
 arch/ia64/kernel/process.c      | 1 +
 arch/ia64/mm/ioremap.c          | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 89782ad3fb887..5c7e79eccaeed 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -35,7 +35,6 @@
 
 #include <linux/compiler.h>
 #include <linux/page-flags.h>
-#include <linux/mm.h>
 
 #include <asm/intrinsics.h>
 #include <asm/pgtable.h>
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 968b5f33e725e..743aaf5283278 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -681,3 +681,4 @@ machine_power_off (void)
 	machine_halt();
 }
 
+EXPORT_SYMBOL(ia64_delay_loop);
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index a09cfa0645369..55fd3eb753ff9 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/efi.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/vmalloc.h>
 #include <asm/io.h>
 #include <asm/meminit.h>
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 09/20] ia64: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Tony Luck, Fenghua Yu, linux-ia64,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi, Nick Hu,
	Greentime Hu, Vincent Chen, Guo Ren, linux-csky, Brian Cain,
	linux-hexagon, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/ia64/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
---
V3: New patch
---
 arch/ia64/include/asm/uaccess.h | 1 -
 arch/ia64/kernel/process.c      | 1 +
 arch/ia64/mm/ioremap.c          | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 89782ad3fb887..5c7e79eccaeed 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -35,7 +35,6 @@
 
 #include <linux/compiler.h>
 #include <linux/page-flags.h>
-#include <linux/mm.h>
 
 #include <asm/intrinsics.h>
 #include <asm/pgtable.h>
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 968b5f33e725e..743aaf5283278 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -681,3 +681,4 @@ machine_power_off (void)
 	machine_halt();
 }
 
+EXPORT_SYMBOL(ia64_delay_loop);
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index a09cfa0645369..55fd3eb753ff9 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/efi.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/vmalloc.h>
 #include <asm/io.h>
 #include <asm/meminit.h>
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 10/20] microblaze: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25 ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Michal Simek, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi, Nick Hu,
	Greentime Hu, Vincent Chen, Guo Ren, linux-csky, Brian Cain,
	linux-hexagon, Tony Luck, Fenghua Yu, linux-ia64,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/microblaze/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Simek <monstr@monstr.eu>
---
V3; New patch
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index a1f206b90753a..4916d5fbea5e3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include <linux/kernel.h>
-#include <linux/mm.h>
 
 #include <asm/mmu.h>
 #include <asm/page.h>
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 10/20] microblaze: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Michal Simek, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/microblaze/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Simek <monstr@monstr.eu>
---
V3; New patch
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index a1f206b90753a..4916d5fbea5e3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include <linux/kernel.h>
-#include <linux/mm.h>
 
 #include <asm/mmu.h>
 #include <asm/page.h>
-- 
2.26.0.rc2

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 10/20] microblaze: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/microblaze/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Simek <monstr@monstr.eu>
---
V3; New patch
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index a1f206b90753a..4916d5fbea5e3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include <linux/kernel.h>
-#include <linux/mm.h>
 
 #include <asm/mmu.h>
 #include <asm/page.h>
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 10/20] microblaze: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Michal Simek, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi, Nick Hu,
	Greentime Hu, Vincent Chen, Guo Ren, linux-csky, Brian Cain,
	linux-hexagon, Tony Luck, Fenghua Yu, linux-ia64,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/microblaze/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Simek <monstr@monstr.eu>
---
V3; New patch
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index a1f206b90753a..4916d5fbea5e3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include <linux/kernel.h>
-#include <linux/mm.h>
 
 #include <asm/mmu.h>
 #include <asm/page.h>
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 10/20] microblaze: Remove mm.h from asm/uaccess.h
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	kbuild test robot, Michal Simek, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/microblaze/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Simek <monstr@monstr.eu>
---
V3; New patch
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index a1f206b90753a..4916d5fbea5e3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include <linux/kernel.h>
-#include <linux/mm.h>
 
 #include <asm/mmu.h>
 #include <asm/page.h>
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [patch V3 11/20] rcuwait: Add @state argument to rcuwait_wait_event()
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Peter Zijlstra (Intel) <peterz@infradead.org>

Extend rcuwait_wait_event() with a state variable so that it is not
restricted to UNINTERRUPTIBLE waits.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>

---
 include/linux/rcuwait.h       |   12 ++++++++++--
 kernel/locking/percpu-rwsem.c |    2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -3,6 +3,7 @@
 #define _LINUX_RCUWAIT_H_
 
 #include <linux/rcupdate.h>
+#include <linux/sched/signal.h>
 
 /*
  * rcuwait provides a way of blocking and waking up a single
@@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwa
  * The caller is responsible for locking around rcuwait_wait_event(),
  * such that writes to @task are properly serialized.
  */
-#define rcuwait_wait_event(w, condition)				\
+#define rcuwait_wait_event(w, condition, state)				\
 ({									\
+	int __ret = 0;							\
 	rcu_assign_pointer((w)->task, current);				\
 	for (;;) {							\
 		/*							\
 		 * Implicit barrier (A) pairs with (B) in		\
 		 * rcuwait_wake_up().					\
 		 */							\
-		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		set_current_state(state);				\
 		if (condition)						\
 			break;						\
 									\
+		if (signal_pending_state(state, current)) {		\
+			__ret = -EINTR;					\
+			break;						\
+		}							\
+									\
 		schedule();						\
 	}								\
 									\
 	WRITE_ONCE((w)->task, NULL);					\
 	__set_current_state(TASK_RUNNING);				\
+	__ret;								\
 })
 
 #endif /* _LINUX_RCUWAIT_H_ */
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -162,7 +162,7 @@ void percpu_down_write(struct percpu_rw_
 	 */
 
 	/* Wait for all now active readers to complete. */
-	rcuwait_wait_event(&sem->writer, readers_active_check(sem));
+	rcuwait_wait_event(&sem->writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 11/20] rcuwait: Add @state argument to rcuwait_wait_event()
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Peter Zijlstra (Intel) <peterz@infradead.org>

Extend rcuwait_wait_event() with a state variable so that it is not
restricted to UNINTERRUPTIBLE waits.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>

---
 include/linux/rcuwait.h       |   12 ++++++++++--
 kernel/locking/percpu-rwsem.c |    2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -3,6 +3,7 @@
 #define _LINUX_RCUWAIT_H_
 
 #include <linux/rcupdate.h>
+#include <linux/sched/signal.h>
 
 /*
  * rcuwait provides a way of blocking and waking up a single
@@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwa
  * The caller is responsible for locking around rcuwait_wait_event(),
  * such that writes to @task are properly serialized.
  */
-#define rcuwait_wait_event(w, condition)				\
+#define rcuwait_wait_event(w, condition, state)				\
 ({									\
+	int __ret = 0;							\
 	rcu_assign_pointer((w)->task, current);				\
 	for (;;) {							\
 		/*							\
 		 * Implicit barrier (A) pairs with (B) in		\
 		 * rcuwait_wake_up().					\
 		 */							\
-		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		set_current_state(state);				\
 		if (condition)						\
 			break;						\
 									\
+		if (signal_pending_state(state, current)) {		\
+			__ret = -EINTR;					\
+			break;						\
+		}							\
+									\
 		schedule();						\
 	}								\
 									\
 	WRITE_ONCE((w)->task, NULL);					\
 	__set_current_state(TASK_RUNNING);				\
+	__ret;								\
 })
 
 #endif /* _LINUX_RCUWAIT_H_ */
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -162,7 +162,7 @@ void percpu_down_write(struct percpu_rw_
 	 */
 
 	/* Wait for all now active readers to complete. */
-	rcuwait_wait_event(&sem->writer, readers_active_check(sem));
+	rcuwait_wait_event(&sem->writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 11/20] rcuwait: Add @state argument to rcuwait_wait_event()
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Peter Zijlstra (Intel) <peterz@infradead.org>

Extend rcuwait_wait_event() with a state variable so that it is not
restricted to UNINTERRUPTIBLE waits.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>

---
 include/linux/rcuwait.h       |   12 ++++++++++--
 kernel/locking/percpu-rwsem.c |    2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -3,6 +3,7 @@
 #define _LINUX_RCUWAIT_H_
 
 #include <linux/rcupdate.h>
+#include <linux/sched/signal.h>
 
 /*
  * rcuwait provides a way of blocking and waking up a single
@@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwa
  * The caller is responsible for locking around rcuwait_wait_event(),
  * such that writes to @task are properly serialized.
  */
-#define rcuwait_wait_event(w, condition)				\
+#define rcuwait_wait_event(w, condition, state)				\
 ({									\
+	int __ret = 0;							\
 	rcu_assign_pointer((w)->task, current);				\
 	for (;;) {							\
 		/*							\
 		 * Implicit barrier (A) pairs with (B) in		\
 		 * rcuwait_wake_up().					\
 		 */							\
-		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		set_current_state(state);				\
 		if (condition)						\
 			break;						\
 									\
+		if (signal_pending_state(state, current)) {		\
+			__ret = -EINTR;					\
+			break;						\
+		}							\
+									\
 		schedule();						\
 	}								\
 									\
 	WRITE_ONCE((w)->task, NULL);					\
 	__set_current_state(TASK_RUNNING);				\
+	__ret;								\
 })
 
 #endif /* _LINUX_RCUWAIT_H_ */
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -162,7 +162,7 @@ void percpu_down_write(struct percpu_rw_
 	 */
 
 	/* Wait for all now active readers to complete. */
-	rcuwait_wait_event(&sem->writer, readers_active_check(sem));
+	rcuwait_wait_event(&sem->writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 11/20] rcuwait: Add @state argument to rcuwait_wait_event()
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Peter Zijlstra (Intel) <peterz@infradead.org>

Extend rcuwait_wait_event() with a state variable so that it is not
restricted to UNINTERRUPTIBLE waits.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>

---
 include/linux/rcuwait.h       |   12 ++++++++++--
 kernel/locking/percpu-rwsem.c |    2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -3,6 +3,7 @@
 #define _LINUX_RCUWAIT_H_
 
 #include <linux/rcupdate.h>
+#include <linux/sched/signal.h>
 
 /*
  * rcuwait provides a way of blocking and waking up a single
@@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwa
  * The caller is responsible for locking around rcuwait_wait_event(),
  * such that writes to @task are properly serialized.
  */
-#define rcuwait_wait_event(w, condition)				\
+#define rcuwait_wait_event(w, condition, state)				\
 ({									\
+	int __ret = 0;							\
 	rcu_assign_pointer((w)->task, current);				\
 	for (;;) {							\
 		/*							\
 		 * Implicit barrier (A) pairs with (B) in		\
 		 * rcuwait_wake_up().					\
 		 */							\
-		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		set_current_state(state);				\
 		if (condition)						\
 			break;						\
 									\
+		if (signal_pending_state(state, current)) {		\
+			__ret = -EINTR;					\
+			break;						\
+		}							\
+									\
 		schedule();						\
 	}								\
 									\
 	WRITE_ONCE((w)->task, NULL);					\
 	__set_current_state(TASK_RUNNING);				\
+	__ret;								\
 })
 
 #endif /* _LINUX_RCUWAIT_H_ */
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -162,7 +162,7 @@ void percpu_down_write(struct percpu_rw_
 	 */
 
 	/* Wait for all now active readers to complete. */
-	rcuwait_wait_event(&sem->writer, readers_active_check(sem));
+	rcuwait_wait_event(&sem->writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, Geoff Levand, linuxppc-dev, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

The PS3 notification interrupt and kthread use a hacked up completion to
communicate. Since we're wanting to change the completion implementation and
this is abuse anyway, replace it with a simple rcuwait since there is only ever
the one waiter.

AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
cannot receive signals by default and this one doesn't look different. Use
TASK_IDLE instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geoff Levand <geoff@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
V3: Folded the init fix from bigeasy
V2: New patch to avoid the magic completion wait variant
---
 arch/powerpc/platforms/ps3/device-init.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/reboot.h>
+#include <linux/rcuwait.h>
 
 #include <asm/firmware.h>
 #include <asm/lv1call.h>
@@ -670,7 +671,8 @@ struct ps3_notification_device {
 	spinlock_t lock;
 	u64 tag;
 	u64 lv1_status;
-	struct completion done;
+	struct rcuwait wait;
+	bool done;
 };
 
 enum ps3_notify_type {
@@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_inte
 		pr_debug("%s:%u: completed, status 0x%llx\n", __func__,
 			 __LINE__, status);
 		dev->lv1_status = status;
-		complete(&dev->done);
+		dev->done = true;
+		rcuwait_wake_up(&dev->wait);
 	}
 	spin_unlock(&dev->lock);
 	return IRQ_HANDLED;
@@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
 	unsigned long flags;
 	int res;
 
-	init_completion(&dev->done);
 	spin_lock_irqsave(&dev->lock, flags);
 	res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 					&dev->tag)
 		    : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 				       &dev->tag);
+	dev->done = false;
 	spin_unlock_irqrestore(&dev->lock, flags);
 	if (res) {
 		pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
@@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
 	}
 	pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
 
-	res = wait_event_interruptible(dev->done.wait,
-				       dev->done.done || kthread_should_stop());
+	rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE);
+
 	if (kthread_should_stop())
 		res = -EINTR;
-	if (res) {
-		pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op);
-		return res;
-	}
 
 	if (dev->lv1_status) {
 		pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__,
@@ -810,6 +809,7 @@ static int ps3_probe_thread(void *data)
 	}
 
 	spin_lock_init(&dev.lock);
+	rcuwait_init(&dev.wait);
 
 	res = request_irq(irq, ps3_notification_interrupt, 0,
 			  "ps3_notification", &dev);


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, Geoff Levand, linuxppc-dev, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller

From: Thomas Gleixner <tglx@linutronix.de>

The PS3 notification interrupt and kthread use a hacked up completion to
communicate. Since we're wanting to change the completion implementation and
this is abuse anyway, replace it with a simple rcuwait since there is only ever
the one waiter.

AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
cannot receive signals by default and this one doesn't look different. Use
TASK_IDLE instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geoff Levand <geoff@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
V3: Folded the init fix from bigeasy
V2: New patch to avoid the magic completion wait variant
---
 arch/powerpc/platforms/ps3/device-init.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/reboot.h>
+#include <linux/rcuwait.h>
 
 #include <asm/firmware.h>
 #include <asm/lv1call.h>
@@ -670,7 +671,8 @@ struct ps3_notification_device {
 	spinlock_t lock;
 	u64 tag;
 	u64 lv1_status;
-	struct completion done;
+	struct rcuwait wait;
+	bool done;
 };
 
 enum ps3_notify_type {
@@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_inte
 		pr_debug("%s:%u: completed, status 0x%llx\n", __func__,
 			 __LINE__, status);
 		dev->lv1_status = status;
-		complete(&dev->done);
+		dev->done = true;
+		rcuwait_wake_up(&dev->wait);
 	}
 	spin_unlock(&dev->lock);
 	return IRQ_HANDLED;
@@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
 	unsigned long flags;
 	int res;
 
-	init_completion(&dev->done);
 	spin_lock_irqsave(&dev->lock, flags);
 	res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 					&dev->tag)
 		    : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 				       &dev->tag);
+	dev->done = false;
 	spin_unlock_irqrestore(&dev->lock, flags);
 	if (res) {
 		pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
@@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
 	}
 	pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
 
-	res = wait_event_interruptible(dev->done.wait,
-				       dev->done.done || kthread_should_stop());
+	rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE);
+
 	if (kthread_should_stop())
 		res = -EINTR;
-	if (res) {
-		pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op);
-		return res;
-	}
 
 	if (dev->lv1_status) {
 		pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__,
@@ -810,6 +809,7 @@ static int ps3_probe_thread(void *data)
 	}
 
 	spin_lock_init(&dev.lock);
+	rcuwait_init(&dev.wait);
 
 	res = request_irq(irq, ps3_notification_interrupt, 0,
 			  "ps3_notification", &dev);

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	linux-hexagon, Vincent Chen, Ingo Molnar, Jonathan Corbet,
	Davidlohr Bueso, kbuild test robot, Brian Cain, linux-acpi,
	Paul E . McKenney, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, Greentime Hu, Bjorn Helgaas, Kurt Schwemmer,
	Kalle Valo, Felipe Balbi, Michal Simek, Tony Luck, Nick Hu,
	Geoff Levand, Greg Kroah-Hartman, linux-usb, linux-wireless,
	Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe, netdev,
	linuxppc-dev, David S. Miller, Andy Shevchenko

From: Thomas Gleixner <tglx@linutronix.de>

The PS3 notification interrupt and kthread use a hacked up completion to
communicate. Since we're wanting to change the completion implementation and
this is abuse anyway, replace it with a simple rcuwait since there is only ever
the one waiter.

AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
cannot receive signals by default and this one doesn't look different. Use
TASK_IDLE instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geoff Levand <geoff@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
V3: Folded the init fix from bigeasy
V2: New patch to avoid the magic completion wait variant
---
 arch/powerpc/platforms/ps3/device-init.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/reboot.h>
+#include <linux/rcuwait.h>
 
 #include <asm/firmware.h>
 #include <asm/lv1call.h>
@@ -670,7 +671,8 @@ struct ps3_notification_device {
 	spinlock_t lock;
 	u64 tag;
 	u64 lv1_status;
-	struct completion done;
+	struct rcuwait wait;
+	bool done;
 };
 
 enum ps3_notify_type {
@@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_inte
 		pr_debug("%s:%u: completed, status 0x%llx\n", __func__,
 			 __LINE__, status);
 		dev->lv1_status = status;
-		complete(&dev->done);
+		dev->done = true;
+		rcuwait_wake_up(&dev->wait);
 	}
 	spin_unlock(&dev->lock);
 	return IRQ_HANDLED;
@@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
 	unsigned long flags;
 	int res;
 
-	init_completion(&dev->done);
 	spin_lock_irqsave(&dev->lock, flags);
 	res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 					&dev->tag)
 		    : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 				       &dev->tag);
+	dev->done = false;
 	spin_unlock_irqrestore(&dev->lock, flags);
 	if (res) {
 		pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
@@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
 	}
 	pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
 
-	res = wait_event_interruptible(dev->done.wait,
-				       dev->done.done || kthread_should_stop());
+	rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE);
+
 	if (kthread_should_stop())
 		res = -EINTR;
-	if (res) {
-		pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op);
-		return res;
-	}
 
 	if (dev->lv1_status) {
 		pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__,
@@ -810,6 +809,7 @@ static int ps3_probe_thread(void *data)
 	}
 
 	spin_lock_init(&dev.lock);
+	rcuwait_init(&dev.wait);
 
 	res = request_irq(irq, ps3_notification_interrupt, 0,
 			  "ps3_notification", &dev);


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, Geoff Levand, linuxppc-dev, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

The PS3 notification interrupt and kthread use a hacked up completion to
communicate. Since we're wanting to change the completion implementation and
this is abuse anyway, replace it with a simple rcuwait since there is only ever
the one waiter.

AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
cannot receive signals by default and this one doesn't look different. Use
TASK_IDLE instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geoff Levand <geoff@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
V3: Folded the init fix from bigeasy
V2: New patch to avoid the magic completion wait variant
---
 arch/powerpc/platforms/ps3/device-init.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/reboot.h>
+#include <linux/rcuwait.h>
 
 #include <asm/firmware.h>
 #include <asm/lv1call.h>
@@ -670,7 +671,8 @@ struct ps3_notification_device {
 	spinlock_t lock;
 	u64 tag;
 	u64 lv1_status;
-	struct completion done;
+	struct rcuwait wait;
+	bool done;
 };
 
 enum ps3_notify_type {
@@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_inte
 		pr_debug("%s:%u: completed, status 0x%llx\n", __func__,
 			 __LINE__, status);
 		dev->lv1_status = status;
-		complete(&dev->done);
+		dev->done = true;
+		rcuwait_wake_up(&dev->wait);
 	}
 	spin_unlock(&dev->lock);
 	return IRQ_HANDLED;
@@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
 	unsigned long flags;
 	int res;
 
-	init_completion(&dev->done);
 	spin_lock_irqsave(&dev->lock, flags);
 	res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 					&dev->tag)
 		    : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 				       &dev->tag);
+	dev->done = false;
 	spin_unlock_irqrestore(&dev->lock, flags);
 	if (res) {
 		pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
@@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
 	}
 	pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
 
-	res = wait_event_interruptible(dev->done.wait,
-				       dev->done.done || kthread_should_stop());
+	rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE);
+
 	if (kthread_should_stop())
 		res = -EINTR;
-	if (res) {
-		pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op);
-		return res;
-	}
 
 	if (dev->lv1_status) {
 		pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__,
@@ -810,6 +809,7 @@ static int ps3_probe_thread(void *data)
 	}
 
 	spin_lock_init(&dev.lock);
+	rcuwait_init(&dev.wait);
 
 	res = request_irq(irq, ps3_notification_interrupt, 0,
 			  "ps3_notification", &dev);

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

The kernel provides a variety of locking primitives. The nesting of these
lock types and the implications of them on RT enabled kernels is nowhere
documented.

Add initial documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
---
V3: Addressed review comments from Paul, Jonathan, Davidlohr
V2: Addressed review comments from Randy
---
 Documentation/locking/index.rst     |    1 
 Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
 2 files changed, 300 insertions(+)
 create mode 100644 Documentation/locking/locktypes.rst

--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -7,6 +7,7 @@ locking
 .. toctree::
     :maxdepth: 1
 
+    locktypes
     lockdep-design
     lockstat
     locktorture
--- /dev/null
+++ b/Documentation/locking/locktypes.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _kernel_hacking_locktypes:
+
+==========================
+Lock types and their rules
+==========================
+
+Introduction
+============
+
+The kernel provides a variety of locking primitives which can be divided
+into two categories:
+
+ - Sleeping locks
+ - Spinning locks
+
+This document conceptually describes these lock types and provides rules
+for their nesting, including the rules for use under PREEMPT_RT.
+
+
+Lock categories
+===============
+
+Sleeping locks
+--------------
+
+Sleeping locks can only be acquired in preemptible task context.
+
+Although implementations allow try_lock() from other contexts, it is
+necessary to carefully evaluate the safety of unlock() as well as of
+try_lock().  Furthermore, it is also necessary to evaluate the debugging
+versions of these primitives.  In short, don't acquire sleeping locks from
+other contexts unless there is no other option.
+
+Sleeping lock types:
+
+ - mutex
+ - rt_mutex
+ - semaphore
+ - rw_semaphore
+ - ww_mutex
+ - percpu_rw_semaphore
+
+On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks
+--------------
+
+ - raw_spinlock_t
+ - bit spinlocks
+
+On non-PREEMPT_RT kernels, these lock types are also spinning locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks implicitly disable preemption and the lock / unlock functions
+can have suffixes which apply further protections:
+
+ ===================  ====================================================
+ _bh()                Disable / enable bottom halves (soft interrupts)
+ _irq()               Disable / enable interrupts
+ _irqsave/restore()   Save and disable / restore interrupt disabled state
+ ===================  ====================================================
+
+
+rtmutex
+=======
+
+RT-mutexes are mutexes with support for priority inheritance (PI).
+
+PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+interrupt disabled sections.
+
+PI clearly cannot preempt preemption-disabled or interrupt-disabled
+regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
+execute most such regions of code in preemptible task context, especially
+interrupt handlers and soft interrupts.  This conversion allows spinlock_t
+and rwlock_t to be implemented via RT-mutexes.
+
+
+raw_spinlock_t and spinlock_t
+=============================
+
+raw_spinlock_t
+--------------
+
+raw_spinlock_t is a strict spinning lock implementation regardless of the
+kernel configuration including PREEMPT_RT enabled kernels.
+
+raw_spinlock_t is a strict spinning lock implementation in all kernels,
+including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
+core code, low level interrupt handling and places where disabling
+preemption or interrupts is required, for example, to safely access
+hardware state.  raw_spinlock_t can sometimes also be used when the
+critical section is tiny, thus avoiding RT-mutex overhead.
+
+spinlock_t
+----------
+
+The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+
+On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+and has exactly the same semantics.
+
+spinlock_t and PREEMPT_RT
+-------------------------
+
+On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+implementation based on rt_mutex which changes the semantics:
+
+ - Preemption is not disabled
+
+ - The hard interrupt related suffixes for spin_lock / spin_unlock
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
+   interrupt disabled state
+
+ - The soft interrupt related suffix (_bh()) still disables softirq
+   handlers.
+
+   Non-PREEMPT_RT kernels disable preemption to get this effect.
+
+   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
+   preemption disabled. The lock disables softirq handlers and also
+   prevents reentrancy due to task preemption.
+
+PREEMPT_RT kernels preserve all other spinlock_t semantics:
+
+ - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
+   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
+   disable migration, which ensures that pointers to per-CPU variables
+   remain valid even if the task is preempted.
+
+ - Task state is preserved across spinlock acquisition, ensuring that the
+   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
+   kernels leave task state untouched.  However, PREEMPT_RT must change
+   task state if the task blocks during acquisition.  Therefore, it saves
+   the current task state before blocking and the corresponding lock wakeup
+   restores it.
+
+   Other types of wakeups would normally unconditionally set the task state
+   to RUNNING, but that does not work here because the task must remain
+   blocked until the lock becomes available.  Therefore, when a non-lock
+   wakeup attempts to awaken a task blocked waiting for a spinlock, it
+   instead sets the saved state to RUNNING.  Then, when the lock
+   acquisition completes, the lock wakeup sets the task state to the saved
+   state, in this case setting it to RUNNING.
+
+rwlock_t
+========
+
+rwlock_t is a multiple readers and single writer lock mechanism.
+
+Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
+suffix rules of spinlock_t apply accordingly. The implementation is fair,
+thus preventing writer starvation.
+
+rwlock_t and PREEMPT_RT
+-----------------------
+
+PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
+implementation, thus changing semantics:
+
+ - All the spinlock_t changes also apply to rwlock_t.
+
+ - Because an rwlock_t writer cannot grant its priority to multiple
+   readers, a preempted low-priority reader will continue holding its lock,
+   thus starving even high-priority writers.  In contrast, because readers
+   can grant their priority to a writer, a preempted low-priority writer
+   will have its priority boosted until it releases the lock, thus
+   preventing that writer from starving readers.
+
+
+PREEMPT_RT caveats
+==================
+
+spinlock_t and rwlock_t
+-----------------------
+
+These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
+have a few implications.  For example, on a non-PREEMPT_RT kernel the
+following code sequence works as expected::
+
+   local_irq_disable();
+   spin_lock(&lock);
+
+and is fully equivalent to::
+
+   spin_lock_irq(&lock);
+
+Same applies to rwlock_t and the _irqsave() suffix variants.
+
+On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
+fully preemptible context.  Instead, use spin_lock_irq() or
+spin_lock_irqsave() and their unlock counterparts.  In cases where the
+interrupt disabling and locking must remain separate, PREEMPT_RT offers a
+local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
+allowing things like per-CPU irq-disabled locks to be acquired.  However,
+this approach should be used only where absolutely necessary.
+
+
+raw_spinlock_t
+--------------
+
+Acquiring a raw_spinlock_t disables preemption and possibly also
+interrupts, so the critical section must avoid acquiring a regular
+spinlock_t or rwlock_t, for example, the critical section must avoid
+allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
+works perfectly::
+
+  raw_spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+But this code fails on PREEMPT_RT kernels because the memory allocator is
+fully preemptible and therefore cannot be invoked from truly atomic
+contexts.  However, it is perfectly fine to invoke the memory allocator
+while holding normal non-raw spinlocks because they do not disable
+preemption on PREEMPT_RT kernels::
+
+  spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+
+bit spinlocks
+-------------
+
+Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
+substituted by an RT-mutex based implementation for obvious reasons.
+
+The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
+caveats vs. raw_spinlock_t apply.
+
+Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
+this requires conditional (#ifdef'ed) code changes at the usage site while
+the spinlock_t substitution is simply done by the compiler and the
+conditionals are restricted to header files and core implementation of the
+locking primitives and the usage sites do not require any changes.
+
+
+Lock type nesting rules
+=======================
+
+The most basic rules are:
+
+  - Lock types of the same lock category (sleeping, spinning) can nest
+    arbitrarily as long as they respect the general lock ordering rules to
+    prevent deadlocks.
+
+  - Sleeping lock types cannot nest inside spinning lock types.
+
+  - Spinning lock types can nest inside sleeping lock types.
+
+These rules apply in general independent of CONFIG_PREEMPT_RT.
+
+As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
+spinning to sleeping this has obviously restrictions how they can nest with
+raw_spinlock_t.
+
+This results in the following nest ordering:
+
+  1) Sleeping locks
+  2) spinlock_t and rwlock_t
+  3) raw_spinlock_t and bit spinlocks
+
+Lockdep is aware of these constraints to ensure that they are respected.
+
+
+Owner semantics
+===============
+
+Most lock types in the Linux kernel have strict owner semantics, i.e. the
+context (task) which acquires a lock has to release it.
+
+There are two exceptions:
+
+  - semaphores
+  - rwsems
+
+semaphores have no owner semantics for historical reason, and as such
+trylock and release operations can be called from any context. They are
+often used for both serialization and waiting purposes. That's generally
+discouraged and should be replaced by separate serialization and wait
+mechanisms, such as mutexes and completions.
+
+rwsems have grown interfaces which allow non owner release for special
+purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
+substitutes all locking primitives except semaphores with RT-mutex based
+implementations to provide priority inheritance for all lock types except
+the truly spinning ones. Priority inheritance on ownerless locks is
+obviously impossible.
+
+For now the rwsem non-owner release excludes code which utilizes it from
+being used on PREEMPT_RT enabled kernels. In same cases this can be
+mitigated by disabling portions of the code, in other cases the complete
+functionality has to be disabled until a workable solution has been found.


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless

From: Thomas Gleixner <tglx@linutronix.de>

The kernel provides a variety of locking primitives. The nesting of these
lock types and the implications of them on RT enabled kernels is nowhere
documented.

Add initial documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
---
V3: Addressed review comments from Paul, Jonathan, Davidlohr
V2: Addressed review comments from Randy
---
 Documentation/locking/index.rst     |    1 
 Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
 2 files changed, 300 insertions(+)
 create mode 100644 Documentation/locking/locktypes.rst

--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -7,6 +7,7 @@ locking
 .. toctree::
     :maxdepth: 1
 
+    locktypes
     lockdep-design
     lockstat
     locktorture
--- /dev/null
+++ b/Documentation/locking/locktypes.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _kernel_hacking_locktypes:
+
+==========================
+Lock types and their rules
+==========================
+
+Introduction
+============
+
+The kernel provides a variety of locking primitives which can be divided
+into two categories:
+
+ - Sleeping locks
+ - Spinning locks
+
+This document conceptually describes these lock types and provides rules
+for their nesting, including the rules for use under PREEMPT_RT.
+
+
+Lock categories
+===============
+
+Sleeping locks
+--------------
+
+Sleeping locks can only be acquired in preemptible task context.
+
+Although implementations allow try_lock() from other contexts, it is
+necessary to carefully evaluate the safety of unlock() as well as of
+try_lock().  Furthermore, it is also necessary to evaluate the debugging
+versions of these primitives.  In short, don't acquire sleeping locks from
+other contexts unless there is no other option.
+
+Sleeping lock types:
+
+ - mutex
+ - rt_mutex
+ - semaphore
+ - rw_semaphore
+ - ww_mutex
+ - percpu_rw_semaphore
+
+On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks
+--------------
+
+ - raw_spinlock_t
+ - bit spinlocks
+
+On non-PREEMPT_RT kernels, these lock types are also spinning locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks implicitly disable preemption and the lock / unlock functions
+can have suffixes which apply further protections:
+
+ ===================  ====================================================
+ _bh()                Disable / enable bottom halves (soft interrupts)
+ _irq()               Disable / enable interrupts
+ _irqsave/restore()   Save and disable / restore interrupt disabled state
+ ===================  ====================================================
+
+
+rtmutex
+=======
+
+RT-mutexes are mutexes with support for priority inheritance (PI).
+
+PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+interrupt disabled sections.
+
+PI clearly cannot preempt preemption-disabled or interrupt-disabled
+regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
+execute most such regions of code in preemptible task context, especially
+interrupt handlers and soft interrupts.  This conversion allows spinlock_t
+and rwlock_t to be implemented via RT-mutexes.
+
+
+raw_spinlock_t and spinlock_t
+=============================
+
+raw_spinlock_t
+--------------
+
+raw_spinlock_t is a strict spinning lock implementation regardless of the
+kernel configuration including PREEMPT_RT enabled kernels.
+
+raw_spinlock_t is a strict spinning lock implementation in all kernels,
+including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
+core code, low level interrupt handling and places where disabling
+preemption or interrupts is required, for example, to safely access
+hardware state.  raw_spinlock_t can sometimes also be used when the
+critical section is tiny, thus avoiding RT-mutex overhead.
+
+spinlock_t
+----------
+
+The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+
+On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+and has exactly the same semantics.
+
+spinlock_t and PREEMPT_RT
+-------------------------
+
+On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+implementation based on rt_mutex which changes the semantics:
+
+ - Preemption is not disabled
+
+ - The hard interrupt related suffixes for spin_lock / spin_unlock
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
+   interrupt disabled state
+
+ - The soft interrupt related suffix (_bh()) still disables softirq
+   handlers.
+
+   Non-PREEMPT_RT kernels disable preemption to get this effect.
+
+   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
+   preemption disabled. The lock disables softirq handlers and also
+   prevents reentrancy due to task preemption.
+
+PREEMPT_RT kernels preserve all other spinlock_t semantics:
+
+ - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
+   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
+   disable migration, which ensures that pointers to per-CPU variables
+   remain valid even if the task is preempted.
+
+ - Task state is preserved across spinlock acquisition, ensuring that the
+   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
+   kernels leave task state untouched.  However, PREEMPT_RT must change
+   task state if the task blocks during acquisition.  Therefore, it saves
+   the current task state before blocking and the corresponding lock wakeup
+   restores it.
+
+   Other types of wakeups would normally unconditionally set the task state
+   to RUNNING, but that does not work here because the task must remain
+   blocked until the lock becomes available.  Therefore, when a non-lock
+   wakeup attempts to awaken a task blocked waiting for a spinlock, it
+   instead sets the saved state to RUNNING.  Then, when the lock
+   acquisition completes, the lock wakeup sets the task state to the saved
+   state, in this case setting it to RUNNING.
+
+rwlock_t
+========
+
+rwlock_t is a multiple readers and single writer lock mechanism.
+
+Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
+suffix rules of spinlock_t apply accordingly. The implementation is fair,
+thus preventing writer starvation.
+
+rwlock_t and PREEMPT_RT
+-----------------------
+
+PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
+implementation, thus changing semantics:
+
+ - All the spinlock_t changes also apply to rwlock_t.
+
+ - Because an rwlock_t writer cannot grant its priority to multiple
+   readers, a preempted low-priority reader will continue holding its lock,
+   thus starving even high-priority writers.  In contrast, because readers
+   can grant their priority to a writer, a preempted low-priority writer
+   will have its priority boosted until it releases the lock, thus
+   preventing that writer from starving readers.
+
+
+PREEMPT_RT caveats
+==================
+
+spinlock_t and rwlock_t
+-----------------------
+
+These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
+have a few implications.  For example, on a non-PREEMPT_RT kernel the
+following code sequence works as expected::
+
+   local_irq_disable();
+   spin_lock(&lock);
+
+and is fully equivalent to::
+
+   spin_lock_irq(&lock);
+
+Same applies to rwlock_t and the _irqsave() suffix variants.
+
+On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
+fully preemptible context.  Instead, use spin_lock_irq() or
+spin_lock_irqsave() and their unlock counterparts.  In cases where the
+interrupt disabling and locking must remain separate, PREEMPT_RT offers a
+local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
+allowing things like per-CPU irq-disabled locks to be acquired.  However,
+this approach should be used only where absolutely necessary.
+
+
+raw_spinlock_t
+--------------
+
+Acquiring a raw_spinlock_t disables preemption and possibly also
+interrupts, so the critical section must avoid acquiring a regular
+spinlock_t or rwlock_t, for example, the critical section must avoid
+allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
+works perfectly::
+
+  raw_spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+But this code fails on PREEMPT_RT kernels because the memory allocator is
+fully preemptible and therefore cannot be invoked from truly atomic
+contexts.  However, it is perfectly fine to invoke the memory allocator
+while holding normal non-raw spinlocks because they do not disable
+preemption on PREEMPT_RT kernels::
+
+  spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+
+bit spinlocks
+-------------
+
+Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
+substituted by an RT-mutex based implementation for obvious reasons.
+
+The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
+caveats vs. raw_spinlock_t apply.
+
+Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
+this requires conditional (#ifdef'ed) code changes at the usage site while
+the spinlock_t substitution is simply done by the compiler and the
+conditionals are restricted to header files and core implementation of the
+locking primitives and the usage sites do not require any changes.
+
+
+Lock type nesting rules
+=======================
+
+The most basic rules are:
+
+  - Lock types of the same lock category (sleeping, spinning) can nest
+    arbitrarily as long as they respect the general lock ordering rules to
+    prevent deadlocks.
+
+  - Sleeping lock types cannot nest inside spinning lock types.
+
+  - Spinning lock types can nest inside sleeping lock types.
+
+These rules apply in general independent of CONFIG_PREEMPT_RT.
+
+As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
+spinning to sleeping this has obviously restrictions how they can nest with
+raw_spinlock_t.
+
+This results in the following nest ordering:
+
+  1) Sleeping locks
+  2) spinlock_t and rwlock_t
+  3) raw_spinlock_t and bit spinlocks
+
+Lockdep is aware of these constraints to ensure that they are respected.
+
+
+Owner semantics
+===============
+
+Most lock types in the Linux kernel have strict owner semantics, i.e. the
+context (task) which acquires a lock has to release it.
+
+There are two exceptions:
+
+  - semaphores
+  - rwsems
+
+semaphores have no owner semantics for historical reason, and as such
+trylock and release operations can be called from any context. They are
+often used for both serialization and waiting purposes. That's generally
+discouraged and should be replaced by separate serialization and wait
+mechanisms, such as mutexes and completions.
+
+rwsems have grown interfaces which allow non owner release for special
+purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
+substitutes all locking primitives except semaphores with RT-mutex based
+implementations to provide priority inheritance for all lock types except
+the truly spinning ones. Priority inheritance on ownerless locks is
+obviously impossible.
+
+For now the rwsem non-owner release excludes code which utilizes it from
+being used on PREEMPT_RT enabled kernels. In same cases this can be
+mitigated by disabling portions of the code, in other cases the complete
+functionality has to be disabled until a workable solution has been found.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: linux-usb, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Davidlohr Bueso, linux-acpi,
	Brian Cain, Jonathan Corbet, linux-hexagon, Rafael J. Wysocki,
	linux-csky, Linus Torvalds, Darren Hart, Zhang Rui, Len Brown,
	Fenghua Yu, Paul E . McKenney, linux-pm, linuxppc-dev,
	Greentime Hu, Bjorn Helgaas, Kurt Schwemmer, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, Greg Kroah-Hartman, Randy Dunlap,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, Arnd Bergmann,
	netdev, Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Thomas Gleixner <tglx@linutronix.de>

The kernel provides a variety of locking primitives. The nesting of these
lock types and the implications of them on RT enabled kernels is nowhere
documented.

Add initial documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
---
V3: Addressed review comments from Paul, Jonathan, Davidlohr
V2: Addressed review comments from Randy
---
 Documentation/locking/index.rst     |    1 
 Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
 2 files changed, 300 insertions(+)
 create mode 100644 Documentation/locking/locktypes.rst

--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -7,6 +7,7 @@ locking
 .. toctree::
     :maxdepth: 1
 
+    locktypes
     lockdep-design
     lockstat
     locktorture
--- /dev/null
+++ b/Documentation/locking/locktypes.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _kernel_hacking_locktypes:
+
+==========================
+Lock types and their rules
+==========================
+
+Introduction
+============
+
+The kernel provides a variety of locking primitives which can be divided
+into two categories:
+
+ - Sleeping locks
+ - Spinning locks
+
+This document conceptually describes these lock types and provides rules
+for their nesting, including the rules for use under PREEMPT_RT.
+
+
+Lock categories
+===============
+
+Sleeping locks
+--------------
+
+Sleeping locks can only be acquired in preemptible task context.
+
+Although implementations allow try_lock() from other contexts, it is
+necessary to carefully evaluate the safety of unlock() as well as of
+try_lock().  Furthermore, it is also necessary to evaluate the debugging
+versions of these primitives.  In short, don't acquire sleeping locks from
+other contexts unless there is no other option.
+
+Sleeping lock types:
+
+ - mutex
+ - rt_mutex
+ - semaphore
+ - rw_semaphore
+ - ww_mutex
+ - percpu_rw_semaphore
+
+On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks
+--------------
+
+ - raw_spinlock_t
+ - bit spinlocks
+
+On non-PREEMPT_RT kernels, these lock types are also spinning locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks implicitly disable preemption and the lock / unlock functions
+can have suffixes which apply further protections:
+
+ ===================  ====================================================
+ _bh()                Disable / enable bottom halves (soft interrupts)
+ _irq()               Disable / enable interrupts
+ _irqsave/restore()   Save and disable / restore interrupt disabled state
+ ===================  ====================================================
+
+
+rtmutex
+=======
+
+RT-mutexes are mutexes with support for priority inheritance (PI).
+
+PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+interrupt disabled sections.
+
+PI clearly cannot preempt preemption-disabled or interrupt-disabled
+regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
+execute most such regions of code in preemptible task context, especially
+interrupt handlers and soft interrupts.  This conversion allows spinlock_t
+and rwlock_t to be implemented via RT-mutexes.
+
+
+raw_spinlock_t and spinlock_t
+=============================
+
+raw_spinlock_t
+--------------
+
+raw_spinlock_t is a strict spinning lock implementation regardless of the
+kernel configuration including PREEMPT_RT enabled kernels.
+
+raw_spinlock_t is a strict spinning lock implementation in all kernels,
+including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
+core code, low level interrupt handling and places where disabling
+preemption or interrupts is required, for example, to safely access
+hardware state.  raw_spinlock_t can sometimes also be used when the
+critical section is tiny, thus avoiding RT-mutex overhead.
+
+spinlock_t
+----------
+
+The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+
+On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+and has exactly the same semantics.
+
+spinlock_t and PREEMPT_RT
+-------------------------
+
+On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+implementation based on rt_mutex which changes the semantics:
+
+ - Preemption is not disabled
+
+ - The hard interrupt related suffixes for spin_lock / spin_unlock
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
+   interrupt disabled state
+
+ - The soft interrupt related suffix (_bh()) still disables softirq
+   handlers.
+
+   Non-PREEMPT_RT kernels disable preemption to get this effect.
+
+   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
+   preemption disabled. The lock disables softirq handlers and also
+   prevents reentrancy due to task preemption.
+
+PREEMPT_RT kernels preserve all other spinlock_t semantics:
+
+ - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
+   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
+   disable migration, which ensures that pointers to per-CPU variables
+   remain valid even if the task is preempted.
+
+ - Task state is preserved across spinlock acquisition, ensuring that the
+   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
+   kernels leave task state untouched.  However, PREEMPT_RT must change
+   task state if the task blocks during acquisition.  Therefore, it saves
+   the current task state before blocking and the corresponding lock wakeup
+   restores it.
+
+   Other types of wakeups would normally unconditionally set the task state
+   to RUNNING, but that does not work here because the task must remain
+   blocked until the lock becomes available.  Therefore, when a non-lock
+   wakeup attempts to awaken a task blocked waiting for a spinlock, it
+   instead sets the saved state to RUNNING.  Then, when the lock
+   acquisition completes, the lock wakeup sets the task state to the saved
+   state, in this case setting it to RUNNING.
+
+rwlock_t
+========
+
+rwlock_t is a multiple readers and single writer lock mechanism.
+
+Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
+suffix rules of spinlock_t apply accordingly. The implementation is fair,
+thus preventing writer starvation.
+
+rwlock_t and PREEMPT_RT
+-----------------------
+
+PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
+implementation, thus changing semantics:
+
+ - All the spinlock_t changes also apply to rwlock_t.
+
+ - Because an rwlock_t writer cannot grant its priority to multiple
+   readers, a preempted low-priority reader will continue holding its lock,
+   thus starving even high-priority writers.  In contrast, because readers
+   can grant their priority to a writer, a preempted low-priority writer
+   will have its priority boosted until it releases the lock, thus
+   preventing that writer from starving readers.
+
+
+PREEMPT_RT caveats
+==================
+
+spinlock_t and rwlock_t
+-----------------------
+
+These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
+have a few implications.  For example, on a non-PREEMPT_RT kernel the
+following code sequence works as expected::
+
+   local_irq_disable();
+   spin_lock(&lock);
+
+and is fully equivalent to::
+
+   spin_lock_irq(&lock);
+
+Same applies to rwlock_t and the _irqsave() suffix variants.
+
+On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
+fully preemptible context.  Instead, use spin_lock_irq() or
+spin_lock_irqsave() and their unlock counterparts.  In cases where the
+interrupt disabling and locking must remain separate, PREEMPT_RT offers a
+local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
+allowing things like per-CPU irq-disabled locks to be acquired.  However,
+this approach should be used only where absolutely necessary.
+
+
+raw_spinlock_t
+--------------
+
+Acquiring a raw_spinlock_t disables preemption and possibly also
+interrupts, so the critical section must avoid acquiring a regular
+spinlock_t or rwlock_t, for example, the critical section must avoid
+allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
+works perfectly::
+
+  raw_spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+But this code fails on PREEMPT_RT kernels because the memory allocator is
+fully preemptible and therefore cannot be invoked from truly atomic
+contexts.  However, it is perfectly fine to invoke the memory allocator
+while holding normal non-raw spinlocks because they do not disable
+preemption on PREEMPT_RT kernels::
+
+  spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+
+bit spinlocks
+-------------
+
+Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
+substituted by an RT-mutex based implementation for obvious reasons.
+
+The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
+caveats vs. raw_spinlock_t apply.
+
+Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
+this requires conditional (#ifdef'ed) code changes at the usage site while
+the spinlock_t substitution is simply done by the compiler and the
+conditionals are restricted to header files and core implementation of the
+locking primitives and the usage sites do not require any changes.
+
+
+Lock type nesting rules
+=======================
+
+The most basic rules are:
+
+  - Lock types of the same lock category (sleeping, spinning) can nest
+    arbitrarily as long as they respect the general lock ordering rules to
+    prevent deadlocks.
+
+  - Sleeping lock types cannot nest inside spinning lock types.
+
+  - Spinning lock types can nest inside sleeping lock types.
+
+These rules apply in general independent of CONFIG_PREEMPT_RT.
+
+As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
+spinning to sleeping this has obviously restrictions how they can nest with
+raw_spinlock_t.
+
+This results in the following nest ordering:
+
+  1) Sleeping locks
+  2) spinlock_t and rwlock_t
+  3) raw_spinlock_t and bit spinlocks
+
+Lockdep is aware of these constraints to ensure that they are respected.
+
+
+Owner semantics
+===============
+
+Most lock types in the Linux kernel have strict owner semantics, i.e. the
+context (task) which acquires a lock has to release it.
+
+There are two exceptions:
+
+  - semaphores
+  - rwsems
+
+semaphores have no owner semantics for historical reason, and as such
+trylock and release operations can be called from any context. They are
+often used for both serialization and waiting purposes. That's generally
+discouraged and should be replaced by separate serialization and wait
+mechanisms, such as mutexes and completions.
+
+rwsems have grown interfaces which allow non owner release for special
+purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
+substitutes all locking primitives except semaphores with RT-mutex based
+implementations to provide priority inheritance for all lock types except
+the truly spinning ones. Priority inheritance on ownerless locks is
+obviously impossible.
+
+For now the rwsem non-owner release excludes code which utilizes it from
+being used on PREEMPT_RT enabled kernels. In same cases this can be
+mitigated by disabling portions of the code, in other cases the complete
+functionality has to be disabled until a workable solution has been found.


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer, linux-pci,
	Greg Kroah-Hartman, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

The kernel provides a variety of locking primitives. The nesting of these
lock types and the implications of them on RT enabled kernels is nowhere
documented.

Add initial documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
---
V3: Addressed review comments from Paul, Jonathan, Davidlohr
V2: Addressed review comments from Randy
---
 Documentation/locking/index.rst     |    1 
 Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
 2 files changed, 300 insertions(+)
 create mode 100644 Documentation/locking/locktypes.rst

--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -7,6 +7,7 @@ locking
 .. toctree::
     :maxdepth: 1
 
+    locktypes
     lockdep-design
     lockstat
     locktorture
--- /dev/null
+++ b/Documentation/locking/locktypes.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _kernel_hacking_locktypes:
+
+=============
+Lock types and their rules
+=============
+
+Introduction
+======
+
+The kernel provides a variety of locking primitives which can be divided
+into two categories:
+
+ - Sleeping locks
+ - Spinning locks
+
+This document conceptually describes these lock types and provides rules
+for their nesting, including the rules for use under PREEMPT_RT.
+
+
+Lock categories
+=======+
+Sleeping locks
+--------------
+
+Sleeping locks can only be acquired in preemptible task context.
+
+Although implementations allow try_lock() from other contexts, it is
+necessary to carefully evaluate the safety of unlock() as well as of
+try_lock().  Furthermore, it is also necessary to evaluate the debugging
+versions of these primitives.  In short, don't acquire sleeping locks from
+other contexts unless there is no other option.
+
+Sleeping lock types:
+
+ - mutex
+ - rt_mutex
+ - semaphore
+ - rw_semaphore
+ - ww_mutex
+ - percpu_rw_semaphore
+
+On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks
+--------------
+
+ - raw_spinlock_t
+ - bit spinlocks
+
+On non-PREEMPT_RT kernels, these lock types are also spinning locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks implicitly disable preemption and the lock / unlock functions
+can have suffixes which apply further protections:
+
+ ==========  ==========================
+ _bh()                Disable / enable bottom halves (soft interrupts)
+ _irq()               Disable / enable interrupts
+ _irqsave/restore()   Save and disable / restore interrupt disabled state
+ ==========  ==========================
+
+
+rtmutex
+===+
+RT-mutexes are mutexes with support for priority inheritance (PI).
+
+PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+interrupt disabled sections.
+
+PI clearly cannot preempt preemption-disabled or interrupt-disabled
+regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
+execute most such regions of code in preemptible task context, especially
+interrupt handlers and soft interrupts.  This conversion allows spinlock_t
+and rwlock_t to be implemented via RT-mutexes.
+
+
+raw_spinlock_t and spinlock_t
+==============+
+raw_spinlock_t
+--------------
+
+raw_spinlock_t is a strict spinning lock implementation regardless of the
+kernel configuration including PREEMPT_RT enabled kernels.
+
+raw_spinlock_t is a strict spinning lock implementation in all kernels,
+including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
+core code, low level interrupt handling and places where disabling
+preemption or interrupts is required, for example, to safely access
+hardware state.  raw_spinlock_t can sometimes also be used when the
+critical section is tiny, thus avoiding RT-mutex overhead.
+
+spinlock_t
+----------
+
+The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+
+On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+and has exactly the same semantics.
+
+spinlock_t and PREEMPT_RT
+-------------------------
+
+On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+implementation based on rt_mutex which changes the semantics:
+
+ - Preemption is not disabled
+
+ - The hard interrupt related suffixes for spin_lock / spin_unlock
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
+   interrupt disabled state
+
+ - The soft interrupt related suffix (_bh()) still disables softirq
+   handlers.
+
+   Non-PREEMPT_RT kernels disable preemption to get this effect.
+
+   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
+   preemption disabled. The lock disables softirq handlers and also
+   prevents reentrancy due to task preemption.
+
+PREEMPT_RT kernels preserve all other spinlock_t semantics:
+
+ - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
+   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
+   disable migration, which ensures that pointers to per-CPU variables
+   remain valid even if the task is preempted.
+
+ - Task state is preserved across spinlock acquisition, ensuring that the
+   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
+   kernels leave task state untouched.  However, PREEMPT_RT must change
+   task state if the task blocks during acquisition.  Therefore, it saves
+   the current task state before blocking and the corresponding lock wakeup
+   restores it.
+
+   Other types of wakeups would normally unconditionally set the task state
+   to RUNNING, but that does not work here because the task must remain
+   blocked until the lock becomes available.  Therefore, when a non-lock
+   wakeup attempts to awaken a task blocked waiting for a spinlock, it
+   instead sets the saved state to RUNNING.  Then, when the lock
+   acquisition completes, the lock wakeup sets the task state to the saved
+   state, in this case setting it to RUNNING.
+
+rwlock_t
+====
+
+rwlock_t is a multiple readers and single writer lock mechanism.
+
+Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
+suffix rules of spinlock_t apply accordingly. The implementation is fair,
+thus preventing writer starvation.
+
+rwlock_t and PREEMPT_RT
+-----------------------
+
+PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
+implementation, thus changing semantics:
+
+ - All the spinlock_t changes also apply to rwlock_t.
+
+ - Because an rwlock_t writer cannot grant its priority to multiple
+   readers, a preempted low-priority reader will continue holding its lock,
+   thus starving even high-priority writers.  In contrast, because readers
+   can grant their priority to a writer, a preempted low-priority writer
+   will have its priority boosted until it releases the lock, thus
+   preventing that writer from starving readers.
+
+
+PREEMPT_RT caveats
+=========
+
+spinlock_t and rwlock_t
+-----------------------
+
+These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
+have a few implications.  For example, on a non-PREEMPT_RT kernel the
+following code sequence works as expected::
+
+   local_irq_disable();
+   spin_lock(&lock);
+
+and is fully equivalent to::
+
+   spin_lock_irq(&lock);
+
+Same applies to rwlock_t and the _irqsave() suffix variants.
+
+On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
+fully preemptible context.  Instead, use spin_lock_irq() or
+spin_lock_irqsave() and their unlock counterparts.  In cases where the
+interrupt disabling and locking must remain separate, PREEMPT_RT offers a
+local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
+allowing things like per-CPU irq-disabled locks to be acquired.  However,
+this approach should be used only where absolutely necessary.
+
+
+raw_spinlock_t
+--------------
+
+Acquiring a raw_spinlock_t disables preemption and possibly also
+interrupts, so the critical section must avoid acquiring a regular
+spinlock_t or rwlock_t, for example, the critical section must avoid
+allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
+works perfectly::
+
+  raw_spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+But this code fails on PREEMPT_RT kernels because the memory allocator is
+fully preemptible and therefore cannot be invoked from truly atomic
+contexts.  However, it is perfectly fine to invoke the memory allocator
+while holding normal non-raw spinlocks because they do not disable
+preemption on PREEMPT_RT kernels::
+
+  spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+
+bit spinlocks
+-------------
+
+Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
+substituted by an RT-mutex based implementation for obvious reasons.
+
+The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
+caveats vs. raw_spinlock_t apply.
+
+Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
+this requires conditional (#ifdef'ed) code changes at the usage site while
+the spinlock_t substitution is simply done by the compiler and the
+conditionals are restricted to header files and core implementation of the
+locking primitives and the usage sites do not require any changes.
+
+
+Lock type nesting rules
+===========+
+The most basic rules are:
+
+  - Lock types of the same lock category (sleeping, spinning) can nest
+    arbitrarily as long as they respect the general lock ordering rules to
+    prevent deadlocks.
+
+  - Sleeping lock types cannot nest inside spinning lock types.
+
+  - Spinning lock types can nest inside sleeping lock types.
+
+These rules apply in general independent of CONFIG_PREEMPT_RT.
+
+As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
+spinning to sleeping this has obviously restrictions how they can nest with
+raw_spinlock_t.
+
+This results in the following nest ordering:
+
+  1) Sleeping locks
+  2) spinlock_t and rwlock_t
+  3) raw_spinlock_t and bit spinlocks
+
+Lockdep is aware of these constraints to ensure that they are respected.
+
+
+Owner semantics
+=======+
+Most lock types in the Linux kernel have strict owner semantics, i.e. the
+context (task) which acquires a lock has to release it.
+
+There are two exceptions:
+
+  - semaphores
+  - rwsems
+
+semaphores have no owner semantics for historical reason, and as such
+trylock and release operations can be called from any context. They are
+often used for both serialization and waiting purposes. That's generally
+discouraged and should be replaced by separate serialization and wait
+mechanisms, such as mutexes and completions.
+
+rwsems have grown interfaces which allow non owner release for special
+purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
+substitutes all locking primitives except semaphores with RT-mutex based
+implementations to provide priority inheritance for all lock types except
+the truly spinning ones. Priority inheritance on ownerless locks is
+obviously impossible.
+
+For now the rwsem non-owner release excludes code which utilizes it from
+being used on PREEMPT_RT enabled kernels. In same cases this can be
+mitigated by disabling portions of the code, in other cases the complete
+functionality has to be disabled until a workable solution has been found.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 14/20] timekeeping: Split jiffies seqlock
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/jiffies.c     |    7 ++++---
 kernel/time/tick-common.c |   10 ++++++----
 kernel/time/tick-sched.c  |   19 ++++++++++++-------
 kernel/time/timekeeping.c |    6 ++++--
 kernel/time/timekeeping.h |    3 ++-
 5 files changed, 28 insertions(+), 17 deletions(-)

--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -58,7 +58,8 @@ static struct clocksource clocksource_ji
 	.max_cycles	= 10,
 };
 
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
+__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
+__cacheline_aligned_in_smp seqcount_t jiffies_seq;
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
@@ -67,9 +68,9 @@ u64 get_jiffies_64(void)
 	u64 ret;
 
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		ret = jiffies_64;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	return ret;
 }
 EXPORT_SYMBOL(get_jiffies_64);
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -84,13 +84,15 @@ int tick_is_oneshot_available(void)
 static void tick_periodic(int cpu)
 {
 	if (tick_do_timer_cpu == cpu) {
-		write_seqlock(&jiffies_lock);
+		raw_spin_lock(&jiffies_lock);
+		write_seqcount_begin(&jiffies_seq);
 
 		/* Keep track of the next tick event */
 		tick_next_period = ktime_add(tick_next_period, tick_period);
 
 		do_timer(1);
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		update_wall_time();
 	}
 
@@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_ev
 		ktime_t next;
 
 		do {
-			seq = read_seqbegin(&jiffies_lock);
+			seq = read_seqcount_begin(&jiffies_seq);
 			next = tick_next_period;
-		} while (read_seqretry(&jiffies_lock, seq));
+		} while (read_seqcount_retry(&jiffies_seq, seq));
 
 		clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(kti
 		return;
 
 	/* Reevaluate with jiffies_lock held */
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 
 	delta = ktime_sub(now, last_jiffies_update);
 	if (delta >= tick_period) {
@@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(kti
 		/* Keep the tick_next_period variable up to date */
 		tick_next_period = ktime_add(last_jiffies_update, tick_period);
 	} else {
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		return;
 	}
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
 
@@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(vo
 {
 	ktime_t period;
 
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	/* Did we start the jiffies update yet ? */
 	if (last_jiffies_update == 0)
 		last_jiffies_update = tick_next_period;
 	period = last_jiffies_update;
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	return period;
 }
 
@@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(stru
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		basemono = last_jiffies_update;
 		basejiff = jiffies;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	ts->last_jiffies = basejiff;
 	ts->timer_expires_base = basemono;
 
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps);
  */
 void xtime_update(unsigned long ticks)
 {
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	do_timer(ticks);
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
--- a/kernel/time/timekeeping.h
+++ b/kernel/time/timekeeping.h
@@ -25,7 +25,8 @@ static inline void sched_clock_resume(vo
 extern void do_timer(unsigned long ticks);
 extern void update_wall_time(void);
 
-extern seqlock_t jiffies_lock;
+extern raw_spinlock_t jiffies_lock;
+extern seqcount_t jiffies_seq;
 
 #define CS_NAME_LEN	32
 



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 14/20] timekeeping: Split jiffies seqlock
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Thomas Gleixner <tglx@linutronix.de>

seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/jiffies.c     |    7 ++++---
 kernel/time/tick-common.c |   10 ++++++----
 kernel/time/tick-sched.c  |   19 ++++++++++++-------
 kernel/time/timekeeping.c |    6 ++++--
 kernel/time/timekeeping.h |    3 ++-
 5 files changed, 28 insertions(+), 17 deletions(-)

--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -58,7 +58,8 @@ static struct clocksource clocksource_ji
 	.max_cycles	= 10,
 };
 
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
+__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
+__cacheline_aligned_in_smp seqcount_t jiffies_seq;
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
@@ -67,9 +68,9 @@ u64 get_jiffies_64(void)
 	u64 ret;
 
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		ret = jiffies_64;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	return ret;
 }
 EXPORT_SYMBOL(get_jiffies_64);
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -84,13 +84,15 @@ int tick_is_oneshot_available(void)
 static void tick_periodic(int cpu)
 {
 	if (tick_do_timer_cpu == cpu) {
-		write_seqlock(&jiffies_lock);
+		raw_spin_lock(&jiffies_lock);
+		write_seqcount_begin(&jiffies_seq);
 
 		/* Keep track of the next tick event */
 		tick_next_period = ktime_add(tick_next_period, tick_period);
 
 		do_timer(1);
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		update_wall_time();
 	}
 
@@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_ev
 		ktime_t next;
 
 		do {
-			seq = read_seqbegin(&jiffies_lock);
+			seq = read_seqcount_begin(&jiffies_seq);
 			next = tick_next_period;
-		} while (read_seqretry(&jiffies_lock, seq));
+		} while (read_seqcount_retry(&jiffies_seq, seq));
 
 		clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(kti
 		return;
 
 	/* Reevaluate with jiffies_lock held */
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 
 	delta = ktime_sub(now, last_jiffies_update);
 	if (delta >= tick_period) {
@@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(kti
 		/* Keep the tick_next_period variable up to date */
 		tick_next_period = ktime_add(last_jiffies_update, tick_period);
 	} else {
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		return;
 	}
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
 
@@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(vo
 {
 	ktime_t period;
 
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	/* Did we start the jiffies update yet ? */
 	if (last_jiffies_update == 0)
 		last_jiffies_update = tick_next_period;
 	period = last_jiffies_update;
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	return period;
 }
 
@@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(stru
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		basemono = last_jiffies_update;
 		basejiff = jiffies;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	ts->last_jiffies = basejiff;
 	ts->timer_expires_base = basemono;
 
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps);
  */
 void xtime_update(unsigned long ticks)
 {
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	do_timer(ticks);
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
--- a/kernel/time/timekeeping.h
+++ b/kernel/time/timekeeping.h
@@ -25,7 +25,8 @@ static inline void sched_clock_resume(vo
 extern void do_timer(unsigned long ticks);
 extern void update_wall_time(void);
 
-extern seqlock_t jiffies_lock;
+extern raw_spinlock_t jiffies_lock;
+extern seqcount_t jiffies_seq;
 
 #define CS_NAME_LEN	32
 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 14/20] timekeeping: Split jiffies seqlock
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Thomas Gleixner <tglx@linutronix.de>

seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/jiffies.c     |    7 ++++---
 kernel/time/tick-common.c |   10 ++++++----
 kernel/time/tick-sched.c  |   19 ++++++++++++-------
 kernel/time/timekeeping.c |    6 ++++--
 kernel/time/timekeeping.h |    3 ++-
 5 files changed, 28 insertions(+), 17 deletions(-)

--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -58,7 +58,8 @@ static struct clocksource clocksource_ji
 	.max_cycles	= 10,
 };
 
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
+__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
+__cacheline_aligned_in_smp seqcount_t jiffies_seq;
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
@@ -67,9 +68,9 @@ u64 get_jiffies_64(void)
 	u64 ret;
 
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		ret = jiffies_64;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	return ret;
 }
 EXPORT_SYMBOL(get_jiffies_64);
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -84,13 +84,15 @@ int tick_is_oneshot_available(void)
 static void tick_periodic(int cpu)
 {
 	if (tick_do_timer_cpu == cpu) {
-		write_seqlock(&jiffies_lock);
+		raw_spin_lock(&jiffies_lock);
+		write_seqcount_begin(&jiffies_seq);
 
 		/* Keep track of the next tick event */
 		tick_next_period = ktime_add(tick_next_period, tick_period);
 
 		do_timer(1);
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		update_wall_time();
 	}
 
@@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_ev
 		ktime_t next;
 
 		do {
-			seq = read_seqbegin(&jiffies_lock);
+			seq = read_seqcount_begin(&jiffies_seq);
 			next = tick_next_period;
-		} while (read_seqretry(&jiffies_lock, seq));
+		} while (read_seqcount_retry(&jiffies_seq, seq));
 
 		clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(kti
 		return;
 
 	/* Reevaluate with jiffies_lock held */
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 
 	delta = ktime_sub(now, last_jiffies_update);
 	if (delta >= tick_period) {
@@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(kti
 		/* Keep the tick_next_period variable up to date */
 		tick_next_period = ktime_add(last_jiffies_update, tick_period);
 	} else {
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		return;
 	}
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
 
@@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(vo
 {
 	ktime_t period;
 
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	/* Did we start the jiffies update yet ? */
 	if (last_jiffies_update == 0)
 		last_jiffies_update = tick_next_period;
 	period = last_jiffies_update;
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	return period;
 }
 
@@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(stru
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		basemono = last_jiffies_update;
 		basejiff = jiffies;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	ts->last_jiffies = basejiff;
 	ts->timer_expires_base = basemono;
 
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps);
  */
 void xtime_update(unsigned long ticks)
 {
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	do_timer(ticks);
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
--- a/kernel/time/timekeeping.h
+++ b/kernel/time/timekeeping.h
@@ -25,7 +25,8 @@ static inline void sched_clock_resume(vo
 extern void do_timer(unsigned long ticks);
 extern void update_wall_time(void);
 
-extern seqlock_t jiffies_lock;
+extern raw_spinlock_t jiffies_lock;
+extern seqcount_t jiffies_seq;
 
 #define CS_NAME_LEN	32
 



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 14/20] timekeeping: Split jiffies seqlock
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/jiffies.c     |    7 ++++---
 kernel/time/tick-common.c |   10 ++++++----
 kernel/time/tick-sched.c  |   19 ++++++++++++-------
 kernel/time/timekeeping.c |    6 ++++--
 kernel/time/timekeeping.h |    3 ++-
 5 files changed, 28 insertions(+), 17 deletions(-)

--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -58,7 +58,8 @@ static struct clocksource clocksource_ji
 	.max_cycles	= 10,
 };
 
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
+__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
+__cacheline_aligned_in_smp seqcount_t jiffies_seq;
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
@@ -67,9 +68,9 @@ u64 get_jiffies_64(void)
 	u64 ret;
 
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		ret = jiffies_64;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	return ret;
 }
 EXPORT_SYMBOL(get_jiffies_64);
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -84,13 +84,15 @@ int tick_is_oneshot_available(void)
 static void tick_periodic(int cpu)
 {
 	if (tick_do_timer_cpu = cpu) {
-		write_seqlock(&jiffies_lock);
+		raw_spin_lock(&jiffies_lock);
+		write_seqcount_begin(&jiffies_seq);
 
 		/* Keep track of the next tick event */
 		tick_next_period = ktime_add(tick_next_period, tick_period);
 
 		do_timer(1);
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		update_wall_time();
 	}
 
@@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_ev
 		ktime_t next;
 
 		do {
-			seq = read_seqbegin(&jiffies_lock);
+			seq = read_seqcount_begin(&jiffies_seq);
 			next = tick_next_period;
-		} while (read_seqretry(&jiffies_lock, seq));
+		} while (read_seqcount_retry(&jiffies_seq, seq));
 
 		clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(kti
 		return;
 
 	/* Reevaluate with jiffies_lock held */
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 
 	delta = ktime_sub(now, last_jiffies_update);
 	if (delta >= tick_period) {
@@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(kti
 		/* Keep the tick_next_period variable up to date */
 		tick_next_period = ktime_add(last_jiffies_update, tick_period);
 	} else {
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		return;
 	}
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
 
@@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(vo
 {
 	ktime_t period;
 
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	/* Did we start the jiffies update yet ? */
 	if (last_jiffies_update = 0)
 		last_jiffies_update = tick_next_period;
 	period = last_jiffies_update;
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	return period;
 }
 
@@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(stru
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		basemono = last_jiffies_update;
 		basejiff = jiffies;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	ts->last_jiffies = basejiff;
 	ts->timer_expires_base = basemono;
 
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps);
  */
 void xtime_update(unsigned long ticks)
 {
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	do_timer(ticks);
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
--- a/kernel/time/timekeeping.h
+++ b/kernel/time/timekeeping.h
@@ -25,7 +25,8 @@ static inline void sched_clock_resume(vo
 extern void do_timer(unsigned long ticks);
 extern void update_wall_time(void);
 
-extern seqlock_t jiffies_lock;
+extern raw_spinlock_t jiffies_lock;
+extern seqcount_t jiffies_seq;
 
 #define CS_NAME_LEN	32
 


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 15/20] sched/swait: Prepare usage in completions
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:25   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

As a preparation to use simple wait queues for completions:

  - Provide swake_up_all_locked() to support complete_all()
  - Make __prepare_to_swait() public available

This is done to enable the usage of complete() within truly atomic contexts
on a PREEMPT_RT enabled kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
V2: Add comment to swake_up_all_locked()
---
 kernel/sched/sched.h |    3 +++
 kernel/sched/swait.c |   15 ++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(st
 	return true;
 }
 #endif
+
+void swake_up_all_locked(struct swait_queue_head *q);
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_
 }
 EXPORT_SYMBOL(swake_up_locked);
 
+/*
+ * Wake up all waiters. This is an interface which is solely exposed for
+ * completions and not for general usage.
+ *
+ * It is intentionally different from swake_up_all() to allow usage from
+ * hard interrupt context and interrupt disabled regions.
+ */
+void swake_up_all_locked(struct swait_queue_head *q)
+{
+	while (!list_empty(&q->task_list))
+		swake_up_locked(q);
+}
+
 void swake_up_one(struct swait_queue_head *q)
 {
 	unsigned long flags;
@@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_hea
 }
 EXPORT_SYMBOL(swake_up_all);
 
-static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	wait->task = current;
 	if (list_empty(&wait->task_list))



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 15/20] sched/swait: Prepare usage in completions
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Thomas Gleixner <tglx@linutronix.de>

As a preparation to use simple wait queues for completions:

  - Provide swake_up_all_locked() to support complete_all()
  - Make __prepare_to_swait() public available

This is done to enable the usage of complete() within truly atomic contexts
on a PREEMPT_RT enabled kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
V2: Add comment to swake_up_all_locked()
---
 kernel/sched/sched.h |    3 +++
 kernel/sched/swait.c |   15 ++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(st
 	return true;
 }
 #endif
+
+void swake_up_all_locked(struct swait_queue_head *q);
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_
 }
 EXPORT_SYMBOL(swake_up_locked);
 
+/*
+ * Wake up all waiters. This is an interface which is solely exposed for
+ * completions and not for general usage.
+ *
+ * It is intentionally different from swake_up_all() to allow usage from
+ * hard interrupt context and interrupt disabled regions.
+ */
+void swake_up_all_locked(struct swait_queue_head *q)
+{
+	while (!list_empty(&q->task_list))
+		swake_up_locked(q);
+}
+
 void swake_up_one(struct swait_queue_head *q)
 {
 	unsigned long flags;
@@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_hea
 }
 EXPORT_SYMBOL(swake_up_all);
 
-static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	wait->task = current;
 	if (list_empty(&wait->task_list))

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 15/20] sched/swait: Prepare usage in completions
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Thomas Gleixner <tglx@linutronix.de>

As a preparation to use simple wait queues for completions:

  - Provide swake_up_all_locked() to support complete_all()
  - Make __prepare_to_swait() public available

This is done to enable the usage of complete() within truly atomic contexts
on a PREEMPT_RT enabled kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
V2: Add comment to swake_up_all_locked()
---
 kernel/sched/sched.h |    3 +++
 kernel/sched/swait.c |   15 ++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(st
 	return true;
 }
 #endif
+
+void swake_up_all_locked(struct swait_queue_head *q);
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_
 }
 EXPORT_SYMBOL(swake_up_locked);
 
+/*
+ * Wake up all waiters. This is an interface which is solely exposed for
+ * completions and not for general usage.
+ *
+ * It is intentionally different from swake_up_all() to allow usage from
+ * hard interrupt context and interrupt disabled regions.
+ */
+void swake_up_all_locked(struct swait_queue_head *q)
+{
+	while (!list_empty(&q->task_list))
+		swake_up_locked(q);
+}
+
 void swake_up_one(struct swait_queue_head *q)
 {
 	unsigned long flags;
@@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_hea
 }
 EXPORT_SYMBOL(swake_up_all);
 
-static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	wait->task = current;
 	if (list_empty(&wait->task_list))



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 15/20] sched/swait: Prepare usage in completions
@ 2020-03-21 11:25   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:25 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Thomas Gleixner <tglx@linutronix.de>

As a preparation to use simple wait queues for completions:

  - Provide swake_up_all_locked() to support complete_all()
  - Make __prepare_to_swait() public available

This is done to enable the usage of complete() within truly atomic contexts
on a PREEMPT_RT enabled kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
V2: Add comment to swake_up_all_locked()
---
 kernel/sched/sched.h |    3 +++
 kernel/sched/swait.c |   15 ++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(st
 	return true;
 }
 #endif
+
+void swake_up_all_locked(struct swait_queue_head *q);
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_
 }
 EXPORT_SYMBOL(swake_up_locked);
 
+/*
+ * Wake up all waiters. This is an interface which is solely exposed for
+ * completions and not for general usage.
+ *
+ * It is intentionally different from swake_up_all() to allow usage from
+ * hard interrupt context and interrupt disabled regions.
+ */
+void swake_up_all_locked(struct swait_queue_head *q)
+{
+	while (!list_empty(&q->task_list))
+		swake_up_locked(q);
+}
+
 void swake_up_one(struct swait_queue_head *q)
 {
 	unsigned long flags;
@@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_hea
 }
 EXPORT_SYMBOL(swake_up_all);
 
-static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	wait->task = current;
 	if (list_empty(&wait->task_list))


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 16/20] completion: Use simple wait queues
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:26   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Davidlohr Bueso,
	Greg Kroah-Hartman, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap

From: Thomas Gleixner <tglx@linutronix.de>

completion uses a wait_queue_head_t to enqueue waiters.

wait_queue_head_t contains a spinlock_t to protect the list of waiters
which excludes it from being used in truly atomic context on a PREEMPT_RT
enabled kernel.

The spinlock in the wait queue head cannot be replaced by a raw_spinlock
because:

  - wait queues can have custom wakeup callbacks, which acquire other
    spinlock_t locks and have potentially long execution times

  - wake_up() walks an unbounded number of list entries during the wake up
    and may wake an unbounded number of waiters.

For simplicity and performance reasons complete() should be usable on
PREEMPT_RT enabled kernels.

completions do not use custom wakeup callbacks and are usually single
waiter, except for a few corner cases.

Replace the wait queue in the completion with a simple wait queue (swait),
which uses a raw_spinlock_t for protecting the waiter list and therefore is
safe to use inside truly atomic regions on PREEMPT_RT.

There is no semantical or functional change:

  - completions use the exclusive wait mode which is what swait provides

  - complete() wakes one exclusive waiter

  - complete_all() wakes all waiters while holding the lock which protects
    the wait queue against newly incoming waiters. The conversion to swait
    preserves this behaviour.

complete_all() might cause unbound latencies with a large number of waiters
being woken at once, but most complete_all() usage sites are either in
testing or initialization code or have only a really small number of
concurrent waiters which for now does not cause a latency problem. Keep it
simple for now.

The fixup of the warning check in the USB gadget driver is just a straight
forward conversion of the lockless waiter check from one waitqueue type to
the other.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
---
V2: Split out the orinoco and usb gadget parts and amended change log
---
 drivers/usb/gadget/function/f_fs.c |    2 +-
 include/linux/completion.h         |    8 ++++----
 kernel/sched/completion.c          |   36 +++++++++++++++++++-----------------
 3 files changed, 24 insertions(+), 22 deletions(-)

--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
 		pr_info("%s(): freeing\n", __func__);
 		ffs_data_clear(ffs);
 		BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
-		       waitqueue_active(&ffs->ep0req_completion.wait) ||
+		       swait_active(&ffs->ep0req_completion.wait) ||
 		       waitqueue_active(&ffs->wait));
 		destroy_workqueue(ffs->io_completion_wq);
 		kfree(ffs->dev_name);
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,7 +9,7 @@
  * See kernel/sched/completion.c for details.
  */
 
-#include <linux/wait.h>
+#include <linux/swait.h>
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,7 +25,7 @@
  */
 struct completion {
 	unsigned int done;
-	wait_queue_head_t wait;
+	struct swait_queue_head wait;
 };
 
 #define init_completion_map(x, m) __init_completion(x)
@@ -34,7 +34,7 @@ static inline void complete_acquire(stru
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -85,7 +85,7 @@ static inline void complete_release(stru
 static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
-	init_waitqueue_head(&x->wait);
+	init_swait_queue_head(&x->wait);
 }
 
 /**
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,12 +29,12 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
 		x->done++;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete);
 
@@ -58,10 +58,12 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	WARN_ON(irqs_disabled());
+
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 0);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_all_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete_all);
 
@@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
 		   long (*action)(long), long timeout, int state)
 {
 	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
+		DECLARE_SWAITQUEUE(wait);
 
-		__add_wait_queue_entry_tail_exclusive(&x->wait, &wait);
 		do {
 			if (signal_pending_state(state, current)) {
 				timeout = -ERESTARTSYS;
 				break;
 			}
+			__prepare_to_swait(&x->wait, &wait);
 			__set_current_state(state);
-			spin_unlock_irq(&x->wait.lock);
+			raw_spin_unlock_irq(&x->wait.lock);
 			timeout = action(timeout);
-			spin_lock_irq(&x->wait.lock);
+			raw_spin_lock_irq(&x->wait.lock);
 		} while (!x->done && timeout);
-		__remove_wait_queue(&x->wait, &wait);
+		__finish_swait(&x->wait, &wait);
 		if (!x->done)
 			return timeout;
 	}
@@ -100,9 +102,9 @@ static inline long __sched
 
 	complete_acquire(x);
 
-	spin_lock_irq(&x->wait.lock);
+	raw_spin_lock_irq(&x->wait.lock);
 	timeout = do_wait_for_common(x, action, timeout, state);
-	spin_unlock_irq(&x->wait.lock);
+	raw_spin_unlock_irq(&x->wait.lock);
 
 	complete_release(x);
 
@@ -291,12 +293,12 @@ bool try_wait_for_completion(struct comp
 	if (!READ_ONCE(x->done))
 		return false;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	if (!x->done)
 		ret = false;
 	else if (x->done != UINT_MAX)
 		x->done--;
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(try_wait_for_completion);
@@ -322,8 +324,8 @@ bool completion_done(struct completion *
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
 	 */
-	spin_lock_irqsave(&x->wait.lock, flags);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 16/20] completion: Use simple wait queues
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Davidlohr Bueso,
	Greg Kroah-Hartman, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko

From: Thomas Gleixner <tglx@linutronix.de>

completion uses a wait_queue_head_t to enqueue waiters.

wait_queue_head_t contains a spinlock_t to protect the list of waiters
which excludes it from being used in truly atomic context on a PREEMPT_RT
enabled kernel.

The spinlock in the wait queue head cannot be replaced by a raw_spinlock
because:

  - wait queues can have custom wakeup callbacks, which acquire other
    spinlock_t locks and have potentially long execution times

  - wake_up() walks an unbounded number of list entries during the wake up
    and may wake an unbounded number of waiters.

For simplicity and performance reasons complete() should be usable on
PREEMPT_RT enabled kernels.

completions do not use custom wakeup callbacks and are usually single
waiter, except for a few corner cases.

Replace the wait queue in the completion with a simple wait queue (swait),
which uses a raw_spinlock_t for protecting the waiter list and therefore is
safe to use inside truly atomic regions on PREEMPT_RT.

There is no semantical or functional change:

  - completions use the exclusive wait mode which is what swait provides

  - complete() wakes one exclusive waiter

  - complete_all() wakes all waiters while holding the lock which protects
    the wait queue against newly incoming waiters. The conversion to swait
    preserves this behaviour.

complete_all() might cause unbound latencies with a large number of waiters
being woken at once, but most complete_all() usage sites are either in
testing or initialization code or have only a really small number of
concurrent waiters which for now does not cause a latency problem. Keep it
simple for now.

The fixup of the warning check in the USB gadget driver is just a straight
forward conversion of the lockless waiter check from one waitqueue type to
the other.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
---
V2: Split out the orinoco and usb gadget parts and amended change log
---
 drivers/usb/gadget/function/f_fs.c |    2 +-
 include/linux/completion.h         |    8 ++++----
 kernel/sched/completion.c          |   36 +++++++++++++++++++-----------------
 3 files changed, 24 insertions(+), 22 deletions(-)

--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
 		pr_info("%s(): freeing\n", __func__);
 		ffs_data_clear(ffs);
 		BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
-		       waitqueue_active(&ffs->ep0req_completion.wait) ||
+		       swait_active(&ffs->ep0req_completion.wait) ||
 		       waitqueue_active(&ffs->wait));
 		destroy_workqueue(ffs->io_completion_wq);
 		kfree(ffs->dev_name);
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,7 +9,7 @@
  * See kernel/sched/completion.c for details.
  */
 
-#include <linux/wait.h>
+#include <linux/swait.h>
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,7 +25,7 @@
  */
 struct completion {
 	unsigned int done;
-	wait_queue_head_t wait;
+	struct swait_queue_head wait;
 };
 
 #define init_completion_map(x, m) __init_completion(x)
@@ -34,7 +34,7 @@ static inline void complete_acquire(stru
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -85,7 +85,7 @@ static inline void complete_release(stru
 static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
-	init_waitqueue_head(&x->wait);
+	init_swait_queue_head(&x->wait);
 }
 
 /**
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,12 +29,12 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
 		x->done++;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete);
 
@@ -58,10 +58,12 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	WARN_ON(irqs_disabled());
+
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 0);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_all_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete_all);
 
@@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
 		   long (*action)(long), long timeout, int state)
 {
 	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
+		DECLARE_SWAITQUEUE(wait);
 
-		__add_wait_queue_entry_tail_exclusive(&x->wait, &wait);
 		do {
 			if (signal_pending_state(state, current)) {
 				timeout = -ERESTARTSYS;
 				break;
 			}
+			__prepare_to_swait(&x->wait, &wait);
 			__set_current_state(state);
-			spin_unlock_irq(&x->wait.lock);
+			raw_spin_unlock_irq(&x->wait.lock);
 			timeout = action(timeout);
-			spin_lock_irq(&x->wait.lock);
+			raw_spin_lock_irq(&x->wait.lock);
 		} while (!x->done && timeout);
-		__remove_wait_queue(&x->wait, &wait);
+		__finish_swait(&x->wait, &wait);
 		if (!x->done)
 			return timeout;
 	}
@@ -100,9 +102,9 @@ static inline long __sched
 
 	complete_acquire(x);
 
-	spin_lock_irq(&x->wait.lock);
+	raw_spin_lock_irq(&x->wait.lock);
 	timeout = do_wait_for_common(x, action, timeout, state);
-	spin_unlock_irq(&x->wait.lock);
+	raw_spin_unlock_irq(&x->wait.lock);
 
 	complete_release(x);
 
@@ -291,12 +293,12 @@ bool try_wait_for_completion(struct comp
 	if (!READ_ONCE(x->done))
 		return false;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	if (!x->done)
 		ret = false;
 	else if (x->done != UINT_MAX)
 		x->done--;
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(try_wait_for_completion);
@@ -322,8 +324,8 @@ bool completion_done(struct completion *
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
 	 */
-	spin_lock_irqsave(&x->wait.lock, flags);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 16/20] completion: Use simple wait queues
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	linux-acpi, Brian Cain, Davidlohr Bueso, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, kbuild test robot, Felipe Balbi,
	Michal Simek, Tony Luck, Nick Hu, Geoff Levand,
	Greg Kroah-Hartman, linux-usb, linux-wireless, Oleg Nesterov,
	netdev, Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Thomas Gleixner <tglx@linutronix.de>

completion uses a wait_queue_head_t to enqueue waiters.

wait_queue_head_t contains a spinlock_t to protect the list of waiters
which excludes it from being used in truly atomic context on a PREEMPT_RT
enabled kernel.

The spinlock in the wait queue head cannot be replaced by a raw_spinlock
because:

  - wait queues can have custom wakeup callbacks, which acquire other
    spinlock_t locks and have potentially long execution times

  - wake_up() walks an unbounded number of list entries during the wake up
    and may wake an unbounded number of waiters.

For simplicity and performance reasons complete() should be usable on
PREEMPT_RT enabled kernels.

completions do not use custom wakeup callbacks and are usually single
waiter, except for a few corner cases.

Replace the wait queue in the completion with a simple wait queue (swait),
which uses a raw_spinlock_t for protecting the waiter list and therefore is
safe to use inside truly atomic regions on PREEMPT_RT.

There is no semantical or functional change:

  - completions use the exclusive wait mode which is what swait provides

  - complete() wakes one exclusive waiter

  - complete_all() wakes all waiters while holding the lock which protects
    the wait queue against newly incoming waiters. The conversion to swait
    preserves this behaviour.

complete_all() might cause unbound latencies with a large number of waiters
being woken at once, but most complete_all() usage sites are either in
testing or initialization code or have only a really small number of
concurrent waiters which for now does not cause a latency problem. Keep it
simple for now.

The fixup of the warning check in the USB gadget driver is just a straight
forward conversion of the lockless waiter check from one waitqueue type to
the other.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
---
V2: Split out the orinoco and usb gadget parts and amended change log
---
 drivers/usb/gadget/function/f_fs.c |    2 +-
 include/linux/completion.h         |    8 ++++----
 kernel/sched/completion.c          |   36 +++++++++++++++++++-----------------
 3 files changed, 24 insertions(+), 22 deletions(-)

--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
 		pr_info("%s(): freeing\n", __func__);
 		ffs_data_clear(ffs);
 		BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
-		       waitqueue_active(&ffs->ep0req_completion.wait) ||
+		       swait_active(&ffs->ep0req_completion.wait) ||
 		       waitqueue_active(&ffs->wait));
 		destroy_workqueue(ffs->io_completion_wq);
 		kfree(ffs->dev_name);
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,7 +9,7 @@
  * See kernel/sched/completion.c for details.
  */
 
-#include <linux/wait.h>
+#include <linux/swait.h>
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,7 +25,7 @@
  */
 struct completion {
 	unsigned int done;
-	wait_queue_head_t wait;
+	struct swait_queue_head wait;
 };
 
 #define init_completion_map(x, m) __init_completion(x)
@@ -34,7 +34,7 @@ static inline void complete_acquire(stru
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -85,7 +85,7 @@ static inline void complete_release(stru
 static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
-	init_waitqueue_head(&x->wait);
+	init_swait_queue_head(&x->wait);
 }
 
 /**
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,12 +29,12 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
 		x->done++;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete);
 
@@ -58,10 +58,12 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	WARN_ON(irqs_disabled());
+
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 0);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_all_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete_all);
 
@@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
 		   long (*action)(long), long timeout, int state)
 {
 	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
+		DECLARE_SWAITQUEUE(wait);
 
-		__add_wait_queue_entry_tail_exclusive(&x->wait, &wait);
 		do {
 			if (signal_pending_state(state, current)) {
 				timeout = -ERESTARTSYS;
 				break;
 			}
+			__prepare_to_swait(&x->wait, &wait);
 			__set_current_state(state);
-			spin_unlock_irq(&x->wait.lock);
+			raw_spin_unlock_irq(&x->wait.lock);
 			timeout = action(timeout);
-			spin_lock_irq(&x->wait.lock);
+			raw_spin_lock_irq(&x->wait.lock);
 		} while (!x->done && timeout);
-		__remove_wait_queue(&x->wait, &wait);
+		__finish_swait(&x->wait, &wait);
 		if (!x->done)
 			return timeout;
 	}
@@ -100,9 +102,9 @@ static inline long __sched
 
 	complete_acquire(x);
 
-	spin_lock_irq(&x->wait.lock);
+	raw_spin_lock_irq(&x->wait.lock);
 	timeout = do_wait_for_common(x, action, timeout, state);
-	spin_unlock_irq(&x->wait.lock);
+	raw_spin_unlock_irq(&x->wait.lock);
 
 	complete_release(x);
 
@@ -291,12 +293,12 @@ bool try_wait_for_completion(struct comp
 	if (!READ_ONCE(x->done))
 		return false;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	if (!x->done)
 		ret = false;
 	else if (x->done != UINT_MAX)
 		x->done--;
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(try_wait_for_completion);
@@ -322,8 +324,8 @@ bool completion_done(struct completion *
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
 	 */
-	spin_lock_irqsave(&x->wait.lock, flags);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 16/20] completion: Use simple wait queues
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Davidlohr Bueso,
	Greg Kroah-Hartman, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap

From: Thomas Gleixner <tglx@linutronix.de>

completion uses a wait_queue_head_t to enqueue waiters.

wait_queue_head_t contains a spinlock_t to protect the list of waiters
which excludes it from being used in truly atomic context on a PREEMPT_RT
enabled kernel.

The spinlock in the wait queue head cannot be replaced by a raw_spinlock
because:

  - wait queues can have custom wakeup callbacks, which acquire other
    spinlock_t locks and have potentially long execution times

  - wake_up() walks an unbounded number of list entries during the wake up
    and may wake an unbounded number of waiters.

For simplicity and performance reasons complete() should be usable on
PREEMPT_RT enabled kernels.

completions do not use custom wakeup callbacks and are usually single
waiter, except for a few corner cases.

Replace the wait queue in the completion with a simple wait queue (swait),
which uses a raw_spinlock_t for protecting the waiter list and therefore is
safe to use inside truly atomic regions on PREEMPT_RT.

There is no semantical or functional change:

  - completions use the exclusive wait mode which is what swait provides

  - complete() wakes one exclusive waiter

  - complete_all() wakes all waiters while holding the lock which protects
    the wait queue against newly incoming waiters. The conversion to swait
    preserves this behaviour.

complete_all() might cause unbound latencies with a large number of waiters
being woken at once, but most complete_all() usage sites are either in
testing or initialization code or have only a really small number of
concurrent waiters which for now does not cause a latency problem. Keep it
simple for now.

The fixup of the warning check in the USB gadget driver is just a straight
forward conversion of the lockless waiter check from one waitqueue type to
the other.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
---
V2: Split out the orinoco and usb gadget parts and amended change log
---
 drivers/usb/gadget/function/f_fs.c |    2 +-
 include/linux/completion.h         |    8 ++++----
 kernel/sched/completion.c          |   36 +++++++++++++++++++-----------------
 3 files changed, 24 insertions(+), 22 deletions(-)

--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data
 		pr_info("%s(): freeing\n", __func__);
 		ffs_data_clear(ffs);
 		BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
-		       waitqueue_active(&ffs->ep0req_completion.wait) ||
+		       swait_active(&ffs->ep0req_completion.wait) ||
 		       waitqueue_active(&ffs->wait));
 		destroy_workqueue(ffs->io_completion_wq);
 		kfree(ffs->dev_name);
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,7 +9,7 @@
  * See kernel/sched/completion.c for details.
  */
 
-#include <linux/wait.h>
+#include <linux/swait.h>
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,7 +25,7 @@
  */
 struct completion {
 	unsigned int done;
-	wait_queue_head_t wait;
+	struct swait_queue_head wait;
 };
 
 #define init_completion_map(x, m) __init_completion(x)
@@ -34,7 +34,7 @@ static inline void complete_acquire(stru
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -85,7 +85,7 @@ static inline void complete_release(stru
 static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
-	init_waitqueue_head(&x->wait);
+	init_swait_queue_head(&x->wait);
 }
 
 /**
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,12 +29,12 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
 		x->done++;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete);
 
@@ -58,10 +58,12 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	WARN_ON(irqs_disabled());
+
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 0);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_all_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete_all);
 
@@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
 		   long (*action)(long), long timeout, int state)
 {
 	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
+		DECLARE_SWAITQUEUE(wait);
 
-		__add_wait_queue_entry_tail_exclusive(&x->wait, &wait);
 		do {
 			if (signal_pending_state(state, current)) {
 				timeout = -ERESTARTSYS;
 				break;
 			}
+			__prepare_to_swait(&x->wait, &wait);
 			__set_current_state(state);
-			spin_unlock_irq(&x->wait.lock);
+			raw_spin_unlock_irq(&x->wait.lock);
 			timeout = action(timeout);
-			spin_lock_irq(&x->wait.lock);
+			raw_spin_lock_irq(&x->wait.lock);
 		} while (!x->done && timeout);
-		__remove_wait_queue(&x->wait, &wait);
+		__finish_swait(&x->wait, &wait);
 		if (!x->done)
 			return timeout;
 	}
@@ -100,9 +102,9 @@ static inline long __sched
 
 	complete_acquire(x);
 
-	spin_lock_irq(&x->wait.lock);
+	raw_spin_lock_irq(&x->wait.lock);
 	timeout = do_wait_for_common(x, action, timeout, state);
-	spin_unlock_irq(&x->wait.lock);
+	raw_spin_unlock_irq(&x->wait.lock);
 
 	complete_release(x);
 
@@ -291,12 +293,12 @@ bool try_wait_for_completion(struct comp
 	if (!READ_ONCE(x->done))
 		return false;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	if (!x->done)
 		ret = false;
 	else if (x->done != UINT_MAX)
 		x->done--;
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(try_wait_for_completion);
@@ -322,8 +324,8 @@ bool completion_done(struct completion *
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
 	 */
-	spin_lock_irqsave(&x->wait.lock, flags);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 17/20] lockdep: Introduce wait-type checks
  2020-03-21 11:25 ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 11:26   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Peter Zijlstra <peterz@infradead.org>

Extend lockdep to validate lock wait-type context.

The current wait-types are:

	LD_WAIT_FREE,		/* wait free, rcu etc.. */
	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.

Below is an example output generated by the trivial test code:

  raw_spin_lock(&foo);
  spin_lock(&bar);
  spin_unlock(&bar);
  raw_spin_unlock(&foo);

 [ BUG: Invalid wait context ]
 -----------------------------
 swapper/0/1 is trying to lock:
 ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
 other info that might help us debug this:
 1 lock held by swapper/0/1:
  #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.

This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.

Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.

The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.

The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.

[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
	   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fix the LOCKDEP=y && LOCK_PROVE=n case
---
 include/linux/irqflags.h        |    8 ++
 include/linux/lockdep.h         |   71 +++++++++++++++++---
 include/linux/mutex.h           |    7 +-
 include/linux/rwlock_types.h    |    6 +
 include/linux/rwsem.h           |    6 +
 include/linux/sched.h           |    1 
 include/linux/spinlock.h        |   35 +++++++---
 include/linux/spinlock_types.h  |   24 +++++-
 kernel/irq/handle.c             |    7 ++
 kernel/locking/lockdep.c        |  138 ++++++++++++++++++++++++++++++++++++++--
 kernel/locking/mutex-debug.c    |    2 
 kernel/locking/rwsem.c          |    2 
 kernel/locking/spinlock_debug.c |    6 -
 kernel/rcu/update.c             |   24 +++++-
 lib/Kconfig.debug               |   17 ++++
 15 files changed, 307 insertions(+), 47 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -37,7 +37,12 @@
 # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
 # define trace_hardirq_enter()			\
 do {						\
-	current->hardirq_context++;		\
+	if (!current->hardirq_context++)	\
+		current->hardirq_threaded = 0;	\
+} while (0)
+# define trace_hardirq_threaded()		\
+do {						\
+	current->hardirq_threaded = 1;		\
 } while (0)
 # define trace_hardirq_exit()			\
 do {						\
@@ -59,6 +64,7 @@ do {						\
 # define trace_hardirqs_enabled(p)	0
 # define trace_softirqs_enabled(p)	0
 # define trace_hardirq_enter()		do { } while (0)
+# define trace_hardirq_threaded()	do { } while (0)
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -21,6 +21,22 @@ extern int lock_stat;
 
 #include <linux/types.h>
 
+enum lockdep_wait_type {
+	LD_WAIT_INV = 0,	/* not checked, catch all */
+
+	LD_WAIT_FREE,		/* wait free, rcu etc.. */
+	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
+
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
+#else
+	LD_WAIT_CONFIG = LD_WAIT_SPIN,
+#endif
+	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */
+
+	LD_WAIT_MAX,		/* must be last */
+};
+
 #ifdef CONFIG_LOCKDEP
 
 #include <linux/linkage.h>
@@ -111,6 +127,9 @@ struct lock_class {
 	int				name_version;
 	const char			*name;
 
+	short				wait_type_inner;
+	short				wait_type_outer;
+
 #ifdef CONFIG_LOCK_STAT
 	unsigned long			contention_point[LOCKSTAT_POINTS];
 	unsigned long			contending_point[LOCKSTAT_POINTS];
@@ -158,6 +177,8 @@ struct lockdep_map {
 	struct lock_class_key		*key;
 	struct lock_class		*class_cache[NR_LOCKDEP_CACHING_CLASSES];
 	const char			*name;
+	short				wait_type_outer; /* can be taken in this context */
+	short				wait_type_inner; /* presents this context */
 #ifdef CONFIG_LOCK_STAT
 	int				cpu;
 	unsigned long			ip;
@@ -299,8 +320,21 @@ extern void lockdep_unregister_key(struc
  * to lockdep:
  */
 
-extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
-			     struct lock_class_key *key, int subclass);
+extern void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+	struct lock_class_key *key, int subclass, short inner, short outer);
+
+static inline void
+lockdep_init_map_wait(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass, short inner)
+{
+	lockdep_init_map_waits(lock, name, key, subclass, inner, LD_WAIT_INV);
+}
+
+static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
+			     struct lock_class_key *key, int subclass)
+{
+	lockdep_init_map_wait(lock, name, key, subclass, LD_WAIT_INV);
+}
 
 /*
  * Reinitialize a lock key - for cases where there is special locking or
@@ -308,18 +342,29 @@ extern void lockdep_init_map(struct lock
  * of dependencies wrong: they are either too broad (they need a class-split)
  * or they are too narrow (they suffer from a false class-split):
  */
-#define lockdep_set_class(lock, key) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, 0)
-#define lockdep_set_class_and_name(lock, key, name) \
-		lockdep_init_map(&(lock)->dep_map, name, key, 0)
-#define lockdep_set_class_and_subclass(lock, key, sub) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, sub)
-#define lockdep_set_subclass(lock, sub)	\
-		lockdep_init_map(&(lock)->dep_map, #lock, \
-				 (lock)->dep_map.key, sub)
+#define lockdep_set_class(lock, key)				\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_name(lock, key, name)		\
+	lockdep_init_map_waits(&(lock)->dep_map, name, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_subclass(lock, key, sub)		\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_subclass(lock, sub)					\
+	lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+			       (lock)->dep_map.wait_type_inner,		\
+			       (lock)->dep_map.wait_type_outer)
 
 #define lockdep_set_novalidate_class(lock) \
 	lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock)
+
 /*
  * Compare locking classes
  */
@@ -432,6 +477,10 @@ static inline void lockdep_set_selftest_
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
 # define lockdep_init()				do { } while (0)
+# define lockdep_init_map_waits(lock, name, key, sub, inner, outer) \
+		do { (void)(name); (void)(key); } while (0)
+# define lockdep_init_map_wait(lock, name, key, sub, inner) \
+		do { (void)(name); (void)(key); } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -109,8 +109,11 @@ do {									\
 } while (0)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
-		, .dep_map = { .name = #lockname }
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname)			\
+		, .dep_map = {					\
+			.name = #lockname,			\
+			.wait_type_inner = LD_WAIT_SLEEP,	\
+		}
 #else
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
 #endif
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -22,7 +22,11 @@ typedef struct {
 #define RWLOCK_MAGIC		0xdeaf1eed
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define RW_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RW_DEP_MAP_INIT(lockname)					\
+	.dep_map = {							\
+		.name = #lockname,					\
+		.wait_type_inner = LD_WAIT_CONFIG,			\
+	}
 #else
 # define RW_DEP_MAP_INIT(lockname)
 #endif
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -71,7 +71,11 @@ static inline int rwsem_is_locked(struct
 /* Common initializer macros and functions */
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
+# define __RWSEM_DEP_MAP_INIT(lockname)			\
+	, .dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SLEEP,	\
+	}
 #else
 # define __RWSEM_DEP_MAP_INIT(lockname)
 #endif
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -970,6 +970,7 @@ struct task_struct {
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
+	unsigned int			hardirq_threaded;
 	unsigned long			hardirq_enable_ip;
 	unsigned long			hardirq_disable_ip;
 	unsigned int			hardirq_enable_event;
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -93,12 +93,13 @@
 
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-				   struct lock_class_key *key);
-# define raw_spin_lock_init(lock)				\
-do {								\
-	static struct lock_class_key __key;			\
-								\
-	__raw_spin_lock_init((lock), #lock, &__key);		\
+				   struct lock_class_key *key, short inner);
+
+# define raw_spin_lock_init(lock)					\
+do {									\
+	static struct lock_class_key __key;				\
+									\
+	__raw_spin_lock_init((lock), #lock, &__key, LD_WAIT_SPIN);	\
 } while (0)
 
 #else
@@ -327,12 +328,26 @@ static __always_inline raw_spinlock_t *s
 	return &lock->rlock;
 }
 
-#define spin_lock_init(_lock)				\
-do {							\
-	spinlock_check(_lock);				\
-	raw_spin_lock_init(&(_lock)->rlock);		\
+#ifdef CONFIG_DEBUG_SPINLOCK
+
+# define spin_lock_init(lock)					\
+do {								\
+	static struct lock_class_key __key;			\
+								\
+	__raw_spin_lock_init(spinlock_check(lock),		\
+			     #lock, &__key, LD_WAIT_CONFIG);	\
+} while (0)
+
+#else
+
+# define spin_lock_init(_lock)			\
+do {						\
+	spinlock_check(_lock);			\
+	*(_lock) = __SPIN_LOCK_UNLOCKED(_lock);	\
 } while (0)
 
+#endif
+
 static __always_inline void spin_lock(spinlock_t *lock)
 {
 	raw_spin_lock(&lock->rlock);
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -33,8 +33,18 @@ typedef struct raw_spinlock {
 #define SPINLOCK_OWNER_INIT	((void *)-1L)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SPIN_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RAW_SPIN_DEP_MAP_INIT(lockname)		\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SPIN,	\
+	}
+# define SPIN_DEP_MAP_INIT(lockname)			\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_CONFIG,	\
+	}
 #else
+# define RAW_SPIN_DEP_MAP_INIT(lockname)
 # define SPIN_DEP_MAP_INIT(lockname)
 #endif
 
@@ -51,7 +61,7 @@ typedef struct raw_spinlock {
 	{					\
 	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 	SPIN_DEBUG_INIT(lockname)		\
-	SPIN_DEP_MAP_INIT(lockname) }
+	RAW_SPIN_DEP_MAP_INIT(lockname) }
 
 #define __RAW_SPIN_LOCK_UNLOCKED(lockname)	\
 	(raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
@@ -72,11 +82,17 @@ typedef struct spinlock {
 	};
 } spinlock_t;
 
+#define ___SPIN_LOCK_INITIALIZER(lockname)	\
+	{					\
+	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
+	SPIN_DEBUG_INIT(lockname)		\
+	SPIN_DEP_MAP_INIT(lockname) }
+
 #define __SPIN_LOCK_INITIALIZER(lockname) \
-	{ { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
+	{ { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
 
 #define __SPIN_LOCK_UNLOCKED(lockname) \
-	(spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
+	(spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
 
 #define DEFINE_SPINLOCK(x)	spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
 
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -145,6 +145,13 @@ irqreturn_t __handle_irq_event_percpu(st
 	for_each_action_of_desc(desc, action) {
 		irqreturn_t res;
 
+		/*
+		 * If this IRQ would be threaded under force_irqthreads, mark it so.
+		 */
+		if (irq_settings_can_thread(desc) &&
+		    !(action->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)))
+			trace_hardirq_threaded();
+
 		trace_irq_handler_entry(irq, action);
 		res = action->handler(irq, action->dev_id);
 		trace_irq_handler_exit(irq, action, res);
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -653,7 +653,9 @@ static void print_lock_name(struct lock_
 
 	printk(KERN_CONT " (");
 	__print_lock_name(class);
-	printk(KERN_CONT "){%s}", usage);
+	printk(KERN_CONT "){%s}-{%hd:%hd}", usage,
+			class->wait_type_outer ?: class->wait_type_inner,
+			class->wait_type_inner);
 }
 
 static void print_lockdep_cache(struct lockdep_map *lock)
@@ -1230,6 +1232,8 @@ register_lock_class(struct lockdep_map *
 	WARN_ON_ONCE(!list_empty(&class->locks_before));
 	WARN_ON_ONCE(!list_empty(&class->locks_after));
 	class->name_version = count_matching_names(class);
+	class->wait_type_inner = lock->wait_type_inner;
+	class->wait_type_outer = lock->wait_type_outer;
 	/*
 	 * We use RCU's safe list-add method to make
 	 * parallel walking of the hash-list safe:
@@ -3682,6 +3686,113 @@ static int mark_lock(struct task_struct
 	return ret;
 }
 
+static int
+print_lock_invalid_wait_context(struct task_struct *curr,
+				struct held_lock *hlock)
+{
+	if (!debug_locks_off())
+		return 0;
+	if (debug_locks_silent)
+		return 0;
+
+	pr_warn("\n");
+	pr_warn("=============================\n");
+	pr_warn("[ BUG: Invalid wait context ]\n");
+	print_kernel_ident();
+	pr_warn("-----------------------------\n");
+
+	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
+	print_lock(hlock);
+
+	pr_warn("other info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	pr_warn("stack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Verify the wait_type context.
+ *
+ * This check validates we takes locks in the right wait-type order; that is it
+ * ensures that we do not take mutexes inside spinlocks and do not attempt to
+ * acquire spinlocks inside raw_spinlocks and the sort.
+ *
+ * The entire thing is slightly more complex because of RCU, RCU is a lock that
+ * can be taken from (pretty much) any context but also has constraints.
+ * However when taken in a stricter environment the RCU lock does not loosen
+ * the constraints.
+ *
+ * Therefore we must look for the strictest environment in the lock stack and
+ * compare that to the lock we're trying to acquire.
+ */
+static int check_wait_context(struct task_struct *curr, struct held_lock *next)
+{
+	short next_inner = hlock_class(next)->wait_type_inner;
+	short next_outer = hlock_class(next)->wait_type_outer;
+	short curr_inner;
+	int depth;
+
+	if (!curr->lockdep_depth || !next_inner || next->trylock)
+		return 0;
+
+	if (!next_outer)
+		next_outer = next_inner;
+
+	/*
+	 * Find start of current irq_context..
+	 */
+	for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) {
+		struct held_lock *prev = curr->held_locks + depth;
+		if (prev->irq_context != next->irq_context)
+			break;
+	}
+	depth++;
+
+	/*
+	 * Set appropriate wait type for the context; for IRQs we have to take
+	 * into account force_irqthread as that is implied by PREEMPT_RT.
+	 */
+	if (curr->hardirq_context) {
+		/*
+		 * Check if force_irqthreads will run us threaded.
+		 */
+		if (curr->hardirq_threaded)
+			curr_inner = LD_WAIT_CONFIG;
+		else
+			curr_inner = LD_WAIT_SPIN;
+	} else if (curr->softirq_context) {
+		/*
+		 * Softirqs are always threaded.
+		 */
+		curr_inner = LD_WAIT_CONFIG;
+	} else {
+		curr_inner = LD_WAIT_MAX;
+	}
+
+	for (; depth < curr->lockdep_depth; depth++) {
+		struct held_lock *prev = curr->held_locks + depth;
+		short prev_inner = hlock_class(prev)->wait_type_inner;
+
+		if (prev_inner) {
+			/*
+			 * We can have a bigger inner than a previous one
+			 * when outer is smaller than inner, as with RCU.
+			 *
+			 * Also due to trylocks.
+			 */
+			curr_inner = min(curr_inner, prev_inner);
+		}
+	}
+
+	if (next_outer > curr_inner)
+		return print_lock_invalid_wait_context(curr, next);
+
+	return 0;
+}
+
 #else /* CONFIG_PROVE_LOCKING */
 
 static inline int
@@ -3701,13 +3812,20 @@ static inline int separate_irq_context(s
 	return 0;
 }
 
+static inline int check_wait_context(struct task_struct *curr,
+				     struct held_lock *next)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PROVE_LOCKING */
 
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
-		      struct lock_class_key *key, int subclass)
+void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+			    struct lock_class_key *key, int subclass,
+			    short inner, short outer)
 {
 	int i;
 
@@ -3728,6 +3846,9 @@ void lockdep_init_map(struct lockdep_map
 
 	lock->name = name;
 
+	lock->wait_type_outer = outer;
+	lock->wait_type_inner = inner;
+
 	/*
 	 * No key, no joy, we need to hash something.
 	 */
@@ -3761,7 +3882,7 @@ void lockdep_init_map(struct lockdep_map
 		raw_local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_init_map);
+EXPORT_SYMBOL_GPL(lockdep_init_map_waits);
 
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -3862,7 +3983,7 @@ static int __lock_acquire(struct lockdep
 
 	class_idx = class - lock_classes;
 
-	if (depth) {
+	if (depth) { /* we're holding locks */
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (!references)
@@ -3904,6 +4025,9 @@ static int __lock_acquire(struct lockdep
 #endif
 	hlock->pin_count = pin_count;
 
+	if (check_wait_context(curr, hlock))
+		return 0;
+
 	/* Initialize the lock usage bit */
 	if (!mark_usage(curr, hlock, check))
 		return 0;
@@ -4139,7 +4263,9 @@ static int
 		return 0;
 	}
 
-	lockdep_init_map(lock, name, key, 0);
+	lockdep_init_map_waits(lock, name, key, 0,
+			       lock->wait_type_inner,
+			       lock->wait_type_outer);
 	class = register_lock_class(lock, subclass, 0);
 	hlock->class_idx = class - lock_classes;
 
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -85,7 +85,7 @@ void debug_mutex_init(struct mutex *lock
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 	lock->magic = lock;
 }
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -329,7 +329,7 @@ void __init_rwsem(struct rw_semaphore *s
 	 * Make sure we are not reinitializing a held semaphore:
 	 */
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
-	lockdep_init_map(&sem->dep_map, name, key, 0);
+	lockdep_init_map_wait(&sem->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 #ifdef CONFIG_DEBUG_RWSEMS
 	sem->magic = sem;
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -14,14 +14,14 @@
 #include <linux/export.h>
 
 void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-			  struct lock_class_key *key)
+			  struct lock_class_key *key, short inner)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	/*
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, inner);
 #endif
 	lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
 	lock->magic = SPINLOCK_MAGIC;
@@ -39,7 +39,7 @@ void __rwlock_init(rwlock_t *lock, const
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_CONFIG);
 #endif
 	lock->raw_lock = (arch_rwlock_t) __ARCH_RW_LOCK_UNLOCKED;
 	lock->magic = RWLOCK_MAGIC;
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -227,18 +227,30 @@ core_initcall(rcu_set_runtime_mode);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 static struct lock_class_key rcu_lock_key;
-struct lockdep_map rcu_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+struct lockdep_map rcu_lock_map = {
+	.name = "rcu_read_lock",
+	.key = &rcu_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* XXX PREEMPT_RCU ? */
+};
 EXPORT_SYMBOL_GPL(rcu_lock_map);
 
 static struct lock_class_key rcu_bh_lock_key;
-struct lockdep_map rcu_bh_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_bh", &rcu_bh_lock_key);
+struct lockdep_map rcu_bh_lock_map = {
+	.name = "rcu_read_lock_bh",
+	.key = &rcu_bh_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* PREEMPT_LOCK also makes BH preemptible */
+};
 EXPORT_SYMBOL_GPL(rcu_bh_lock_map);
 
 static struct lock_class_key rcu_sched_lock_key;
-struct lockdep_map rcu_sched_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key);
+struct lockdep_map rcu_sched_lock_map = {
+	.name = "rcu_read_lock_sched",
+	.key = &rcu_sched_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_SPIN,
+};
 EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
 
 static struct lock_class_key rcu_callback_key;
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1086,6 +1086,23 @@ config PROVE_LOCKING
 
 	 For more details, see Documentation/locking/lockdep-design.rst.
 
+config PROVE_RAW_LOCK_NESTING
+	bool "Enable raw_spinlock - spinlock nesting checks"
+	depends on PROVE_LOCKING
+	default n
+	help
+	 Enable the raw_spinlock vs. spinlock nesting checks which ensure
+	 that the lock nesting rules for PREEMPT_RT enabled kernels are
+	 not violated.
+
+	 NOTE: There are known nesting problems. So if you enable this
+	 option expect lockdep splats until these problems have been fully
+	 addressed which is work in progress. This config switch allows to
+	 identify and analyze these problems. It will be removed and the
+	 check permanentely enabled once the main issues have been fixed.
+
+	 If unsure, select N.
+
 config LOCK_STAT
 	bool "Lock usage statistics"
 	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 17/20] lockdep: Introduce wait-type checks
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Peter Zijlstra <peterz@infradead.org>

Extend lockdep to validate lock wait-type context.

The current wait-types are:

	LD_WAIT_FREE,		/* wait free, rcu etc.. */
	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.

Below is an example output generated by the trivial test code:

  raw_spin_lock(&foo);
  spin_lock(&bar);
  spin_unlock(&bar);
  raw_spin_unlock(&foo);

 [ BUG: Invalid wait context ]
 -----------------------------
 swapper/0/1 is trying to lock:
 ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
 other info that might help us debug this:
 1 lock held by swapper/0/1:
  #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.

This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.

Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.

The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.

The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.

[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
	   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fix the LOCKDEP=y && LOCK_PROVE=n case
---
 include/linux/irqflags.h        |    8 ++
 include/linux/lockdep.h         |   71 +++++++++++++++++---
 include/linux/mutex.h           |    7 +-
 include/linux/rwlock_types.h    |    6 +
 include/linux/rwsem.h           |    6 +
 include/linux/sched.h           |    1 
 include/linux/spinlock.h        |   35 +++++++---
 include/linux/spinlock_types.h  |   24 +++++-
 kernel/irq/handle.c             |    7 ++
 kernel/locking/lockdep.c        |  138 ++++++++++++++++++++++++++++++++++++++--
 kernel/locking/mutex-debug.c    |    2 
 kernel/locking/rwsem.c          |    2 
 kernel/locking/spinlock_debug.c |    6 -
 kernel/rcu/update.c             |   24 +++++-
 lib/Kconfig.debug               |   17 ++++
 15 files changed, 307 insertions(+), 47 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -37,7 +37,12 @@
 # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
 # define trace_hardirq_enter()			\
 do {						\
-	current->hardirq_context++;		\
+	if (!current->hardirq_context++)	\
+		current->hardirq_threaded = 0;	\
+} while (0)
+# define trace_hardirq_threaded()		\
+do {						\
+	current->hardirq_threaded = 1;		\
 } while (0)
 # define trace_hardirq_exit()			\
 do {						\
@@ -59,6 +64,7 @@ do {						\
 # define trace_hardirqs_enabled(p)	0
 # define trace_softirqs_enabled(p)	0
 # define trace_hardirq_enter()		do { } while (0)
+# define trace_hardirq_threaded()	do { } while (0)
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -21,6 +21,22 @@ extern int lock_stat;
 
 #include <linux/types.h>
 
+enum lockdep_wait_type {
+	LD_WAIT_INV = 0,	/* not checked, catch all */
+
+	LD_WAIT_FREE,		/* wait free, rcu etc.. */
+	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
+
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
+#else
+	LD_WAIT_CONFIG = LD_WAIT_SPIN,
+#endif
+	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */
+
+	LD_WAIT_MAX,		/* must be last */
+};
+
 #ifdef CONFIG_LOCKDEP
 
 #include <linux/linkage.h>
@@ -111,6 +127,9 @@ struct lock_class {
 	int				name_version;
 	const char			*name;
 
+	short				wait_type_inner;
+	short				wait_type_outer;
+
 #ifdef CONFIG_LOCK_STAT
 	unsigned long			contention_point[LOCKSTAT_POINTS];
 	unsigned long			contending_point[LOCKSTAT_POINTS];
@@ -158,6 +177,8 @@ struct lockdep_map {
 	struct lock_class_key		*key;
 	struct lock_class		*class_cache[NR_LOCKDEP_CACHING_CLASSES];
 	const char			*name;
+	short				wait_type_outer; /* can be taken in this context */
+	short				wait_type_inner; /* presents this context */
 #ifdef CONFIG_LOCK_STAT
 	int				cpu;
 	unsigned long			ip;
@@ -299,8 +320,21 @@ extern void lockdep_unregister_key(struc
  * to lockdep:
  */
 
-extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
-			     struct lock_class_key *key, int subclass);
+extern void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+	struct lock_class_key *key, int subclass, short inner, short outer);
+
+static inline void
+lockdep_init_map_wait(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass, short inner)
+{
+	lockdep_init_map_waits(lock, name, key, subclass, inner, LD_WAIT_INV);
+}
+
+static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
+			     struct lock_class_key *key, int subclass)
+{
+	lockdep_init_map_wait(lock, name, key, subclass, LD_WAIT_INV);
+}
 
 /*
  * Reinitialize a lock key - for cases where there is special locking or
@@ -308,18 +342,29 @@ extern void lockdep_init_map(struct lock
  * of dependencies wrong: they are either too broad (they need a class-split)
  * or they are too narrow (they suffer from a false class-split):
  */
-#define lockdep_set_class(lock, key) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, 0)
-#define lockdep_set_class_and_name(lock, key, name) \
-		lockdep_init_map(&(lock)->dep_map, name, key, 0)
-#define lockdep_set_class_and_subclass(lock, key, sub) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, sub)
-#define lockdep_set_subclass(lock, sub)	\
-		lockdep_init_map(&(lock)->dep_map, #lock, \
-				 (lock)->dep_map.key, sub)
+#define lockdep_set_class(lock, key)				\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_name(lock, key, name)		\
+	lockdep_init_map_waits(&(lock)->dep_map, name, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_subclass(lock, key, sub)		\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_subclass(lock, sub)					\
+	lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+			       (lock)->dep_map.wait_type_inner,		\
+			       (lock)->dep_map.wait_type_outer)
 
 #define lockdep_set_novalidate_class(lock) \
 	lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock)
+
 /*
  * Compare locking classes
  */
@@ -432,6 +477,10 @@ static inline void lockdep_set_selftest_
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
 # define lockdep_init()				do { } while (0)
+# define lockdep_init_map_waits(lock, name, key, sub, inner, outer) \
+		do { (void)(name); (void)(key); } while (0)
+# define lockdep_init_map_wait(lock, name, key, sub, inner) \
+		do { (void)(name); (void)(key); } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -109,8 +109,11 @@ do {									\
 } while (0)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
-		, .dep_map = { .name = #lockname }
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname)			\
+		, .dep_map = {					\
+			.name = #lockname,			\
+			.wait_type_inner = LD_WAIT_SLEEP,	\
+		}
 #else
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
 #endif
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -22,7 +22,11 @@ typedef struct {
 #define RWLOCK_MAGIC		0xdeaf1eed
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define RW_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RW_DEP_MAP_INIT(lockname)					\
+	.dep_map = {							\
+		.name = #lockname,					\
+		.wait_type_inner = LD_WAIT_CONFIG,			\
+	}
 #else
 # define RW_DEP_MAP_INIT(lockname)
 #endif
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -71,7 +71,11 @@ static inline int rwsem_is_locked(struct
 /* Common initializer macros and functions */
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
+# define __RWSEM_DEP_MAP_INIT(lockname)			\
+	, .dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SLEEP,	\
+	}
 #else
 # define __RWSEM_DEP_MAP_INIT(lockname)
 #endif
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -970,6 +970,7 @@ struct task_struct {
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
+	unsigned int			hardirq_threaded;
 	unsigned long			hardirq_enable_ip;
 	unsigned long			hardirq_disable_ip;
 	unsigned int			hardirq_enable_event;
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -93,12 +93,13 @@
 
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-				   struct lock_class_key *key);
-# define raw_spin_lock_init(lock)				\
-do {								\
-	static struct lock_class_key __key;			\
-								\
-	__raw_spin_lock_init((lock), #lock, &__key);		\
+				   struct lock_class_key *key, short inner);
+
+# define raw_spin_lock_init(lock)					\
+do {									\
+	static struct lock_class_key __key;				\
+									\
+	__raw_spin_lock_init((lock), #lock, &__key, LD_WAIT_SPIN);	\
 } while (0)
 
 #else
@@ -327,12 +328,26 @@ static __always_inline raw_spinlock_t *s
 	return &lock->rlock;
 }
 
-#define spin_lock_init(_lock)				\
-do {							\
-	spinlock_check(_lock);				\
-	raw_spin_lock_init(&(_lock)->rlock);		\
+#ifdef CONFIG_DEBUG_SPINLOCK
+
+# define spin_lock_init(lock)					\
+do {								\
+	static struct lock_class_key __key;			\
+								\
+	__raw_spin_lock_init(spinlock_check(lock),		\
+			     #lock, &__key, LD_WAIT_CONFIG);	\
+} while (0)
+
+#else
+
+# define spin_lock_init(_lock)			\
+do {						\
+	spinlock_check(_lock);			\
+	*(_lock) = __SPIN_LOCK_UNLOCKED(_lock);	\
 } while (0)
 
+#endif
+
 static __always_inline void spin_lock(spinlock_t *lock)
 {
 	raw_spin_lock(&lock->rlock);
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -33,8 +33,18 @@ typedef struct raw_spinlock {
 #define SPINLOCK_OWNER_INIT	((void *)-1L)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SPIN_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RAW_SPIN_DEP_MAP_INIT(lockname)		\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SPIN,	\
+	}
+# define SPIN_DEP_MAP_INIT(lockname)			\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_CONFIG,	\
+	}
 #else
+# define RAW_SPIN_DEP_MAP_INIT(lockname)
 # define SPIN_DEP_MAP_INIT(lockname)
 #endif
 
@@ -51,7 +61,7 @@ typedef struct raw_spinlock {
 	{					\
 	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 	SPIN_DEBUG_INIT(lockname)		\
-	SPIN_DEP_MAP_INIT(lockname) }
+	RAW_SPIN_DEP_MAP_INIT(lockname) }
 
 #define __RAW_SPIN_LOCK_UNLOCKED(lockname)	\
 	(raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
@@ -72,11 +82,17 @@ typedef struct spinlock {
 	};
 } spinlock_t;
 
+#define ___SPIN_LOCK_INITIALIZER(lockname)	\
+	{					\
+	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
+	SPIN_DEBUG_INIT(lockname)		\
+	SPIN_DEP_MAP_INIT(lockname) }
+
 #define __SPIN_LOCK_INITIALIZER(lockname) \
-	{ { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
+	{ { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
 
 #define __SPIN_LOCK_UNLOCKED(lockname) \
-	(spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
+	(spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
 
 #define DEFINE_SPINLOCK(x)	spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
 
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -145,6 +145,13 @@ irqreturn_t __handle_irq_event_percpu(st
 	for_each_action_of_desc(desc, action) {
 		irqreturn_t res;
 
+		/*
+		 * If this IRQ would be threaded under force_irqthreads, mark it so.
+		 */
+		if (irq_settings_can_thread(desc) &&
+		    !(action->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)))
+			trace_hardirq_threaded();
+
 		trace_irq_handler_entry(irq, action);
 		res = action->handler(irq, action->dev_id);
 		trace_irq_handler_exit(irq, action, res);
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -653,7 +653,9 @@ static void print_lock_name(struct lock_
 
 	printk(KERN_CONT " (");
 	__print_lock_name(class);
-	printk(KERN_CONT "){%s}", usage);
+	printk(KERN_CONT "){%s}-{%hd:%hd}", usage,
+			class->wait_type_outer ?: class->wait_type_inner,
+			class->wait_type_inner);
 }
 
 static void print_lockdep_cache(struct lockdep_map *lock)
@@ -1230,6 +1232,8 @@ register_lock_class(struct lockdep_map *
 	WARN_ON_ONCE(!list_empty(&class->locks_before));
 	WARN_ON_ONCE(!list_empty(&class->locks_after));
 	class->name_version = count_matching_names(class);
+	class->wait_type_inner = lock->wait_type_inner;
+	class->wait_type_outer = lock->wait_type_outer;
 	/*
 	 * We use RCU's safe list-add method to make
 	 * parallel walking of the hash-list safe:
@@ -3682,6 +3686,113 @@ static int mark_lock(struct task_struct
 	return ret;
 }
 
+static int
+print_lock_invalid_wait_context(struct task_struct *curr,
+				struct held_lock *hlock)
+{
+	if (!debug_locks_off())
+		return 0;
+	if (debug_locks_silent)
+		return 0;
+
+	pr_warn("\n");
+	pr_warn("=============================\n");
+	pr_warn("[ BUG: Invalid wait context ]\n");
+	print_kernel_ident();
+	pr_warn("-----------------------------\n");
+
+	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
+	print_lock(hlock);
+
+	pr_warn("other info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	pr_warn("stack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Verify the wait_type context.
+ *
+ * This check validates we takes locks in the right wait-type order; that is it
+ * ensures that we do not take mutexes inside spinlocks and do not attempt to
+ * acquire spinlocks inside raw_spinlocks and the sort.
+ *
+ * The entire thing is slightly more complex because of RCU, RCU is a lock that
+ * can be taken from (pretty much) any context but also has constraints.
+ * However when taken in a stricter environment the RCU lock does not loosen
+ * the constraints.
+ *
+ * Therefore we must look for the strictest environment in the lock stack and
+ * compare that to the lock we're trying to acquire.
+ */
+static int check_wait_context(struct task_struct *curr, struct held_lock *next)
+{
+	short next_inner = hlock_class(next)->wait_type_inner;
+	short next_outer = hlock_class(next)->wait_type_outer;
+	short curr_inner;
+	int depth;
+
+	if (!curr->lockdep_depth || !next_inner || next->trylock)
+		return 0;
+
+	if (!next_outer)
+		next_outer = next_inner;
+
+	/*
+	 * Find start of current irq_context..
+	 */
+	for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) {
+		struct held_lock *prev = curr->held_locks + depth;
+		if (prev->irq_context != next->irq_context)
+			break;
+	}
+	depth++;
+
+	/*
+	 * Set appropriate wait type for the context; for IRQs we have to take
+	 * into account force_irqthread as that is implied by PREEMPT_RT.
+	 */
+	if (curr->hardirq_context) {
+		/*
+		 * Check if force_irqthreads will run us threaded.
+		 */
+		if (curr->hardirq_threaded)
+			curr_inner = LD_WAIT_CONFIG;
+		else
+			curr_inner = LD_WAIT_SPIN;
+	} else if (curr->softirq_context) {
+		/*
+		 * Softirqs are always threaded.
+		 */
+		curr_inner = LD_WAIT_CONFIG;
+	} else {
+		curr_inner = LD_WAIT_MAX;
+	}
+
+	for (; depth < curr->lockdep_depth; depth++) {
+		struct held_lock *prev = curr->held_locks + depth;
+		short prev_inner = hlock_class(prev)->wait_type_inner;
+
+		if (prev_inner) {
+			/*
+			 * We can have a bigger inner than a previous one
+			 * when outer is smaller than inner, as with RCU.
+			 *
+			 * Also due to trylocks.
+			 */
+			curr_inner = min(curr_inner, prev_inner);
+		}
+	}
+
+	if (next_outer > curr_inner)
+		return print_lock_invalid_wait_context(curr, next);
+
+	return 0;
+}
+
 #else /* CONFIG_PROVE_LOCKING */
 
 static inline int
@@ -3701,13 +3812,20 @@ static inline int separate_irq_context(s
 	return 0;
 }
 
+static inline int check_wait_context(struct task_struct *curr,
+				     struct held_lock *next)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PROVE_LOCKING */
 
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
-		      struct lock_class_key *key, int subclass)
+void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+			    struct lock_class_key *key, int subclass,
+			    short inner, short outer)
 {
 	int i;
 
@@ -3728,6 +3846,9 @@ void lockdep_init_map(struct lockdep_map
 
 	lock->name = name;
 
+	lock->wait_type_outer = outer;
+	lock->wait_type_inner = inner;
+
 	/*
 	 * No key, no joy, we need to hash something.
 	 */
@@ -3761,7 +3882,7 @@ void lockdep_init_map(struct lockdep_map
 		raw_local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_init_map);
+EXPORT_SYMBOL_GPL(lockdep_init_map_waits);
 
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -3862,7 +3983,7 @@ static int __lock_acquire(struct lockdep
 
 	class_idx = class - lock_classes;
 
-	if (depth) {
+	if (depth) { /* we're holding locks */
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (!references)
@@ -3904,6 +4025,9 @@ static int __lock_acquire(struct lockdep
 #endif
 	hlock->pin_count = pin_count;
 
+	if (check_wait_context(curr, hlock))
+		return 0;
+
 	/* Initialize the lock usage bit */
 	if (!mark_usage(curr, hlock, check))
 		return 0;
@@ -4139,7 +4263,9 @@ static int
 		return 0;
 	}
 
-	lockdep_init_map(lock, name, key, 0);
+	lockdep_init_map_waits(lock, name, key, 0,
+			       lock->wait_type_inner,
+			       lock->wait_type_outer);
 	class = register_lock_class(lock, subclass, 0);
 	hlock->class_idx = class - lock_classes;
 
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -85,7 +85,7 @@ void debug_mutex_init(struct mutex *lock
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 	lock->magic = lock;
 }
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -329,7 +329,7 @@ void __init_rwsem(struct rw_semaphore *s
 	 * Make sure we are not reinitializing a held semaphore:
 	 */
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
-	lockdep_init_map(&sem->dep_map, name, key, 0);
+	lockdep_init_map_wait(&sem->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 #ifdef CONFIG_DEBUG_RWSEMS
 	sem->magic = sem;
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -14,14 +14,14 @@
 #include <linux/export.h>
 
 void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-			  struct lock_class_key *key)
+			  struct lock_class_key *key, short inner)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	/*
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, inner);
 #endif
 	lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
 	lock->magic = SPINLOCK_MAGIC;
@@ -39,7 +39,7 @@ void __rwlock_init(rwlock_t *lock, const
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_CONFIG);
 #endif
 	lock->raw_lock = (arch_rwlock_t) __ARCH_RW_LOCK_UNLOCKED;
 	lock->magic = RWLOCK_MAGIC;
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -227,18 +227,30 @@ core_initcall(rcu_set_runtime_mode);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 static struct lock_class_key rcu_lock_key;
-struct lockdep_map rcu_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+struct lockdep_map rcu_lock_map = {
+	.name = "rcu_read_lock",
+	.key = &rcu_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* XXX PREEMPT_RCU ? */
+};
 EXPORT_SYMBOL_GPL(rcu_lock_map);
 
 static struct lock_class_key rcu_bh_lock_key;
-struct lockdep_map rcu_bh_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_bh", &rcu_bh_lock_key);
+struct lockdep_map rcu_bh_lock_map = {
+	.name = "rcu_read_lock_bh",
+	.key = &rcu_bh_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* PREEMPT_LOCK also makes BH preemptible */
+};
 EXPORT_SYMBOL_GPL(rcu_bh_lock_map);
 
 static struct lock_class_key rcu_sched_lock_key;
-struct lockdep_map rcu_sched_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key);
+struct lockdep_map rcu_sched_lock_map = {
+	.name = "rcu_read_lock_sched",
+	.key = &rcu_sched_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_SPIN,
+};
 EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
 
 static struct lock_class_key rcu_callback_key;
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1086,6 +1086,23 @@ config PROVE_LOCKING
 
 	 For more details, see Documentation/locking/lockdep-design.rst.
 
+config PROVE_RAW_LOCK_NESTING
+	bool "Enable raw_spinlock - spinlock nesting checks"
+	depends on PROVE_LOCKING
+	default n
+	help
+	 Enable the raw_spinlock vs. spinlock nesting checks which ensure
+	 that the lock nesting rules for PREEMPT_RT enabled kernels are
+	 not violated.
+
+	 NOTE: There are known nesting problems. So if you enable this
+	 option expect lockdep splats until these problems have been fully
+	 addressed which is work in progress. This config switch allows to
+	 identify and analyze these problems. It will be removed and the
+	 check permanentely enabled once the main issues have been fixed.
+
+	 If unsure, select N.
+
 config LOCK_STAT
 	bool "Lock usage statistics"
 	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 17/20] lockdep: Introduce wait-type checks
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Peter Zijlstra <peterz@infradead.org>

Extend lockdep to validate lock wait-type context.

The current wait-types are:

	LD_WAIT_FREE,		/* wait free, rcu etc.. */
	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.

Below is an example output generated by the trivial test code:

  raw_spin_lock(&foo);
  spin_lock(&bar);
  spin_unlock(&bar);
  raw_spin_unlock(&foo);

 [ BUG: Invalid wait context ]
 -----------------------------
 swapper/0/1 is trying to lock:
 ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
 other info that might help us debug this:
 1 lock held by swapper/0/1:
  #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.

This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.

Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.

The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.

The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.

[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
	   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fix the LOCKDEP=y && LOCK_PROVE=n case
---
 include/linux/irqflags.h        |    8 ++
 include/linux/lockdep.h         |   71 +++++++++++++++++---
 include/linux/mutex.h           |    7 +-
 include/linux/rwlock_types.h    |    6 +
 include/linux/rwsem.h           |    6 +
 include/linux/sched.h           |    1 
 include/linux/spinlock.h        |   35 +++++++---
 include/linux/spinlock_types.h  |   24 +++++-
 kernel/irq/handle.c             |    7 ++
 kernel/locking/lockdep.c        |  138 ++++++++++++++++++++++++++++++++++++++--
 kernel/locking/mutex-debug.c    |    2 
 kernel/locking/rwsem.c          |    2 
 kernel/locking/spinlock_debug.c |    6 -
 kernel/rcu/update.c             |   24 +++++-
 lib/Kconfig.debug               |   17 ++++
 15 files changed, 307 insertions(+), 47 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -37,7 +37,12 @@
 # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
 # define trace_hardirq_enter()			\
 do {						\
-	current->hardirq_context++;		\
+	if (!current->hardirq_context++)	\
+		current->hardirq_threaded = 0;	\
+} while (0)
+# define trace_hardirq_threaded()		\
+do {						\
+	current->hardirq_threaded = 1;		\
 } while (0)
 # define trace_hardirq_exit()			\
 do {						\
@@ -59,6 +64,7 @@ do {						\
 # define trace_hardirqs_enabled(p)	0
 # define trace_softirqs_enabled(p)	0
 # define trace_hardirq_enter()		do { } while (0)
+# define trace_hardirq_threaded()	do { } while (0)
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -21,6 +21,22 @@ extern int lock_stat;
 
 #include <linux/types.h>
 
+enum lockdep_wait_type {
+	LD_WAIT_INV = 0,	/* not checked, catch all */
+
+	LD_WAIT_FREE,		/* wait free, rcu etc.. */
+	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
+
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
+#else
+	LD_WAIT_CONFIG = LD_WAIT_SPIN,
+#endif
+	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */
+
+	LD_WAIT_MAX,		/* must be last */
+};
+
 #ifdef CONFIG_LOCKDEP
 
 #include <linux/linkage.h>
@@ -111,6 +127,9 @@ struct lock_class {
 	int				name_version;
 	const char			*name;
 
+	short				wait_type_inner;
+	short				wait_type_outer;
+
 #ifdef CONFIG_LOCK_STAT
 	unsigned long			contention_point[LOCKSTAT_POINTS];
 	unsigned long			contending_point[LOCKSTAT_POINTS];
@@ -158,6 +177,8 @@ struct lockdep_map {
 	struct lock_class_key		*key;
 	struct lock_class		*class_cache[NR_LOCKDEP_CACHING_CLASSES];
 	const char			*name;
+	short				wait_type_outer; /* can be taken in this context */
+	short				wait_type_inner; /* presents this context */
 #ifdef CONFIG_LOCK_STAT
 	int				cpu;
 	unsigned long			ip;
@@ -299,8 +320,21 @@ extern void lockdep_unregister_key(struc
  * to lockdep:
  */
 
-extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
-			     struct lock_class_key *key, int subclass);
+extern void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+	struct lock_class_key *key, int subclass, short inner, short outer);
+
+static inline void
+lockdep_init_map_wait(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass, short inner)
+{
+	lockdep_init_map_waits(lock, name, key, subclass, inner, LD_WAIT_INV);
+}
+
+static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
+			     struct lock_class_key *key, int subclass)
+{
+	lockdep_init_map_wait(lock, name, key, subclass, LD_WAIT_INV);
+}
 
 /*
  * Reinitialize a lock key - for cases where there is special locking or
@@ -308,18 +342,29 @@ extern void lockdep_init_map(struct lock
  * of dependencies wrong: they are either too broad (they need a class-split)
  * or they are too narrow (they suffer from a false class-split):
  */
-#define lockdep_set_class(lock, key) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, 0)
-#define lockdep_set_class_and_name(lock, key, name) \
-		lockdep_init_map(&(lock)->dep_map, name, key, 0)
-#define lockdep_set_class_and_subclass(lock, key, sub) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, sub)
-#define lockdep_set_subclass(lock, sub)	\
-		lockdep_init_map(&(lock)->dep_map, #lock, \
-				 (lock)->dep_map.key, sub)
+#define lockdep_set_class(lock, key)				\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_name(lock, key, name)		\
+	lockdep_init_map_waits(&(lock)->dep_map, name, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_subclass(lock, key, sub)		\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_subclass(lock, sub)					\
+	lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+			       (lock)->dep_map.wait_type_inner,		\
+			       (lock)->dep_map.wait_type_outer)
 
 #define lockdep_set_novalidate_class(lock) \
 	lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock)
+
 /*
  * Compare locking classes
  */
@@ -432,6 +477,10 @@ static inline void lockdep_set_selftest_
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
 # define lockdep_init()				do { } while (0)
+# define lockdep_init_map_waits(lock, name, key, sub, inner, outer) \
+		do { (void)(name); (void)(key); } while (0)
+# define lockdep_init_map_wait(lock, name, key, sub, inner) \
+		do { (void)(name); (void)(key); } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -109,8 +109,11 @@ do {									\
 } while (0)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
-		, .dep_map = { .name = #lockname }
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname)			\
+		, .dep_map = {					\
+			.name = #lockname,			\
+			.wait_type_inner = LD_WAIT_SLEEP,	\
+		}
 #else
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
 #endif
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -22,7 +22,11 @@ typedef struct {
 #define RWLOCK_MAGIC		0xdeaf1eed
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define RW_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RW_DEP_MAP_INIT(lockname)					\
+	.dep_map = {							\
+		.name = #lockname,					\
+		.wait_type_inner = LD_WAIT_CONFIG,			\
+	}
 #else
 # define RW_DEP_MAP_INIT(lockname)
 #endif
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -71,7 +71,11 @@ static inline int rwsem_is_locked(struct
 /* Common initializer macros and functions */
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
+# define __RWSEM_DEP_MAP_INIT(lockname)			\
+	, .dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SLEEP,	\
+	}
 #else
 # define __RWSEM_DEP_MAP_INIT(lockname)
 #endif
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -970,6 +970,7 @@ struct task_struct {
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
+	unsigned int			hardirq_threaded;
 	unsigned long			hardirq_enable_ip;
 	unsigned long			hardirq_disable_ip;
 	unsigned int			hardirq_enable_event;
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -93,12 +93,13 @@
 
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-				   struct lock_class_key *key);
-# define raw_spin_lock_init(lock)				\
-do {								\
-	static struct lock_class_key __key;			\
-								\
-	__raw_spin_lock_init((lock), #lock, &__key);		\
+				   struct lock_class_key *key, short inner);
+
+# define raw_spin_lock_init(lock)					\
+do {									\
+	static struct lock_class_key __key;				\
+									\
+	__raw_spin_lock_init((lock), #lock, &__key, LD_WAIT_SPIN);	\
 } while (0)
 
 #else
@@ -327,12 +328,26 @@ static __always_inline raw_spinlock_t *s
 	return &lock->rlock;
 }
 
-#define spin_lock_init(_lock)				\
-do {							\
-	spinlock_check(_lock);				\
-	raw_spin_lock_init(&(_lock)->rlock);		\
+#ifdef CONFIG_DEBUG_SPINLOCK
+
+# define spin_lock_init(lock)					\
+do {								\
+	static struct lock_class_key __key;			\
+								\
+	__raw_spin_lock_init(spinlock_check(lock),		\
+			     #lock, &__key, LD_WAIT_CONFIG);	\
+} while (0)
+
+#else
+
+# define spin_lock_init(_lock)			\
+do {						\
+	spinlock_check(_lock);			\
+	*(_lock) = __SPIN_LOCK_UNLOCKED(_lock);	\
 } while (0)
 
+#endif
+
 static __always_inline void spin_lock(spinlock_t *lock)
 {
 	raw_spin_lock(&lock->rlock);
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -33,8 +33,18 @@ typedef struct raw_spinlock {
 #define SPINLOCK_OWNER_INIT	((void *)-1L)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SPIN_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RAW_SPIN_DEP_MAP_INIT(lockname)		\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SPIN,	\
+	}
+# define SPIN_DEP_MAP_INIT(lockname)			\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_CONFIG,	\
+	}
 #else
+# define RAW_SPIN_DEP_MAP_INIT(lockname)
 # define SPIN_DEP_MAP_INIT(lockname)
 #endif
 
@@ -51,7 +61,7 @@ typedef struct raw_spinlock {
 	{					\
 	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 	SPIN_DEBUG_INIT(lockname)		\
-	SPIN_DEP_MAP_INIT(lockname) }
+	RAW_SPIN_DEP_MAP_INIT(lockname) }
 
 #define __RAW_SPIN_LOCK_UNLOCKED(lockname)	\
 	(raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
@@ -72,11 +82,17 @@ typedef struct spinlock {
 	};
 } spinlock_t;
 
+#define ___SPIN_LOCK_INITIALIZER(lockname)	\
+	{					\
+	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
+	SPIN_DEBUG_INIT(lockname)		\
+	SPIN_DEP_MAP_INIT(lockname) }
+
 #define __SPIN_LOCK_INITIALIZER(lockname) \
-	{ { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
+	{ { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
 
 #define __SPIN_LOCK_UNLOCKED(lockname) \
-	(spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
+	(spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
 
 #define DEFINE_SPINLOCK(x)	spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
 
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -145,6 +145,13 @@ irqreturn_t __handle_irq_event_percpu(st
 	for_each_action_of_desc(desc, action) {
 		irqreturn_t res;
 
+		/*
+		 * If this IRQ would be threaded under force_irqthreads, mark it so.
+		 */
+		if (irq_settings_can_thread(desc) &&
+		    !(action->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)))
+			trace_hardirq_threaded();
+
 		trace_irq_handler_entry(irq, action);
 		res = action->handler(irq, action->dev_id);
 		trace_irq_handler_exit(irq, action, res);
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -653,7 +653,9 @@ static void print_lock_name(struct lock_
 
 	printk(KERN_CONT " (");
 	__print_lock_name(class);
-	printk(KERN_CONT "){%s}", usage);
+	printk(KERN_CONT "){%s}-{%hd:%hd}", usage,
+			class->wait_type_outer ?: class->wait_type_inner,
+			class->wait_type_inner);
 }
 
 static void print_lockdep_cache(struct lockdep_map *lock)
@@ -1230,6 +1232,8 @@ register_lock_class(struct lockdep_map *
 	WARN_ON_ONCE(!list_empty(&class->locks_before));
 	WARN_ON_ONCE(!list_empty(&class->locks_after));
 	class->name_version = count_matching_names(class);
+	class->wait_type_inner = lock->wait_type_inner;
+	class->wait_type_outer = lock->wait_type_outer;
 	/*
 	 * We use RCU's safe list-add method to make
 	 * parallel walking of the hash-list safe:
@@ -3682,6 +3686,113 @@ static int mark_lock(struct task_struct
 	return ret;
 }
 
+static int
+print_lock_invalid_wait_context(struct task_struct *curr,
+				struct held_lock *hlock)
+{
+	if (!debug_locks_off())
+		return 0;
+	if (debug_locks_silent)
+		return 0;
+
+	pr_warn("\n");
+	pr_warn("=============================\n");
+	pr_warn("[ BUG: Invalid wait context ]\n");
+	print_kernel_ident();
+	pr_warn("-----------------------------\n");
+
+	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
+	print_lock(hlock);
+
+	pr_warn("other info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	pr_warn("stack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Verify the wait_type context.
+ *
+ * This check validates we takes locks in the right wait-type order; that is it
+ * ensures that we do not take mutexes inside spinlocks and do not attempt to
+ * acquire spinlocks inside raw_spinlocks and the sort.
+ *
+ * The entire thing is slightly more complex because of RCU, RCU is a lock that
+ * can be taken from (pretty much) any context but also has constraints.
+ * However when taken in a stricter environment the RCU lock does not loosen
+ * the constraints.
+ *
+ * Therefore we must look for the strictest environment in the lock stack and
+ * compare that to the lock we're trying to acquire.
+ */
+static int check_wait_context(struct task_struct *curr, struct held_lock *next)
+{
+	short next_inner = hlock_class(next)->wait_type_inner;
+	short next_outer = hlock_class(next)->wait_type_outer;
+	short curr_inner;
+	int depth;
+
+	if (!curr->lockdep_depth || !next_inner || next->trylock)
+		return 0;
+
+	if (!next_outer)
+		next_outer = next_inner;
+
+	/*
+	 * Find start of current irq_context..
+	 */
+	for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) {
+		struct held_lock *prev = curr->held_locks + depth;
+		if (prev->irq_context != next->irq_context)
+			break;
+	}
+	depth++;
+
+	/*
+	 * Set appropriate wait type for the context; for IRQs we have to take
+	 * into account force_irqthread as that is implied by PREEMPT_RT.
+	 */
+	if (curr->hardirq_context) {
+		/*
+		 * Check if force_irqthreads will run us threaded.
+		 */
+		if (curr->hardirq_threaded)
+			curr_inner = LD_WAIT_CONFIG;
+		else
+			curr_inner = LD_WAIT_SPIN;
+	} else if (curr->softirq_context) {
+		/*
+		 * Softirqs are always threaded.
+		 */
+		curr_inner = LD_WAIT_CONFIG;
+	} else {
+		curr_inner = LD_WAIT_MAX;
+	}
+
+	for (; depth < curr->lockdep_depth; depth++) {
+		struct held_lock *prev = curr->held_locks + depth;
+		short prev_inner = hlock_class(prev)->wait_type_inner;
+
+		if (prev_inner) {
+			/*
+			 * We can have a bigger inner than a previous one
+			 * when outer is smaller than inner, as with RCU.
+			 *
+			 * Also due to trylocks.
+			 */
+			curr_inner = min(curr_inner, prev_inner);
+		}
+	}
+
+	if (next_outer > curr_inner)
+		return print_lock_invalid_wait_context(curr, next);
+
+	return 0;
+}
+
 #else /* CONFIG_PROVE_LOCKING */
 
 static inline int
@@ -3701,13 +3812,20 @@ static inline int separate_irq_context(s
 	return 0;
 }
 
+static inline int check_wait_context(struct task_struct *curr,
+				     struct held_lock *next)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PROVE_LOCKING */
 
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
-		      struct lock_class_key *key, int subclass)
+void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+			    struct lock_class_key *key, int subclass,
+			    short inner, short outer)
 {
 	int i;
 
@@ -3728,6 +3846,9 @@ void lockdep_init_map(struct lockdep_map
 
 	lock->name = name;
 
+	lock->wait_type_outer = outer;
+	lock->wait_type_inner = inner;
+
 	/*
 	 * No key, no joy, we need to hash something.
 	 */
@@ -3761,7 +3882,7 @@ void lockdep_init_map(struct lockdep_map
 		raw_local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_init_map);
+EXPORT_SYMBOL_GPL(lockdep_init_map_waits);
 
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -3862,7 +3983,7 @@ static int __lock_acquire(struct lockdep
 
 	class_idx = class - lock_classes;
 
-	if (depth) {
+	if (depth) { /* we're holding locks */
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (!references)
@@ -3904,6 +4025,9 @@ static int __lock_acquire(struct lockdep
 #endif
 	hlock->pin_count = pin_count;
 
+	if (check_wait_context(curr, hlock))
+		return 0;
+
 	/* Initialize the lock usage bit */
 	if (!mark_usage(curr, hlock, check))
 		return 0;
@@ -4139,7 +4263,9 @@ static int
 		return 0;
 	}
 
-	lockdep_init_map(lock, name, key, 0);
+	lockdep_init_map_waits(lock, name, key, 0,
+			       lock->wait_type_inner,
+			       lock->wait_type_outer);
 	class = register_lock_class(lock, subclass, 0);
 	hlock->class_idx = class - lock_classes;
 
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -85,7 +85,7 @@ void debug_mutex_init(struct mutex *lock
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 	lock->magic = lock;
 }
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -329,7 +329,7 @@ void __init_rwsem(struct rw_semaphore *s
 	 * Make sure we are not reinitializing a held semaphore:
 	 */
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
-	lockdep_init_map(&sem->dep_map, name, key, 0);
+	lockdep_init_map_wait(&sem->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 #ifdef CONFIG_DEBUG_RWSEMS
 	sem->magic = sem;
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -14,14 +14,14 @@
 #include <linux/export.h>
 
 void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-			  struct lock_class_key *key)
+			  struct lock_class_key *key, short inner)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	/*
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, inner);
 #endif
 	lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
 	lock->magic = SPINLOCK_MAGIC;
@@ -39,7 +39,7 @@ void __rwlock_init(rwlock_t *lock, const
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_CONFIG);
 #endif
 	lock->raw_lock = (arch_rwlock_t) __ARCH_RW_LOCK_UNLOCKED;
 	lock->magic = RWLOCK_MAGIC;
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -227,18 +227,30 @@ core_initcall(rcu_set_runtime_mode);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 static struct lock_class_key rcu_lock_key;
-struct lockdep_map rcu_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+struct lockdep_map rcu_lock_map = {
+	.name = "rcu_read_lock",
+	.key = &rcu_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* XXX PREEMPT_RCU ? */
+};
 EXPORT_SYMBOL_GPL(rcu_lock_map);
 
 static struct lock_class_key rcu_bh_lock_key;
-struct lockdep_map rcu_bh_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_bh", &rcu_bh_lock_key);
+struct lockdep_map rcu_bh_lock_map = {
+	.name = "rcu_read_lock_bh",
+	.key = &rcu_bh_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* PREEMPT_LOCK also makes BH preemptible */
+};
 EXPORT_SYMBOL_GPL(rcu_bh_lock_map);
 
 static struct lock_class_key rcu_sched_lock_key;
-struct lockdep_map rcu_sched_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key);
+struct lockdep_map rcu_sched_lock_map = {
+	.name = "rcu_read_lock_sched",
+	.key = &rcu_sched_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_SPIN,
+};
 EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
 
 static struct lock_class_key rcu_callback_key;
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1086,6 +1086,23 @@ config PROVE_LOCKING
 
 	 For more details, see Documentation/locking/lockdep-design.rst.
 
+config PROVE_RAW_LOCK_NESTING
+	bool "Enable raw_spinlock - spinlock nesting checks"
+	depends on PROVE_LOCKING
+	default n
+	help
+	 Enable the raw_spinlock vs. spinlock nesting checks which ensure
+	 that the lock nesting rules for PREEMPT_RT enabled kernels are
+	 not violated.
+
+	 NOTE: There are known nesting problems. So if you enable this
+	 option expect lockdep splats until these problems have been fully
+	 addressed which is work in progress. This config switch allows to
+	 identify and analyze these problems. It will be removed and the
+	 check permanentely enabled once the main issues have been fixed.
+
+	 If unsure, select N.
+
 config LOCK_STAT
 	bool "Lock usage statistics"
 	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 17/20] lockdep: Introduce wait-type checks
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

RnJvbTogUGV0ZXIgWmlqbHN0cmEgPHBldGVyekBpbmZyYWRlYWQub3JnPgoKRXh0ZW5kIGxvY2tk
ZXAgdG8gdmFsaWRhdGUgbG9jayB3YWl0LXR5cGUgY29udGV4dC4KClRoZSBjdXJyZW50IHdhaXQt
dHlwZXMgYXJlOgoKCUxEX1dBSVRfRlJFRSwJCS8qIHdhaXQgZnJlZSwgcmN1IGV0Yy4uICovCglM
RF9XQUlUX1NQSU4sCQkvKiBzcGluIGxvb3BzLCByYXdfc3BpbmxvY2tfdCBldGMuLiAqLwoJTERf
V0FJVF9DT05GSUcsCQkvKiBDT05GSUdfUFJFRU1QVF9MT0NLLCBzcGlubG9ja190IGV0Yy4uICov
CglMRF9XQUlUX1NMRUVQLAkJLyogc2xlZXBpbmcgbG9ja3MsIG11dGV4X3QgZXRjLi4gKi8KCldo
ZXJlIGxvY2tkZXAgdmFsaWRhdGVzIHRoYXQgdGhlIGN1cnJlbnQgbG9jayAodGhlIG9uZSBiZWlu
ZyBhY3F1aXJlZCkKZml0cyBpbiB0aGUgY3VycmVudCB3YWl0LWNvbnRleHQgKGFzIGdlbmVyYXRl
ZCBieSB0aGUgaGVsZCBzdGFjaykuCgpUaGlzIGVuc3VyZXMgdGhhdCB0aGVyZSBpcyBubyBhdHRl
bXB0IHRvIGFjcXVpcmUgbXV0ZXhlcyB3aGlsZSBob2xkaW5nCnNwaW5sb2NrcywgdG8gYWNxdWly
ZSBzcGlubG9ja3Mgd2hpbGUgaG9sZGluZyByYXdfc3BpbmxvY2tzIGFuZCBzbyBvbi4gSW4Kb3Ro
ZXIgd29yZHMsIGl0cyBhIG1vcmUgZmFuY3kgbWlnaHRfc2xlZXAoKS4KCk9idmlvdXNseSBSQ1Ug
bWFkZSB0aGUgZW50aXJlIG9yZGVhbCBtb3JlIGNvbXBsZXggdGhhbiBhIHNpbXBsZSBzaW5nbGUK
dmFsdWUgdGVzdCBiZWNhdXNlIFJDVSBjYW4gYmUgYWNxdWlyZWQgaW4gKHByZXR0eSBtdWNoKSBh
bnkgY29udGV4dCBhbmQKd2hpbGUgaXQgcHJlc2VudHMgYSBjb250ZXh0IHRvIG5lc3RlZCBsb2Nr
cyBpdCBpcyBub3QgdGhlIHNhbWUgYXMgaXQKZ290IGFjcXVpcmVkIGluLgoKVGhlcmVmb3JlIGl0
cyBuZWNlc3NhcnkgdG8gc3BsaXQgdGhlIHdhaXRfdHlwZSBpbnRvIHR3byB2YWx1ZXMsIG9uZQpy
ZXByZXNlbnRpbmcgdGhlIGFjcXVpcmUgKG91dGVyKSBhbmQgb25lIHJlcHJlc2VudGluZyB0aGUg
bmVzdGVkIGNvbnRleHQKKGlubmVyKS4gRm9yIG1vc3QgJ25vcm1hbCcgbG9ja3MgdGhlc2UgdHdv
IGFyZSB0aGUgc2FtZS4KClsgVG8gbWFrZSBzdGF0aWMgaW5pdGlhbGl6YXRpb24gZWFzaWVyIHdl
IGhhdmUgdGhlIHJ1bGUgdGhhdDoKICAub3V0ZXIgPT0gSU5WIG1lYW5zIC5vdXRlciA9PSAuaW5u
ZXI7IGJlY2F1c2UgSU5WID09IDAuIF0KCkl0IGZ1cnRoZXIgbWVhbnMgdGhhdCBpdHMgcmVxdWly
ZWQgdG8gZmluZCB0aGUgbWluaW1hbCAuaW5uZXIgb2YgdGhlIGhlbGQKc3RhY2sgdG8gY29tcGFy
ZSBhZ2FpbnN0IHRoZSBvdXRlciBvZiB0aGUgbmV3IGxvY2s7IGJlY2F1c2Ugd2hpbGUgJ25vcm1h
bCcKUkNVIHByZXNlbnRzIGEgQ09ORklHIHR5cGUgdG8gbmVzdGVkIGxvY2tzLCBpZiBpdCBpcyB0
YWtlbiB3aGlsZSBhbHJlYWR5CmhvbGRpbmcgYSBTUElOIHR5cGUgaXQgb2J2aW91c2x5IGRvZXNu
J3QgcmVsYXggdGhlIHJ1bGVzLgoKQmVsb3cgaXMgYW4gZXhhbXBsZSBvdXRwdXQgZ2VuZXJhdGVk
IGJ5IHRoZSB0cml2aWFsIHRlc3QgY29kZToKCiAgcmF3X3NwaW5fbG9jaygmZm9vKTsKICBzcGlu
X2xvY2soJmJhcik7CiAgc3Bpbl91bmxvY2soJmJhcik7CiAgcmF3X3NwaW5fdW5sb2NrKCZmb28p
OwoKIFsgQlVHOiBJbnZhbGlkIHdhaXQgY29udGV4dCBdCiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLQogc3dhcHBlci8wLzEgaXMgdHJ5aW5nIHRvIGxvY2s6CiBmZmZmYzkwMDAwMDEzZjIw
ICgmYmFyKXsuLi4ufS17MzozfSwgYXQ6IGtlcm5lbF9pbml0KzB4ZGIvMHgxODcKIG90aGVyIGlu
Zm8gdGhhdCBtaWdodCBoZWxwIHVzIGRlYnVnIHRoaXM6CiAxIGxvY2sgaGVsZCBieSBzd2FwcGVy
LzAvMToKICAjMDogZmZmZmM5MDAwMDAxM2VlMCAoJmZvbyl7Ky4rLn0tezI6Mn0sIGF0OiBrZXJu
ZWxfaW5pdCsweGQxLzB4MTg3CgpUaGUgd2F5IHRvIHJlYWQgaXQgaXMgdG8gbG9vayBhdCB0aGUg
bmV3IC17bixtfSBwYXJ0IGluIHRoZSBsb2NrCmRlc2NyaXB0aW9uOyAtezM6M30gZm9yIHRoZSBh
dHRlbXB0ZWQgbG9jaywgYW5kIHRyeSBhbmQgbWF0Y2ggdGhhdCB1cCB0bwp0aGUgaGVsZCBsb2Nr
cywgd2hpY2ggaW4gdGhpcyBjYXNlIGlzIHRoZSBvbmU6IC17MiwyfS4KClRoaXMgdGVsbHMgdGhh
dCB0aGUgYWNxdWlyaW5nIGxvY2sgcmVxdWlyZXMgYSBtb3JlIHJlbGF4ZWQgZW52aXJvbm1lbnQg
dGhhbgpwcmVzZW50ZWQgYnkgdGhlIGxvY2sgc3RhY2suCgpDdXJyZW50bHkgb25seSB0aGUgbm9y
bWFsIGxvY2tzIGFuZCBSQ1UgYXJlIGNvbnZlcnRlZCwgdGhlIHJlc3Qgb2YgdGhlCmxvY2tkZXAg
dXNlcnMgZGVmYXVsdHMgdG8gLmlubmVyID0gSU5WIHdoaWNoIGlzIGlnbm9yZWQuIE1vcmUgY29u
dmVyc2lvbnMKY2FuIGJlIGRvbmUgd2hlbiBkZXNpcmVkLgoKVGhlIGNoZWNrIGZvciBzcGlubG9j
a190IG5lc3RpbmcgaXMgbm90IGVuYWJsZWQgYnkgZGVmYXVsdC4gSXQncyBhIHNlcGFyYXRlCmNv
bmZpZyBvcHRpb24gZm9yIG5vdyBhcyB0aGVyZSBhcmUga25vd24gcHJvYmxlbXMgd2hpY2ggYXJl
IGN1cnJlbnRseQphZGRyZXNzZWQuIFRoZSBjb25maWcgb3B0aW9uIGFsbG93cyB0byBpZGVudGlm
eSB0aGVzZSBwcm9ibGVtcyBhbmQgdG8KdmVyaWZ5IHRoYXQgdGhlIHNvbHV0aW9ucyBmb3VuZCBh
cmUgaW5kZWVkIHNvbHZpbmcgdGhlbS4KClRoZSBjb25maWcgc3dpdGNoIHdpbGwgYmUgcmVtb3Zl
ZCBhbmQgdGhlIGNoZWNrcyB3aWxsIHBlcm1hbmVudGx5IGVuYWJsZWQKb25jZSB0aGUgdmFzdCBt
YWpvcml0eSBvZiBpc3N1ZXMgaGFzIGJlZW4gYWRkcmVzc2VkLgoKWyBiaWdlYXN5OiBNb3ZlIExE
X1dBSVRfRlJFRSzigKYgb3V0IG9mIENPTkZJR19MT0NLREVQIHRvIGF2b2lkIGNvbXBpbGUKCSAg
IGZhaWx1cmUgd2l0aCBDT05GSUdfREVCVUdfU1BJTkxPQ0sgKyAhQ09ORklHX0xPQ0tERVBdClsg
dGdseDogQWRkIHRoZSBjb25maWcgb3B0aW9uIF0KClJlcXVlc3RlZC1ieTogVGhvbWFzIEdsZWl4
bmVyIDx0Z2x4QGxpbnV0cm9uaXguZGU+ClNpZ25lZC1vZmYtYnk6IFBldGVyIFppamxzdHJhIChJ
bnRlbCkgPHBldGVyekBpbmZyYWRlYWQub3JnPgpTaWduZWQtb2ZmLWJ5OiBTZWJhc3RpYW4gQW5k
cnplaiBTaWV3aW9yIDxiaWdlYXN5QGxpbnV0cm9uaXguZGU+ClNpZ25lZC1vZmYtYnk6IFRob21h
cyBHbGVpeG5lciA8dGdseEBsaW51dHJvbml4LmRlPgotLS0KVjI6IEZpeCB0aGUgTE9DS0RFUD15
ICYmIExPQ0tfUFJPVkU9biBjYXNlCi0tLQogaW5jbHVkZS9saW51eC9pcnFmbGFncy5oICAgICAg
ICB8ICAgIDggKysKIGluY2x1ZGUvbGludXgvbG9ja2RlcC5oICAgICAgICAgfCAgIDcxICsrKysr
KysrKysrKysrKysrLS0tCiBpbmNsdWRlL2xpbnV4L211dGV4LmggICAgICAgICAgIHwgICAgNyAr
LQogaW5jbHVkZS9saW51eC9yd2xvY2tfdHlwZXMuaCAgICB8ICAgIDYgKwogaW5jbHVkZS9saW51
eC9yd3NlbS5oICAgICAgICAgICB8ICAgIDYgKwogaW5jbHVkZS9saW51eC9zY2hlZC5oICAgICAg
ICAgICB8ICAgIDEgCiBpbmNsdWRlL2xpbnV4L3NwaW5sb2NrLmggICAgICAgIHwgICAzNSArKysr
KysrLS0tCiBpbmNsdWRlL2xpbnV4L3NwaW5sb2NrX3R5cGVzLmggIHwgICAyNCArKysrKy0KIGtl
cm5lbC9pcnEvaGFuZGxlLmMgICAgICAgICAgICAgfCAgICA3ICsrCiBrZXJuZWwvbG9ja2luZy9s
b2NrZGVwLmMgICAgICAgIHwgIDEzOCArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr
KysrKy0tCiBrZXJuZWwvbG9ja2luZy9tdXRleC1kZWJ1Zy5jICAgIHwgICAgMiAKIGtlcm5lbC9s
b2NraW5nL3J3c2VtLmMgICAgICAgICAgfCAgICAyIAoga2VybmVsL2xvY2tpbmcvc3BpbmxvY2tf
ZGVidWcuYyB8ICAgIDYgLQoga2VybmVsL3JjdS91cGRhdGUuYyAgICAgICAgICAgICB8ICAgMjQg
KysrKystCiBsaWIvS2NvbmZpZy5kZWJ1ZyAgICAgICAgICAgICAgIHwgICAxNyArKysrCiAxNSBm
aWxlcyBjaGFuZ2VkLCAzMDcgaW5zZXJ0aW9ucygrKSwgNDcgZGVsZXRpb25zKC0pCgotLS0gYS9p
bmNsdWRlL2xpbnV4L2lycWZsYWdzLmgKKysrIGIvaW5jbHVkZS9saW51eC9pcnFmbGFncy5oCkBA
IC0zNyw3ICszNywxMiBAQAogIyBkZWZpbmUgdHJhY2Vfc29mdGlycXNfZW5hYmxlZChwKQkoKHAp
LT5zb2Z0aXJxc19lbmFibGVkKQogIyBkZWZpbmUgdHJhY2VfaGFyZGlycV9lbnRlcigpCQkJXAog
ZG8gewkJCQkJCVwKLQljdXJyZW50LT5oYXJkaXJxX2NvbnRleHQrKzsJCVwKKwlpZiAoIWN1cnJl
bnQtPmhhcmRpcnFfY29udGV4dCsrKQlcCisJCWN1cnJlbnQtPmhhcmRpcnFfdGhyZWFkZWQgPSAw
OwlcCit9IHdoaWxlICgwKQorIyBkZWZpbmUgdHJhY2VfaGFyZGlycV90aHJlYWRlZCgpCQlcCitk
byB7CQkJCQkJXAorCWN1cnJlbnQtPmhhcmRpcnFfdGhyZWFkZWQgPSAxOwkJXAogfSB3aGlsZSAo
MCkKICMgZGVmaW5lIHRyYWNlX2hhcmRpcnFfZXhpdCgpCQkJXAogZG8gewkJCQkJCVwKQEAgLTU5
LDYgKzY0LDcgQEAgZG8gewkJCQkJCVwKICMgZGVmaW5lIHRyYWNlX2hhcmRpcnFzX2VuYWJsZWQo
cCkJMAogIyBkZWZpbmUgdHJhY2Vfc29mdGlycXNfZW5hYmxlZChwKQkwCiAjIGRlZmluZSB0cmFj
ZV9oYXJkaXJxX2VudGVyKCkJCWRvIHsgfSB3aGlsZSAoMCkKKyMgZGVmaW5lIHRyYWNlX2hhcmRp
cnFfdGhyZWFkZWQoKQlkbyB7IH0gd2hpbGUgKDApCiAjIGRlZmluZSB0cmFjZV9oYXJkaXJxX2V4
aXQoKQkJZG8geyB9IHdoaWxlICgwKQogIyBkZWZpbmUgbG9ja2RlcF9zb2Z0aXJxX2VudGVyKCkJ
ZG8geyB9IHdoaWxlICgwKQogIyBkZWZpbmUgbG9ja2RlcF9zb2Z0aXJxX2V4aXQoKQkJZG8geyB9
IHdoaWxlICgwKQotLS0gYS9pbmNsdWRlL2xpbnV4L2xvY2tkZXAuaAorKysgYi9pbmNsdWRlL2xp
bnV4L2xvY2tkZXAuaApAQCAtMjEsNiArMjEsMjIgQEAgZXh0ZXJuIGludCBsb2NrX3N0YXQ7CiAK
ICNpbmNsdWRlIDxsaW51eC90eXBlcy5oPgogCitlbnVtIGxvY2tkZXBfd2FpdF90eXBlIHsKKwlM
RF9XQUlUX0lOViA9IDAsCS8qIG5vdCBjaGVja2VkLCBjYXRjaCBhbGwgKi8KKworCUxEX1dBSVRf
RlJFRSwJCS8qIHdhaXQgZnJlZSwgcmN1IGV0Yy4uICovCisJTERfV0FJVF9TUElOLAkJLyogc3Bp
biBsb29wcywgcmF3X3NwaW5sb2NrX3QgZXRjLi4gKi8KKworI2lmZGVmIENPTkZJR19QUk9WRV9S
QVdfTE9DS19ORVNUSU5HCisJTERfV0FJVF9DT05GSUcsCQkvKiBDT05GSUdfUFJFRU1QVF9MT0NL
LCBzcGlubG9ja190IGV0Yy4uICovCisjZWxzZQorCUxEX1dBSVRfQ09ORklHID0gTERfV0FJVF9T
UElOLAorI2VuZGlmCisJTERfV0FJVF9TTEVFUCwJCS8qIHNsZWVwaW5nIGxvY2tzLCBtdXRleF90
IGV0Yy4uICovCisKKwlMRF9XQUlUX01BWCwJCS8qIG11c3QgYmUgbGFzdCAqLworfTsKKwogI2lm
ZGVmIENPTkZJR19MT0NLREVQCiAKICNpbmNsdWRlIDxsaW51eC9saW5rYWdlLmg+CkBAIC0xMTEs
NiArMTI3LDkgQEAgc3RydWN0IGxvY2tfY2xhc3MgewogCWludAkJCQluYW1lX3ZlcnNpb247CiAJ
Y29uc3QgY2hhcgkJCSpuYW1lOwogCisJc2hvcnQJCQkJd2FpdF90eXBlX2lubmVyOworCXNob3J0
CQkJCXdhaXRfdHlwZV9vdXRlcjsKKwogI2lmZGVmIENPTkZJR19MT0NLX1NUQVQKIAl1bnNpZ25l
ZCBsb25nCQkJY29udGVudGlvbl9wb2ludFtMT0NLU1RBVF9QT0lOVFNdOwogCXVuc2lnbmVkIGxv
bmcJCQljb250ZW5kaW5nX3BvaW50W0xPQ0tTVEFUX1BPSU5UU107CkBAIC0xNTgsNiArMTc3LDgg
QEAgc3RydWN0IGxvY2tkZXBfbWFwIHsKIAlzdHJ1Y3QgbG9ja19jbGFzc19rZXkJCSprZXk7CiAJ
c3RydWN0IGxvY2tfY2xhc3MJCSpjbGFzc19jYWNoZVtOUl9MT0NLREVQX0NBQ0hJTkdfQ0xBU1NF
U107CiAJY29uc3QgY2hhcgkJCSpuYW1lOworCXNob3J0CQkJCXdhaXRfdHlwZV9vdXRlcjsgLyog
Y2FuIGJlIHRha2VuIGluIHRoaXMgY29udGV4dCAqLworCXNob3J0CQkJCXdhaXRfdHlwZV9pbm5l
cjsgLyogcHJlc2VudHMgdGhpcyBjb250ZXh0ICovCiAjaWZkZWYgQ09ORklHX0xPQ0tfU1RBVAog
CWludAkJCQljcHU7CiAJdW5zaWduZWQgbG9uZwkJCWlwOwpAQCAtMjk5LDggKzMyMCwyMSBAQCBl
eHRlcm4gdm9pZCBsb2NrZGVwX3VucmVnaXN0ZXJfa2V5KHN0cnVjCiAgKiB0byBsb2NrZGVwOgog
ICovCiAKLWV4dGVybiB2b2lkIGxvY2tkZXBfaW5pdF9tYXAoc3RydWN0IGxvY2tkZXBfbWFwICps
b2NrLCBjb25zdCBjaGFyICpuYW1lLAotCQkJICAgICBzdHJ1Y3QgbG9ja19jbGFzc19rZXkgKmtl
eSwgaW50IHN1YmNsYXNzKTsKK2V4dGVybiB2b2lkIGxvY2tkZXBfaW5pdF9tYXBfd2FpdHMoc3Ry
dWN0IGxvY2tkZXBfbWFwICpsb2NrLCBjb25zdCBjaGFyICpuYW1lLAorCXN0cnVjdCBsb2NrX2Ns
YXNzX2tleSAqa2V5LCBpbnQgc3ViY2xhc3MsIHNob3J0IGlubmVyLCBzaG9ydCBvdXRlcik7CisK
K3N0YXRpYyBpbmxpbmUgdm9pZAorbG9ja2RlcF9pbml0X21hcF93YWl0KHN0cnVjdCBsb2NrZGVw
X21hcCAqbG9jaywgY29uc3QgY2hhciAqbmFtZSwKKwkJICAgICAgc3RydWN0IGxvY2tfY2xhc3Nf
a2V5ICprZXksIGludCBzdWJjbGFzcywgc2hvcnQgaW5uZXIpCit7CisJbG9ja2RlcF9pbml0X21h
cF93YWl0cyhsb2NrLCBuYW1lLCBrZXksIHN1YmNsYXNzLCBpbm5lciwgTERfV0FJVF9JTlYpOwor
fQorCitzdGF0aWMgaW5saW5lIHZvaWQgbG9ja2RlcF9pbml0X21hcChzdHJ1Y3QgbG9ja2RlcF9t
YXAgKmxvY2ssIGNvbnN0IGNoYXIgKm5hbWUsCisJCQkgICAgIHN0cnVjdCBsb2NrX2NsYXNzX2tl
eSAqa2V5LCBpbnQgc3ViY2xhc3MpCit7CisJbG9ja2RlcF9pbml0X21hcF93YWl0KGxvY2ssIG5h
bWUsIGtleSwgc3ViY2xhc3MsIExEX1dBSVRfSU5WKTsKK30KIAogLyoKICAqIFJlaW5pdGlhbGl6
ZSBhIGxvY2sga2V5IC0gZm9yIGNhc2VzIHdoZXJlIHRoZXJlIGlzIHNwZWNpYWwgbG9ja2luZyBv
cgpAQCAtMzA4LDE4ICszNDIsMjkgQEAgZXh0ZXJuIHZvaWQgbG9ja2RlcF9pbml0X21hcChzdHJ1
Y3QgbG9jawogICogb2YgZGVwZW5kZW5jaWVzIHdyb25nOiB0aGV5IGFyZSBlaXRoZXIgdG9vIGJy
b2FkICh0aGV5IG5lZWQgYSBjbGFzcy1zcGxpdCkKICAqIG9yIHRoZXkgYXJlIHRvbyBuYXJyb3cg
KHRoZXkgc3VmZmVyIGZyb20gYSBmYWxzZSBjbGFzcy1zcGxpdCk6CiAgKi8KLSNkZWZpbmUgbG9j
a2RlcF9zZXRfY2xhc3MobG9jaywga2V5KSBcCi0JCWxvY2tkZXBfaW5pdF9tYXAoJihsb2NrKS0+
ZGVwX21hcCwgI2tleSwga2V5LCAwKQotI2RlZmluZSBsb2NrZGVwX3NldF9jbGFzc19hbmRfbmFt
ZShsb2NrLCBrZXksIG5hbWUpIFwKLQkJbG9ja2RlcF9pbml0X21hcCgmKGxvY2spLT5kZXBfbWFw
LCBuYW1lLCBrZXksIDApCi0jZGVmaW5lIGxvY2tkZXBfc2V0X2NsYXNzX2FuZF9zdWJjbGFzcyhs
b2NrLCBrZXksIHN1YikgXAotCQlsb2NrZGVwX2luaXRfbWFwKCYobG9jayktPmRlcF9tYXAsICNr
ZXksIGtleSwgc3ViKQotI2RlZmluZSBsb2NrZGVwX3NldF9zdWJjbGFzcyhsb2NrLCBzdWIpCVwK
LQkJbG9ja2RlcF9pbml0X21hcCgmKGxvY2spLT5kZXBfbWFwLCAjbG9jaywgXAotCQkJCSAobG9j
ayktPmRlcF9tYXAua2V5LCBzdWIpCisjZGVmaW5lIGxvY2tkZXBfc2V0X2NsYXNzKGxvY2ssIGtl
eSkJCQkJXAorCWxvY2tkZXBfaW5pdF9tYXBfd2FpdHMoJihsb2NrKS0+ZGVwX21hcCwgI2tleSwg
a2V5LCAwLAlcCisJCQkgICAgICAgKGxvY2spLT5kZXBfbWFwLndhaXRfdHlwZV9pbm5lciwJXAor
CQkJICAgICAgIChsb2NrKS0+ZGVwX21hcC53YWl0X3R5cGVfb3V0ZXIpCisKKyNkZWZpbmUgbG9j
a2RlcF9zZXRfY2xhc3NfYW5kX25hbWUobG9jaywga2V5LCBuYW1lKQkJXAorCWxvY2tkZXBfaW5p
dF9tYXBfd2FpdHMoJihsb2NrKS0+ZGVwX21hcCwgbmFtZSwga2V5LCAwLAlcCisJCQkgICAgICAg
KGxvY2spLT5kZXBfbWFwLndhaXRfdHlwZV9pbm5lciwJXAorCQkJICAgICAgIChsb2NrKS0+ZGVw
X21hcC53YWl0X3R5cGVfb3V0ZXIpCisKKyNkZWZpbmUgbG9ja2RlcF9zZXRfY2xhc3NfYW5kX3N1
YmNsYXNzKGxvY2ssIGtleSwgc3ViKQkJXAorCWxvY2tkZXBfaW5pdF9tYXBfd2FpdHMoJihsb2Nr
KS0+ZGVwX21hcCwgI2tleSwga2V5LCBzdWIsXAorCQkJICAgICAgIChsb2NrKS0+ZGVwX21hcC53
YWl0X3R5cGVfaW5uZXIsCVwKKwkJCSAgICAgICAobG9jayktPmRlcF9tYXAud2FpdF90eXBlX291
dGVyKQorCisjZGVmaW5lIGxvY2tkZXBfc2V0X3N1YmNsYXNzKGxvY2ssIHN1YikJCQkJCVwKKwls
b2NrZGVwX2luaXRfbWFwX3dhaXRzKCYobG9jayktPmRlcF9tYXAsICNsb2NrLCAobG9jayktPmRl
cF9tYXAua2V5LCBzdWIsXAorCQkJICAgICAgIChsb2NrKS0+ZGVwX21hcC53YWl0X3R5cGVfaW5u
ZXIsCQlcCisJCQkgICAgICAgKGxvY2spLT5kZXBfbWFwLndhaXRfdHlwZV9vdXRlcikKIAogI2Rl
ZmluZSBsb2NrZGVwX3NldF9ub3ZhbGlkYXRlX2NsYXNzKGxvY2spIFwKIAlsb2NrZGVwX3NldF9j
bGFzc19hbmRfbmFtZShsb2NrLCAmX19sb2NrZGVwX25vX3ZhbGlkYXRlX18sICNsb2NrKQorCiAv
KgogICogQ29tcGFyZSBsb2NraW5nIGNsYXNzZXMKICAqLwpAQCAtNDMyLDYgKzQ3NywxMCBAQCBz
dGF0aWMgaW5saW5lIHZvaWQgbG9ja2RlcF9zZXRfc2VsZnRlc3RfCiAjIGRlZmluZSBsb2NrX3Nl
dF9jbGFzcyhsLCBuLCBrLCBzLCBpKQkJZG8geyB9IHdoaWxlICgwKQogIyBkZWZpbmUgbG9ja19z
ZXRfc3ViY2xhc3MobCwgcywgaSkJCWRvIHsgfSB3aGlsZSAoMCkKICMgZGVmaW5lIGxvY2tkZXBf
aW5pdCgpCQkJCWRvIHsgfSB3aGlsZSAoMCkKKyMgZGVmaW5lIGxvY2tkZXBfaW5pdF9tYXBfd2Fp
dHMobG9jaywgbmFtZSwga2V5LCBzdWIsIGlubmVyLCBvdXRlcikgXAorCQlkbyB7ICh2b2lkKShu
YW1lKTsgKHZvaWQpKGtleSk7IH0gd2hpbGUgKDApCisjIGRlZmluZSBsb2NrZGVwX2luaXRfbWFw
X3dhaXQobG9jaywgbmFtZSwga2V5LCBzdWIsIGlubmVyKSBcCisJCWRvIHsgKHZvaWQpKG5hbWUp
OyAodm9pZCkoa2V5KTsgfSB3aGlsZSAoMCkKICMgZGVmaW5lIGxvY2tkZXBfaW5pdF9tYXAobG9j
aywgbmFtZSwga2V5LCBzdWIpIFwKIAkJZG8geyAodm9pZCkobmFtZSk7ICh2b2lkKShrZXkpOyB9
IHdoaWxlICgwKQogIyBkZWZpbmUgbG9ja2RlcF9zZXRfY2xhc3MobG9jaywga2V5KQkJZG8geyAo
dm9pZCkoa2V5KTsgfSB3aGlsZSAoMCkKLS0tIGEvaW5jbHVkZS9saW51eC9tdXRleC5oCisrKyBi
L2luY2x1ZGUvbGludXgvbXV0ZXguaApAQCAtMTA5LDggKzEwOSwxMSBAQCBkbyB7CQkJCQkJCQkJ
XAogfSB3aGlsZSAoMCkKIAogI2lmZGVmIENPTkZJR19ERUJVR19MT0NLX0FMTE9DCi0jIGRlZmlu
ZSBfX0RFUF9NQVBfTVVURVhfSU5JVElBTElaRVIobG9ja25hbWUpIFwKLQkJLCAuZGVwX21hcCA9
IHsgLm5hbWUgPSAjbG9ja25hbWUgfQorIyBkZWZpbmUgX19ERVBfTUFQX01VVEVYX0lOSVRJQUxJ
WkVSKGxvY2tuYW1lKQkJCVwKKwkJLCAuZGVwX21hcCA9IHsJCQkJCVwKKwkJCS5uYW1lID0gI2xv
Y2tuYW1lLAkJCVwKKwkJCS53YWl0X3R5cGVfaW5uZXIgPSBMRF9XQUlUX1NMRUVQLAlcCisJCX0K
ICNlbHNlCiAjIGRlZmluZSBfX0RFUF9NQVBfTVVURVhfSU5JVElBTElaRVIobG9ja25hbWUpCiAj
ZW5kaWYKLS0tIGEvaW5jbHVkZS9saW51eC9yd2xvY2tfdHlwZXMuaAorKysgYi9pbmNsdWRlL2xp
bnV4L3J3bG9ja190eXBlcy5oCkBAIC0yMiw3ICsyMiwxMSBAQCB0eXBlZGVmIHN0cnVjdCB7CiAj
ZGVmaW5lIFJXTE9DS19NQUdJQwkJMHhkZWFmMWVlZAogCiAjaWZkZWYgQ09ORklHX0RFQlVHX0xP
Q0tfQUxMT0MKLSMgZGVmaW5lIFJXX0RFUF9NQVBfSU5JVChsb2NrbmFtZSkJLmRlcF9tYXAgPSB7
IC5uYW1lID0gI2xvY2tuYW1lIH0KKyMgZGVmaW5lIFJXX0RFUF9NQVBfSU5JVChsb2NrbmFtZSkJ
CQkJCVwKKwkuZGVwX21hcCA9IHsJCQkJCQkJXAorCQkubmFtZSA9ICNsb2NrbmFtZSwJCQkJCVwK
KwkJLndhaXRfdHlwZV9pbm5lciA9IExEX1dBSVRfQ09ORklHLAkJCVwKKwl9CiAjZWxzZQogIyBk
ZWZpbmUgUldfREVQX01BUF9JTklUKGxvY2tuYW1lKQogI2VuZGlmCi0tLSBhL2luY2x1ZGUvbGlu
dXgvcndzZW0uaAorKysgYi9pbmNsdWRlL2xpbnV4L3J3c2VtLmgKQEAgLTcxLDcgKzcxLDExIEBA
IHN0YXRpYyBpbmxpbmUgaW50IHJ3c2VtX2lzX2xvY2tlZChzdHJ1Y3QKIC8qIENvbW1vbiBpbml0
aWFsaXplciBtYWNyb3MgYW5kIGZ1bmN0aW9ucyAqLwogCiAjaWZkZWYgQ09ORklHX0RFQlVHX0xP
Q0tfQUxMT0MKLSMgZGVmaW5lIF9fUldTRU1fREVQX01BUF9JTklUKGxvY2tuYW1lKSAsIC5kZXBf
bWFwID0geyAubmFtZSA9ICNsb2NrbmFtZSB9CisjIGRlZmluZSBfX1JXU0VNX0RFUF9NQVBfSU5J
VChsb2NrbmFtZSkJCQlcCisJLCAuZGVwX21hcCA9IHsJCQkJCVwKKwkJLm5hbWUgPSAjbG9ja25h
bWUsCQkJXAorCQkud2FpdF90eXBlX2lubmVyID0gTERfV0FJVF9TTEVFUCwJXAorCX0KICNlbHNl
CiAjIGRlZmluZSBfX1JXU0VNX0RFUF9NQVBfSU5JVChsb2NrbmFtZSkKICNlbmRpZgotLS0gYS9p
bmNsdWRlL2xpbnV4L3NjaGVkLmgKKysrIGIvaW5jbHVkZS9saW51eC9zY2hlZC5oCkBAIC05NzAs
NiArOTcwLDcgQEAgc3RydWN0IHRhc2tfc3RydWN0IHsKIAogI2lmZGVmIENPTkZJR19UUkFDRV9J
UlFGTEFHUwogCXVuc2lnbmVkIGludAkJCWlycV9ldmVudHM7CisJdW5zaWduZWQgaW50CQkJaGFy
ZGlycV90aHJlYWRlZDsKIAl1bnNpZ25lZCBsb25nCQkJaGFyZGlycV9lbmFibGVfaXA7CiAJdW5z
aWduZWQgbG9uZwkJCWhhcmRpcnFfZGlzYWJsZV9pcDsKIAl1bnNpZ25lZCBpbnQJCQloYXJkaXJx
X2VuYWJsZV9ldmVudDsKLS0tIGEvaW5jbHVkZS9saW51eC9zcGlubG9jay5oCisrKyBiL2luY2x1
ZGUvbGludXgvc3BpbmxvY2suaApAQCAtOTMsMTIgKzkzLDEzIEBACiAKICNpZmRlZiBDT05GSUdf
REVCVUdfU1BJTkxPQ0sKICAgZXh0ZXJuIHZvaWQgX19yYXdfc3Bpbl9sb2NrX2luaXQocmF3X3Nw
aW5sb2NrX3QgKmxvY2ssIGNvbnN0IGNoYXIgKm5hbWUsCi0JCQkJICAgc3RydWN0IGxvY2tfY2xh
c3Nfa2V5ICprZXkpOwotIyBkZWZpbmUgcmF3X3NwaW5fbG9ja19pbml0KGxvY2spCQkJCVwKLWRv
IHsJCQkJCQkJCVwKLQlzdGF0aWMgc3RydWN0IGxvY2tfY2xhc3Nfa2V5IF9fa2V5OwkJCVwKLQkJ
CQkJCQkJXAotCV9fcmF3X3NwaW5fbG9ja19pbml0KChsb2NrKSwgI2xvY2ssICZfX2tleSk7CQlc
CisJCQkJICAgc3RydWN0IGxvY2tfY2xhc3Nfa2V5ICprZXksIHNob3J0IGlubmVyKTsKKworIyBk
ZWZpbmUgcmF3X3NwaW5fbG9ja19pbml0KGxvY2spCQkJCQlcCitkbyB7CQkJCQkJCQkJXAorCXN0
YXRpYyBzdHJ1Y3QgbG9ja19jbGFzc19rZXkgX19rZXk7CQkJCVwKKwkJCQkJCQkJCVwKKwlfX3Jh
d19zcGluX2xvY2tfaW5pdCgobG9jayksICNsb2NrLCAmX19rZXksIExEX1dBSVRfU1BJTik7CVwK
IH0gd2hpbGUgKDApCiAKICNlbHNlCkBAIC0zMjcsMTIgKzMyOCwyNiBAQCBzdGF0aWMgX19hbHdh
eXNfaW5saW5lIHJhd19zcGlubG9ja190ICpzCiAJcmV0dXJuICZsb2NrLT5ybG9jazsKIH0KIAot
I2RlZmluZSBzcGluX2xvY2tfaW5pdChfbG9jaykJCQkJXAotZG8gewkJCQkJCQlcCi0Jc3Bpbmxv
Y2tfY2hlY2soX2xvY2spOwkJCQlcCi0JcmF3X3NwaW5fbG9ja19pbml0KCYoX2xvY2spLT5ybG9j
ayk7CQlcCisjaWZkZWYgQ09ORklHX0RFQlVHX1NQSU5MT0NLCisKKyMgZGVmaW5lIHNwaW5fbG9j
a19pbml0KGxvY2spCQkJCQlcCitkbyB7CQkJCQkJCQlcCisJc3RhdGljIHN0cnVjdCBsb2NrX2Ns
YXNzX2tleSBfX2tleTsJCQlcCisJCQkJCQkJCVwKKwlfX3Jhd19zcGluX2xvY2tfaW5pdChzcGlu
bG9ja19jaGVjayhsb2NrKSwJCVwKKwkJCSAgICAgI2xvY2ssICZfX2tleSwgTERfV0FJVF9DT05G
SUcpOwlcCit9IHdoaWxlICgwKQorCisjZWxzZQorCisjIGRlZmluZSBzcGluX2xvY2tfaW5pdChf
bG9jaykJCQlcCitkbyB7CQkJCQkJXAorCXNwaW5sb2NrX2NoZWNrKF9sb2NrKTsJCQlcCisJKihf
bG9jaykgPSBfX1NQSU5fTE9DS19VTkxPQ0tFRChfbG9jayk7CVwKIH0gd2hpbGUgKDApCiAKKyNl
bmRpZgorCiBzdGF0aWMgX19hbHdheXNfaW5saW5lIHZvaWQgc3Bpbl9sb2NrKHNwaW5sb2NrX3Qg
KmxvY2spCiB7CiAJcmF3X3NwaW5fbG9jaygmbG9jay0+cmxvY2spOwotLS0gYS9pbmNsdWRlL2xp
bnV4L3NwaW5sb2NrX3R5cGVzLmgKKysrIGIvaW5jbHVkZS9saW51eC9zcGlubG9ja190eXBlcy5o
CkBAIC0zMyw4ICszMywxOCBAQCB0eXBlZGVmIHN0cnVjdCByYXdfc3BpbmxvY2sgewogI2RlZmlu
ZSBTUElOTE9DS19PV05FUl9JTklUCSgodm9pZCAqKS0xTCkKIAogI2lmZGVmIENPTkZJR19ERUJV
R19MT0NLX0FMTE9DCi0jIGRlZmluZSBTUElOX0RFUF9NQVBfSU5JVChsb2NrbmFtZSkJLmRlcF9t
YXAgPSB7IC5uYW1lID0gI2xvY2tuYW1lIH0KKyMgZGVmaW5lIFJBV19TUElOX0RFUF9NQVBfSU5J
VChsb2NrbmFtZSkJCVwKKwkuZGVwX21hcCA9IHsJCQkJCVwKKwkJLm5hbWUgPSAjbG9ja25hbWUs
CQkJXAorCQkud2FpdF90eXBlX2lubmVyID0gTERfV0FJVF9TUElOLAlcCisJfQorIyBkZWZpbmUg
U1BJTl9ERVBfTUFQX0lOSVQobG9ja25hbWUpCQkJXAorCS5kZXBfbWFwID0gewkJCQkJXAorCQku
bmFtZSA9ICNsb2NrbmFtZSwJCQlcCisJCS53YWl0X3R5cGVfaW5uZXIgPSBMRF9XQUlUX0NPTkZJ
RywJXAorCX0KICNlbHNlCisjIGRlZmluZSBSQVdfU1BJTl9ERVBfTUFQX0lOSVQobG9ja25hbWUp
CiAjIGRlZmluZSBTUElOX0RFUF9NQVBfSU5JVChsb2NrbmFtZSkKICNlbmRpZgogCkBAIC01MSw3
ICs2MSw3IEBAIHR5cGVkZWYgc3RydWN0IHJhd19zcGlubG9jayB7CiAJewkJCQkJXAogCS5yYXdf
bG9jayA9IF9fQVJDSF9TUElOX0xPQ0tfVU5MT0NLRUQsCVwKIAlTUElOX0RFQlVHX0lOSVQobG9j
a25hbWUpCQlcCi0JU1BJTl9ERVBfTUFQX0lOSVQobG9ja25hbWUpIH0KKwlSQVdfU1BJTl9ERVBf
TUFQX0lOSVQobG9ja25hbWUpIH0KIAogI2RlZmluZSBfX1JBV19TUElOX0xPQ0tfVU5MT0NLRUQo
bG9ja25hbWUpCVwKIAkocmF3X3NwaW5sb2NrX3QpIF9fUkFXX1NQSU5fTE9DS19JTklUSUFMSVpF
Uihsb2NrbmFtZSkKQEAgLTcyLDExICs4MiwxNyBAQCB0eXBlZGVmIHN0cnVjdCBzcGlubG9jayB7
CiAJfTsKIH0gc3BpbmxvY2tfdDsKIAorI2RlZmluZSBfX19TUElOX0xPQ0tfSU5JVElBTElaRVIo
bG9ja25hbWUpCVwKKwl7CQkJCQlcCisJLnJhd19sb2NrID0gX19BUkNIX1NQSU5fTE9DS19VTkxP
Q0tFRCwJXAorCVNQSU5fREVCVUdfSU5JVChsb2NrbmFtZSkJCVwKKwlTUElOX0RFUF9NQVBfSU5J
VChsb2NrbmFtZSkgfQorCiAjZGVmaW5lIF9fU1BJTl9MT0NLX0lOSVRJQUxJWkVSKGxvY2tuYW1l
KSBcCi0JeyB7IC5ybG9jayA9IF9fUkFXX1NQSU5fTE9DS19JTklUSUFMSVpFUihsb2NrbmFtZSkg
fSB9CisJeyB7IC5ybG9jayA9IF9fX1NQSU5fTE9DS19JTklUSUFMSVpFUihsb2NrbmFtZSkgfSB9
CiAKICNkZWZpbmUgX19TUElOX0xPQ0tfVU5MT0NLRUQobG9ja25hbWUpIFwKLQkoc3BpbmxvY2tf
dCApIF9fU1BJTl9MT0NLX0lOSVRJQUxJWkVSKGxvY2tuYW1lKQorCShzcGlubG9ja190KSBfX1NQ
SU5fTE9DS19JTklUSUFMSVpFUihsb2NrbmFtZSkKIAogI2RlZmluZSBERUZJTkVfU1BJTkxPQ0so
eCkJc3BpbmxvY2tfdCB4ID0gX19TUElOX0xPQ0tfVU5MT0NLRUQoeCkKIAotLS0gYS9rZXJuZWwv
aXJxL2hhbmRsZS5jCisrKyBiL2tlcm5lbC9pcnEvaGFuZGxlLmMKQEAgLTE0NSw2ICsxNDUsMTMg
QEAgaXJxcmV0dXJuX3QgX19oYW5kbGVfaXJxX2V2ZW50X3BlcmNwdShzdAogCWZvcl9lYWNoX2Fj
dGlvbl9vZl9kZXNjKGRlc2MsIGFjdGlvbikgewogCQlpcnFyZXR1cm5fdCByZXM7CiAKKwkJLyoK
KwkJICogSWYgdGhpcyBJUlEgd291bGQgYmUgdGhyZWFkZWQgdW5kZXIgZm9yY2VfaXJxdGhyZWFk
cywgbWFyayBpdCBzby4KKwkJICovCisJCWlmIChpcnFfc2V0dGluZ3NfY2FuX3RocmVhZChkZXNj
KSAmJgorCQkgICAgIShhY3Rpb24tPmZsYWdzICYgKElSUUZfTk9fVEhSRUFEIHwgSVJRRl9QRVJD
UFUgfCBJUlFGX09ORVNIT1QpKSkKKwkJCXRyYWNlX2hhcmRpcnFfdGhyZWFkZWQoKTsKKwogCQl0
cmFjZV9pcnFfaGFuZGxlcl9lbnRyeShpcnEsIGFjdGlvbik7CiAJCXJlcyA9IGFjdGlvbi0+aGFu
ZGxlcihpcnEsIGFjdGlvbi0+ZGV2X2lkKTsKIAkJdHJhY2VfaXJxX2hhbmRsZXJfZXhpdChpcnEs
IGFjdGlvbiwgcmVzKTsKLS0tIGEva2VybmVsL2xvY2tpbmcvbG9ja2RlcC5jCisrKyBiL2tlcm5l
bC9sb2NraW5nL2xvY2tkZXAuYwpAQCAtNjUzLDcgKzY1Myw5IEBAIHN0YXRpYyB2b2lkIHByaW50
X2xvY2tfbmFtZShzdHJ1Y3QgbG9ja18KIAogCXByaW50ayhLRVJOX0NPTlQgIiAoIik7CiAJX19w
cmludF9sb2NrX25hbWUoY2xhc3MpOwotCXByaW50ayhLRVJOX0NPTlQgIil7JXN9IiwgdXNhZ2Up
OworCXByaW50ayhLRVJOX0NPTlQgIil7JXN9LXslaGQ6JWhkfSIsIHVzYWdlLAorCQkJY2xhc3Mt
PndhaXRfdHlwZV9vdXRlciA/OiBjbGFzcy0+d2FpdF90eXBlX2lubmVyLAorCQkJY2xhc3MtPndh
aXRfdHlwZV9pbm5lcik7CiB9CiAKIHN0YXRpYyB2b2lkIHByaW50X2xvY2tkZXBfY2FjaGUoc3Ry
dWN0IGxvY2tkZXBfbWFwICpsb2NrKQpAQCAtMTIzMCw2ICsxMjMyLDggQEAgcmVnaXN0ZXJfbG9j
a19jbGFzcyhzdHJ1Y3QgbG9ja2RlcF9tYXAgKgogCVdBUk5fT05fT05DRSghbGlzdF9lbXB0eSgm
Y2xhc3MtPmxvY2tzX2JlZm9yZSkpOwogCVdBUk5fT05fT05DRSghbGlzdF9lbXB0eSgmY2xhc3Mt
PmxvY2tzX2FmdGVyKSk7CiAJY2xhc3MtPm5hbWVfdmVyc2lvbiA9IGNvdW50X21hdGNoaW5nX25h
bWVzKGNsYXNzKTsKKwljbGFzcy0+d2FpdF90eXBlX2lubmVyID0gbG9jay0+d2FpdF90eXBlX2lu
bmVyOworCWNsYXNzLT53YWl0X3R5cGVfb3V0ZXIgPSBsb2NrLT53YWl0X3R5cGVfb3V0ZXI7CiAJ
LyoKIAkgKiBXZSB1c2UgUkNVJ3Mgc2FmZSBsaXN0LWFkZCBtZXRob2QgdG8gbWFrZQogCSAqIHBh
cmFsbGVsIHdhbGtpbmcgb2YgdGhlIGhhc2gtbGlzdCBzYWZlOgpAQCAtMzY4Miw2ICszNjg2LDEx
MyBAQCBzdGF0aWMgaW50IG1hcmtfbG9jayhzdHJ1Y3QgdGFza19zdHJ1Y3QKIAlyZXR1cm4gcmV0
OwogfQogCitzdGF0aWMgaW50CitwcmludF9sb2NrX2ludmFsaWRfd2FpdF9jb250ZXh0KHN0cnVj
dCB0YXNrX3N0cnVjdCAqY3VyciwKKwkJCQlzdHJ1Y3QgaGVsZF9sb2NrICpobG9jaykKK3sKKwlp
ZiAoIWRlYnVnX2xvY2tzX29mZigpKQorCQlyZXR1cm4gMDsKKwlpZiAoZGVidWdfbG9ja3Nfc2ls
ZW50KQorCQlyZXR1cm4gMDsKKworCXByX3dhcm4oIlxuIik7CisJcHJfd2FybigiPT09PT09PT09
PT09PT09PT09PT09PT09PT09PT1cbiIpOworCXByX3dhcm4oIlsgQlVHOiBJbnZhbGlkIHdhaXQg
Y29udGV4dCBdXG4iKTsKKwlwcmludF9rZXJuZWxfaWRlbnQoKTsKKwlwcl93YXJuKCItLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLVxuIik7CisKKwlwcl93YXJuKCIlcy8lZCBpcyB0cnlpbmcg
dG8gbG9jazpcbiIsIGN1cnItPmNvbW0sIHRhc2tfcGlkX25yKGN1cnIpKTsKKwlwcmludF9sb2Nr
KGhsb2NrKTsKKworCXByX3dhcm4oIm90aGVyIGluZm8gdGhhdCBtaWdodCBoZWxwIHVzIGRlYnVn
IHRoaXM6XG4iKTsKKwlsb2NrZGVwX3ByaW50X2hlbGRfbG9ja3MoY3Vycik7CisKKwlwcl93YXJu
KCJzdGFjayBiYWNrdHJhY2U6XG4iKTsKKwlkdW1wX3N0YWNrKCk7CisKKwlyZXR1cm4gMDsKK30K
KworLyoKKyAqIFZlcmlmeSB0aGUgd2FpdF90eXBlIGNvbnRleHQuCisgKgorICogVGhpcyBjaGVj
ayB2YWxpZGF0ZXMgd2UgdGFrZXMgbG9ja3MgaW4gdGhlIHJpZ2h0IHdhaXQtdHlwZSBvcmRlcjsg
dGhhdCBpcyBpdAorICogZW5zdXJlcyB0aGF0IHdlIGRvIG5vdCB0YWtlIG11dGV4ZXMgaW5zaWRl
IHNwaW5sb2NrcyBhbmQgZG8gbm90IGF0dGVtcHQgdG8KKyAqIGFjcXVpcmUgc3BpbmxvY2tzIGlu
c2lkZSByYXdfc3BpbmxvY2tzIGFuZCB0aGUgc29ydC4KKyAqCisgKiBUaGUgZW50aXJlIHRoaW5n
IGlzIHNsaWdodGx5IG1vcmUgY29tcGxleCBiZWNhdXNlIG9mIFJDVSwgUkNVIGlzIGEgbG9jayB0
aGF0CisgKiBjYW4gYmUgdGFrZW4gZnJvbSAocHJldHR5IG11Y2gpIGFueSBjb250ZXh0IGJ1dCBh
bHNvIGhhcyBjb25zdHJhaW50cy4KKyAqIEhvd2V2ZXIgd2hlbiB0YWtlbiBpbiBhIHN0cmljdGVy
IGVudmlyb25tZW50IHRoZSBSQ1UgbG9jayBkb2VzIG5vdCBsb29zZW4KKyAqIHRoZSBjb25zdHJh
aW50cy4KKyAqCisgKiBUaGVyZWZvcmUgd2UgbXVzdCBsb29rIGZvciB0aGUgc3RyaWN0ZXN0IGVu
dmlyb25tZW50IGluIHRoZSBsb2NrIHN0YWNrIGFuZAorICogY29tcGFyZSB0aGF0IHRvIHRoZSBs
b2NrIHdlJ3JlIHRyeWluZyB0byBhY3F1aXJlLgorICovCitzdGF0aWMgaW50IGNoZWNrX3dhaXRf
Y29udGV4dChzdHJ1Y3QgdGFza19zdHJ1Y3QgKmN1cnIsIHN0cnVjdCBoZWxkX2xvY2sgKm5leHQp
Cit7CisJc2hvcnQgbmV4dF9pbm5lciA9IGhsb2NrX2NsYXNzKG5leHQpLT53YWl0X3R5cGVfaW5u
ZXI7CisJc2hvcnQgbmV4dF9vdXRlciA9IGhsb2NrX2NsYXNzKG5leHQpLT53YWl0X3R5cGVfb3V0
ZXI7CisJc2hvcnQgY3Vycl9pbm5lcjsKKwlpbnQgZGVwdGg7CisKKwlpZiAoIWN1cnItPmxvY2tk
ZXBfZGVwdGggfHwgIW5leHRfaW5uZXIgfHwgbmV4dC0+dHJ5bG9jaykKKwkJcmV0dXJuIDA7CisK
KwlpZiAoIW5leHRfb3V0ZXIpCisJCW5leHRfb3V0ZXIgPSBuZXh0X2lubmVyOworCisJLyoKKwkg
KiBGaW5kIHN0YXJ0IG9mIGN1cnJlbnQgaXJxX2NvbnRleHQuLgorCSAqLworCWZvciAoZGVwdGgg
PSBjdXJyLT5sb2NrZGVwX2RlcHRoIC0gMTsgZGVwdGggPj0gMDsgZGVwdGgtLSkgeworCQlzdHJ1
Y3QgaGVsZF9sb2NrICpwcmV2ID0gY3Vyci0+aGVsZF9sb2NrcyArIGRlcHRoOworCQlpZiAocHJl
di0+aXJxX2NvbnRleHQgIT0gbmV4dC0+aXJxX2NvbnRleHQpCisJCQlicmVhazsKKwl9CisJZGVw
dGgrKzsKKworCS8qCisJICogU2V0IGFwcHJvcHJpYXRlIHdhaXQgdHlwZSBmb3IgdGhlIGNvbnRl
eHQ7IGZvciBJUlFzIHdlIGhhdmUgdG8gdGFrZQorCSAqIGludG8gYWNjb3VudCBmb3JjZV9pcnF0
aHJlYWQgYXMgdGhhdCBpcyBpbXBsaWVkIGJ5IFBSRUVNUFRfUlQuCisJICovCisJaWYgKGN1cnIt
PmhhcmRpcnFfY29udGV4dCkgeworCQkvKgorCQkgKiBDaGVjayBpZiBmb3JjZV9pcnF0aHJlYWRz
IHdpbGwgcnVuIHVzIHRocmVhZGVkLgorCQkgKi8KKwkJaWYgKGN1cnItPmhhcmRpcnFfdGhyZWFk
ZWQpCisJCQljdXJyX2lubmVyID0gTERfV0FJVF9DT05GSUc7CisJCWVsc2UKKwkJCWN1cnJfaW5u
ZXIgPSBMRF9XQUlUX1NQSU47CisJfSBlbHNlIGlmIChjdXJyLT5zb2Z0aXJxX2NvbnRleHQpIHsK
KwkJLyoKKwkJICogU29mdGlycXMgYXJlIGFsd2F5cyB0aHJlYWRlZC4KKwkJICovCisJCWN1cnJf
aW5uZXIgPSBMRF9XQUlUX0NPTkZJRzsKKwl9IGVsc2UgeworCQljdXJyX2lubmVyID0gTERfV0FJ
VF9NQVg7CisJfQorCisJZm9yICg7IGRlcHRoIDwgY3Vyci0+bG9ja2RlcF9kZXB0aDsgZGVwdGgr
KykgeworCQlzdHJ1Y3QgaGVsZF9sb2NrICpwcmV2ID0gY3Vyci0+aGVsZF9sb2NrcyArIGRlcHRo
OworCQlzaG9ydCBwcmV2X2lubmVyID0gaGxvY2tfY2xhc3MocHJldiktPndhaXRfdHlwZV9pbm5l
cjsKKworCQlpZiAocHJldl9pbm5lcikgeworCQkJLyoKKwkJCSAqIFdlIGNhbiBoYXZlIGEgYmln
Z2VyIGlubmVyIHRoYW4gYSBwcmV2aW91cyBvbmUKKwkJCSAqIHdoZW4gb3V0ZXIgaXMgc21hbGxl
ciB0aGFuIGlubmVyLCBhcyB3aXRoIFJDVS4KKwkJCSAqCisJCQkgKiBBbHNvIGR1ZSB0byB0cnls
b2Nrcy4KKwkJCSAqLworCQkJY3Vycl9pbm5lciA9IG1pbihjdXJyX2lubmVyLCBwcmV2X2lubmVy
KTsKKwkJfQorCX0KKworCWlmIChuZXh0X291dGVyID4gY3Vycl9pbm5lcikKKwkJcmV0dXJuIHBy
aW50X2xvY2tfaW52YWxpZF93YWl0X2NvbnRleHQoY3VyciwgbmV4dCk7CisKKwlyZXR1cm4gMDsK
K30KKwogI2Vsc2UgLyogQ09ORklHX1BST1ZFX0xPQ0tJTkcgKi8KIAogc3RhdGljIGlubGluZSBp
bnQKQEAgLTM3MDEsMTMgKzM4MTIsMjAgQEAgc3RhdGljIGlubGluZSBpbnQgc2VwYXJhdGVfaXJx
X2NvbnRleHQocwogCXJldHVybiAwOwogfQogCitzdGF0aWMgaW5saW5lIGludCBjaGVja193YWl0
X2NvbnRleHQoc3RydWN0IHRhc2tfc3RydWN0ICpjdXJyLAorCQkJCSAgICAgc3RydWN0IGhlbGRf
bG9jayAqbmV4dCkKK3sKKwlyZXR1cm4gMDsKK30KKwogI2VuZGlmIC8qIENPTkZJR19QUk9WRV9M
T0NLSU5HICovCiAKIC8qCiAgKiBJbml0aWFsaXplIGEgbG9jayBpbnN0YW5jZSdzIGxvY2stY2xh
c3MgbWFwcGluZyBpbmZvOgogICovCi12b2lkIGxvY2tkZXBfaW5pdF9tYXAoc3RydWN0IGxvY2tk
ZXBfbWFwICpsb2NrLCBjb25zdCBjaGFyICpuYW1lLAotCQkgICAgICBzdHJ1Y3QgbG9ja19jbGFz
c19rZXkgKmtleSwgaW50IHN1YmNsYXNzKQordm9pZCBsb2NrZGVwX2luaXRfbWFwX3dhaXRzKHN0
cnVjdCBsb2NrZGVwX21hcCAqbG9jaywgY29uc3QgY2hhciAqbmFtZSwKKwkJCSAgICBzdHJ1Y3Qg
bG9ja19jbGFzc19rZXkgKmtleSwgaW50IHN1YmNsYXNzLAorCQkJICAgIHNob3J0IGlubmVyLCBz
aG9ydCBvdXRlcikKIHsKIAlpbnQgaTsKIApAQCAtMzcyOCw2ICszODQ2LDkgQEAgdm9pZCBsb2Nr
ZGVwX2luaXRfbWFwKHN0cnVjdCBsb2NrZGVwX21hcAogCiAJbG9jay0+bmFtZSA9IG5hbWU7CiAK
Kwlsb2NrLT53YWl0X3R5cGVfb3V0ZXIgPSBvdXRlcjsKKwlsb2NrLT53YWl0X3R5cGVfaW5uZXIg
PSBpbm5lcjsKKwogCS8qCiAJICogTm8ga2V5LCBubyBqb3ksIHdlIG5lZWQgdG8gaGFzaCBzb21l
dGhpbmcuCiAJICovCkBAIC0zNzYxLDcgKzM4ODIsNyBAQCB2b2lkIGxvY2tkZXBfaW5pdF9tYXAo
c3RydWN0IGxvY2tkZXBfbWFwCiAJCXJhd19sb2NhbF9pcnFfcmVzdG9yZShmbGFncyk7CiAJfQog
fQotRVhQT1JUX1NZTUJPTF9HUEwobG9ja2RlcF9pbml0X21hcCk7CitFWFBPUlRfU1lNQk9MX0dQ
TChsb2NrZGVwX2luaXRfbWFwX3dhaXRzKTsKIAogc3RydWN0IGxvY2tfY2xhc3Nfa2V5IF9fbG9j
a2RlcF9ub192YWxpZGF0ZV9fOwogRVhQT1JUX1NZTUJPTF9HUEwoX19sb2NrZGVwX25vX3ZhbGlk
YXRlX18pOwpAQCAtMzg2Miw3ICszOTgzLDcgQEAgc3RhdGljIGludCBfX2xvY2tfYWNxdWlyZShz
dHJ1Y3QgbG9ja2RlcAogCiAJY2xhc3NfaWR4ID0gY2xhc3MgLSBsb2NrX2NsYXNzZXM7CiAKLQlp
ZiAoZGVwdGgpIHsKKwlpZiAoZGVwdGgpIHsgLyogd2UncmUgaG9sZGluZyBsb2NrcyAqLwogCQlo
bG9jayA9IGN1cnItPmhlbGRfbG9ja3MgKyBkZXB0aCAtIDE7CiAJCWlmIChobG9jay0+Y2xhc3Nf
aWR4ID09IGNsYXNzX2lkeCAmJiBuZXN0X2xvY2spIHsKIAkJCWlmICghcmVmZXJlbmNlcykKQEAg
LTM5MDQsNiArNDAyNSw5IEBAIHN0YXRpYyBpbnQgX19sb2NrX2FjcXVpcmUoc3RydWN0IGxvY2tk
ZXAKICNlbmRpZgogCWhsb2NrLT5waW5fY291bnQgPSBwaW5fY291bnQ7CiAKKwlpZiAoY2hlY2tf
d2FpdF9jb250ZXh0KGN1cnIsIGhsb2NrKSkKKwkJcmV0dXJuIDA7CisKIAkvKiBJbml0aWFsaXpl
IHRoZSBsb2NrIHVzYWdlIGJpdCAqLwogCWlmICghbWFya191c2FnZShjdXJyLCBobG9jaywgY2hl
Y2spKQogCQlyZXR1cm4gMDsKQEAgLTQxMzksNyArNDI2Myw5IEBAIHN0YXRpYyBpbnQKIAkJcmV0
dXJuIDA7CiAJfQogCi0JbG9ja2RlcF9pbml0X21hcChsb2NrLCBuYW1lLCBrZXksIDApOworCWxv
Y2tkZXBfaW5pdF9tYXBfd2FpdHMobG9jaywgbmFtZSwga2V5LCAwLAorCQkJICAgICAgIGxvY2st
PndhaXRfdHlwZV9pbm5lciwKKwkJCSAgICAgICBsb2NrLT53YWl0X3R5cGVfb3V0ZXIpOwogCWNs
YXNzID0gcmVnaXN0ZXJfbG9ja19jbGFzcyhsb2NrLCBzdWJjbGFzcywgMCk7CiAJaGxvY2stPmNs
YXNzX2lkeCA9IGNsYXNzIC0gbG9ja19jbGFzc2VzOwogCi0tLSBhL2tlcm5lbC9sb2NraW5nL211
dGV4LWRlYnVnLmMKKysrIGIva2VybmVsL2xvY2tpbmcvbXV0ZXgtZGVidWcuYwpAQCAtODUsNyAr
ODUsNyBAQCB2b2lkIGRlYnVnX211dGV4X2luaXQoc3RydWN0IG11dGV4ICpsb2NrCiAJICogTWFr
ZSBzdXJlIHdlIGFyZSBub3QgcmVpbml0aWFsaXppbmcgYSBoZWxkIGxvY2s6CiAJICovCiAJZGVi
dWdfY2hlY2tfbm9fbG9ja3NfZnJlZWQoKHZvaWQgKilsb2NrLCBzaXplb2YoKmxvY2spKTsKLQls
b2NrZGVwX2luaXRfbWFwKCZsb2NrLT5kZXBfbWFwLCBuYW1lLCBrZXksIDApOworCWxvY2tkZXBf
aW5pdF9tYXBfd2FpdCgmbG9jay0+ZGVwX21hcCwgbmFtZSwga2V5LCAwLCBMRF9XQUlUX1NMRUVQ
KTsKICNlbmRpZgogCWxvY2stPm1hZ2ljID0gbG9jazsKIH0KLS0tIGEva2VybmVsL2xvY2tpbmcv
cndzZW0uYworKysgYi9rZXJuZWwvbG9ja2luZy9yd3NlbS5jCkBAIC0zMjksNyArMzI5LDcgQEAg
dm9pZCBfX2luaXRfcndzZW0oc3RydWN0IHJ3X3NlbWFwaG9yZSAqcwogCSAqIE1ha2Ugc3VyZSB3
ZSBhcmUgbm90IHJlaW5pdGlhbGl6aW5nIGEgaGVsZCBzZW1hcGhvcmU6CiAJICovCiAJZGVidWdf
Y2hlY2tfbm9fbG9ja3NfZnJlZWQoKHZvaWQgKilzZW0sIHNpemVvZigqc2VtKSk7Ci0JbG9ja2Rl
cF9pbml0X21hcCgmc2VtLT5kZXBfbWFwLCBuYW1lLCBrZXksIDApOworCWxvY2tkZXBfaW5pdF9t
YXBfd2FpdCgmc2VtLT5kZXBfbWFwLCBuYW1lLCBrZXksIDAsIExEX1dBSVRfU0xFRVApOwogI2Vu
ZGlmCiAjaWZkZWYgQ09ORklHX0RFQlVHX1JXU0VNUwogCXNlbS0+bWFnaWMgPSBzZW07Ci0tLSBh
L2tlcm5lbC9sb2NraW5nL3NwaW5sb2NrX2RlYnVnLmMKKysrIGIva2VybmVsL2xvY2tpbmcvc3Bp
bmxvY2tfZGVidWcuYwpAQCAtMTQsMTQgKzE0LDE0IEBACiAjaW5jbHVkZSA8bGludXgvZXhwb3J0
Lmg+CiAKIHZvaWQgX19yYXdfc3Bpbl9sb2NrX2luaXQocmF3X3NwaW5sb2NrX3QgKmxvY2ssIGNv
bnN0IGNoYXIgKm5hbWUsCi0JCQkgIHN0cnVjdCBsb2NrX2NsYXNzX2tleSAqa2V5KQorCQkJICBz
dHJ1Y3QgbG9ja19jbGFzc19rZXkgKmtleSwgc2hvcnQgaW5uZXIpCiB7CiAjaWZkZWYgQ09ORklH
X0RFQlVHX0xPQ0tfQUxMT0MKIAkvKgogCSAqIE1ha2Ugc3VyZSB3ZSBhcmUgbm90IHJlaW5pdGlh
bGl6aW5nIGEgaGVsZCBsb2NrOgogCSAqLwogCWRlYnVnX2NoZWNrX25vX2xvY2tzX2ZyZWVkKCh2
b2lkICopbG9jaywgc2l6ZW9mKCpsb2NrKSk7Ci0JbG9ja2RlcF9pbml0X21hcCgmbG9jay0+ZGVw
X21hcCwgbmFtZSwga2V5LCAwKTsKKwlsb2NrZGVwX2luaXRfbWFwX3dhaXQoJmxvY2stPmRlcF9t
YXAsIG5hbWUsIGtleSwgMCwgaW5uZXIpOwogI2VuZGlmCiAJbG9jay0+cmF3X2xvY2sgPSAoYXJj
aF9zcGlubG9ja190KV9fQVJDSF9TUElOX0xPQ0tfVU5MT0NLRUQ7CiAJbG9jay0+bWFnaWMgPSBT
UElOTE9DS19NQUdJQzsKQEAgLTM5LDcgKzM5LDcgQEAgdm9pZCBfX3J3bG9ja19pbml0KHJ3bG9j
a190ICpsb2NrLCBjb25zdAogCSAqIE1ha2Ugc3VyZSB3ZSBhcmUgbm90IHJlaW5pdGlhbGl6aW5n
IGEgaGVsZCBsb2NrOgogCSAqLwogCWRlYnVnX2NoZWNrX25vX2xvY2tzX2ZyZWVkKCh2b2lkICop
bG9jaywgc2l6ZW9mKCpsb2NrKSk7Ci0JbG9ja2RlcF9pbml0X21hcCgmbG9jay0+ZGVwX21hcCwg
bmFtZSwga2V5LCAwKTsKKwlsb2NrZGVwX2luaXRfbWFwX3dhaXQoJmxvY2stPmRlcF9tYXAsIG5h
bWUsIGtleSwgMCwgTERfV0FJVF9DT05GSUcpOwogI2VuZGlmCiAJbG9jay0+cmF3X2xvY2sgPSAo
YXJjaF9yd2xvY2tfdCkgX19BUkNIX1JXX0xPQ0tfVU5MT0NLRUQ7CiAJbG9jay0+bWFnaWMgPSBS
V0xPQ0tfTUFHSUM7Ci0tLSBhL2tlcm5lbC9yY3UvdXBkYXRlLmMKKysrIGIva2VybmVsL3JjdS91
cGRhdGUuYwpAQCAtMjI3LDE4ICsyMjcsMzAgQEAgY29yZV9pbml0Y2FsbChyY3Vfc2V0X3J1bnRp
bWVfbW9kZSk7CiAKICNpZmRlZiBDT05GSUdfREVCVUdfTE9DS19BTExPQwogc3RhdGljIHN0cnVj
dCBsb2NrX2NsYXNzX2tleSByY3VfbG9ja19rZXk7Ci1zdHJ1Y3QgbG9ja2RlcF9tYXAgcmN1X2xv
Y2tfbWFwID0KLQlTVEFUSUNfTE9DS0RFUF9NQVBfSU5JVCgicmN1X3JlYWRfbG9jayIsICZyY3Vf
bG9ja19rZXkpOworc3RydWN0IGxvY2tkZXBfbWFwIHJjdV9sb2NrX21hcCA9IHsKKwkubmFtZSA9
ICJyY3VfcmVhZF9sb2NrIiwKKwkua2V5ID0gJnJjdV9sb2NrX2tleSwKKwkud2FpdF90eXBlX291
dGVyID0gTERfV0FJVF9GUkVFLAorCS53YWl0X3R5cGVfaW5uZXIgPSBMRF9XQUlUX0NPTkZJRywg
LyogWFhYIFBSRUVNUFRfUkNVID8gKi8KK307CiBFWFBPUlRfU1lNQk9MX0dQTChyY3VfbG9ja19t
YXApOwogCiBzdGF0aWMgc3RydWN0IGxvY2tfY2xhc3Nfa2V5IHJjdV9iaF9sb2NrX2tleTsKLXN0
cnVjdCBsb2NrZGVwX21hcCByY3VfYmhfbG9ja19tYXAgPQotCVNUQVRJQ19MT0NLREVQX01BUF9J
TklUKCJyY3VfcmVhZF9sb2NrX2JoIiwgJnJjdV9iaF9sb2NrX2tleSk7CitzdHJ1Y3QgbG9ja2Rl
cF9tYXAgcmN1X2JoX2xvY2tfbWFwID0geworCS5uYW1lID0gInJjdV9yZWFkX2xvY2tfYmgiLAor
CS5rZXkgPSAmcmN1X2JoX2xvY2tfa2V5LAorCS53YWl0X3R5cGVfb3V0ZXIgPSBMRF9XQUlUX0ZS
RUUsCisJLndhaXRfdHlwZV9pbm5lciA9IExEX1dBSVRfQ09ORklHLCAvKiBQUkVFTVBUX0xPQ0sg
YWxzbyBtYWtlcyBCSCBwcmVlbXB0aWJsZSAqLworfTsKIEVYUE9SVF9TWU1CT0xfR1BMKHJjdV9i
aF9sb2NrX21hcCk7CiAKIHN0YXRpYyBzdHJ1Y3QgbG9ja19jbGFzc19rZXkgcmN1X3NjaGVkX2xv
Y2tfa2V5Owotc3RydWN0IGxvY2tkZXBfbWFwIHJjdV9zY2hlZF9sb2NrX21hcCA9Ci0JU1RBVElD
X0xPQ0tERVBfTUFQX0lOSVQoInJjdV9yZWFkX2xvY2tfc2NoZWQiLCAmcmN1X3NjaGVkX2xvY2tf
a2V5KTsKK3N0cnVjdCBsb2NrZGVwX21hcCByY3Vfc2NoZWRfbG9ja19tYXAgPSB7CisJLm5hbWUg
PSAicmN1X3JlYWRfbG9ja19zY2hlZCIsCisJLmtleSA9ICZyY3Vfc2NoZWRfbG9ja19rZXksCisJ
LndhaXRfdHlwZV9vdXRlciA9IExEX1dBSVRfRlJFRSwKKwkud2FpdF90eXBlX2lubmVyID0gTERf
V0FJVF9TUElOLAorfTsKIEVYUE9SVF9TWU1CT0xfR1BMKHJjdV9zY2hlZF9sb2NrX21hcCk7CiAK
IHN0YXRpYyBzdHJ1Y3QgbG9ja19jbGFzc19rZXkgcmN1X2NhbGxiYWNrX2tleTsKLS0tIGEvbGli
L0tjb25maWcuZGVidWcKKysrIGIvbGliL0tjb25maWcuZGVidWcKQEAgLTEwODYsNiArMTA4Niwy
MyBAQCBjb25maWcgUFJPVkVfTE9DS0lORwogCiAJIEZvciBtb3JlIGRldGFpbHMsIHNlZSBEb2N1
bWVudGF0aW9uL2xvY2tpbmcvbG9ja2RlcC1kZXNpZ24ucnN0LgogCitjb25maWcgUFJPVkVfUkFX
X0xPQ0tfTkVTVElORworCWJvb2wgIkVuYWJsZSByYXdfc3BpbmxvY2sgLSBzcGlubG9jayBuZXN0
aW5nIGNoZWNrcyIKKwlkZXBlbmRzIG9uIFBST1ZFX0xPQ0tJTkcKKwlkZWZhdWx0IG4KKwloZWxw
CisJIEVuYWJsZSB0aGUgcmF3X3NwaW5sb2NrIHZzLiBzcGlubG9jayBuZXN0aW5nIGNoZWNrcyB3
aGljaCBlbnN1cmUKKwkgdGhhdCB0aGUgbG9jayBuZXN0aW5nIHJ1bGVzIGZvciBQUkVFTVBUX1JU
IGVuYWJsZWQga2VybmVscyBhcmUKKwkgbm90IHZpb2xhdGVkLgorCisJIE5PVEU6IFRoZXJlIGFy
ZSBrbm93biBuZXN0aW5nIHByb2JsZW1zLiBTbyBpZiB5b3UgZW5hYmxlIHRoaXMKKwkgb3B0aW9u
IGV4cGVjdCBsb2NrZGVwIHNwbGF0cyB1bnRpbCB0aGVzZSBwcm9ibGVtcyBoYXZlIGJlZW4gZnVs
bHkKKwkgYWRkcmVzc2VkIHdoaWNoIGlzIHdvcmsgaW4gcHJvZ3Jlc3MuIFRoaXMgY29uZmlnIHN3
aXRjaCBhbGxvd3MgdG8KKwkgaWRlbnRpZnkgYW5kIGFuYWx5emUgdGhlc2UgcHJvYmxlbXMuIEl0
IHdpbGwgYmUgcmVtb3ZlZCBhbmQgdGhlCisJIGNoZWNrIHBlcm1hbmVudGVseSBlbmFibGVkIG9u
Y2UgdGhlIG1haW4gaXNzdWVzIGhhdmUgYmVlbiBmaXhlZC4KKworCSBJZiB1bnN1cmUsIHNlbGVj
dCBOLgorCiBjb25maWcgTE9DS19TVEFUCiAJYm9vbCAiTG9jayB1c2FnZSBzdGF0aXN0aWNzIgog
CWRlcGVuZHMgb24gREVCVUdfS0VSTkVMICYmIExPQ0tfREVCVUdHSU5HX1NVUFBPUlQKCg=

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 17/20] lockdep: Introduce wait-type checks
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Peter Zijlstra <peterz@infradead.org>

Extend lockdep to validate lock wait-type context.

The current wait-types are:

	LD_WAIT_FREE,		/* wait free, rcu etc.. */
	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.

Below is an example output generated by the trivial test code:

  raw_spin_lock(&foo);
  spin_lock(&bar);
  spin_unlock(&bar);
  raw_spin_unlock(&foo);

 [ BUG: Invalid wait context ]
 -----------------------------
 swapper/0/1 is trying to lock:
 ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
 other info that might help us debug this:
 1 lock held by swapper/0/1:
  #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.

This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.

Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.

The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.

The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.

[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
	   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fix the LOCKDEP=y && LOCK_PROVE=n case
---
 include/linux/irqflags.h        |    8 ++
 include/linux/lockdep.h         |   71 +++++++++++++++++---
 include/linux/mutex.h           |    7 +-
 include/linux/rwlock_types.h    |    6 +
 include/linux/rwsem.h           |    6 +
 include/linux/sched.h           |    1 
 include/linux/spinlock.h        |   35 +++++++---
 include/linux/spinlock_types.h  |   24 +++++-
 kernel/irq/handle.c             |    7 ++
 kernel/locking/lockdep.c        |  138 ++++++++++++++++++++++++++++++++++++++--
 kernel/locking/mutex-debug.c    |    2 
 kernel/locking/rwsem.c          |    2 
 kernel/locking/spinlock_debug.c |    6 -
 kernel/rcu/update.c             |   24 +++++-
 lib/Kconfig.debug               |   17 ++++
 15 files changed, 307 insertions(+), 47 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -37,7 +37,12 @@
 # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
 # define trace_hardirq_enter()			\
 do {						\
-	current->hardirq_context++;		\
+	if (!current->hardirq_context++)	\
+		current->hardirq_threaded = 0;	\
+} while (0)
+# define trace_hardirq_threaded()		\
+do {						\
+	current->hardirq_threaded = 1;		\
 } while (0)
 # define trace_hardirq_exit()			\
 do {						\
@@ -59,6 +64,7 @@ do {						\
 # define trace_hardirqs_enabled(p)	0
 # define trace_softirqs_enabled(p)	0
 # define trace_hardirq_enter()		do { } while (0)
+# define trace_hardirq_threaded()	do { } while (0)
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -21,6 +21,22 @@ extern int lock_stat;
 
 #include <linux/types.h>
 
+enum lockdep_wait_type {
+	LD_WAIT_INV = 0,	/* not checked, catch all */
+
+	LD_WAIT_FREE,		/* wait free, rcu etc.. */
+	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
+
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
+#else
+	LD_WAIT_CONFIG = LD_WAIT_SPIN,
+#endif
+	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */
+
+	LD_WAIT_MAX,		/* must be last */
+};
+
 #ifdef CONFIG_LOCKDEP
 
 #include <linux/linkage.h>
@@ -111,6 +127,9 @@ struct lock_class {
 	int				name_version;
 	const char			*name;
 
+	short				wait_type_inner;
+	short				wait_type_outer;
+
 #ifdef CONFIG_LOCK_STAT
 	unsigned long			contention_point[LOCKSTAT_POINTS];
 	unsigned long			contending_point[LOCKSTAT_POINTS];
@@ -158,6 +177,8 @@ struct lockdep_map {
 	struct lock_class_key		*key;
 	struct lock_class		*class_cache[NR_LOCKDEP_CACHING_CLASSES];
 	const char			*name;
+	short				wait_type_outer; /* can be taken in this context */
+	short				wait_type_inner; /* presents this context */
 #ifdef CONFIG_LOCK_STAT
 	int				cpu;
 	unsigned long			ip;
@@ -299,8 +320,21 @@ extern void lockdep_unregister_key(struc
  * to lockdep:
  */
 
-extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
-			     struct lock_class_key *key, int subclass);
+extern void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+	struct lock_class_key *key, int subclass, short inner, short outer);
+
+static inline void
+lockdep_init_map_wait(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass, short inner)
+{
+	lockdep_init_map_waits(lock, name, key, subclass, inner, LD_WAIT_INV);
+}
+
+static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
+			     struct lock_class_key *key, int subclass)
+{
+	lockdep_init_map_wait(lock, name, key, subclass, LD_WAIT_INV);
+}
 
 /*
  * Reinitialize a lock key - for cases where there is special locking or
@@ -308,18 +342,29 @@ extern void lockdep_init_map(struct lock
  * of dependencies wrong: they are either too broad (they need a class-split)
  * or they are too narrow (they suffer from a false class-split):
  */
-#define lockdep_set_class(lock, key) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, 0)
-#define lockdep_set_class_and_name(lock, key, name) \
-		lockdep_init_map(&(lock)->dep_map, name, key, 0)
-#define lockdep_set_class_and_subclass(lock, key, sub) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, sub)
-#define lockdep_set_subclass(lock, sub)	\
-		lockdep_init_map(&(lock)->dep_map, #lock, \
-				 (lock)->dep_map.key, sub)
+#define lockdep_set_class(lock, key)				\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_name(lock, key, name)		\
+	lockdep_init_map_waits(&(lock)->dep_map, name, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_subclass(lock, key, sub)		\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_subclass(lock, sub)					\
+	lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+			       (lock)->dep_map.wait_type_inner,		\
+			       (lock)->dep_map.wait_type_outer)
 
 #define lockdep_set_novalidate_class(lock) \
 	lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock)
+
 /*
  * Compare locking classes
  */
@@ -432,6 +477,10 @@ static inline void lockdep_set_selftest_
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
 # define lockdep_init()				do { } while (0)
+# define lockdep_init_map_waits(lock, name, key, sub, inner, outer) \
+		do { (void)(name); (void)(key); } while (0)
+# define lockdep_init_map_wait(lock, name, key, sub, inner) \
+		do { (void)(name); (void)(key); } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -109,8 +109,11 @@ do {									\
 } while (0)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
-		, .dep_map = { .name = #lockname }
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname)			\
+		, .dep_map = {					\
+			.name = #lockname,			\
+			.wait_type_inner = LD_WAIT_SLEEP,	\
+		}
 #else
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
 #endif
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -22,7 +22,11 @@ typedef struct {
 #define RWLOCK_MAGIC		0xdeaf1eed
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define RW_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RW_DEP_MAP_INIT(lockname)					\
+	.dep_map = {							\
+		.name = #lockname,					\
+		.wait_type_inner = LD_WAIT_CONFIG,			\
+	}
 #else
 # define RW_DEP_MAP_INIT(lockname)
 #endif
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -71,7 +71,11 @@ static inline int rwsem_is_locked(struct
 /* Common initializer macros and functions */
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
+# define __RWSEM_DEP_MAP_INIT(lockname)			\
+	, .dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SLEEP,	\
+	}
 #else
 # define __RWSEM_DEP_MAP_INIT(lockname)
 #endif
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -970,6 +970,7 @@ struct task_struct {
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
+	unsigned int			hardirq_threaded;
 	unsigned long			hardirq_enable_ip;
 	unsigned long			hardirq_disable_ip;
 	unsigned int			hardirq_enable_event;
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -93,12 +93,13 @@
 
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-				   struct lock_class_key *key);
-# define raw_spin_lock_init(lock)				\
-do {								\
-	static struct lock_class_key __key;			\
-								\
-	__raw_spin_lock_init((lock), #lock, &__key);		\
+				   struct lock_class_key *key, short inner);
+
+# define raw_spin_lock_init(lock)					\
+do {									\
+	static struct lock_class_key __key;				\
+									\
+	__raw_spin_lock_init((lock), #lock, &__key, LD_WAIT_SPIN);	\
 } while (0)
 
 #else
@@ -327,12 +328,26 @@ static __always_inline raw_spinlock_t *s
 	return &lock->rlock;
 }
 
-#define spin_lock_init(_lock)				\
-do {							\
-	spinlock_check(_lock);				\
-	raw_spin_lock_init(&(_lock)->rlock);		\
+#ifdef CONFIG_DEBUG_SPINLOCK
+
+# define spin_lock_init(lock)					\
+do {								\
+	static struct lock_class_key __key;			\
+								\
+	__raw_spin_lock_init(spinlock_check(lock),		\
+			     #lock, &__key, LD_WAIT_CONFIG);	\
+} while (0)
+
+#else
+
+# define spin_lock_init(_lock)			\
+do {						\
+	spinlock_check(_lock);			\
+	*(_lock) = __SPIN_LOCK_UNLOCKED(_lock);	\
 } while (0)
 
+#endif
+
 static __always_inline void spin_lock(spinlock_t *lock)
 {
 	raw_spin_lock(&lock->rlock);
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -33,8 +33,18 @@ typedef struct raw_spinlock {
 #define SPINLOCK_OWNER_INIT	((void *)-1L)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SPIN_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RAW_SPIN_DEP_MAP_INIT(lockname)		\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SPIN,	\
+	}
+# define SPIN_DEP_MAP_INIT(lockname)			\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_CONFIG,	\
+	}
 #else
+# define RAW_SPIN_DEP_MAP_INIT(lockname)
 # define SPIN_DEP_MAP_INIT(lockname)
 #endif
 
@@ -51,7 +61,7 @@ typedef struct raw_spinlock {
 	{					\
 	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 	SPIN_DEBUG_INIT(lockname)		\
-	SPIN_DEP_MAP_INIT(lockname) }
+	RAW_SPIN_DEP_MAP_INIT(lockname) }
 
 #define __RAW_SPIN_LOCK_UNLOCKED(lockname)	\
 	(raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
@@ -72,11 +82,17 @@ typedef struct spinlock {
 	};
 } spinlock_t;
 
+#define ___SPIN_LOCK_INITIALIZER(lockname)	\
+	{					\
+	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
+	SPIN_DEBUG_INIT(lockname)		\
+	SPIN_DEP_MAP_INIT(lockname) }
+
 #define __SPIN_LOCK_INITIALIZER(lockname) \
-	{ { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
+	{ { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
 
 #define __SPIN_LOCK_UNLOCKED(lockname) \
-	(spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
+	(spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
 
 #define DEFINE_SPINLOCK(x)	spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
 
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -145,6 +145,13 @@ irqreturn_t __handle_irq_event_percpu(st
 	for_each_action_of_desc(desc, action) {
 		irqreturn_t res;
 
+		/*
+		 * If this IRQ would be threaded under force_irqthreads, mark it so.
+		 */
+		if (irq_settings_can_thread(desc) &&
+		    !(action->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)))
+			trace_hardirq_threaded();
+
 		trace_irq_handler_entry(irq, action);
 		res = action->handler(irq, action->dev_id);
 		trace_irq_handler_exit(irq, action, res);
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -653,7 +653,9 @@ static void print_lock_name(struct lock_
 
 	printk(KERN_CONT " (");
 	__print_lock_name(class);
-	printk(KERN_CONT "){%s}", usage);
+	printk(KERN_CONT "){%s}-{%hd:%hd}", usage,
+			class->wait_type_outer ?: class->wait_type_inner,
+			class->wait_type_inner);
 }
 
 static void print_lockdep_cache(struct lockdep_map *lock)
@@ -1230,6 +1232,8 @@ register_lock_class(struct lockdep_map *
 	WARN_ON_ONCE(!list_empty(&class->locks_before));
 	WARN_ON_ONCE(!list_empty(&class->locks_after));
 	class->name_version = count_matching_names(class);
+	class->wait_type_inner = lock->wait_type_inner;
+	class->wait_type_outer = lock->wait_type_outer;
 	/*
 	 * We use RCU's safe list-add method to make
 	 * parallel walking of the hash-list safe:
@@ -3682,6 +3686,113 @@ static int mark_lock(struct task_struct
 	return ret;
 }
 
+static int
+print_lock_invalid_wait_context(struct task_struct *curr,
+				struct held_lock *hlock)
+{
+	if (!debug_locks_off())
+		return 0;
+	if (debug_locks_silent)
+		return 0;
+
+	pr_warn("\n");
+	pr_warn("=============================\n");
+	pr_warn("[ BUG: Invalid wait context ]\n");
+	print_kernel_ident();
+	pr_warn("-----------------------------\n");
+
+	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
+	print_lock(hlock);
+
+	pr_warn("other info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	pr_warn("stack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Verify the wait_type context.
+ *
+ * This check validates we takes locks in the right wait-type order; that is it
+ * ensures that we do not take mutexes inside spinlocks and do not attempt to
+ * acquire spinlocks inside raw_spinlocks and the sort.
+ *
+ * The entire thing is slightly more complex because of RCU, RCU is a lock that
+ * can be taken from (pretty much) any context but also has constraints.
+ * However when taken in a stricter environment the RCU lock does not loosen
+ * the constraints.
+ *
+ * Therefore we must look for the strictest environment in the lock stack and
+ * compare that to the lock we're trying to acquire.
+ */
+static int check_wait_context(struct task_struct *curr, struct held_lock *next)
+{
+	short next_inner = hlock_class(next)->wait_type_inner;
+	short next_outer = hlock_class(next)->wait_type_outer;
+	short curr_inner;
+	int depth;
+
+	if (!curr->lockdep_depth || !next_inner || next->trylock)
+		return 0;
+
+	if (!next_outer)
+		next_outer = next_inner;
+
+	/*
+	 * Find start of current irq_context..
+	 */
+	for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) {
+		struct held_lock *prev = curr->held_locks + depth;
+		if (prev->irq_context != next->irq_context)
+			break;
+	}
+	depth++;
+
+	/*
+	 * Set appropriate wait type for the context; for IRQs we have to take
+	 * into account force_irqthread as that is implied by PREEMPT_RT.
+	 */
+	if (curr->hardirq_context) {
+		/*
+		 * Check if force_irqthreads will run us threaded.
+		 */
+		if (curr->hardirq_threaded)
+			curr_inner = LD_WAIT_CONFIG;
+		else
+			curr_inner = LD_WAIT_SPIN;
+	} else if (curr->softirq_context) {
+		/*
+		 * Softirqs are always threaded.
+		 */
+		curr_inner = LD_WAIT_CONFIG;
+	} else {
+		curr_inner = LD_WAIT_MAX;
+	}
+
+	for (; depth < curr->lockdep_depth; depth++) {
+		struct held_lock *prev = curr->held_locks + depth;
+		short prev_inner = hlock_class(prev)->wait_type_inner;
+
+		if (prev_inner) {
+			/*
+			 * We can have a bigger inner than a previous one
+			 * when outer is smaller than inner, as with RCU.
+			 *
+			 * Also due to trylocks.
+			 */
+			curr_inner = min(curr_inner, prev_inner);
+		}
+	}
+
+	if (next_outer > curr_inner)
+		return print_lock_invalid_wait_context(curr, next);
+
+	return 0;
+}
+
 #else /* CONFIG_PROVE_LOCKING */
 
 static inline int
@@ -3701,13 +3812,20 @@ static inline int separate_irq_context(s
 	return 0;
 }
 
+static inline int check_wait_context(struct task_struct *curr,
+				     struct held_lock *next)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PROVE_LOCKING */
 
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
-		      struct lock_class_key *key, int subclass)
+void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+			    struct lock_class_key *key, int subclass,
+			    short inner, short outer)
 {
 	int i;
 
@@ -3728,6 +3846,9 @@ void lockdep_init_map(struct lockdep_map
 
 	lock->name = name;
 
+	lock->wait_type_outer = outer;
+	lock->wait_type_inner = inner;
+
 	/*
 	 * No key, no joy, we need to hash something.
 	 */
@@ -3761,7 +3882,7 @@ void lockdep_init_map(struct lockdep_map
 		raw_local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_init_map);
+EXPORT_SYMBOL_GPL(lockdep_init_map_waits);
 
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -3862,7 +3983,7 @@ static int __lock_acquire(struct lockdep
 
 	class_idx = class - lock_classes;
 
-	if (depth) {
+	if (depth) { /* we're holding locks */
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (!references)
@@ -3904,6 +4025,9 @@ static int __lock_acquire(struct lockdep
 #endif
 	hlock->pin_count = pin_count;
 
+	if (check_wait_context(curr, hlock))
+		return 0;
+
 	/* Initialize the lock usage bit */
 	if (!mark_usage(curr, hlock, check))
 		return 0;
@@ -4139,7 +4263,9 @@ static int
 		return 0;
 	}
 
-	lockdep_init_map(lock, name, key, 0);
+	lockdep_init_map_waits(lock, name, key, 0,
+			       lock->wait_type_inner,
+			       lock->wait_type_outer);
 	class = register_lock_class(lock, subclass, 0);
 	hlock->class_idx = class - lock_classes;
 
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -85,7 +85,7 @@ void debug_mutex_init(struct mutex *lock
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 	lock->magic = lock;
 }
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -329,7 +329,7 @@ void __init_rwsem(struct rw_semaphore *s
 	 * Make sure we are not reinitializing a held semaphore:
 	 */
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
-	lockdep_init_map(&sem->dep_map, name, key, 0);
+	lockdep_init_map_wait(&sem->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 #ifdef CONFIG_DEBUG_RWSEMS
 	sem->magic = sem;
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -14,14 +14,14 @@
 #include <linux/export.h>
 
 void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-			  struct lock_class_key *key)
+			  struct lock_class_key *key, short inner)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	/*
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, inner);
 #endif
 	lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
 	lock->magic = SPINLOCK_MAGIC;
@@ -39,7 +39,7 @@ void __rwlock_init(rwlock_t *lock, const
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_CONFIG);
 #endif
 	lock->raw_lock = (arch_rwlock_t) __ARCH_RW_LOCK_UNLOCKED;
 	lock->magic = RWLOCK_MAGIC;
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -227,18 +227,30 @@ core_initcall(rcu_set_runtime_mode);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 static struct lock_class_key rcu_lock_key;
-struct lockdep_map rcu_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+struct lockdep_map rcu_lock_map = {
+	.name = "rcu_read_lock",
+	.key = &rcu_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* XXX PREEMPT_RCU ? */
+};
 EXPORT_SYMBOL_GPL(rcu_lock_map);
 
 static struct lock_class_key rcu_bh_lock_key;
-struct lockdep_map rcu_bh_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_bh", &rcu_bh_lock_key);
+struct lockdep_map rcu_bh_lock_map = {
+	.name = "rcu_read_lock_bh",
+	.key = &rcu_bh_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* PREEMPT_LOCK also makes BH preemptible */
+};
 EXPORT_SYMBOL_GPL(rcu_bh_lock_map);
 
 static struct lock_class_key rcu_sched_lock_key;
-struct lockdep_map rcu_sched_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key);
+struct lockdep_map rcu_sched_lock_map = {
+	.name = "rcu_read_lock_sched",
+	.key = &rcu_sched_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_SPIN,
+};
 EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
 
 static struct lock_class_key rcu_callback_key;
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1086,6 +1086,23 @@ config PROVE_LOCKING
 
 	 For more details, see Documentation/locking/lockdep-design.rst.
 
+config PROVE_RAW_LOCK_NESTING
+	bool "Enable raw_spinlock - spinlock nesting checks"
+	depends on PROVE_LOCKING
+	default n
+	help
+	 Enable the raw_spinlock vs. spinlock nesting checks which ensure
+	 that the lock nesting rules for PREEMPT_RT enabled kernels are
+	 not violated.
+
+	 NOTE: There are known nesting problems. So if you enable this
+	 option expect lockdep splats until these problems have been fully
+	 addressed which is work in progress. This config switch allows to
+	 identify and analyze these problems. It will be removed and the
+	 check permanentely enabled once the main issues have been fixed.
+
+	 If unsure, select N.
+
 config LOCK_STAT
 	bool "Lock usage statistics"
 	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 18/20] lockdep: Add hrtimer context tracing bits
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:26   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.

Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqflags.h |   15 +++++++++++++++
 include/linux/sched.h    |    1 +
 kernel/locking/lockdep.c |    2 +-
 kernel/time/hrtimer.c    |    6 +++++-
 4 files changed, 22 insertions(+), 2 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -56,6 +56,19 @@ do {						\
 do {						\
 	current->softirq_context--;		\
 } while (0)
+
+# define lockdep_hrtimer_enter(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 1;	\
+	  } while (0)
+
+# define lockdep_hrtimer_exit(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 0;	\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -68,6 +81,8 @@ do {						\
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
+# define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
+# define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -983,6 +983,7 @@ struct task_struct {
 	unsigned int			softirq_enable_event;
 	int				softirqs_enabled;
 	int				softirq_context;
+	int				irq_config;
 #endif
 
 #ifdef CONFIG_LOCKDEP
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3759,7 +3759,7 @@ static int check_wait_context(struct tas
 		/*
 		 * Check if force_irqthreads will run us threaded.
 		 */
-		if (curr->hardirq_threaded)
+		if (curr->hardirq_threaded || curr->irq_config)
 			curr_inner = LD_WAIT_CONFIG;
 		else
 			curr_inner = LD_WAIT_SPIN;
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1404,7 +1404,7 @@ static void __hrtimer_init(struct hrtime
 	base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0;
 	base += hrtimer_clockid_to_base(clock_id);
 	timer->is_soft = softtimer;
-	timer->is_hard = !softtimer;
+	timer->is_hard = !!(mode & HRTIMER_MODE_HARD);
 	timer->base = &cpu_base->clock_base[base];
 	timerqueue_init(&timer->node);
 }
@@ -1514,7 +1514,11 @@ static void __run_hrtimer(struct hrtimer
 	 */
 	raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	trace_hrtimer_expire_entry(timer, now);
+	lockdep_hrtimer_enter(timer);
+
 	restart = fn(timer);
+
+	lockdep_hrtimer_exit(timer);
 	trace_hrtimer_expire_exit(timer);
 	raw_spin_lock_irq(&cpu_base->lock);
 



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 18/20] lockdep: Add hrtimer context tracing bits
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.

Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqflags.h |   15 +++++++++++++++
 include/linux/sched.h    |    1 +
 kernel/locking/lockdep.c |    2 +-
 kernel/time/hrtimer.c    |    6 +++++-
 4 files changed, 22 insertions(+), 2 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -56,6 +56,19 @@ do {						\
 do {						\
 	current->softirq_context--;		\
 } while (0)
+
+# define lockdep_hrtimer_enter(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 1;	\
+	  } while (0)
+
+# define lockdep_hrtimer_exit(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 0;	\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -68,6 +81,8 @@ do {						\
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
+# define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
+# define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -983,6 +983,7 @@ struct task_struct {
 	unsigned int			softirq_enable_event;
 	int				softirqs_enabled;
 	int				softirq_context;
+	int				irq_config;
 #endif
 
 #ifdef CONFIG_LOCKDEP
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3759,7 +3759,7 @@ static int check_wait_context(struct tas
 		/*
 		 * Check if force_irqthreads will run us threaded.
 		 */
-		if (curr->hardirq_threaded)
+		if (curr->hardirq_threaded || curr->irq_config)
 			curr_inner = LD_WAIT_CONFIG;
 		else
 			curr_inner = LD_WAIT_SPIN;
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1404,7 +1404,7 @@ static void __hrtimer_init(struct hrtime
 	base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0;
 	base += hrtimer_clockid_to_base(clock_id);
 	timer->is_soft = softtimer;
-	timer->is_hard = !softtimer;
+	timer->is_hard = !!(mode & HRTIMER_MODE_HARD);
 	timer->base = &cpu_base->clock_base[base];
 	timerqueue_init(&timer->node);
 }
@@ -1514,7 +1514,11 @@ static void __run_hrtimer(struct hrtimer
 	 */
 	raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	trace_hrtimer_expire_entry(timer, now);
+	lockdep_hrtimer_enter(timer);
+
 	restart = fn(timer);
+
+	lockdep_hrtimer_exit(timer);
 	trace_hrtimer_expire_exit(timer);
 	raw_spin_lock_irq(&cpu_base->lock);
 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 18/20] lockdep: Add hrtimer context tracing bits
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.

Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqflags.h |   15 +++++++++++++++
 include/linux/sched.h    |    1 +
 kernel/locking/lockdep.c |    2 +-
 kernel/time/hrtimer.c    |    6 +++++-
 4 files changed, 22 insertions(+), 2 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -56,6 +56,19 @@ do {						\
 do {						\
 	current->softirq_context--;		\
 } while (0)
+
+# define lockdep_hrtimer_enter(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 1;	\
+	  } while (0)
+
+# define lockdep_hrtimer_exit(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 0;	\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -68,6 +81,8 @@ do {						\
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
+# define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
+# define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -983,6 +983,7 @@ struct task_struct {
 	unsigned int			softirq_enable_event;
 	int				softirqs_enabled;
 	int				softirq_context;
+	int				irq_config;
 #endif
 
 #ifdef CONFIG_LOCKDEP
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3759,7 +3759,7 @@ static int check_wait_context(struct tas
 		/*
 		 * Check if force_irqthreads will run us threaded.
 		 */
-		if (curr->hardirq_threaded)
+		if (curr->hardirq_threaded || curr->irq_config)
 			curr_inner = LD_WAIT_CONFIG;
 		else
 			curr_inner = LD_WAIT_SPIN;
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1404,7 +1404,7 @@ static void __hrtimer_init(struct hrtime
 	base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0;
 	base += hrtimer_clockid_to_base(clock_id);
 	timer->is_soft = softtimer;
-	timer->is_hard = !softtimer;
+	timer->is_hard = !!(mode & HRTIMER_MODE_HARD);
 	timer->base = &cpu_base->clock_base[base];
 	timerqueue_init(&timer->node);
 }
@@ -1514,7 +1514,11 @@ static void __run_hrtimer(struct hrtimer
 	 */
 	raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	trace_hrtimer_expire_entry(timer, now);
+	lockdep_hrtimer_enter(timer);
+
 	restart = fn(timer);
+
+	lockdep_hrtimer_exit(timer);
 	trace_hrtimer_expire_exit(timer);
 	raw_spin_lock_irq(&cpu_base->lock);
 



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 18/20] lockdep: Add hrtimer context tracing bits
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.

Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqflags.h |   15 +++++++++++++++
 include/linux/sched.h    |    1 +
 kernel/locking/lockdep.c |    2 +-
 kernel/time/hrtimer.c    |    6 +++++-
 4 files changed, 22 insertions(+), 2 deletions(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -56,6 +56,19 @@ do {						\
 do {						\
 	current->softirq_context--;		\
 } while (0)
+
+# define lockdep_hrtimer_enter(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 1;	\
+	  } while (0)
+
+# define lockdep_hrtimer_exit(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 0;	\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -68,6 +81,8 @@ do {						\
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
+# define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
+# define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -983,6 +983,7 @@ struct task_struct {
 	unsigned int			softirq_enable_event;
 	int				softirqs_enabled;
 	int				softirq_context;
+	int				irq_config;
 #endif
 
 #ifdef CONFIG_LOCKDEP
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3759,7 +3759,7 @@ static int check_wait_context(struct tas
 		/*
 		 * Check if force_irqthreads will run us threaded.
 		 */
-		if (curr->hardirq_threaded)
+		if (curr->hardirq_threaded || curr->irq_config)
 			curr_inner = LD_WAIT_CONFIG;
 		else
 			curr_inner = LD_WAIT_SPIN;
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1404,7 +1404,7 @@ static void __hrtimer_init(struct hrtime
 	base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0;
 	base += hrtimer_clockid_to_base(clock_id);
 	timer->is_soft = softtimer;
-	timer->is_hard = !softtimer;
+	timer->is_hard = !!(mode & HRTIMER_MODE_HARD);
 	timer->base = &cpu_base->clock_base[base];
 	timerqueue_init(&timer->node);
 }
@@ -1514,7 +1514,11 @@ static void __run_hrtimer(struct hrtimer
 	 */
 	raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	trace_hrtimer_expire_entry(timer, now);
+	lockdep_hrtimer_enter(timer);
+
 	restart = fn(timer);
+
+	lockdep_hrtimer_exit(timer);
 	trace_hrtimer_expire_exit(timer);
 	raw_spin_lock_irq(&cpu_base->lock);
 


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 19/20] lockdep: Annotate irq_work
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:26   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.

Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irq_work.h |    2 ++
 include/linux/irqflags.h |   13 +++++++++++++
 kernel/irq_work.c        |    2 ++
 kernel/rcu/tree.c        |    1 +
 kernel/time/tick-sched.c |    1 +
 5 files changed, 19 insertions(+)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -18,6 +18,8 @@
 
 /* Doesn't want IPI, wait for tick: */
 #define IRQ_WORK_LAZY		BIT(2)
+/* Run hard IRQ context, even on RT */
+#define IRQ_WORK_HARD_IRQ	BIT(3)
 
 #define IRQ_WORK_CLAIMED	(IRQ_WORK_PENDING | IRQ_WORK_BUSY)
 
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,17 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_irq_work_enter(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 1;			\
+	  } while (0)
+# define lockdep_irq_work_exit(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 0;			\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -83,6 +94,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_irq_work_enter(__work)		do { } while (0)
+# define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -153,7 +153,9 @@ static void irq_work_run_list(struct lli
 		 */
 		flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
 
+		lockdep_irq_work_enter(work);
 		work->func(work);
+		lockdep_irq_work_exit(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
 		 * no-one else claimed it meanwhile.
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1113,6 +1113,7 @@ static int rcu_implicit_dynticks_qs(stru
 		    !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
 		    (rnp->ffmask & rdp->grpmask)) {
 			init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
+			atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ);
 			rdp->rcu_iw_pending = true;
 			rdp->rcu_iw_gp_seq = rnp->gp_seq;
 			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct i
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 	.func = nohz_full_kick_func,
+	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
 };
 
 /*



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 19/20] lockdep: Annotate irq_work
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.

Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irq_work.h |    2 ++
 include/linux/irqflags.h |   13 +++++++++++++
 kernel/irq_work.c        |    2 ++
 kernel/rcu/tree.c        |    1 +
 kernel/time/tick-sched.c |    1 +
 5 files changed, 19 insertions(+)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -18,6 +18,8 @@
 
 /* Doesn't want IPI, wait for tick: */
 #define IRQ_WORK_LAZY		BIT(2)
+/* Run hard IRQ context, even on RT */
+#define IRQ_WORK_HARD_IRQ	BIT(3)
 
 #define IRQ_WORK_CLAIMED	(IRQ_WORK_PENDING | IRQ_WORK_BUSY)
 
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,17 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_irq_work_enter(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 1;			\
+	  } while (0)
+# define lockdep_irq_work_exit(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 0;			\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -83,6 +94,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_irq_work_enter(__work)		do { } while (0)
+# define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -153,7 +153,9 @@ static void irq_work_run_list(struct lli
 		 */
 		flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
 
+		lockdep_irq_work_enter(work);
 		work->func(work);
+		lockdep_irq_work_exit(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
 		 * no-one else claimed it meanwhile.
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1113,6 +1113,7 @@ static int rcu_implicit_dynticks_qs(stru
 		    !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
 		    (rnp->ffmask & rdp->grpmask)) {
 			init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
+			atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ);
 			rdp->rcu_iw_pending = true;
 			rdp->rcu_iw_gp_seq = rnp->gp_seq;
 			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct i
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 	.func = nohz_full_kick_func,
+	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
 };
 
 /*

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 19/20] lockdep: Annotate irq_work
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.

Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irq_work.h |    2 ++
 include/linux/irqflags.h |   13 +++++++++++++
 kernel/irq_work.c        |    2 ++
 kernel/rcu/tree.c        |    1 +
 kernel/time/tick-sched.c |    1 +
 5 files changed, 19 insertions(+)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -18,6 +18,8 @@
 
 /* Doesn't want IPI, wait for tick: */
 #define IRQ_WORK_LAZY		BIT(2)
+/* Run hard IRQ context, even on RT */
+#define IRQ_WORK_HARD_IRQ	BIT(3)
 
 #define IRQ_WORK_CLAIMED	(IRQ_WORK_PENDING | IRQ_WORK_BUSY)
 
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,17 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_irq_work_enter(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 1;			\
+	  } while (0)
+# define lockdep_irq_work_exit(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 0;			\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -83,6 +94,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_irq_work_enter(__work)		do { } while (0)
+# define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -153,7 +153,9 @@ static void irq_work_run_list(struct lli
 		 */
 		flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
 
+		lockdep_irq_work_enter(work);
 		work->func(work);
+		lockdep_irq_work_exit(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
 		 * no-one else claimed it meanwhile.
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1113,6 +1113,7 @@ static int rcu_implicit_dynticks_qs(stru
 		    !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
 		    (rnp->ffmask & rdp->grpmask)) {
 			init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
+			atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ);
 			rdp->rcu_iw_pending = true;
 			rdp->rcu_iw_gp_seq = rnp->gp_seq;
 			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct i
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 	.func = nohz_full_kick_func,
+	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
 };
 
 /*



^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 19/20] lockdep: Annotate irq_work
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.

Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irq_work.h |    2 ++
 include/linux/irqflags.h |   13 +++++++++++++
 kernel/irq_work.c        |    2 ++
 kernel/rcu/tree.c        |    1 +
 kernel/time/tick-sched.c |    1 +
 5 files changed, 19 insertions(+)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -18,6 +18,8 @@
 
 /* Doesn't want IPI, wait for tick: */
 #define IRQ_WORK_LAZY		BIT(2)
+/* Run hard IRQ context, even on RT */
+#define IRQ_WORK_HARD_IRQ	BIT(3)
 
 #define IRQ_WORK_CLAIMED	(IRQ_WORK_PENDING | IRQ_WORK_BUSY)
 
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,17 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_irq_work_enter(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 1;			\
+	  } while (0)
+# define lockdep_irq_work_exit(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 0;			\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -83,6 +94,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_irq_work_enter(__work)		do { } while (0)
+# define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -153,7 +153,9 @@ static void irq_work_run_list(struct lli
 		 */
 		flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
 
+		lockdep_irq_work_enter(work);
 		work->func(work);
+		lockdep_irq_work_exit(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
 		 * no-one else claimed it meanwhile.
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1113,6 +1113,7 @@ static int rcu_implicit_dynticks_qs(stru
 		    !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
 		    (rnp->ffmask & rdp->grpmask)) {
 			init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
+			atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ);
 			rdp->rcu_iw_pending = true;
 			rdp->rcu_iw_gp_seq = rnp->gp_seq;
 			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct i
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 	.func = nohz_full_kick_func,
+	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
 };
 
 /*


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 20/20] lockdep: Add posixtimer context tracing bits
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 11:26   ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Splitting run_posix_cpu_timers() into two parts is work in progress which
is stuck on other entry code related problems. The heavy lifting which
involves locking of sighand lock will be moved into task context so the
necessary execution time is burdened on the task and not on interrupt
context.

Until this work completes lockdep with the spinlock nesting rules enabled
would emit warnings for this known context.

Prevent it by setting "->irq_config = 1" for the invocation of
run_posix_cpu_timers() so lockdep does not complain when sighand lock is
acquried. This will be removed once the split is completed.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqflags.h       |   12 ++++++++++++
 kernel/time/posix-cpu-timers.c |    6 +++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,16 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_posixtimer_enter()				\
+	  do {							\
+		  current->irq_config = 1;			\
+	  } while (0)
+
+# define lockdep_posixtimer_exit()				\
+	  do {							\
+		  current->irq_config = 0;			\
+	  } while (0)
+
 # define lockdep_irq_work_enter(__work)					\
 	  do {								\
 		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
@@ -94,6 +104,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_posixtimer_enter()		do { } while (0)
+# define lockdep_posixtimer_exit()		do { } while (0)
 # define lockdep_irq_work_enter(__work)		do { } while (0)
 # define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1126,8 +1126,11 @@ void run_posix_cpu_timers(void)
 	if (!fastpath_timer_check(tsk))
 		return;
 
-	if (!lock_task_sighand(tsk, &flags))
+	lockdep_posixtimer_enter();
+	if (!lock_task_sighand(tsk, &flags)) {
+		lockdep_posixtimer_exit();
 		return;
+	}
 	/*
 	 * Here we take off tsk->signal->cpu_timers[N] and
 	 * tsk->cpu_timers[N] all the timers that are firing, and
@@ -1169,6 +1172,7 @@ void run_posix_cpu_timers(void)
 			cpu_timer_fire(timer);
 		spin_unlock(&timer->it_lock);
 	}
+	lockdep_posixtimer_exit();
 }
 
 /*


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 20/20] lockdep: Add posixtimer context tracing bits
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Splitting run_posix_cpu_timers() into two parts is work in progress which
is stuck on other entry code related problems. The heavy lifting which
involves locking of sighand lock will be moved into task context so the
necessary execution time is burdened on the task and not on interrupt
context.

Until this work completes lockdep with the spinlock nesting rules enabled
would emit warnings for this known context.

Prevent it by setting "->irq_config = 1" for the invocation of
run_posix_cpu_timers() so lockdep does not complain when sighand lock is
acquried. This will be removed once the split is completed.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqflags.h       |   12 ++++++++++++
 kernel/time/posix-cpu-timers.c |    6 +++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,16 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_posixtimer_enter()				\
+	  do {							\
+		  current->irq_config = 1;			\
+	  } while (0)
+
+# define lockdep_posixtimer_exit()				\
+	  do {							\
+		  current->irq_config = 0;			\
+	  } while (0)
+
 # define lockdep_irq_work_enter(__work)					\
 	  do {								\
 		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
@@ -94,6 +104,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_posixtimer_enter()		do { } while (0)
+# define lockdep_posixtimer_exit()		do { } while (0)
 # define lockdep_irq_work_enter(__work)		do { } while (0)
 # define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1126,8 +1126,11 @@ void run_posix_cpu_timers(void)
 	if (!fastpath_timer_check(tsk))
 		return;
 
-	if (!lock_task_sighand(tsk, &flags))
+	lockdep_posixtimer_enter();
+	if (!lock_task_sighand(tsk, &flags)) {
+		lockdep_posixtimer_exit();
 		return;
+	}
 	/*
 	 * Here we take off tsk->signal->cpu_timers[N] and
 	 * tsk->cpu_timers[N] all the timers that are firing, and
@@ -1169,6 +1172,7 @@ void run_posix_cpu_timers(void)
 			cpu_timer_fire(timer);
 		spin_unlock(&timer->it_lock);
 	}
+	lockdep_posixtimer_exit();
 }
 
 /*

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 20/20] lockdep: Add posixtimer context tracing bits
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, Oleg Nesterov, Davidlohr Bueso, netdev,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Splitting run_posix_cpu_timers() into two parts is work in progress which
is stuck on other entry code related problems. The heavy lifting which
involves locking of sighand lock will be moved into task context so the
necessary execution time is burdened on the task and not on interrupt
context.

Until this work completes lockdep with the spinlock nesting rules enabled
would emit warnings for this known context.

Prevent it by setting "->irq_config = 1" for the invocation of
run_posix_cpu_timers() so lockdep does not complain when sighand lock is
acquried. This will be removed once the split is completed.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqflags.h       |   12 ++++++++++++
 kernel/time/posix-cpu-timers.c |    6 +++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,16 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_posixtimer_enter()				\
+	  do {							\
+		  current->irq_config = 1;			\
+	  } while (0)
+
+# define lockdep_posixtimer_exit()				\
+	  do {							\
+		  current->irq_config = 0;			\
+	  } while (0)
+
 # define lockdep_irq_work_enter(__work)					\
 	  do {								\
 		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
@@ -94,6 +104,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_posixtimer_enter()		do { } while (0)
+# define lockdep_posixtimer_exit()		do { } while (0)
 # define lockdep_irq_work_enter(__work)		do { } while (0)
 # define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1126,8 +1126,11 @@ void run_posix_cpu_timers(void)
 	if (!fastpath_timer_check(tsk))
 		return;
 
-	if (!lock_task_sighand(tsk, &flags))
+	lockdep_posixtimer_enter();
+	if (!lock_task_sighand(tsk, &flags)) {
+		lockdep_posixtimer_exit();
 		return;
+	}
 	/*
 	 * Here we take off tsk->signal->cpu_timers[N] and
 	 * tsk->cpu_timers[N] all the timers that are firing, and
@@ -1169,6 +1172,7 @@ void run_posix_cpu_timers(void)
 			cpu_timer_fire(timer);
 		spin_unlock(&timer->it_lock);
 	}
+	lockdep_posixtimer_exit();
 }
 
 /*


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [patch V3 20/20] lockdep: Add posixtimer context tracing bits
@ 2020-03-21 11:26   ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 11:26 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Splitting run_posix_cpu_timers() into two parts is work in progress which
is stuck on other entry code related problems. The heavy lifting which
involves locking of sighand lock will be moved into task context so the
necessary execution time is burdened on the task and not on interrupt
context.

Until this work completes lockdep with the spinlock nesting rules enabled
would emit warnings for this known context.

Prevent it by setting "->irq_config = 1" for the invocation of
run_posix_cpu_timers() so lockdep does not complain when sighand lock is
acquried. This will be removed once the split is completed.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqflags.h       |   12 ++++++++++++
 kernel/time/posix-cpu-timers.c |    6 +++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,16 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_posixtimer_enter()				\
+	  do {							\
+		  current->irq_config = 1;			\
+	  } while (0)
+
+# define lockdep_posixtimer_exit()				\
+	  do {							\
+		  current->irq_config = 0;			\
+	  } while (0)
+
 # define lockdep_irq_work_enter(__work)					\
 	  do {								\
 		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
@@ -94,6 +104,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_posixtimer_enter()		do { } while (0)
+# define lockdep_posixtimer_exit()		do { } while (0)
 # define lockdep_irq_work_enter(__work)		do { } while (0)
 # define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1126,8 +1126,11 @@ void run_posix_cpu_timers(void)
 	if (!fastpath_timer_check(tsk))
 		return;
 
-	if (!lock_task_sighand(tsk, &flags))
+	lockdep_posixtimer_enter();
+	if (!lock_task_sighand(tsk, &flags)) {
+		lockdep_posixtimer_exit();
 		return;
+	}
 	/*
 	 * Here we take off tsk->signal->cpu_timers[N] and
 	 * tsk->cpu_timers[N] all the timers that are firing, and
@@ -1169,6 +1172,7 @@ void run_posix_cpu_timers(void)
 			cpu_timer_fire(timer);
 		spin_unlock(&timer->it_lock);
 	}
+	lockdep_posixtimer_exit();
 }
 
 /*

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 05/20] acpi: Remove header dependency
  2020-03-21 11:25   ` Thomas Gleixner
@ 2020-03-21 12:23     ` Andy Shevchenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Andy Shevchenko @ 2020-03-21 12:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: USB, linux-ia64, Peter Zijlstra, linux-pci, Sebastian Siewior,
	Oleg Nesterov, Guo Ren, Joel Fernandes, Vincent Chen,
	Ingo Molnar, Davidlohr Bueso, kbuild test robot, Brian Cain,
	Jonathan Corbet, Paul E . McKenney, linux-hexagon,
	Rafael J. Wysocki, linux-csky, ACPI Devel Maling List,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Randy Dunlap,
	Arnd Bergmann

On Sat, Mar 21, 2020 at 1:34 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> In order to avoid future header hell, remove the inclusion of
> proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
> struct.
>

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # for PDx86

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andy Shevchenko <andy@infradead.org>
> Cc: platform-driver-x86@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: Len Brown <lenb@kernel.org>
> Cc: linux-acpi@vger.kernel.org
> ---
>  drivers/platform/x86/dell-smo8800.c                      |    1 +
>  drivers/platform/x86/wmi.c                               |    1 +
>  drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
>  include/acpi/acpi_bus.h                                  |    2 +-
>  4 files changed, 4 insertions(+), 1 deletion(-)
>
> --- a/drivers/platform/x86/dell-smo8800.c
> +++ b/drivers/platform/x86/dell-smo8800.c
> @@ -16,6 +16,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/miscdevice.h>
>  #include <linux/uaccess.h>
> +#include <linux/fs.h>
>
>  struct smo8800_device {
>         u32 irq;                     /* acpi device irq */
> --- a/drivers/platform/x86/wmi.c
> +++ b/drivers/platform/x86/wmi.c
> @@ -29,6 +29,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/uuid.h>
>  #include <linux/wmi.h>
> +#include <linux/fs.h>
>  #include <uapi/linux/wmi.h>
>
>  ACPI_MODULE_NAME("wmi");
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> @@ -19,6 +19,7 @@
>  #include <linux/acpi.h>
>  #include <linux/uaccess.h>
>  #include <linux/miscdevice.h>
> +#include <linux/fs.h>
>  #include "acpi_thermal_rel.h"
>
>  static acpi_handle acpi_thermal_rel_handle;
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
>
>  #ifdef CONFIG_ACPI
>
> -#include <linux/proc_fs.h>
> +struct proc_dir_entry;
>
>  #define ACPI_BUS_FILE_ROOT     "acpi"
>  extern struct proc_dir_entry *acpi_root_dir;
>
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 05/20] acpi: Remove header dependency
@ 2020-03-21 12:23     ` Andy Shevchenko
  0 siblings, 0 replies; 195+ messages in thread
From: Andy Shevchenko @ 2020-03-21 12:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: USB, linux-ia64, Peter Zijlstra, linux-pci, Sebastian Siewior,
	Oleg Nesterov, Guo Ren, Joel Fernandes, Vincent Chen,
	Ingo Molnar, Davidlohr Bueso, kbuild test robot, Brian Cain,
	Jonathan Corbet, Paul E . McKenney, linux-hexagon,
	Rafael J. Wysocki, linux-csky, ACPI Devel Maling List,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Randy Dunlap,
	Arnd Bergmann, Linux PM,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, Platform Driver, Kalle Valo,
	Felipe Balbi, Michal Simek, Tony Luck, Nick Hu, Geoff Levand,
	Greg Kroah-Hartman, Linus Torvalds,
	open list:TI WILINK WIRELES...,
	LKML, Davidlohr Bueso, netdev, Logan Gunthorpe, David S. Miller,
	Andy Shevchenko

On Sat, Mar 21, 2020 at 1:34 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> In order to avoid future header hell, remove the inclusion of
> proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
> struct.
>

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # for PDx86

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andy Shevchenko <andy@infradead.org>
> Cc: platform-driver-x86@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: Len Brown <lenb@kernel.org>
> Cc: linux-acpi@vger.kernel.org
> ---
>  drivers/platform/x86/dell-smo8800.c                      |    1 +
>  drivers/platform/x86/wmi.c                               |    1 +
>  drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
>  include/acpi/acpi_bus.h                                  |    2 +-
>  4 files changed, 4 insertions(+), 1 deletion(-)
>
> --- a/drivers/platform/x86/dell-smo8800.c
> +++ b/drivers/platform/x86/dell-smo8800.c
> @@ -16,6 +16,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/miscdevice.h>
>  #include <linux/uaccess.h>
> +#include <linux/fs.h>
>
>  struct smo8800_device {
>         u32 irq;                     /* acpi device irq */
> --- a/drivers/platform/x86/wmi.c
> +++ b/drivers/platform/x86/wmi.c
> @@ -29,6 +29,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/uuid.h>
>  #include <linux/wmi.h>
> +#include <linux/fs.h>
>  #include <uapi/linux/wmi.h>
>
>  ACPI_MODULE_NAME("wmi");
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> @@ -19,6 +19,7 @@
>  #include <linux/acpi.h>
>  #include <linux/uaccess.h>
>  #include <linux/miscdevice.h>
> +#include <linux/fs.h>
>  #include "acpi_thermal_rel.h"
>
>  static acpi_handle acpi_thermal_rel_handle;
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
>
>  #ifdef CONFIG_ACPI
>
> -#include <linux/proc_fs.h>
> +struct proc_dir_entry;
>
>  #define ACPI_BUS_FILE_ROOT     "acpi"
>  extern struct proc_dir_entry *acpi_root_dir;
>
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
  2020-03-21 11:25   ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 13:22     ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 13:22 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, Geoff Levand, linuxppc-dev, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>

That's obviously bogus and wants to be:

From: Peter Zijlstra (Intel) <peterz@infradead.org>


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-21 13:22     ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 13:22 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, Geoff Levand, linuxppc-dev, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>

That's obviously bogus and wants to be:

From: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-21 13:22     ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 13:22 UTC (permalink / raw)
  To: LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	linux-hexagon, Vincent Chen, Ingo Molnar, Jonathan Corbet,
	Davidlohr Bueso, kbuild test robot, Brian Cain, linux-acpi,
	Paul E . McKenney, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, Greentime Hu, Bjorn Helgaas, Kurt Schwemmer,
	Kalle Valo, Felipe Balbi, Michal Simek, Tony Luck, Nick Hu,
	Geoff Levand, Greg Kroah-Hartman, linux-usb, linux-wireless,
	Oleg Nesterov, Davidlohr Bueso, Logan Gunthorpe, netdev,
	linuxppc-dev, David S. Miller, Andy Shevchenko

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>

That's obviously bogus and wants to be:

From: Peter Zijlstra (Intel) <peterz@infradead.org>


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-21 13:22     ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 13:22 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, Geoff Levand, linuxppc-dev, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>

That's obviously bogus and wants to be:

From: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [tip: locking/core] lockdep: Add posixtimer context tracing bits
  2020-03-21 11:26   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     d53f2b62fcb63f6547c10d8c62bca19e957b0eef
Gitweb:        https://git.kernel.org/tip/d53f2b62fcb63f6547c10d8c62bca19e957b0eef
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:26:04 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:25 +01:00

lockdep: Add posixtimer context tracing bits

Splitting run_posix_cpu_timers() into two parts is work in progress which
is stuck on other entry code related problems. The heavy lifting which
involves locking of sighand lock will be moved into task context so the
necessary execution time is burdened on the task and not on interrupt
context.

Until this work completes lockdep with the spinlock nesting rules enabled
would emit warnings for this known context.

Prevent it by setting "->irq_config = 1" for the invocation of
run_posix_cpu_timers() so lockdep does not complain when sighand lock is
acquried. This will be removed once the split is completed.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.751182723@linutronix.de
---
 include/linux/irqflags.h       | 12 ++++++++++++
 kernel/time/posix-cpu-timers.c |  6 +++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index f23f540..a16adbb 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,16 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_posixtimer_enter()				\
+	  do {							\
+		  current->irq_config = 1;			\
+	  } while (0)
+
+# define lockdep_posixtimer_exit()				\
+	  do {							\
+		  current->irq_config = 0;			\
+	  } while (0)
+
 # define lockdep_irq_work_enter(__work)					\
 	  do {								\
 		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
@@ -94,6 +104,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_posixtimer_enter()		do { } while (0)
+# define lockdep_posixtimer_exit()		do { } while (0)
 # define lockdep_irq_work_enter(__work)		do { } while (0)
 # define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 8ff6da7..2c48a72 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1126,8 +1126,11 @@ void run_posix_cpu_timers(void)
 	if (!fastpath_timer_check(tsk))
 		return;
 
-	if (!lock_task_sighand(tsk, &flags))
+	lockdep_posixtimer_enter();
+	if (!lock_task_sighand(tsk, &flags)) {
+		lockdep_posixtimer_exit();
 		return;
+	}
 	/*
 	 * Here we take off tsk->signal->cpu_timers[N] and
 	 * tsk->cpu_timers[N] all the timers that are firing, and
@@ -1169,6 +1172,7 @@ void run_posix_cpu_timers(void)
 			cpu_timer_fire(timer);
 		spin_unlock(&timer->it_lock);
 	}
+	lockdep_posixtimer_exit();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] lockdep: Annotate irq_work
  2020-03-21 11:26   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  2020-03-21 16:40     ` Frederic Weisbecker
  -1 siblings, 1 reply; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     49915ac35ca7b07c54295a72d905be5064afb89e
Gitweb:        https://git.kernel.org/tip/49915ac35ca7b07c54295a72d905be5064afb89e
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:26:03 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:24 +01:00

lockdep: Annotate irq_work

Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.

Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.643576700@linutronix.de
---
 include/linux/irq_work.h |  2 ++
 include/linux/irqflags.h | 13 +++++++++++++
 kernel/irq_work.c        |  2 ++
 kernel/rcu/tree.c        |  1 +
 kernel/time/tick-sched.c |  1 +
 5 files changed, 19 insertions(+)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 02da997..3b752e8 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -18,6 +18,8 @@
 
 /* Doesn't want IPI, wait for tick: */
 #define IRQ_WORK_LAZY		BIT(2)
+/* Run hard IRQ context, even on RT */
+#define IRQ_WORK_HARD_IRQ	BIT(3)
 
 #define IRQ_WORK_CLAIMED	(IRQ_WORK_PENDING | IRQ_WORK_BUSY)
 
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 9c17f9c..f23f540 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -69,6 +69,17 @@ do {						\
 			current->irq_config = 0;	\
 	  } while (0)
 
+# define lockdep_irq_work_enter(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 1;			\
+	  } while (0)
+# define lockdep_irq_work_exit(__work)					\
+	  do {								\
+		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
+			current->irq_config = 0;			\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -83,6 +94,8 @@ do {						\
 # define lockdep_softirq_exit()		do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
 # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
+# define lockdep_irq_work_enter(__work)		do { } while (0)
+# define lockdep_irq_work_exit(__work)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 828cc30..48b5d1b 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -153,7 +153,9 @@ static void irq_work_run_list(struct llist_head *list)
 		 */
 		flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
 
+		lockdep_irq_work_enter(work);
 		work->func(work);
+		lockdep_irq_work_exit(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
 		 * no-one else claimed it meanwhile.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d91c915..5066d1d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1113,6 +1113,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
 		    !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
 		    (rnp->ffmask & rdp->grpmask)) {
 			init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
+			atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ);
 			rdp->rcu_iw_pending = true;
 			rdp->rcu_iw_gp_seq = rnp->gp_seq;
 			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4be756b..3e2dc9b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct irq_work *work)
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
 	.func = nohz_full_kick_func,
+	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
 };
 
 /*

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] lockdep: Add hrtimer context tracing bits
  2020-03-21 11:26   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  2020-03-21 16:46     ` Frederic Weisbecker
  -1 siblings, 1 reply; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     40db173965c05a1d803451240ed41707d5bd978d
Gitweb:        https://git.kernel.org/tip/40db173965c05a1d803451240ed41707d5bd978d
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:26:02 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:24 +01:00

lockdep: Add hrtimer context tracing bits

Set current->irq_config = 1 for hrtimers which are not marked to expire in
hard interrupt context during hrtimer_init(). These timers will expire in
softirq context on PREEMPT_RT.

Setting this allows lockdep to differentiate these timers. If a timer is
marked to expire in hard interrupt context then the timer callback is not
supposed to acquire a regular spinlock instead of a raw_spinlock in the
expiry callback.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.534508206@linutronix.de
---
 include/linux/irqflags.h | 15 +++++++++++++++
 include/linux/sched.h    |  1 +
 kernel/locking/lockdep.c |  2 +-
 kernel/time/hrtimer.c    |  6 +++++-
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index fdaf286..9c17f9c 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -56,6 +56,19 @@ do {						\
 do {						\
 	current->softirq_context--;		\
 } while (0)
+
+# define lockdep_hrtimer_enter(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 1;	\
+	  } while (0)
+
+# define lockdep_hrtimer_exit(__hrtimer)		\
+	  do {						\
+		  if (!__hrtimer->is_hard)		\
+			current->irq_config = 0;	\
+	  } while (0)
+
 #else
 # define trace_hardirqs_on()		do { } while (0)
 # define trace_hardirqs_off()		do { } while (0)
@@ -68,6 +81,8 @@ do {						\
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
+# define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
+# define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
 #endif
 
 #if defined(CONFIG_IRQSOFF_TRACER) || \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4d3b9ec..933914c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -983,6 +983,7 @@ struct task_struct {
 	unsigned int			softirq_enable_event;
 	int				softirqs_enabled;
 	int				softirq_context;
+	int				irq_config;
 #endif
 
 #ifdef CONFIG_LOCKDEP
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 6b9f9f3..0ebf980 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4025,7 +4025,7 @@ static int check_wait_context(struct task_struct *curr, struct held_lock *next)
 		/*
 		 * Check if force_irqthreads will run us threaded.
 		 */
-		if (curr->hardirq_threaded)
+		if (curr->hardirq_threaded || curr->irq_config)
 			curr_inner = LD_WAIT_CONFIG;
 		else
 			curr_inner = LD_WAIT_SPIN;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 3a609e7..8cce725 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1404,7 +1404,7 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
 	base = softtimer ? HRTIMER_MAX_CLOCK_BASES / 2 : 0;
 	base += hrtimer_clockid_to_base(clock_id);
 	timer->is_soft = softtimer;
-	timer->is_hard = !softtimer;
+	timer->is_hard = !!(mode & HRTIMER_MODE_HARD);
 	timer->base = &cpu_base->clock_base[base];
 	timerqueue_init(&timer->node);
 }
@@ -1514,7 +1514,11 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base,
 	 */
 	raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
 	trace_hrtimer_expire_entry(timer, now);
+	lockdep_hrtimer_enter(timer);
+
 	restart = fn(timer);
+
+	lockdep_hrtimer_exit(timer);
 	trace_hrtimer_expire_exit(timer);
 	raw_spin_lock_irq(&cpu_base->lock);
 

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] sched/swait: Prepare usage in completions
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     b3212fe2bc06fa1014b3063b85b2bac4332a1c28
Gitweb:        https://git.kernel.org/tip/b3212fe2bc06fa1014b3063b85b2bac4332a1c28
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:59 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:23 +01:00

sched/swait: Prepare usage in completions

As a preparation to use simple wait queues for completions:

  - Provide swake_up_all_locked() to support complete_all()
  - Make __prepare_to_swait() public available

This is done to enable the usage of complete() within truly atomic contexts
on a PREEMPT_RT enabled kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.228481202@linutronix.de
---
 kernel/sched/sched.h |  3 +++
 kernel/sched/swait.c | 15 ++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9ea6478..fdc77e7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2492,3 +2492,6 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
 	return true;
 }
 #endif
+
+void swake_up_all_locked(struct swait_queue_head *q);
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
index e83a3f8..e1c655f 100644
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -32,6 +32,19 @@ void swake_up_locked(struct swait_queue_head *q)
 }
 EXPORT_SYMBOL(swake_up_locked);
 
+/*
+ * Wake up all waiters. This is an interface which is solely exposed for
+ * completions and not for general usage.
+ *
+ * It is intentionally different from swake_up_all() to allow usage from
+ * hard interrupt context and interrupt disabled regions.
+ */
+void swake_up_all_locked(struct swait_queue_head *q)
+{
+	while (!list_empty(&q->task_list))
+		swake_up_locked(q);
+}
+
 void swake_up_one(struct swait_queue_head *q)
 {
 	unsigned long flags;
@@ -69,7 +82,7 @@ void swake_up_all(struct swait_queue_head *q)
 }
 EXPORT_SYMBOL(swake_up_all);
 
-static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
+void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait)
 {
 	wait->task = current;
 	if (list_empty(&wait->task_list))

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] lockdep: Introduce wait-type checks
  2020-03-21 11:26   ` Thomas Gleixner
                     ` (3 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Peter Zijlstra
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Sebastian Andrzej Siewior, Thomas Gleixner, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     de8f5e4f2dc1f032b46afda0a78cab5456974f89
Gitweb:        https://git.kernel.org/tip/de8f5e4f2dc1f032b46afda0a78cab5456974f89
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Sat, 21 Mar 2020 12:26:01 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:24 +01:00

lockdep: Introduce wait-type checks

Extend lockdep to validate lock wait-type context.

The current wait-types are:

	LD_WAIT_FREE,		/* wait free, rcu etc.. */
	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */

Where lockdep validates that the current lock (the one being acquired)
fits in the current wait-context (as generated by the held stack).

This ensures that there is no attempt to acquire mutexes while holding
spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In
other words, its a more fancy might_sleep().

Obviously RCU made the entire ordeal more complex than a simple single
value test because RCU can be acquired in (pretty much) any context and
while it presents a context to nested locks it is not the same as it
got acquired in.

Therefore its necessary to split the wait_type into two values, one
representing the acquire (outer) and one representing the nested context
(inner). For most 'normal' locks these two are the same.

[ To make static initialization easier we have the rule that:
  .outer == INV means .outer == .inner; because INV == 0. ]

It further means that its required to find the minimal .inner of the held
stack to compare against the outer of the new lock; because while 'normal'
RCU presents a CONFIG type to nested locks, if it is taken while already
holding a SPIN type it obviously doesn't relax the rules.

Below is an example output generated by the trivial test code:

  raw_spin_lock(&foo);
  spin_lock(&bar);
  spin_unlock(&bar);
  raw_spin_unlock(&foo);

 [ BUG: Invalid wait context ]
 -----------------------------
 swapper/0/1 is trying to lock:
 ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187
 other info that might help us debug this:
 1 lock held by swapper/0/1:
  #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187

The way to read it is to look at the new -{n,m} part in the lock
description; -{3:3} for the attempted lock, and try and match that up to
the held locks, which in this case is the one: -{2,2}.

This tells that the acquiring lock requires a more relaxed environment than
presented by the lock stack.

Currently only the normal locks and RCU are converted, the rest of the
lockdep users defaults to .inner = INV which is ignored. More conversions
can be done when desired.

The check for spinlock_t nesting is not enabled by default. It's a separate
config option for now as there are known problems which are currently
addressed. The config option allows to identify these problems and to
verify that the solutions found are indeed solving them.

The config switch will be removed and the checks will permanently enabled
once the vast majority of issues has been addressed.

[ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile
	   failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP]
[ tglx: Add the config option ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de
---
 include/linux/irqflags.h        |   8 +-
 include/linux/lockdep.h         |  71 +++++++++++++---
 include/linux/mutex.h           |   7 +-
 include/linux/rwlock_types.h    |   6 +-
 include/linux/rwsem.h           |   6 +-
 include/linux/sched.h           |   1 +-
 include/linux/spinlock.h        |  35 +++++---
 include/linux/spinlock_types.h  |  24 ++++-
 kernel/irq/handle.c             |   7 ++-
 kernel/locking/lockdep.c        | 138 +++++++++++++++++++++++++++++--
 kernel/locking/mutex-debug.c    |   2 +-
 kernel/locking/rwsem.c          |   2 +-
 kernel/locking/spinlock_debug.c |   6 +-
 kernel/rcu/update.c             |  24 +++--
 lib/Kconfig.debug               |  17 ++++-
 15 files changed, 307 insertions(+), 47 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c9..fdaf286 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -37,7 +37,12 @@
 # define trace_softirqs_enabled(p)	((p)->softirqs_enabled)
 # define trace_hardirq_enter()			\
 do {						\
-	current->hardirq_context++;		\
+	if (!current->hardirq_context++)	\
+		current->hardirq_threaded = 0;	\
+} while (0)
+# define trace_hardirq_threaded()		\
+do {						\
+	current->hardirq_threaded = 1;		\
 } while (0)
 # define trace_hardirq_exit()			\
 do {						\
@@ -59,6 +64,7 @@ do {						\
 # define trace_hardirqs_enabled(p)	0
 # define trace_softirqs_enabled(p)	0
 # define trace_hardirq_enter()		do { } while (0)
+# define trace_hardirq_threaded()	do { } while (0)
 # define trace_hardirq_exit()		do { } while (0)
 # define lockdep_softirq_enter()	do { } while (0)
 # define lockdep_softirq_exit()		do { } while (0)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 664f52c..425b4ce 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -21,6 +21,22 @@ extern int lock_stat;
 
 #include <linux/types.h>
 
+enum lockdep_wait_type {
+	LD_WAIT_INV = 0,	/* not checked, catch all */
+
+	LD_WAIT_FREE,		/* wait free, rcu etc.. */
+	LD_WAIT_SPIN,		/* spin loops, raw_spinlock_t etc.. */
+
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+	LD_WAIT_CONFIG,		/* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
+#else
+	LD_WAIT_CONFIG = LD_WAIT_SPIN,
+#endif
+	LD_WAIT_SLEEP,		/* sleeping locks, mutex_t etc.. */
+
+	LD_WAIT_MAX,		/* must be last */
+};
+
 #ifdef CONFIG_LOCKDEP
 
 #include <linux/linkage.h>
@@ -111,6 +127,9 @@ struct lock_class {
 	int				name_version;
 	const char			*name;
 
+	short				wait_type_inner;
+	short				wait_type_outer;
+
 #ifdef CONFIG_LOCK_STAT
 	unsigned long			contention_point[LOCKSTAT_POINTS];
 	unsigned long			contending_point[LOCKSTAT_POINTS];
@@ -158,6 +177,8 @@ struct lockdep_map {
 	struct lock_class_key		*key;
 	struct lock_class		*class_cache[NR_LOCKDEP_CACHING_CLASSES];
 	const char			*name;
+	short				wait_type_outer; /* can be taken in this context */
+	short				wait_type_inner; /* presents this context */
 #ifdef CONFIG_LOCK_STAT
 	int				cpu;
 	unsigned long			ip;
@@ -299,8 +320,21 @@ extern void lockdep_unregister_key(struct lock_class_key *key);
  * to lockdep:
  */
 
-extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
-			     struct lock_class_key *key, int subclass);
+extern void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+	struct lock_class_key *key, int subclass, short inner, short outer);
+
+static inline void
+lockdep_init_map_wait(struct lockdep_map *lock, const char *name,
+		      struct lock_class_key *key, int subclass, short inner)
+{
+	lockdep_init_map_waits(lock, name, key, subclass, inner, LD_WAIT_INV);
+}
+
+static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
+			     struct lock_class_key *key, int subclass)
+{
+	lockdep_init_map_wait(lock, name, key, subclass, LD_WAIT_INV);
+}
 
 /*
  * Reinitialize a lock key - for cases where there is special locking or
@@ -308,18 +342,29 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
  * of dependencies wrong: they are either too broad (they need a class-split)
  * or they are too narrow (they suffer from a false class-split):
  */
-#define lockdep_set_class(lock, key) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, 0)
-#define lockdep_set_class_and_name(lock, key, name) \
-		lockdep_init_map(&(lock)->dep_map, name, key, 0)
-#define lockdep_set_class_and_subclass(lock, key, sub) \
-		lockdep_init_map(&(lock)->dep_map, #key, key, sub)
-#define lockdep_set_subclass(lock, sub)	\
-		lockdep_init_map(&(lock)->dep_map, #lock, \
-				 (lock)->dep_map.key, sub)
+#define lockdep_set_class(lock, key)				\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_name(lock, key, name)		\
+	lockdep_init_map_waits(&(lock)->dep_map, name, key, 0,	\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_class_and_subclass(lock, key, sub)		\
+	lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\
+			       (lock)->dep_map.wait_type_inner,	\
+			       (lock)->dep_map.wait_type_outer)
+
+#define lockdep_set_subclass(lock, sub)					\
+	lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+			       (lock)->dep_map.wait_type_inner,		\
+			       (lock)->dep_map.wait_type_outer)
 
 #define lockdep_set_novalidate_class(lock) \
 	lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock)
+
 /*
  * Compare locking classes
  */
@@ -432,6 +477,10 @@ static inline void lockdep_set_selftest_task(struct task_struct *task)
 # define lock_set_class(l, n, k, s, i)		do { } while (0)
 # define lock_set_subclass(l, s, i)		do { } while (0)
 # define lockdep_init()				do { } while (0)
+# define lockdep_init_map_waits(lock, name, key, sub, inner, outer) \
+		do { (void)(name); (void)(key); } while (0)
+# define lockdep_init_map_wait(lock, name, key, sub, inner) \
+		do { (void)(name); (void)(key); } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
 # define lockdep_set_class(lock, key)		do { (void)(key); } while (0)
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index aca8f36..ae197cc 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -109,8 +109,11 @@ do {									\
 } while (0)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
-		, .dep_map = { .name = #lockname }
+# define __DEP_MAP_MUTEX_INITIALIZER(lockname)			\
+		, .dep_map = {					\
+			.name = #lockname,			\
+			.wait_type_inner = LD_WAIT_SLEEP,	\
+		}
 #else
 # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
 #endif
diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index 857a72c..3bd03e1 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -22,7 +22,11 @@ typedef struct {
 #define RWLOCK_MAGIC		0xdeaf1eed
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define RW_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RW_DEP_MAP_INIT(lockname)					\
+	.dep_map = {							\
+		.name = #lockname,					\
+		.wait_type_inner = LD_WAIT_CONFIG,			\
+	}
 #else
 # define RW_DEP_MAP_INIT(lockname)
 #endif
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 8a418d9..7e5b2a4 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -65,7 +65,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 /* Common initializer macros and functions */
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
+# define __RWSEM_DEP_MAP_INIT(lockname)			\
+	, .dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SLEEP,	\
+	}
 #else
 # define __RWSEM_DEP_MAP_INIT(lockname)
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0427849..4d3b9ec 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -970,6 +970,7 @@ struct task_struct {
 
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
+	unsigned int			hardirq_threaded;
 	unsigned long			hardirq_enable_ip;
 	unsigned long			hardirq_disable_ip;
 	unsigned int			hardirq_enable_event;
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 031ce86..d3770b3 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -93,12 +93,13 @@
 
 #ifdef CONFIG_DEBUG_SPINLOCK
   extern void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-				   struct lock_class_key *key);
-# define raw_spin_lock_init(lock)				\
-do {								\
-	static struct lock_class_key __key;			\
-								\
-	__raw_spin_lock_init((lock), #lock, &__key);		\
+				   struct lock_class_key *key, short inner);
+
+# define raw_spin_lock_init(lock)					\
+do {									\
+	static struct lock_class_key __key;				\
+									\
+	__raw_spin_lock_init((lock), #lock, &__key, LD_WAIT_SPIN);	\
 } while (0)
 
 #else
@@ -327,12 +328,26 @@ static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
 	return &lock->rlock;
 }
 
-#define spin_lock_init(_lock)				\
-do {							\
-	spinlock_check(_lock);				\
-	raw_spin_lock_init(&(_lock)->rlock);		\
+#ifdef CONFIG_DEBUG_SPINLOCK
+
+# define spin_lock_init(lock)					\
+do {								\
+	static struct lock_class_key __key;			\
+								\
+	__raw_spin_lock_init(spinlock_check(lock),		\
+			     #lock, &__key, LD_WAIT_CONFIG);	\
+} while (0)
+
+#else
+
+# define spin_lock_init(_lock)			\
+do {						\
+	spinlock_check(_lock);			\
+	*(_lock) = __SPIN_LOCK_UNLOCKED(_lock);	\
 } while (0)
 
+#endif
+
 static __always_inline void spin_lock(spinlock_t *lock)
 {
 	raw_spin_lock(&lock->rlock);
diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index 24b4e6f..6102e6b 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -33,8 +33,18 @@ typedef struct raw_spinlock {
 #define SPINLOCK_OWNER_INIT	((void *)-1L)
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define SPIN_DEP_MAP_INIT(lockname)	.dep_map = { .name = #lockname }
+# define RAW_SPIN_DEP_MAP_INIT(lockname)		\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_SPIN,	\
+	}
+# define SPIN_DEP_MAP_INIT(lockname)			\
+	.dep_map = {					\
+		.name = #lockname,			\
+		.wait_type_inner = LD_WAIT_CONFIG,	\
+	}
 #else
+# define RAW_SPIN_DEP_MAP_INIT(lockname)
 # define SPIN_DEP_MAP_INIT(lockname)
 #endif
 
@@ -51,7 +61,7 @@ typedef struct raw_spinlock {
 	{					\
 	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 	SPIN_DEBUG_INIT(lockname)		\
-	SPIN_DEP_MAP_INIT(lockname) }
+	RAW_SPIN_DEP_MAP_INIT(lockname) }
 
 #define __RAW_SPIN_LOCK_UNLOCKED(lockname)	\
 	(raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
@@ -72,11 +82,17 @@ typedef struct spinlock {
 	};
 } spinlock_t;
 
+#define ___SPIN_LOCK_INITIALIZER(lockname)	\
+	{					\
+	.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
+	SPIN_DEBUG_INIT(lockname)		\
+	SPIN_DEP_MAP_INIT(lockname) }
+
 #define __SPIN_LOCK_INITIALIZER(lockname) \
-	{ { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } }
+	{ { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
 
 #define __SPIN_LOCK_UNLOCKED(lockname) \
-	(spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname)
+	(spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
 
 #define DEFINE_SPINLOCK(x)	spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
 
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a4ace61..16ee716 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -145,6 +145,13 @@ irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags
 	for_each_action_of_desc(desc, action) {
 		irqreturn_t res;
 
+		/*
+		 * If this IRQ would be threaded under force_irqthreads, mark it so.
+		 */
+		if (irq_settings_can_thread(desc) &&
+		    !(action->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)))
+			trace_hardirq_threaded();
+
 		trace_irq_handler_entry(irq, action);
 		res = action->handler(irq, action->dev_id);
 		trace_irq_handler_exit(irq, action, res);
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4c3b1cc..6b9f9f3 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -683,7 +683,9 @@ static void print_lock_name(struct lock_class *class)
 
 	printk(KERN_CONT " (");
 	__print_lock_name(class);
-	printk(KERN_CONT "){%s}", usage);
+	printk(KERN_CONT "){%s}-{%hd:%hd}", usage,
+			class->wait_type_outer ?: class->wait_type_inner,
+			class->wait_type_inner);
 }
 
 static void print_lockdep_cache(struct lockdep_map *lock)
@@ -1264,6 +1266,8 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
 	WARN_ON_ONCE(!list_empty(&class->locks_before));
 	WARN_ON_ONCE(!list_empty(&class->locks_after));
 	class->name_version = count_matching_names(class);
+	class->wait_type_inner = lock->wait_type_inner;
+	class->wait_type_outer = lock->wait_type_outer;
 	/*
 	 * We use RCU's safe list-add method to make
 	 * parallel walking of the hash-list safe:
@@ -3948,6 +3952,113 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
 	return ret;
 }
 
+static int
+print_lock_invalid_wait_context(struct task_struct *curr,
+				struct held_lock *hlock)
+{
+	if (!debug_locks_off())
+		return 0;
+	if (debug_locks_silent)
+		return 0;
+
+	pr_warn("\n");
+	pr_warn("=============================\n");
+	pr_warn("[ BUG: Invalid wait context ]\n");
+	print_kernel_ident();
+	pr_warn("-----------------------------\n");
+
+	pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
+	print_lock(hlock);
+
+	pr_warn("other info that might help us debug this:\n");
+	lockdep_print_held_locks(curr);
+
+	pr_warn("stack backtrace:\n");
+	dump_stack();
+
+	return 0;
+}
+
+/*
+ * Verify the wait_type context.
+ *
+ * This check validates we takes locks in the right wait-type order; that is it
+ * ensures that we do not take mutexes inside spinlocks and do not attempt to
+ * acquire spinlocks inside raw_spinlocks and the sort.
+ *
+ * The entire thing is slightly more complex because of RCU, RCU is a lock that
+ * can be taken from (pretty much) any context but also has constraints.
+ * However when taken in a stricter environment the RCU lock does not loosen
+ * the constraints.
+ *
+ * Therefore we must look for the strictest environment in the lock stack and
+ * compare that to the lock we're trying to acquire.
+ */
+static int check_wait_context(struct task_struct *curr, struct held_lock *next)
+{
+	short next_inner = hlock_class(next)->wait_type_inner;
+	short next_outer = hlock_class(next)->wait_type_outer;
+	short curr_inner;
+	int depth;
+
+	if (!curr->lockdep_depth || !next_inner || next->trylock)
+		return 0;
+
+	if (!next_outer)
+		next_outer = next_inner;
+
+	/*
+	 * Find start of current irq_context..
+	 */
+	for (depth = curr->lockdep_depth - 1; depth >= 0; depth--) {
+		struct held_lock *prev = curr->held_locks + depth;
+		if (prev->irq_context != next->irq_context)
+			break;
+	}
+	depth++;
+
+	/*
+	 * Set appropriate wait type for the context; for IRQs we have to take
+	 * into account force_irqthread as that is implied by PREEMPT_RT.
+	 */
+	if (curr->hardirq_context) {
+		/*
+		 * Check if force_irqthreads will run us threaded.
+		 */
+		if (curr->hardirq_threaded)
+			curr_inner = LD_WAIT_CONFIG;
+		else
+			curr_inner = LD_WAIT_SPIN;
+	} else if (curr->softirq_context) {
+		/*
+		 * Softirqs are always threaded.
+		 */
+		curr_inner = LD_WAIT_CONFIG;
+	} else {
+		curr_inner = LD_WAIT_MAX;
+	}
+
+	for (; depth < curr->lockdep_depth; depth++) {
+		struct held_lock *prev = curr->held_locks + depth;
+		short prev_inner = hlock_class(prev)->wait_type_inner;
+
+		if (prev_inner) {
+			/*
+			 * We can have a bigger inner than a previous one
+			 * when outer is smaller than inner, as with RCU.
+			 *
+			 * Also due to trylocks.
+			 */
+			curr_inner = min(curr_inner, prev_inner);
+		}
+	}
+
+	if (next_outer > curr_inner)
+		return print_lock_invalid_wait_context(curr, next);
+
+	return 0;
+}
+
 #else /* CONFIG_PROVE_LOCKING */
 
 static inline int
@@ -3967,13 +4078,20 @@ static inline int separate_irq_context(struct task_struct *curr,
 	return 0;
 }
 
+static inline int check_wait_context(struct task_struct *curr,
+				     struct held_lock *next)
+{
+	return 0;
+}
+
 #endif /* CONFIG_PROVE_LOCKING */
 
 /*
  * Initialize a lock instance's lock-class mapping info:
  */
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
-		      struct lock_class_key *key, int subclass)
+void lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
+			    struct lock_class_key *key, int subclass,
+			    short inner, short outer)
 {
 	int i;
 
@@ -3994,6 +4112,9 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
 
 	lock->name = name;
 
+	lock->wait_type_outer = outer;
+	lock->wait_type_inner = inner;
+
 	/*
 	 * No key, no joy, we need to hash something.
 	 */
@@ -4027,7 +4148,7 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
 		raw_local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(lockdep_init_map);
+EXPORT_SYMBOL_GPL(lockdep_init_map_waits);
 
 struct lock_class_key __lockdep_no_validate__;
 EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -4128,7 +4249,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 
 	class_idx = class - lock_classes;
 
-	if (depth) {
+	if (depth) { /* we're holding locks */
 		hlock = curr->held_locks + depth - 1;
 		if (hlock->class_idx == class_idx && nest_lock) {
 			if (!references)
@@ -4170,6 +4291,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 #endif
 	hlock->pin_count = pin_count;
 
+	if (check_wait_context(curr, hlock))
+		return 0;
+
 	/* Initialize the lock usage bit */
 	if (!mark_usage(curr, hlock, check))
 		return 0;
@@ -4405,7 +4529,9 @@ __lock_set_class(struct lockdep_map *lock, const char *name,
 		return 0;
 	}
 
-	lockdep_init_map(lock, name, key, 0);
+	lockdep_init_map_waits(lock, name, key, 0,
+			       lock->wait_type_inner,
+			       lock->wait_type_outer);
 	class = register_lock_class(lock, subclass, 0);
 	hlock->class_idx = class - lock_classes;
 
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 771d4ca..a7276aa 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -85,7 +85,7 @@ void debug_mutex_init(struct mutex *lock, const char *name,
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 	lock->magic = lock;
 }
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index e6f437b..f11b9bd 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -328,7 +328,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 	 * Make sure we are not reinitializing a held semaphore:
 	 */
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
-	lockdep_init_map(&sem->dep_map, name, key, 0);
+	lockdep_init_map_wait(&sem->dep_map, name, key, 0, LD_WAIT_SLEEP);
 #endif
 #ifdef CONFIG_DEBUG_RWSEMS
 	sem->magic = sem;
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 472dd46..b9d9308 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -14,14 +14,14 @@
 #include <linux/export.h>
 
 void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
-			  struct lock_class_key *key)
+			  struct lock_class_key *key, short inner)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	/*
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, inner);
 #endif
 	lock->raw_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
 	lock->magic = SPINLOCK_MAGIC;
@@ -39,7 +39,7 @@ void __rwlock_init(rwlock_t *lock, const char *name,
 	 * Make sure we are not reinitializing a held lock:
 	 */
 	debug_check_no_locks_freed((void *)lock, sizeof(*lock));
-	lockdep_init_map(&lock->dep_map, name, key, 0);
+	lockdep_init_map_wait(&lock->dep_map, name, key, 0, LD_WAIT_CONFIG);
 #endif
 	lock->raw_lock = (arch_rwlock_t) __ARCH_RW_LOCK_UNLOCKED;
 	lock->magic = RWLOCK_MAGIC;
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 6c4b862..8d3eb2f 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -227,18 +227,30 @@ core_initcall(rcu_set_runtime_mode);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 static struct lock_class_key rcu_lock_key;
-struct lockdep_map rcu_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock", &rcu_lock_key);
+struct lockdep_map rcu_lock_map = {
+	.name = "rcu_read_lock",
+	.key = &rcu_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* XXX PREEMPT_RCU ? */
+};
 EXPORT_SYMBOL_GPL(rcu_lock_map);
 
 static struct lock_class_key rcu_bh_lock_key;
-struct lockdep_map rcu_bh_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_bh", &rcu_bh_lock_key);
+struct lockdep_map rcu_bh_lock_map = {
+	.name = "rcu_read_lock_bh",
+	.key = &rcu_bh_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_CONFIG, /* PREEMPT_LOCK also makes BH preemptible */
+};
 EXPORT_SYMBOL_GPL(rcu_bh_lock_map);
 
 static struct lock_class_key rcu_sched_lock_key;
-struct lockdep_map rcu_sched_lock_map =
-	STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key);
+struct lockdep_map rcu_sched_lock_map = {
+	.name = "rcu_read_lock_sched",
+	.key = &rcu_sched_lock_key,
+	.wait_type_outer = LD_WAIT_FREE,
+	.wait_type_inner = LD_WAIT_SPIN,
+};
 EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
 
 static struct lock_class_key rcu_callback_key;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 69def4a..70813e3 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1086,6 +1086,23 @@ config PROVE_LOCKING
 
 	 For more details, see Documentation/locking/lockdep-design.rst.
 
+config PROVE_RAW_LOCK_NESTING
+	bool "Enable raw_spinlock - spinlock nesting checks"
+	depends on PROVE_LOCKING
+	default n
+	help
+	 Enable the raw_spinlock vs. spinlock nesting checks which ensure
+	 that the lock nesting rules for PREEMPT_RT enabled kernels are
+	 not violated.
+
+	 NOTE: There are known nesting problems. So if you enable this
+	 option expect lockdep splats until these problems have been fully
+	 addressed which is work in progress. This config switch allows to
+	 identify and analyze these problems. It will be removed and the
+	 check permanentely enabled once the main issues have been fixed.
+
+	 If unsure, select N.
+
 config LOCK_STAT
 	bool "Lock usage statistics"
 	depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] completion: Use simple wait queues
  2020-03-21 11:26   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Peter Zijlstra (Intel),
	Greg Kroah-Hartman, Davidlohr Bueso, Joel Fernandes (Google),
	Linus Torvalds, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     a5c6234e10280b3ec65e2410ce34904a2580e5f8
Gitweb:        https://git.kernel.org/tip/a5c6234e10280b3ec65e2410ce34904a2580e5f8
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:26:00 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:24 +01:00

completion: Use simple wait queues

completion uses a wait_queue_head_t to enqueue waiters.

wait_queue_head_t contains a spinlock_t to protect the list of waiters
which excludes it from being used in truly atomic context on a PREEMPT_RT
enabled kernel.

The spinlock in the wait queue head cannot be replaced by a raw_spinlock
because:

  - wait queues can have custom wakeup callbacks, which acquire other
    spinlock_t locks and have potentially long execution times

  - wake_up() walks an unbounded number of list entries during the wake up
    and may wake an unbounded number of waiters.

For simplicity and performance reasons complete() should be usable on
PREEMPT_RT enabled kernels.

completions do not use custom wakeup callbacks and are usually single
waiter, except for a few corner cases.

Replace the wait queue in the completion with a simple wait queue (swait),
which uses a raw_spinlock_t for protecting the waiter list and therefore is
safe to use inside truly atomic regions on PREEMPT_RT.

There is no semantical or functional change:

  - completions use the exclusive wait mode which is what swait provides

  - complete() wakes one exclusive waiter

  - complete_all() wakes all waiters while holding the lock which protects
    the wait queue against newly incoming waiters. The conversion to swait
    preserves this behaviour.

complete_all() might cause unbound latencies with a large number of waiters
being woken at once, but most complete_all() usage sites are either in
testing or initialization code or have only a really small number of
concurrent waiters which for now does not cause a latency problem. Keep it
simple for now.

The fixup of the warning check in the USB gadget driver is just a straight
forward conversion of the lockless waiter check from one waitqueue type to
the other.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200321113242.317954042@linutronix.de
---
 drivers/usb/gadget/function/f_fs.c |  2 +-
 include/linux/completion.h         |  8 +++---
 kernel/sched/completion.c          | 36 +++++++++++++++--------------
 3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 5719176..234177d 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1703,7 +1703,7 @@ static void ffs_data_put(struct ffs_data *ffs)
 		pr_info("%s(): freeing\n", __func__);
 		ffs_data_clear(ffs);
 		BUG_ON(waitqueue_active(&ffs->ev.waitq) ||
-		       waitqueue_active(&ffs->ep0req_completion.wait) ||
+		       swait_active(&ffs->ep0req_completion.wait) ||
 		       waitqueue_active(&ffs->wait));
 		destroy_workqueue(ffs->io_completion_wq);
 		kfree(ffs->dev_name);
diff --git a/include/linux/completion.h b/include/linux/completion.h
index 519e949..bf8e770 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,7 +9,7 @@
  * See kernel/sched/completion.c for details.
  */
 
-#include <linux/wait.h>
+#include <linux/swait.h>
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,7 +25,7 @@
  */
 struct completion {
 	unsigned int done;
-	wait_queue_head_t wait;
+	struct swait_queue_head wait;
 };
 
 #define init_completion_map(x, m) __init_completion(x)
@@ -34,7 +34,7 @@ static inline void complete_acquire(struct completion *x) {}
 static inline void complete_release(struct completion *x) {}
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -85,7 +85,7 @@ static inline void complete_release(struct completion *x) {}
 static inline void __init_completion(struct completion *x)
 {
 	x->done = 0;
-	init_waitqueue_head(&x->wait);
+	init_swait_queue_head(&x->wait);
 }
 
 /**
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index a1ad5b7..f15e961 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -29,12 +29,12 @@ void complete(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 
 	if (x->done != UINT_MAX)
 		x->done++;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 1);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete);
 
@@ -58,10 +58,12 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	WARN_ON(irqs_disabled());
+
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-	__wake_up_locked(&x->wait, TASK_NORMAL, 0);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	swake_up_all_locked(&x->wait);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 }
 EXPORT_SYMBOL(complete_all);
 
@@ -70,20 +72,20 @@ do_wait_for_common(struct completion *x,
 		   long (*action)(long), long timeout, int state)
 {
 	if (!x->done) {
-		DECLARE_WAITQUEUE(wait, current);
+		DECLARE_SWAITQUEUE(wait);
 
-		__add_wait_queue_entry_tail_exclusive(&x->wait, &wait);
 		do {
 			if (signal_pending_state(state, current)) {
 				timeout = -ERESTARTSYS;
 				break;
 			}
+			__prepare_to_swait(&x->wait, &wait);
 			__set_current_state(state);
-			spin_unlock_irq(&x->wait.lock);
+			raw_spin_unlock_irq(&x->wait.lock);
 			timeout = action(timeout);
-			spin_lock_irq(&x->wait.lock);
+			raw_spin_lock_irq(&x->wait.lock);
 		} while (!x->done && timeout);
-		__remove_wait_queue(&x->wait, &wait);
+		__finish_swait(&x->wait, &wait);
 		if (!x->done)
 			return timeout;
 	}
@@ -100,9 +102,9 @@ __wait_for_common(struct completion *x,
 
 	complete_acquire(x);
 
-	spin_lock_irq(&x->wait.lock);
+	raw_spin_lock_irq(&x->wait.lock);
 	timeout = do_wait_for_common(x, action, timeout, state);
-	spin_unlock_irq(&x->wait.lock);
+	raw_spin_unlock_irq(&x->wait.lock);
 
 	complete_release(x);
 
@@ -291,12 +293,12 @@ bool try_wait_for_completion(struct completion *x)
 	if (!READ_ONCE(x->done))
 		return false;
 
-	spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	if (!x->done)
 		ret = false;
 	else if (x->done != UINT_MAX)
 		x->done--;
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(try_wait_for_completion);
@@ -322,8 +324,8 @@ bool completion_done(struct completion *x)
 	 * otherwise we can end up freeing the completion before complete()
 	 * is done referencing it.
 	 */
-	spin_lock_irqsave(&x->wait.lock, flags);
-	spin_unlock_irqrestore(&x->wait.lock, flags);
+	raw_spin_lock_irqsave(&x->wait.lock, flags);
+	raw_spin_unlock_irqrestore(&x->wait.lock, flags);
 	return true;
 }
 EXPORT_SYMBOL(completion_done);

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] timekeeping: Split jiffies seqlock
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     e5d4d1756b07d9490a0269a9e68c1e05ee1feb9b
Gitweb:        https://git.kernel.org/tip/e5d4d1756b07d9490a0269a9e68c1e05ee1feb9b
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:58 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:23 +01:00

timekeeping: Split jiffies seqlock

seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.120587764@linutronix.de
---
 kernel/time/jiffies.c     |  7 ++++---
 kernel/time/tick-common.c | 10 ++++++----
 kernel/time/tick-sched.c  | 19 ++++++++++++-------
 kernel/time/timekeeping.c |  6 ++++--
 kernel/time/timekeeping.h |  3 ++-
 5 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index d23b434..eddcf49 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -58,7 +58,8 @@ static struct clocksource clocksource_jiffies = {
 	.max_cycles	= 10,
 };
 
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
+__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock);
+__cacheline_aligned_in_smp seqcount_t jiffies_seq;
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
@@ -67,9 +68,9 @@ u64 get_jiffies_64(void)
 	u64 ret;
 
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		ret = jiffies_64;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	return ret;
 }
 EXPORT_SYMBOL(get_jiffies_64);
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 7e5d352..6c9c342 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -84,13 +84,15 @@ int tick_is_oneshot_available(void)
 static void tick_periodic(int cpu)
 {
 	if (tick_do_timer_cpu == cpu) {
-		write_seqlock(&jiffies_lock);
+		raw_spin_lock(&jiffies_lock);
+		write_seqcount_begin(&jiffies_seq);
 
 		/* Keep track of the next tick event */
 		tick_next_period = ktime_add(tick_next_period, tick_period);
 
 		do_timer(1);
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		update_wall_time();
 	}
 
@@ -162,9 +164,9 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
 		ktime_t next;
 
 		do {
-			seq = read_seqbegin(&jiffies_lock);
+			seq = read_seqcount_begin(&jiffies_seq);
 			next = tick_next_period;
-		} while (read_seqretry(&jiffies_lock, seq));
+		} while (read_seqcount_retry(&jiffies_seq, seq));
 
 		clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT);
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a792d21..4be756b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -65,7 +65,8 @@ static void tick_do_update_jiffies64(ktime_t now)
 		return;
 
 	/* Reevaluate with jiffies_lock held */
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 
 	delta = ktime_sub(now, last_jiffies_update);
 	if (delta >= tick_period) {
@@ -91,10 +92,12 @@ static void tick_do_update_jiffies64(ktime_t now)
 		/* Keep the tick_next_period variable up to date */
 		tick_next_period = ktime_add(last_jiffies_update, tick_period);
 	} else {
-		write_sequnlock(&jiffies_lock);
+		write_seqcount_end(&jiffies_seq);
+		raw_spin_unlock(&jiffies_lock);
 		return;
 	}
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
 
@@ -105,12 +108,14 @@ static ktime_t tick_init_jiffy_update(void)
 {
 	ktime_t period;
 
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	/* Did we start the jiffies update yet ? */
 	if (last_jiffies_update == 0)
 		last_jiffies_update = tick_next_period;
 	period = last_jiffies_update;
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	return period;
 }
 
@@ -676,10 +681,10 @@ static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
 
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
-		seq = read_seqbegin(&jiffies_lock);
+		seq = read_seqcount_begin(&jiffies_seq);
 		basemono = last_jiffies_update;
 		basejiff = jiffies;
-	} while (read_seqretry(&jiffies_lock, seq));
+	} while (read_seqcount_retry(&jiffies_seq, seq));
 	ts->last_jiffies = basejiff;
 	ts->timer_expires_base = basemono;
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ca69290..856280d 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2397,8 +2397,10 @@ EXPORT_SYMBOL(hardpps);
  */
 void xtime_update(unsigned long ticks)
 {
-	write_seqlock(&jiffies_lock);
+	raw_spin_lock(&jiffies_lock);
+	write_seqcount_begin(&jiffies_seq);
 	do_timer(ticks);
-	write_sequnlock(&jiffies_lock);
+	write_seqcount_end(&jiffies_seq);
+	raw_spin_unlock(&jiffies_lock);
 	update_wall_time();
 }
diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h
index 141ab3a..099737f 100644
--- a/kernel/time/timekeeping.h
+++ b/kernel/time/timekeeping.h
@@ -25,7 +25,8 @@ static inline void sched_clock_resume(void) { }
 extern void do_timer(unsigned long ticks);
 extern void update_wall_time(void);
 
-extern seqlock_t jiffies_lock;
+extern raw_spinlock_t jiffies_lock;
+extern seqcount_t jiffies_seq;
 
 #define CS_NAME_LEN	32
 

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] Documentation: Add lock ordering and nesting documentation
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     919e9e6395cfac23b7e71ed88930367459055daf
Gitweb:        https://git.kernel.org/tip/919e9e6395cfac23b7e71ed88930367459055daf
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:57 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:23 +01:00

Documentation: Add lock ordering and nesting documentation

The kernel provides a variety of locking primitives. The nesting of these
lock types and the implications of them on RT enabled kernels is nowhere
documented.

Add initial documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.026561244@linutronix.de
---
 Documentation/locking/index.rst     |   1 +-
 Documentation/locking/locktypes.rst | 299 +++++++++++++++++++++++++++-
 2 files changed, 300 insertions(+)
 create mode 100644 Documentation/locking/locktypes.rst

diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index 626a463..5d6800a 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -7,6 +7,7 @@ locking
 .. toctree::
     :maxdepth: 1
 
+    locktypes
     lockdep-design
     lockstat
     locktorture
diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
new file mode 100644
index 0000000..f0aa911
--- /dev/null
+++ b/Documentation/locking/locktypes.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _kernel_hacking_locktypes:
+
+==========================
+Lock types and their rules
+==========================
+
+Introduction
+============
+
+The kernel provides a variety of locking primitives which can be divided
+into two categories:
+
+ - Sleeping locks
+ - Spinning locks
+
+This document conceptually describes these lock types and provides rules
+for their nesting, including the rules for use under PREEMPT_RT.
+
+
+Lock categories
+===============
+
+Sleeping locks
+--------------
+
+Sleeping locks can only be acquired in preemptible task context.
+
+Although implementations allow try_lock() from other contexts, it is
+necessary to carefully evaluate the safety of unlock() as well as of
+try_lock().  Furthermore, it is also necessary to evaluate the debugging
+versions of these primitives.  In short, don't acquire sleeping locks from
+other contexts unless there is no other option.
+
+Sleeping lock types:
+
+ - mutex
+ - rt_mutex
+ - semaphore
+ - rw_semaphore
+ - ww_mutex
+ - percpu_rw_semaphore
+
+On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks
+--------------
+
+ - raw_spinlock_t
+ - bit spinlocks
+
+On non-PREEMPT_RT kernels, these lock types are also spinning locks:
+
+ - spinlock_t
+ - rwlock_t
+
+Spinning locks implicitly disable preemption and the lock / unlock functions
+can have suffixes which apply further protections:
+
+ ===================  ====================================================
+ _bh()                Disable / enable bottom halves (soft interrupts)
+ _irq()               Disable / enable interrupts
+ _irqsave/restore()   Save and disable / restore interrupt disabled state
+ ===================  ====================================================
+
+
+rtmutex
+=======
+
+RT-mutexes are mutexes with support for priority inheritance (PI).
+
+PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+interrupt disabled sections.
+
+PI clearly cannot preempt preemption-disabled or interrupt-disabled
+regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
+execute most such regions of code in preemptible task context, especially
+interrupt handlers and soft interrupts.  This conversion allows spinlock_t
+and rwlock_t to be implemented via RT-mutexes.
+
+
+raw_spinlock_t and spinlock_t
+=============================
+
+raw_spinlock_t
+--------------
+
+raw_spinlock_t is a strict spinning lock implementation regardless of the
+kernel configuration including PREEMPT_RT enabled kernels.
+
+raw_spinlock_t is a strict spinning lock implementation in all kernels,
+including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
+core code, low level interrupt handling and places where disabling
+preemption or interrupts is required, for example, to safely access
+hardware state.  raw_spinlock_t can sometimes also be used when the
+critical section is tiny, thus avoiding RT-mutex overhead.
+
+spinlock_t
+----------
+
+The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+
+On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+and has exactly the same semantics.
+
+spinlock_t and PREEMPT_RT
+-------------------------
+
+On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+implementation based on rt_mutex which changes the semantics:
+
+ - Preemption is not disabled
+
+ - The hard interrupt related suffixes for spin_lock / spin_unlock
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
+   interrupt disabled state
+
+ - The soft interrupt related suffix (_bh()) still disables softirq
+   handlers.
+
+   Non-PREEMPT_RT kernels disable preemption to get this effect.
+
+   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
+   preemption disabled. The lock disables softirq handlers and also
+   prevents reentrancy due to task preemption.
+
+PREEMPT_RT kernels preserve all other spinlock_t semantics:
+
+ - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
+   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
+   disable migration, which ensures that pointers to per-CPU variables
+   remain valid even if the task is preempted.
+
+ - Task state is preserved across spinlock acquisition, ensuring that the
+   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
+   kernels leave task state untouched.  However, PREEMPT_RT must change
+   task state if the task blocks during acquisition.  Therefore, it saves
+   the current task state before blocking and the corresponding lock wakeup
+   restores it.
+
+   Other types of wakeups would normally unconditionally set the task state
+   to RUNNING, but that does not work here because the task must remain
+   blocked until the lock becomes available.  Therefore, when a non-lock
+   wakeup attempts to awaken a task blocked waiting for a spinlock, it
+   instead sets the saved state to RUNNING.  Then, when the lock
+   acquisition completes, the lock wakeup sets the task state to the saved
+   state, in this case setting it to RUNNING.
+
+rwlock_t
+========
+
+rwlock_t is a multiple readers and single writer lock mechanism.
+
+Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
+suffix rules of spinlock_t apply accordingly. The implementation is fair,
+thus preventing writer starvation.
+
+rwlock_t and PREEMPT_RT
+-----------------------
+
+PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
+implementation, thus changing semantics:
+
+ - All the spinlock_t changes also apply to rwlock_t.
+
+ - Because an rwlock_t writer cannot grant its priority to multiple
+   readers, a preempted low-priority reader will continue holding its lock,
+   thus starving even high-priority writers.  In contrast, because readers
+   can grant their priority to a writer, a preempted low-priority writer
+   will have its priority boosted until it releases the lock, thus
+   preventing that writer from starving readers.
+
+
+PREEMPT_RT caveats
+==================
+
+spinlock_t and rwlock_t
+-----------------------
+
+These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
+have a few implications.  For example, on a non-PREEMPT_RT kernel the
+following code sequence works as expected::
+
+   local_irq_disable();
+   spin_lock(&lock);
+
+and is fully equivalent to::
+
+   spin_lock_irq(&lock);
+
+Same applies to rwlock_t and the _irqsave() suffix variants.
+
+On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
+fully preemptible context.  Instead, use spin_lock_irq() or
+spin_lock_irqsave() and their unlock counterparts.  In cases where the
+interrupt disabling and locking must remain separate, PREEMPT_RT offers a
+local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
+allowing things like per-CPU irq-disabled locks to be acquired.  However,
+this approach should be used only where absolutely necessary.
+
+
+raw_spinlock_t
+--------------
+
+Acquiring a raw_spinlock_t disables preemption and possibly also
+interrupts, so the critical section must avoid acquiring a regular
+spinlock_t or rwlock_t, for example, the critical section must avoid
+allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
+works perfectly::
+
+  raw_spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+But this code fails on PREEMPT_RT kernels because the memory allocator is
+fully preemptible and therefore cannot be invoked from truly atomic
+contexts.  However, it is perfectly fine to invoke the memory allocator
+while holding normal non-raw spinlocks because they do not disable
+preemption on PREEMPT_RT kernels::
+
+  spin_lock(&lock);
+  p = kmalloc(sizeof(*p), GFP_ATOMIC);
+
+
+bit spinlocks
+-------------
+
+Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
+substituted by an RT-mutex based implementation for obvious reasons.
+
+The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
+caveats vs. raw_spinlock_t apply.
+
+Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
+this requires conditional (#ifdef'ed) code changes at the usage site while
+the spinlock_t substitution is simply done by the compiler and the
+conditionals are restricted to header files and core implementation of the
+locking primitives and the usage sites do not require any changes.
+
+
+Lock type nesting rules
+=======================
+
+The most basic rules are:
+
+  - Lock types of the same lock category (sleeping, spinning) can nest
+    arbitrarily as long as they respect the general lock ordering rules to
+    prevent deadlocks.
+
+  - Sleeping lock types cannot nest inside spinning lock types.
+
+  - Spinning lock types can nest inside sleeping lock types.
+
+These rules apply in general independent of CONFIG_PREEMPT_RT.
+
+As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
+spinning to sleeping this has obviously restrictions how they can nest with
+raw_spinlock_t.
+
+This results in the following nest ordering:
+
+  1) Sleeping locks
+  2) spinlock_t and rwlock_t
+  3) raw_spinlock_t and bit spinlocks
+
+Lockdep is aware of these constraints to ensure that they are respected.
+
+
+Owner semantics
+===============
+
+Most lock types in the Linux kernel have strict owner semantics, i.e. the
+context (task) which acquires a lock has to release it.
+
+There are two exceptions:
+
+  - semaphores
+  - rwsems
+
+semaphores have no owner semantics for historical reason, and as such
+trylock and release operations can be called from any context. They are
+often used for both serialization and waiting purposes. That's generally
+discouraged and should be replaced by separate serialization and wait
+mechanisms, such as mutexes and completions.
+
+rwsems have grown interfaces which allow non owner release for special
+purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
+substitutes all locking primitives except semaphores with RT-mutex based
+implementations to provide priority inheritance for all lock types except
+the truly spinning ones. Priority inheritance on ownerless locks is
+obviously impossible.
+
+For now the rwsem non-owner release excludes code which utilizes it from
+being used on PREEMPT_RT enabled kernels. In same cases this can be
+mitigated by disabling portions of the code, in other cases the complete
+functionality has to be disabled until a workable solution has been found.

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] powerpc/ps3: Convert half completion to rcuwait
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (3 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Peter Zijlstra (Intel)
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Peter Zijlstra (Intel) @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Thomas Gleixner, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     e21fee5368f46e211bc1f3cf118f2b122d644132
Gitweb:        https://git.kernel.org/tip/e21fee5368f46e211bc1f3cf118f2b122d644132
Author:        Peter Zijlstra (Intel) <peterz@infradead.org>
AuthorDate:    Sat, 21 Mar 2020 12:25:56 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:22 +01:00

powerpc/ps3: Convert half completion to rcuwait

The PS3 notification interrupt and kthread use a hacked up completion to
communicate. Since we're wanting to change the completion implementation and
this is abuse anyway, replace it with a simple rcuwait since there is only ever
the one waiter.

AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
cannot receive signals by default and this one doesn't look different. Use
TASK_IDLE instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.930037873@linutronix.de
---
 arch/powerpc/platforms/ps3/device-init.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/device-init.c b/arch/powerpc/platforms/ps3/device-init.c
index 2735ec9..e87360a 100644
--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/reboot.h>
+#include <linux/rcuwait.h>
 
 #include <asm/firmware.h>
 #include <asm/lv1call.h>
@@ -670,7 +671,8 @@ struct ps3_notification_device {
 	spinlock_t lock;
 	u64 tag;
 	u64 lv1_status;
-	struct completion done;
+	struct rcuwait wait;
+	bool done;
 };
 
 enum ps3_notify_type {
@@ -712,7 +714,8 @@ static irqreturn_t ps3_notification_interrupt(int irq, void *data)
 		pr_debug("%s:%u: completed, status 0x%llx\n", __func__,
 			 __LINE__, status);
 		dev->lv1_status = status;
-		complete(&dev->done);
+		dev->done = true;
+		rcuwait_wake_up(&dev->wait);
 	}
 	spin_unlock(&dev->lock);
 	return IRQ_HANDLED;
@@ -725,12 +728,12 @@ static int ps3_notification_read_write(struct ps3_notification_device *dev,
 	unsigned long flags;
 	int res;
 
-	init_completion(&dev->done);
 	spin_lock_irqsave(&dev->lock, flags);
 	res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 					&dev->tag)
 		    : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
 				       &dev->tag);
+	dev->done = false;
 	spin_unlock_irqrestore(&dev->lock, flags);
 	if (res) {
 		pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
@@ -738,14 +741,10 @@ static int ps3_notification_read_write(struct ps3_notification_device *dev,
 	}
 	pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
 
-	res = wait_event_interruptible(dev->done.wait,
-				       dev->done.done || kthread_should_stop());
+	rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), TASK_IDLE);
+
 	if (kthread_should_stop())
 		res = -EINTR;
-	if (res) {
-		pr_debug("%s:%u: interrupted %s\n", __func__, __LINE__, op);
-		return res;
-	}
 
 	if (dev->lv1_status) {
 		pr_err("%s:%u: %s not completed, status 0x%llx\n", __func__,
@@ -810,6 +809,7 @@ static int ps3_probe_thread(void *data)
 	}
 
 	spin_lock_init(&dev.lock);
+	rcuwait_init(&dev.wait);
 
 	res = request_irq(irq, ps3_notification_interrupt, 0,
 			  "ps3_notification", &dev);

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] microblaze: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (3 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kbuild test robot, Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     d964ea7014a9d0d6312ccd5f47088a792371ad0b
Gitweb:        https://git.kernel.org/tip/d964ea7014a9d0d6312ccd5f47088a792371ad0b
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:54 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:22 +01:00

microblaze: Remove mm.h from asm/uaccess.h

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/microblaze/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.719022171@linutronix.de
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h b/arch/microblaze/include/asm/uaccess.h
index a1f206b..4916d5f 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include <linux/kernel.h>
-#include <linux/mm.h>
 
 #include <asm/mmu.h>
 #include <asm/page.h>

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] rcuwait: Add @state argument to rcuwait_wait_event()
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Peter Zijlstra (Intel)
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Peter Zijlstra (Intel) @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Thomas Gleixner, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     80fbaf1c3f2926c834f8ca915441dfe27ce5487e
Gitweb:        https://git.kernel.org/tip/80fbaf1c3f2926c834f8ca915441dfe27ce5487e
Author:        Peter Zijlstra (Intel) <peterz@infradead.org>
AuthorDate:    Sat, 21 Mar 2020 12:25:55 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:22 +01:00

rcuwait: Add @state argument to rcuwait_wait_event()

Extend rcuwait_wait_event() with a state variable so that it is not
restricted to UNINTERRUPTIBLE waits.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.824030968@linutronix.de
---
 include/linux/rcuwait.h       | 12 ++++++++++--
 kernel/locking/percpu-rwsem.c |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
index 75c97e4..2ffe1ee 100644
--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -3,6 +3,7 @@
 #define _LINUX_RCUWAIT_H_
 
 #include <linux/rcupdate.h>
+#include <linux/sched/signal.h>
 
 /*
  * rcuwait provides a way of blocking and waking up a single
@@ -30,23 +31,30 @@ extern void rcuwait_wake_up(struct rcuwait *w);
  * The caller is responsible for locking around rcuwait_wait_event(),
  * such that writes to @task are properly serialized.
  */
-#define rcuwait_wait_event(w, condition)				\
+#define rcuwait_wait_event(w, condition, state)				\
 ({									\
+	int __ret = 0;							\
 	rcu_assign_pointer((w)->task, current);				\
 	for (;;) {							\
 		/*							\
 		 * Implicit barrier (A) pairs with (B) in		\
 		 * rcuwait_wake_up().					\
 		 */							\
-		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		set_current_state(state);				\
 		if (condition)						\
 			break;						\
 									\
+		if (signal_pending_state(state, current)) {		\
+			__ret = -EINTR;					\
+			break;						\
+		}							\
+									\
 		schedule();						\
 	}								\
 									\
 	WRITE_ONCE((w)->task, NULL);					\
 	__set_current_state(TASK_RUNNING);				\
+	__ret;								\
 })
 
 #endif /* _LINUX_RCUWAIT_H_ */
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index 183a3aa..a008a1b 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -234,7 +234,7 @@ void percpu_down_write(struct percpu_rw_semaphore *sem)
 	 */
 
 	/* Wait for all active readers to complete. */
-	rcuwait_wait_event(&sem->writer, readers_active_check(sem));
+	rcuwait_wait_event(&sem->writer, readers_active_check(sem), TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] hexagon: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kbuild test robot, Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     3f332aa0a7659789984f05f691a18df23b961fee
Gitweb:        https://git.kernel.org/tip/3f332aa0a7659789984f05f691a18df23b961fee
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:52 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:21 +01:00

hexagon: Remove mm.h from asm/uaccess.h

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/hexagon/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.531525286@linutronix.de
---
 arch/hexagon/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/uaccess.h b/arch/hexagon/include/asm/uaccess.h
index 00cb38f..c1019a7 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -10,7 +10,6 @@
 /*
  * User space memory access functions
  */
-#include <linux/mm.h>
 #include <asm/sections.h>
 
 /*

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] ia64: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kbuild test robot, Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     6f28b46c4f93b4b4632e8f598c4f796244851a58
Gitweb:        https://git.kernel.org/tip/6f28b46c4f93b4b4632e8f598c4f796244851a58
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:53 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:22 +01:00

ia64: Remove mm.h from asm/uaccess.h

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/ia64/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.624070289@linutronix.de
---
 arch/ia64/include/asm/uaccess.h | 1 -
 arch/ia64/kernel/process.c      | 1 +
 arch/ia64/mm/ioremap.c          | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 89782ad..5c7e79e 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -35,7 +35,6 @@
 
 #include <linux/compiler.h>
 #include <linux/page-flags.h>
-#include <linux/mm.h>
 
 #include <asm/intrinsics.h>
 #include <asm/pgtable.h>
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 968b5f3..743aaf5 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -681,3 +681,4 @@ machine_power_off (void)
 	machine_halt();
 }
 
+EXPORT_SYMBOL(ia64_delay_loop);
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index a09cfa0..55fd3eb 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/efi.h>
 #include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/vmalloc.h>
 #include <asm/io.h>
 #include <asm/meminit.h>

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] nds32: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kbuild test robot, Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     43ea9d1a533a08dea4a2a9312d6aa2583313ef23
Gitweb:        https://git.kernel.org/tip/43ea9d1a533a08dea4a2a9312d6aa2583313ef23
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:50 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:21 +01:00

nds32: Remove mm.h from asm/uaccess.h

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/nds32/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.339289758@linutronix.de
---
 arch/nds32/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 8916ad9..3a9219f 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <asm/errno.h>
 #include <asm/memory.h>
 #include <asm/types.h>
-#include <linux/mm.h>
 
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] csky: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kbuild test robot, Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     c5eedbae2f2b5d340e4f69d25b03ac5878e0efc7
Gitweb:        https://git.kernel.org/tip/c5eedbae2f2b5d340e4f69d25b03ac5878e0efc7
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:51 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:21 +01:00

csky: Remove mm.h from asm/uaccess.h

The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC      kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|                  from include/linux/mm.h:567,
|                  from arch/csky/include/asm/uaccess.h:,
|                  from include/linux/uaccess.h:11,
|                  from include/linux/sched/task.h:11,
|                  from include/linux/sched/signal.h:9,
|                  from include/linux/rcuwait.h:6,
|                  from include/linux/percpu-rwsem.h:8,
|                  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113241.434999165@linutronix.de
---
 arch/csky/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c34..abefa12 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/sched.h>
-#include <linux/mm.h>
 #include <linux/string.h>
 #include <linux/version.h>
 #include <asm/segment.h>

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] orinoco_usb: Use the regular completion interfaces
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Greg Kroah-Hartman, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     9fe114ce0371223d2a0490f0aa52b8f108d92f37
Gitweb:        https://git.kernel.org/tip/9fe114ce0371223d2a0490f0aa52b8f108d92f37
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:48 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:20 +01:00

orinoco_usb: Use the regular completion interfaces

The completion usage in this driver is interesting:

  - it uses a magic complete function which according to the comment was
    implemented by invoking complete() four times in a row because
    complete_all() was not exported at that time.

  - it uses an open coded wait/poll which checks completion:done. Only one wait
    side (device removal) uses the regular wait_for_completion() interface.

The rationale behind this is to prevent that wait_for_completion() consumes
completion::done which would prevent that all waiters are woken. This is not
necessary with complete_all() as that sets completion::done to UINT_MAX which
is left unmodified by the woken waiters.

Replace the magic complete function with complete_all() and convert the
open coded wait/poll to regular completion interfaces.

This changes the wait to exclusive wait mode. But that does not make any
difference because the wakers use complete_all() which ignores the
exclusive mode.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200321113241.150783464@linutronix.de
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 21 ++----------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
index e753f43..0e42de2 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -365,17 +365,6 @@ static struct request_context *ezusb_alloc_ctx(struct ezusb_priv *upriv,
 	return ctx;
 }
 
-
-/* Hopefully the real complete_all will soon be exported, in the mean
- * while this should work. */
-static inline void ezusb_complete_all(struct completion *comp)
-{
-	complete(comp);
-	complete(comp);
-	complete(comp);
-	complete(comp);
-}
-
 static void ezusb_ctx_complete(struct request_context *ctx)
 {
 	struct ezusb_priv *upriv = ctx->upriv;
@@ -409,7 +398,7 @@ static void ezusb_ctx_complete(struct request_context *ctx)
 
 			netif_wake_queue(dev);
 		}
-		ezusb_complete_all(&ctx->done);
+		complete_all(&ctx->done);
 		ezusb_request_context_put(ctx);
 		break;
 
@@ -419,7 +408,7 @@ static void ezusb_ctx_complete(struct request_context *ctx)
 			/* This is normal, as all request contexts get flushed
 			 * when the device is disconnected */
 			err("Called, CTX not terminating, but device gone");
-			ezusb_complete_all(&ctx->done);
+			complete_all(&ctx->done);
 			ezusb_request_context_put(ctx);
 			break;
 		}
@@ -690,11 +679,11 @@ static void ezusb_req_ctx_wait(struct ezusb_priv *upriv,
 			 * get the chance to run themselves. So we make sure
 			 * that we don't sleep for ever */
 			int msecs = DEF_TIMEOUT * (1000 / HZ);
-			while (!ctx->done.done && msecs--)
+
+			while (!try_wait_for_completion(&ctx->done) && msecs--)
 				udelay(1000);
 		} else {
-			wait_event_interruptible(ctx->done.wait,
-						 ctx->done.done);
+			wait_for_completion(&ctx->done);
 		}
 		break;
 	default:

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] acpi: Remove header dependency
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (3 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Peter Zijlstra
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Thomas Gleixner, Andy Shevchenko, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     df23e2be3d240b67222375062ce873f5ec84854d
Gitweb:        https://git.kernel.org/tip/df23e2be3d240b67222375062ce873f5ec84854d
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Sat, 21 Mar 2020 12:25:49 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:21 +01:00

acpi: Remove header dependency

In order to avoid future header hell, remove the inclusion of
proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
struct.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lkml.kernel.org/r/20200321113241.246190285@linutronix.de
---
 drivers/platform/x86/dell-smo8800.c                      | 1 +
 drivers/platform/x86/wmi.c                               | 1 +
 drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c | 1 +
 include/acpi/acpi_bus.h                                  | 2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/x86/dell-smo8800.c b/drivers/platform/x86/dell-smo8800.c
index bfcc1d1..b531fe8 100644
--- a/drivers/platform/x86/dell-smo8800.c
+++ b/drivers/platform/x86/dell-smo8800.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/miscdevice.h>
 #include <linux/uaccess.h>
+#include <linux/fs.h>
 
 struct smo8800_device {
 	u32 irq;                     /* acpi device irq */
diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c
index dc2e966..941739d 100644
--- a/drivers/platform/x86/wmi.c
+++ b/drivers/platform/x86/wmi.c
@@ -29,6 +29,7 @@
 #include <linux/uaccess.h>
 #include <linux/uuid.h>
 #include <linux/wmi.h>
+#include <linux/fs.h>
 #include <uapi/linux/wmi.h>
 
 ACPI_MODULE_NAME("wmi");
diff --git a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
index 7130e90..a478cff 100644
--- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
+++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
@@ -19,6 +19,7 @@
 #include <linux/acpi.h>
 #include <linux/uaccess.h>
 #include <linux/miscdevice.h>
+#include <linux/fs.h>
 #include "acpi_thermal_rel.h"
 
 static acpi_handle acpi_thermal_rel_handle;
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 0c23fd0..a92bea7 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, const char *uid, s64 hrv);
 
 #ifdef CONFIG_ACPI
 
-#include <linux/proc_fs.h>
+struct proc_dir_entry;
 
 #define ACPI_BUS_FILE_ROOT	"acpi"
 extern struct proc_dir_entry *acpi_root_dir;

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] usb: gadget: Use completion interface instead of open coding it
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Greg Kroah-Hartman, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     c1d51dd505577b189bf33867a9c20015ca7efb46
Gitweb:        https://git.kernel.org/tip/c1d51dd505577b189bf33867a9c20015ca7efb46
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:47 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:20 +01:00

usb: gadget: Use completion interface instead of open coding it

ep_io() uses a completion on stack and open codes the waiting with:

  wait_event_interruptible (done.wait, done.done);
and
  wait_event (done.wait, done.done);

This waits in non-exclusive mode for complete(), but there is no reason to
do so because the completion can only be waited for by the task itself and
complete() wakes exactly one exlusive waiter.

Replace the open coded implementation with the corresponding
wait_for_completion*() functions.

No functional change.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lkml.kernel.org/r/20200321113241.043380271@linutronix.de
---
 drivers/usb/gadget/legacy/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/legacy/inode.c b/drivers/usb/gadget/legacy/inode.c
index b47938d..4c3aff9 100644
--- a/drivers/usb/gadget/legacy/inode.c
+++ b/drivers/usb/gadget/legacy/inode.c
@@ -344,7 +344,7 @@ ep_io (struct ep_data *epdata, void *buf, unsigned len)
 	spin_unlock_irq (&epdata->dev->lock);
 
 	if (likely (value == 0)) {
-		value = wait_event_interruptible (done.wait, done.done);
+		value = wait_for_completion_interruptible(&done);
 		if (value != 0) {
 			spin_lock_irq (&epdata->dev->lock);
 			if (likely (epdata->ep != NULL)) {
@@ -353,7 +353,7 @@ ep_io (struct ep_data *epdata, void *buf, unsigned len)
 				usb_ep_dequeue (epdata->ep, epdata->req);
 				spin_unlock_irq (&epdata->dev->lock);
 
-				wait_event (done.wait, done.done);
+				wait_for_completion(&done);
 				if (epdata->status == -ECONNRESET)
 					epdata->status = -EINTR;
 			} else {

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] pci/switchtec: Replace completion wait queue usage for poll
  2020-03-21 11:25   ` Thomas Gleixner
                     ` (2 preceding siblings ...)
  (?)
@ 2020-03-21 15:53   ` tip-bot2 for Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2020-03-21 15:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	Logan Gunthorpe, Bjorn Helgaas, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     deaa0a8a74d86573f190e21ae9a444ea5e3bceee
Gitweb:        https://git.kernel.org/tip/deaa0a8a74d86573f190e21ae9a444ea5e3bceee
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Sat, 21 Mar 2020 12:25:46 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 21 Mar 2020 16:00:20 +01:00

pci/switchtec: Replace completion wait queue usage for poll

The poll callback is using the completion wait queue and sticks it into
poll_wait() to wake up pollers after a command has completed.

This works to some extent, but cannot provide EPOLLEXCLUSIVE support
because the waker side uses complete_all() which unconditionally wakes up
all waiters. complete_all() is required because completions internally use
exclusive wait and complete() only wakes up one waiter by default.

This mixes conceptually different mechanisms and relies on internal
implementation details of completions, which in turn puts contraints on
changing the internal implementation of completions.

Replace it with a regular wait queue and store the state in struct
switchtec_user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113240.936097534@linutronix.de
---
 drivers/pci/switch/switchtec.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index 81dc7ac..e69cac8 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -52,10 +52,11 @@ struct switchtec_user {
 
 	enum mrpc_state state;
 
-	struct completion comp;
+	wait_queue_head_t cmd_comp;
 	struct kref kref;
 	struct list_head list;
 
+	bool cmd_done;
 	u32 cmd;
 	u32 status;
 	u32 return_code;
@@ -77,7 +78,7 @@ static struct switchtec_user *stuser_create(struct switchtec_dev *stdev)
 	stuser->stdev = stdev;
 	kref_init(&stuser->kref);
 	INIT_LIST_HEAD(&stuser->list);
-	init_completion(&stuser->comp);
+	init_waitqueue_head(&stuser->cmd_comp);
 	stuser->event_cnt = atomic_read(&stdev->event_cnt);
 
 	dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser);
@@ -175,7 +176,7 @@ static int mrpc_queue_cmd(struct switchtec_user *stuser)
 	kref_get(&stuser->kref);
 	stuser->read_len = sizeof(stuser->data);
 	stuser_set_state(stuser, MRPC_QUEUED);
-	reinit_completion(&stuser->comp);
+	stuser->cmd_done = false;
 	list_add_tail(&stuser->list, &stdev->mrpc_queue);
 
 	mrpc_cmd_submit(stdev);
@@ -222,7 +223,8 @@ static void mrpc_complete_cmd(struct switchtec_dev *stdev)
 		memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data,
 			      stuser->read_len);
 out:
-	complete_all(&stuser->comp);
+	stuser->cmd_done = true;
+	wake_up_interruptible(&stuser->cmd_comp);
 	list_del_init(&stuser->list);
 	stuser_put(stuser);
 	stdev->mrpc_busy = 0;
@@ -529,10 +531,11 @@ static ssize_t switchtec_dev_read(struct file *filp, char __user *data,
 	mutex_unlock(&stdev->mrpc_mutex);
 
 	if (filp->f_flags & O_NONBLOCK) {
-		if (!try_wait_for_completion(&stuser->comp))
+		if (!stuser->cmd_done)
 			return -EAGAIN;
 	} else {
-		rc = wait_for_completion_interruptible(&stuser->comp);
+		rc = wait_event_interruptible(stuser->cmd_comp,
+					      stuser->cmd_done);
 		if (rc < 0)
 			return rc;
 	}
@@ -580,7 +583,7 @@ static __poll_t switchtec_dev_poll(struct file *filp, poll_table *wait)
 	struct switchtec_dev *stdev = stuser->stdev;
 	__poll_t ret = 0;
 
-	poll_wait(filp, &stuser->comp.wait, wait);
+	poll_wait(filp, &stuser->cmd_comp, wait);
 	poll_wait(filp, &stdev->event_wq, wait);
 
 	if (lock_mutex_and_test_alive(stdev))
@@ -588,7 +591,7 @@ static __poll_t switchtec_dev_poll(struct file *filp, poll_table *wait)
 
 	mutex_unlock(&stdev->mrpc_mutex);
 
-	if (try_wait_for_completion(&stuser->comp))
+	if (stuser->cmd_done)
 		ret |= EPOLLIN | EPOLLRDNORM;
 
 	if (stuser->event_cnt != atomic_read(&stdev->event_cnt))
@@ -1272,7 +1275,8 @@ static void stdev_kill(struct switchtec_dev *stdev)
 
 	/* Wake up and kill any users waiting on an MRPC request */
 	list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) {
-		complete_all(&stuser->comp);
+		stuser->cmd_done = true;
+		wake_up_interruptible(&stuser->cmd_comp);
 		list_del_init(&stuser->list);
 		stuser_put(stuser);
 	}

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* Re: [tip: locking/core] lockdep: Annotate irq_work
  2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
@ 2020-03-21 16:40     ` Frederic Weisbecker
  2020-03-21 18:12       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 195+ messages in thread
From: Frederic Weisbecker @ 2020-03-21 16:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86

On Sat, Mar 21, 2020 at 03:53:45PM -0000, tip-bot2 for Sebastian Andrzej Siewior wrote:
> The following commit has been merged into the locking/core branch of tip:
> 
> Commit-ID:     49915ac35ca7b07c54295a72d905be5064afb89e
> Gitweb:        https://git.kernel.org/tip/49915ac35ca7b07c54295a72d905be5064afb89e
> Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> AuthorDate:    Sat, 21 Mar 2020 12:26:03 +01:00
> Committer:     Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Sat, 21 Mar 2020 16:00:24 +01:00
> 
> lockdep: Annotate irq_work
> 
> Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
> hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
> invoked in softirq context on PREEMPT_RT.
> 
> Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
> context so lockdep knows that these can safely acquire a spinlock_t.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20200321113242.643576700@linutronix.de
> ---
>  include/linux/irq_work.h |  2 ++
>  include/linux/irqflags.h | 13 +++++++++++++
>  kernel/irq_work.c        |  2 ++
>  kernel/rcu/tree.c        |  1 +
>  kernel/time/tick-sched.c |  1 +
>  5 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index 02da997..3b752e8 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -18,6 +18,8 @@
>  
>  /* Doesn't want IPI, wait for tick: */
>  #define IRQ_WORK_LAZY		BIT(2)
> +/* Run hard IRQ context, even on RT */
> +#define IRQ_WORK_HARD_IRQ	BIT(3)
>  
>  #define IRQ_WORK_CLAIMED	(IRQ_WORK_PENDING | IRQ_WORK_BUSY)
>  
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 9c17f9c..f23f540 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -69,6 +69,17 @@ do {						\
>  			current->irq_config = 0;	\
>  	  } while (0)
>  
> +# define lockdep_irq_work_enter(__work)					\
> +	  do {								\
> +		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
> +			current->irq_config = 1;			\

So, irq_config == 1 means we are in a softirq? Are there other values for
irq_config? In which case there should be enums or something?
I can't find the patch that describes this.

> +	  } while (0)
> +# define lockdep_irq_work_exit(__work)					\
> +	  do {								\
> +		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
> +			current->irq_config = 0;			\
> +	  } while (0)
> +
>  #else
>  # define trace_hardirqs_on()		do { } while (0)
>  # define trace_hardirqs_off()		do { } while (0)
> @@ -83,6 +94,8 @@ do {						\
>  # define lockdep_softirq_exit()		do { } while (0)
>  # define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
>  # define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
> +# define lockdep_irq_work_enter(__work)		do { } while (0)
> +# define lockdep_irq_work_exit(__work)		do { } while (0)
>  #endif
>  
>  #if defined(CONFIG_IRQSOFF_TRACER) || \
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 828cc30..48b5d1b 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -153,7 +153,9 @@ static void irq_work_run_list(struct llist_head *list)
>  		 */
>  		flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
>  
> +		lockdep_irq_work_enter(work);
>  		work->func(work);
> +		lockdep_irq_work_exit(work);
>  		/*
>  		 * Clear the BUSY bit and return to the free state if
>  		 * no-one else claimed it meanwhile.
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d91c915..5066d1d 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1113,6 +1113,7 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
>  		    !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq &&
>  		    (rnp->ffmask & rdp->grpmask)) {
>  			init_irq_work(&rdp->rcu_iw, rcu_iw_handler);
> +			atomic_set(&rdp->rcu_iw.flags, IRQ_WORK_HARD_IRQ);
>  			rdp->rcu_iw_pending = true;
>  			rdp->rcu_iw_gp_seq = rnp->gp_seq;
>  			irq_work_queue_on(&rdp->rcu_iw, rdp->cpu);
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 4be756b..3e2dc9b 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct irq_work *work)
>  
>  static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
>  	.func = nohz_full_kick_func,
> +	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
>  };

I get why these need to be in hardirq but some basic explanations for
ordinary mortals as to why those two specifically and not all the others
(and there are many) would have been nice.

Thanks.

>  
>  /*

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [tip: locking/core] lockdep: Add hrtimer context tracing bits
  2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
@ 2020-03-21 16:46     ` Frederic Weisbecker
  0 siblings, 0 replies; 195+ messages in thread
From: Frederic Weisbecker @ 2020-03-21 16:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Sebastian Andrzej Siewior, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86

On Sat, Mar 21, 2020 at 03:53:45PM -0000, tip-bot2 for Sebastian Andrzej Siewior wrote:
> The following commit has been merged into the locking/core branch of tip:
> 
> Commit-ID:     40db173965c05a1d803451240ed41707d5bd978d
> Gitweb:        https://git.kernel.org/tip/40db173965c05a1d803451240ed41707d5bd978d
> Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> AuthorDate:    Sat, 21 Mar 2020 12:26:02 +01:00
> Committer:     Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Sat, 21 Mar 2020 16:00:24 +01:00
> 
> lockdep: Add hrtimer context tracing bits
> 
> Set current->irq_config = 1 for hrtimers which are not marked to expire in
> hard interrupt context during hrtimer_init(). These timers will expire in
> softirq context on PREEMPT_RT.
> 
> Setting this allows lockdep to differentiate these timers. If a timer is
> marked to expire in hard interrupt context then the timer callback is not
> supposed to acquire a regular spinlock instead of a raw_spinlock in the
> expiry callback.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20200321113242.534508206@linutronix.de
> ---
>  include/linux/irqflags.h | 15 +++++++++++++++
>  include/linux/sched.h    |  1 +
>  kernel/locking/lockdep.c |  2 +-
>  kernel/time/hrtimer.c    |  6 +++++-
>  4 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index fdaf286..9c17f9c 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -56,6 +56,19 @@ do {						\
>  do {						\
>  	current->softirq_context--;		\
>  } while (0)
> +
> +# define lockdep_hrtimer_enter(__hrtimer)		\
> +	  do {						\
> +		  if (!__hrtimer->is_hard)		\
> +			current->irq_config = 1;	\
> +	  } while (0)
> +
> +# define lockdep_hrtimer_exit(__hrtimer)		\
> +	  do {						\
> +		  if (!__hrtimer->is_hard)		\
> +			current->irq_config = 0;	\
> +	  } while (0)
> +
>  #else
>  # define trace_hardirqs_on()		do { } while (0)
>  # define trace_hardirqs_off()		do { } while (0)
> @@ -68,6 +81,8 @@ do {						\
>  # define trace_hardirq_exit()		do { } while (0)
>  # define lockdep_softirq_enter()	do { } while (0)
>  # define lockdep_softirq_exit()		do { } while (0)
> +# define lockdep_hrtimer_enter(__hrtimer)		do { } while (0)
> +# define lockdep_hrtimer_exit(__hrtimer)		do { } while (0)
>  #endif
>  
>  #if defined(CONFIG_IRQSOFF_TRACER) || \
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 4d3b9ec..933914c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -983,6 +983,7 @@ struct task_struct {
>  	unsigned int			softirq_enable_event;
>  	int				softirqs_enabled;
>  	int				softirq_context;
> +	int				irq_config;

There really need to be some explanation/comment/symbols to clarify
what this field is about and the meaning of the values it can take.

Thanks.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
  2020-03-21 11:25 ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-21 17:19   ` Davidlohr Bueso
  -1 siblings, 0 replies; 195+ messages in thread
From: Davidlohr Bueso @ 2020-03-21 17:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

On Sat, 21 Mar 2020, Thomas Gleixner wrote:

>This is the third and hopefully final version of this work. The second one
>can be found here:

Would you rather I send in a separate series with the kvm changes, or
should I just send a v2 with the fixes here again?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 17:19   ` Davidlohr Bueso
  0 siblings, 0 replies; 195+ messages in thread
From: Davidlohr Bueso @ 2020-03-21 17:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

On Sat, 21 Mar 2020, Thomas Gleixner wrote:

>This is the third and hopefully final version of this work. The second one
>can be found here:

Would you rather I send in a separate series with the kvm changes, or
should I just send a v2 with the fixes here again?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 17:19   ` Davidlohr Bueso
  0 siblings, 0 replies; 195+ messages in thread
From: Davidlohr Bueso @ 2020-03-21 17:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, Oleg Nesterov, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, kbuild test robot,
	Brian Cain, linux-acpi, Paul E . McKenney, linux-hexagon,
	Rafael J. Wysocki, linux-csky, Linus Torvalds, Darren Hart,
	Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann, linux-pm,
	linuxppc-dev, Greentime Hu, Bjorn Helgaas, Kurt Schwemmer,
	platform-driver-x86, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, netdev, linux-usb,
	linux-wireless, LKML, Davidlohr Bueso, Greg Kroah-Hartman,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

On Sat, 21 Mar 2020, Thomas Gleixner wrote:

>This is the third and hopefully final version of this work. The second one
>can be found here:

Would you rather I send in a separate series with the kvm changes, or
should I just send a v2 with the fixes here again?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 17:19   ` Davidlohr Bueso
  0 siblings, 0 replies; 195+ messages in thread
From: Davidlohr Bueso @ 2020-03-21 17:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

On Sat, 21 Mar 2020, Thomas Gleixner wrote:

>This is the third and hopefully final version of this work. The second one
>can be found here:

Would you rather I send in a separate series with the kvm changes, or
should I just send a v2 with the fixes here again?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
  2020-03-21 17:19   ` Davidlohr Bueso
  (?)
  (?)
@ 2020-03-21 17:45     ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 17:45 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

Davidlohr Bueso <dave@stgolabs.net> writes:

> On Sat, 21 Mar 2020, Thomas Gleixner wrote:
>
>>This is the third and hopefully final version of this work. The second one
>>can be found here:
>
> Would you rather I send in a separate series with the kvm changes, or
> should I just send a v2 with the fixes here again?

Send a separate series please. These nested threads are hard to follow.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 17:45     ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 17:45 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-drive

Davidlohr Bueso <dave@stgolabs.net> writes:

> On Sat, 21 Mar 2020, Thomas Gleixner wrote:
>
>>This is the third and hopefully final version of this work. The second one
>>can be found here:
>
> Would you rather I send in a separate series with the kvm changes, or
> should I just send a v2 with the fixes here again?

Send a separate series please. These nested threads are hard to follow.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 17:45     ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 17:45 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, Oleg Nesterov, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, kbuild test robot,
	Brian Cain, linux-acpi, Paul E . McKenney, linux-hexagon,
	Rafael J. Wysocki, linux-csky, Linus Torvalds, Darren Hart,
	Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann, linux-pm,
	linuxppc-dev, Greentime Hu, Bjorn Helgaas, Kurt Schwemmer,
	platform-driver-x86, Kalle Valo, Felipe Balbi, Michal Simek,
	Tony Luck, Nick Hu, Geoff Levand, netdev, linux-usb,
	linux-wireless, LKML, Davidlohr Bueso, Greg Kroah-Hartman,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

Davidlohr Bueso <dave@stgolabs.net> writes:

> On Sat, 21 Mar 2020, Thomas Gleixner wrote:
>
>>This is the third and hopefully final version of this work. The second one
>>can be found here:
>
> Would you rather I send in a separate series with the kvm changes, or
> should I just send a v2 with the fixes here again?

Send a separate series please. These nested threads are hard to follow.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 00/20] Lock ordering documentation and annotation for lockdep
@ 2020-03-21 17:45     ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-21 17:45 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Logan Gunthorpe,
	Bjorn Helgaas, Kurt Schwemmer, linux-pci, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

Davidlohr Bueso <dave@stgolabs.net> writes:

> On Sat, 21 Mar 2020, Thomas Gleixner wrote:
>
>>This is the third and hopefully final version of this work. The second one
>>can be found here:
>
> Would you rather I send in a separate series with the kvm changes, or
> should I just send a v2 with the fixes here again?

Send a separate series please. These nested threads are hard to follow.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [tip: locking/core] lockdep: Annotate irq_work
  2020-03-21 16:40     ` Frederic Weisbecker
@ 2020-03-21 18:12       ` Sebastian Andrzej Siewior
  2020-03-22  2:33         ` Frederic Weisbecker
  0 siblings, 1 reply; 195+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-21 18:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, linux-tip-commits, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86

On 2020-03-21 17:40:58 [+0100], Frederic Weisbecker wrote:
> > diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> > index 9c17f9c..f23f540 100644
> > --- a/include/linux/irqflags.h
> > +++ b/include/linux/irqflags.h
> > @@ -69,6 +69,17 @@ do {						\
> >  			current->irq_config = 0;	\
> >  	  } while (0)
> >  
> > +# define lockdep_irq_work_enter(__work)					\
> > +	  do {								\
> > +		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
> > +			current->irq_config = 1;			\
> 
> So, irq_config == 1 means we are in a softirq? Are there other values for
> irq_config? In which case there should be enums or something?
> I can't find the patch that describes this.

0 means as-is, 1 means threaded / sleeping locks are okay.

> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct irq_work *work)
> >  
> >  static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
> >  	.func = nohz_full_kick_func,
> > +	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
> >  };
> 
> I get why these need to be in hardirq but some basic explanations for
> ordinary mortals as to why those two specifically and not all the others
> (and there are many) would have been nice.

Is the documentation patch in this series any good?

> Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [tip: locking/core] lockdep: Annotate irq_work
  2020-03-21 18:12       ` Sebastian Andrzej Siewior
@ 2020-03-22  2:33         ` Frederic Weisbecker
  2020-03-22  2:39           ` Frederic Weisbecker
  2020-03-22 12:27           ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 195+ messages in thread
From: Frederic Weisbecker @ 2020-03-22  2:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, linux-tip-commits, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86

On Sat, Mar 21, 2020 at 07:12:49PM +0100, Sebastian Andrzej Siewior wrote:
> On 2020-03-21 17:40:58 [+0100], Frederic Weisbecker wrote:
> > > diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> > > index 9c17f9c..f23f540 100644
> > > --- a/include/linux/irqflags.h
> > > +++ b/include/linux/irqflags.h
> > > @@ -69,6 +69,17 @@ do {						\
> > >  			current->irq_config = 0;	\
> > >  	  } while (0)
> > >  
> > > +# define lockdep_irq_work_enter(__work)					\
> > > +	  do {								\
> > > +		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
> > > +			current->irq_config = 1;			\
> > 
> > So, irq_config == 1 means we are in a softirq? Are there other values for
> > irq_config? In which case there should be enums or something?
> > I can't find the patch that describes this.
> 
> 0 means as-is, 1 means threaded / sleeping locks are okay.

So that's the kind of comment we need :-)

Also how about current->irq_locking instead?

And something like:

enum {
    IRQ_LOCKING_NO_SLEEP,
    IRQ_LOCKING_CAN_SLEEP
}

> 
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct irq_work *work)
> > >  
> > >  static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
> > >  	.func = nohz_full_kick_func,
> > > +	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
> > >  };
> > 
> > I get why these need to be in hardirq but some basic explanations for
> > ordinary mortals as to why those two specifically and not all the others
> > (and there are many) would have been nice.
> 
> Is the documentation patch in this series any good?

That describes the general rules but it doesn't tell anything about the
details of this patch. Especially why RCU and nohz_full irq works in particular
are special here and why it's fine for others to execute in softirq.

Thanks.

> > Thanks.
> 
> Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [tip: locking/core] lockdep: Annotate irq_work
  2020-03-22  2:33         ` Frederic Weisbecker
@ 2020-03-22  2:39           ` Frederic Weisbecker
  2020-03-22 12:27           ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 195+ messages in thread
From: Frederic Weisbecker @ 2020-03-22  2:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, linux-tip-commits, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86

On Sun, Mar 22, 2020 at 03:33:30AM +0100, Frederic Weisbecker wrote:
> On Sat, Mar 21, 2020 at 07:12:49PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2020-03-21 17:40:58 [+0100], Frederic Weisbecker wrote:
> > > > diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> > > > index 9c17f9c..f23f540 100644
> > > > --- a/include/linux/irqflags.h
> > > > +++ b/include/linux/irqflags.h
> > > > @@ -69,6 +69,17 @@ do {						\
> > > >  			current->irq_config = 0;	\
> > > >  	  } while (0)
> > > >  
> > > > +# define lockdep_irq_work_enter(__work)					\
> > > > +	  do {								\
> > > > +		  if (!(atomic_read(&__work->flags) & IRQ_WORK_HARD_IRQ))\
> > > > +			current->irq_config = 1;			\
> > > 
> > > So, irq_config == 1 means we are in a softirq? Are there other values for
> > > irq_config? In which case there should be enums or something?
> > > I can't find the patch that describes this.
> > 
> > 0 means as-is, 1 means threaded / sleeping locks are okay.
> 
> So that's the kind of comment we need :-)
> 
> Also how about current->irq_locking instead?
> 
> And something like:
> 
> enum {
>     IRQ_LOCKING_NO_SLEEP,
>     IRQ_LOCKING_CAN_SLEEP
> }

Or current->irq_preemptible

enum {
    IRQ_NEVER_PREEMPTIBLE,
    IRQ_MAYBE_PREEMPTIBLE
}


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 05/20] acpi: Remove header dependency
  2020-03-21 11:25   ` Thomas Gleixner
@ 2020-03-22  7:02     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 195+ messages in thread
From: Rafael J. Wysocki @ 2020-03-22  7:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: open list:ULTRA-WIDEBAND (UWB) SUBSYSTEM:,
	linux-ia64, Peter Zijlstra, Linux PCI, Sebastian Siewior,
	Oleg Nesterov, Guo Ren, Joel Fernandes, Vincent Chen,
	Ingo Molnar, Davidlohr Bueso, kbuild test robot, Brian Cain,
	Jonathan Corbet, Paul E . McKenney, linux-hexagon,
	Rafael J. Wysocki, linux-csky, ACPI Devel Maling List,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Randy

On Sat, Mar 21, 2020 at 12:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> In order to avoid future header hell, remove the inclusion of
> proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
> struct.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andy Shevchenko <andy@infradead.org>
> Cc: platform-driver-x86@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: Len Brown <lenb@kernel.org>
> Cc: linux-acpi@vger.kernel.org

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/platform/x86/dell-smo8800.c                      |    1 +
>  drivers/platform/x86/wmi.c                               |    1 +
>  drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
>  include/acpi/acpi_bus.h                                  |    2 +-
>  4 files changed, 4 insertions(+), 1 deletion(-)
>
> --- a/drivers/platform/x86/dell-smo8800.c
> +++ b/drivers/platform/x86/dell-smo8800.c
> @@ -16,6 +16,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/miscdevice.h>
>  #include <linux/uaccess.h>
> +#include <linux/fs.h>
>
>  struct smo8800_device {
>         u32 irq;                     /* acpi device irq */
> --- a/drivers/platform/x86/wmi.c
> +++ b/drivers/platform/x86/wmi.c
> @@ -29,6 +29,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/uuid.h>
>  #include <linux/wmi.h>
> +#include <linux/fs.h>
>  #include <uapi/linux/wmi.h>
>
>  ACPI_MODULE_NAME("wmi");
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> @@ -19,6 +19,7 @@
>  #include <linux/acpi.h>
>  #include <linux/uaccess.h>
>  #include <linux/miscdevice.h>
> +#include <linux/fs.h>
>  #include "acpi_thermal_rel.h"
>
>  static acpi_handle acpi_thermal_rel_handle;
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
>
>  #ifdef CONFIG_ACPI
>
> -#include <linux/proc_fs.h>
> +struct proc_dir_entry;
>
>  #define ACPI_BUS_FILE_ROOT     "acpi"
>  extern struct proc_dir_entry *acpi_root_dir;
>
>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 05/20] acpi: Remove header dependency
@ 2020-03-22  7:02     ` Rafael J. Wysocki
  0 siblings, 0 replies; 195+ messages in thread
From: Rafael J. Wysocki @ 2020-03-22  7:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: open list:ULTRA-WIDEBAND (UWB) SUBSYSTEM:,
	linux-ia64, Peter Zijlstra, Linux PCI, Sebastian Siewior,
	Oleg Nesterov, Guo Ren, Joel Fernandes, Vincent Chen,
	Ingo Molnar, Davidlohr Bueso, kbuild test robot, Brian Cain,
	Jonathan Corbet, Paul E . McKenney, linux-hexagon,
	Rafael J. Wysocki, linux-csky, ACPI Devel Maling List,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Randy Dunlap,
	Arnd Bergmann, Linux PM, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, Platform Driver, Kalle Valo,
	Felipe Balbi, Michal Simek, Tony Luck, Nick Hu, Geoff Levand,
	Greg Kroah-Hartman, Linus Torvalds,
	open list:NETWORKING DRIVERS (WIRELESS),
	LKML, Davidlohr Bueso, netdev, Logan Gunthorpe, David S. Miller,
	Andy Shevchenko

On Sat, Mar 21, 2020 at 12:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> In order to avoid future header hell, remove the inclusion of
> proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
> struct.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andy Shevchenko <andy@infradead.org>
> Cc: platform-driver-x86@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: linux-pm@vger.kernel.org
> Cc: Len Brown <lenb@kernel.org>
> Cc: linux-acpi@vger.kernel.org

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  drivers/platform/x86/dell-smo8800.c                      |    1 +
>  drivers/platform/x86/wmi.c                               |    1 +
>  drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |    1 +
>  include/acpi/acpi_bus.h                                  |    2 +-
>  4 files changed, 4 insertions(+), 1 deletion(-)
>
> --- a/drivers/platform/x86/dell-smo8800.c
> +++ b/drivers/platform/x86/dell-smo8800.c
> @@ -16,6 +16,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/miscdevice.h>
>  #include <linux/uaccess.h>
> +#include <linux/fs.h>
>
>  struct smo8800_device {
>         u32 irq;                     /* acpi device irq */
> --- a/drivers/platform/x86/wmi.c
> +++ b/drivers/platform/x86/wmi.c
> @@ -29,6 +29,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/uuid.h>
>  #include <linux/wmi.h>
> +#include <linux/fs.h>
>  #include <uapi/linux/wmi.h>
>
>  ACPI_MODULE_NAME("wmi");
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> @@ -19,6 +19,7 @@
>  #include <linux/acpi.h>
>  #include <linux/uaccess.h>
>  #include <linux/miscdevice.h>
> +#include <linux/fs.h>
>  #include "acpi_thermal_rel.h"
>
>  static acpi_handle acpi_thermal_rel_handle;
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
>
>  #ifdef CONFIG_ACPI
>
> -#include <linux/proc_fs.h>
> +struct proc_dir_entry;
>
>  #define ACPI_BUS_FILE_ROOT     "acpi"
>  extern struct proc_dir_entry *acpi_root_dir;
>
>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [tip: locking/core] lockdep: Annotate irq_work
  2020-03-22  2:33         ` Frederic Weisbecker
  2020-03-22  2:39           ` Frederic Weisbecker
@ 2020-03-22 12:27           ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 195+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-22 12:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, linux-tip-commits, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86

On 2020-03-22 03:33:30 [+0100], Frederic Weisbecker wrote:
> > > > @@ -245,6 +245,7 @@ static void nohz_full_kick_func(struct irq_work *work)
> > > >  
> > > >  static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
> > > >  	.func = nohz_full_kick_func,
> > > > +	.flags = ATOMIC_INIT(IRQ_WORK_HARD_IRQ),
> > > >  };
> > > 
> > > I get why these need to be in hardirq but some basic explanations for
> > > ordinary mortals as to why those two specifically and not all the others
> > > (and there are many) would have been nice.
> > 
> > Is the documentation patch in this series any good?
> 
> That describes the general rules but it doesn't tell anything about the
> details of this patch. Especially why RCU and nohz_full irq works in particular
> are special here and why it's fine for others to execute in softirq.

Hmm. You need to know the details of the code. RCU is used in hardirq
context, uses (carefully) raw_spinlock_t and so on.
If my memory serves me well in regard to the "nohz kick" part here
NOHZ_FULL needs to observe the CPU if it idle or a task is running. If
this is invoked as part of softirq, which is threaded, then it will
never observe an idle CPU because there is always a task RUNNING with
the softirq doing this callback.

> Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 04/20] orinoco_usb: Use the regular completion interfaces
  2020-03-21 11:25   ` Thomas Gleixner
  (?)
@ 2020-03-22 14:42     ` Kalle Valo
  -1 siblings, 0 replies; 195+ messages in thread
From: Kalle Valo @ 2020-03-22 14:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, tel.com, Oleg Nesterov, Guo Ren,
	Joel Fernandes, Vincent Chen, Ingo Molnar, Jonathan Corbet,
	Davidlohr Bueso, Paul E . McKenney, Brian Cain, linux-acpi,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> The completion usage in this driver is interesting:
>
>   - it uses a magic complete function which according to the comment was
>     implemented by invoking complete() four times in a row because
>     complete_all() was not exported at that time.
>
>   - it uses an open coded wait/poll which checks completion:done. Only one wait
>     side (device removal) uses the regular wait_for_completion() interface.
>
> The rationale behind this is to prevent that wait_for_completion() consumes
> completion::done which would prevent that all waiters are woken. This is not
> necessary with complete_all() as that sets completion::done to UINT_MAX which
> is left unmodified by the woken waiters.
>
> Replace the magic complete function with complete_all() and convert the
> open coded wait/poll to regular completion interfaces.
>
> This changes the wait to exclusive wait mode. But that does not make any
> difference because the wakers use complete_all() which ignores the
> exclusive mode.
>
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-usb@vger.kernel.org
> ---
> V2: New patch to avoid conversion to swait functions later.
> ---
>  drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)

I assume this is going via some other than wireless-drivers so:

Acked-by: Kalle Valo <kvalo@codeaurora.org>

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 04/20] orinoco_usb: Use the regular completion interfaces
@ 2020-03-22 14:42     ` Kalle Valo
  0 siblings, 0 replies; 195+ messages in thread
From: Kalle Valo @ 2020-03-22 14:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: , Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, tel.com, Oleg Nesterov, Guo Ren,
	Joel Fernandes, Vincent Chen, Ingo Molnar, Jonathan Corbet,
	Davidlohr Bueso, Paul E . McKenney, Brian Cain, linux-acpi,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, kbuild test robot, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Felipe Balbi,
	Michal Simek, Tony Luck, Nick Hu, Geoff Levand,
	Greg Kroah-Hartman, linux-usb, linux-wireless, LKML,
	Davidlohr Bueso, netdev, Logan Gunthorpe, David S. Miller,
	Andy Shevchenko

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> The completion usage in this driver is interesting:
>
>   - it uses a magic complete function which according to the comment was
>     implemented by invoking complete() four times in a row because
>     complete_all() was not exported at that time.
>
>   - it uses an open coded wait/poll which checks completion:done. Only one wait
>     side (device removal) uses the regular wait_for_completion() interface.
>
> The rationale behind this is to prevent that wait_for_completion() consumes
> completion::done which would prevent that all waiters are woken. This is not
> necessary with complete_all() as that sets completion::done to UINT_MAX which
> is left unmodified by the woken waiters.
>
> Replace the magic complete function with complete_all() and convert the
> open coded wait/poll to regular completion interfaces.
>
> This changes the wait to exclusive wait mode. But that does not make any
> difference because the wakers use complete_all() which ignores the
> exclusive mode.
>
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-usb@vger.kernel.org
> ---
> V2: New patch to avoid conversion to swait functions later.
> ---
>  drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)

I assume this is going via some other than wireless-drivers so:

Acked-by: Kalle Valo <kvalo@codeaurora.org>

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 04/20] orinoco_usb: Use the regular completion interfaces
@ 2020-03-22 14:42     ` Kalle Valo
  0 siblings, 0 replies; 195+ messages in thread
From: Kalle Valo @ 2020-03-22 14:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: , Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, tel.com, Oleg Nesterov, Guo Ren,
	Joel Fernandes, Vincent Chen, Ingo Molnar, Jonathan Corbet,
	Davidlohr Bueso, Paul E . McKenney, Brian Cain, linux-acpi,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> The completion usage in this driver is interesting:
>
>   - it uses a magic complete function which according to the comment was
>     implemented by invoking complete() four times in a row because
>     complete_all() was not exported at that time.
>
>   - it uses an open coded wait/poll which checks completion:done. Only one wait
>     side (device removal) uses the regular wait_for_completion() interface.
>
> The rationale behind this is to prevent that wait_for_completion() consumes
> completion::done which would prevent that all waiters are woken. This is not
> necessary with complete_all() as that sets completion::done to UINT_MAX which
> is left unmodified by the woken waiters.
>
> Replace the magic complete function with complete_all() and convert the
> open coded wait/poll to regular completion interfaces.
>
> This changes the wait to exclusive wait mode. But that does not make any
> difference because the wakers use complete_all() which ignores the
> exclusive mode.
>
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-usb@vger.kernel.org
> ---
> V2: New patch to avoid conversion to swait functions later.
> ---
>  drivers/net/wireless/intersil/orinoco/orinoco_usb.c |   21 ++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)

I assume this is going via some other than wireless-drivers so:

Acked-by: Kalle Valo <kvalo@codeaurora.org>

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
  2020-03-21 11:25   ` Thomas Gleixner
                       ` (2 preceding siblings ...)
  (?)
@ 2020-03-23  2:55     ` Paul E. McKenney
  -1 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-23  2:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The kernel provides a variety of locking primitives. The nesting of these
> lock types and the implications of them on RT enabled kernels is nowhere
> documented.
> 
> Add initial documentation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> V3: Addressed review comments from Paul, Jonathan, Davidlohr
> V2: Addressed review comments from Randy
> ---
>  Documentation/locking/index.rst     |    1 
>  Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 300 insertions(+)
>  create mode 100644 Documentation/locking/locktypes.rst
> 
> --- a/Documentation/locking/index.rst
> +++ b/Documentation/locking/index.rst
> @@ -7,6 +7,7 @@ locking
>  .. toctree::
>      :maxdepth: 1
>  
> +    locktypes
>      lockdep-design
>      lockstat
>      locktorture
> --- /dev/null
> +++ b/Documentation/locking/locktypes.rst
> @@ -0,0 +1,299 @@

[ . . . Adding your example execution sequences . . . ]

> +PREEMPT_RT kernels preserve all other spinlock_t semantics:
> +
> + - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
> +   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
> +   disable migration, which ensures that pointers to per-CPU variables
> +   remain valid even if the task is preempted.
> +
> + - Task state is preserved across spinlock acquisition, ensuring that the
> +   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
> +   kernels leave task state untouched.  However, PREEMPT_RT must change
> +   task state if the task blocks during acquisition.  Therefore, it saves
> +   the current task state before blocking and the corresponding lock wakeup
> +   restores it.
> +
> +   Other types of wakeups would normally unconditionally set the task state
> +   to RUNNING, but that does not work here because the task must remain
> +   blocked until the lock becomes available.  Therefore, when a non-lock
> +   wakeup attempts to awaken a task blocked waiting for a spinlock, it
> +   instead sets the saved state to RUNNING.  Then, when the lock
> +   acquisition completes, the lock wakeup sets the task state to the saved
> +   state, in this case setting it to RUNNING.

In the normal case where the task sleeps through the entire lock
acquisition, the sequence of events is as follows:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                               lock wakeup
                                 state = real_state == UNINTERRUPTIBLE

This sequence of events can occur when the task acquires spinlocks
on its way to sleeping, for example, in a call to wait_event().

The non-lock wakeup can occur when a wakeup races with this wait_event(),
which can result in the following sequence of events:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                             non lock wakeup
                                 real_state = RUNNING

                               lock wakeup
                                 state = real_state == RUNNING

Without this real_state subterfuge, the wakeup might be lost.

[ . . . and continuing where I left off earlier . . . ]

> +bit spinlocks
> +-------------
> +
> +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
> +substituted by an RT-mutex based implementation for obvious reasons.
> +
> +The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
> +caveats vs. raw_spinlock_t apply.
> +
> +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
> +this requires conditional (#ifdef'ed) code changes at the usage site while
> +the spinlock_t substitution is simply done by the compiler and the
> +conditionals are restricted to header files and core implementation of the
> +locking primitives and the usage sites do not require any changes.

PREEMPT_RT cannot substitute bit spinlocks because a single bit is
too small to accommodate an RT-mutex.  Therefore, the semantics of bit
spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
caveats also apply to bit spinlocks.

Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
using conditional (#ifdef'ed) code changes at the usage site.
In contrast, usage-site changes are not needed for the spinlock_t
substitution.  Instead, conditionals in header files and the core locking
implemementation enable the compiler to do the substitution transparently.


> +Lock type nesting rules
> +=======================
> +
> +The most basic rules are:
> +
> +  - Lock types of the same lock category (sleeping, spinning) can nest
> +    arbitrarily as long as they respect the general lock ordering rules to
> +    prevent deadlocks.

  - Lock types in the same category (sleeping, spinning) can nest
     arbitrarily as long as they respect the general deadlock-avoidance
     ordering rules.

[ Give or take lockdep eventually complaining about too-deep nesting,
  but that is probably not worth mentioning here.  Leave that caveat
  to the lockdep documentation. ]

> +  - Sleeping lock types cannot nest inside spinning lock types.
> +
> +  - Spinning lock types can nest inside sleeping lock types.
> +
> +These rules apply in general independent of CONFIG_PREEMPT_RT.

These constraints apply both in CONFIG_PREEMPT_RT and otherwise.

> +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
> +spinning to sleeping this has obviously restrictions how they can nest with
> +raw_spinlock_t.
> +
> +This results in the following nest ordering:

The fact that PREEMPT_RT changes the lock category of spinlock_t and
rwlock_t from spinning to sleeping means that they cannot be acquired
while holding a raw spinlock.  This results in the following nesting
ordering:

> +  1) Sleeping locks
> +  2) spinlock_t and rwlock_t
> +  3) raw_spinlock_t and bit spinlocks
> +
> +Lockdep is aware of these constraints to ensure that they are respected.

Lockdep will complain if these constraints are violated, both in
CONFIG_PREEMPT_RT and otherwise.


> +Owner semantics
> +===============
> +
> +Most lock types in the Linux kernel have strict owner semantics, i.e. the
> +context (task) which acquires a lock has to release it.

The aforementioned lock types have strict owner semantics: The context
(task) that acquired the lock must release it.

> +There are two exceptions:
> +
> +  - semaphores
> +  - rwsems
> +
> +semaphores have no owner semantics for historical reason, and as such
> +trylock and release operations can be called from any context. They are
> +often used for both serialization and waiting purposes. That's generally
> +discouraged and should be replaced by separate serialization and wait
> +mechanisms, such as mutexes and completions.

semaphores lack owner semantics for historical reasons, so their trylock
and release operations may be called from any context. They are often
used for both serialization and waiting, but new use cases should
instead use separate serialization and wait mechanisms, such as mutexes
and completions.

> +rwsems have grown interfaces which allow non owner release for special
> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> +substitutes all locking primitives except semaphores with RT-mutex based
> +implementations to provide priority inheritance for all lock types except
> +the truly spinning ones. Priority inheritance on ownerless locks is
> +obviously impossible.
> +
> +For now the rwsem non-owner release excludes code which utilizes it from
> +being used on PREEMPT_RT enabled kernels. In same cases this can be
> +mitigated by disabling portions of the code, in other cases the complete
> +functionality has to be disabled until a workable solution has been found.

rwsems have grown special-purpose interfaces that allow non-owner release.
This non-owner release prevents PREEMPT_RT from substituting RT-mutex
implementations, for example, by defeating priority inheritance.
After all, if the lock has no owner, whose priority should be boosted?
As a result, PREEMPT_RT does not currently support rwsem, which in turn
means that code using it must therefore be disabled until a workable
solution presents itself.

[ Note: Not as confident as I would like to be in the above. ]

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-23  2:55     ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-23  2:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless

On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The kernel provides a variety of locking primitives. The nesting of these
> lock types and the implications of them on RT enabled kernels is nowhere
> documented.
> 
> Add initial documentation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> V3: Addressed review comments from Paul, Jonathan, Davidlohr
> V2: Addressed review comments from Randy
> ---
>  Documentation/locking/index.rst     |    1 
>  Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 300 insertions(+)
>  create mode 100644 Documentation/locking/locktypes.rst
> 
> --- a/Documentation/locking/index.rst
> +++ b/Documentation/locking/index.rst
> @@ -7,6 +7,7 @@ locking
>  .. toctree::
>      :maxdepth: 1
>  
> +    locktypes
>      lockdep-design
>      lockstat
>      locktorture
> --- /dev/null
> +++ b/Documentation/locking/locktypes.rst
> @@ -0,0 +1,299 @@

[ . . . Adding your example execution sequences . . . ]

> +PREEMPT_RT kernels preserve all other spinlock_t semantics:
> +
> + - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
> +   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
> +   disable migration, which ensures that pointers to per-CPU variables
> +   remain valid even if the task is preempted.
> +
> + - Task state is preserved across spinlock acquisition, ensuring that the
> +   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
> +   kernels leave task state untouched.  However, PREEMPT_RT must change
> +   task state if the task blocks during acquisition.  Therefore, it saves
> +   the current task state before blocking and the corresponding lock wakeup
> +   restores it.
> +
> +   Other types of wakeups would normally unconditionally set the task state
> +   to RUNNING, but that does not work here because the task must remain
> +   blocked until the lock becomes available.  Therefore, when a non-lock
> +   wakeup attempts to awaken a task blocked waiting for a spinlock, it
> +   instead sets the saved state to RUNNING.  Then, when the lock
> +   acquisition completes, the lock wakeup sets the task state to the saved
> +   state, in this case setting it to RUNNING.

In the normal case where the task sleeps through the entire lock
acquisition, the sequence of events is as follows:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                               lock wakeup
                                 state = real_state == UNINTERRUPTIBLE

This sequence of events can occur when the task acquires spinlocks
on its way to sleeping, for example, in a call to wait_event().

The non-lock wakeup can occur when a wakeup races with this wait_event(),
which can result in the following sequence of events:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                             non lock wakeup
                                 real_state = RUNNING

                               lock wakeup
                                 state = real_state == RUNNING

Without this real_state subterfuge, the wakeup might be lost.

[ . . . and continuing where I left off earlier . . . ]

> +bit spinlocks
> +-------------
> +
> +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
> +substituted by an RT-mutex based implementation for obvious reasons.
> +
> +The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
> +caveats vs. raw_spinlock_t apply.
> +
> +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
> +this requires conditional (#ifdef'ed) code changes at the usage site while
> +the spinlock_t substitution is simply done by the compiler and the
> +conditionals are restricted to header files and core implementation of the
> +locking primitives and the usage sites do not require any changes.

PREEMPT_RT cannot substitute bit spinlocks because a single bit is
too small to accommodate an RT-mutex.  Therefore, the semantics of bit
spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
caveats also apply to bit spinlocks.

Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
using conditional (#ifdef'ed) code changes at the usage site.
In contrast, usage-site changes are not needed for the spinlock_t
substitution.  Instead, conditionals in header files and the core locking
implemementation enable the compiler to do the substitution transparently.


> +Lock type nesting rules
> +=======================
> +
> +The most basic rules are:
> +
> +  - Lock types of the same lock category (sleeping, spinning) can nest
> +    arbitrarily as long as they respect the general lock ordering rules to
> +    prevent deadlocks.

  - Lock types in the same category (sleeping, spinning) can nest
     arbitrarily as long as they respect the general deadlock-avoidance
     ordering rules.

[ Give or take lockdep eventually complaining about too-deep nesting,
  but that is probably not worth mentioning here.  Leave that caveat
  to the lockdep documentation. ]

> +  - Sleeping lock types cannot nest inside spinning lock types.
> +
> +  - Spinning lock types can nest inside sleeping lock types.
> +
> +These rules apply in general independent of CONFIG_PREEMPT_RT.

These constraints apply both in CONFIG_PREEMPT_RT and otherwise.

> +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
> +spinning to sleeping this has obviously restrictions how they can nest with
> +raw_spinlock_t.
> +
> +This results in the following nest ordering:

The fact that PREEMPT_RT changes the lock category of spinlock_t and
rwlock_t from spinning to sleeping means that they cannot be acquired
while holding a raw spinlock.  This results in the following nesting
ordering:

> +  1) Sleeping locks
> +  2) spinlock_t and rwlock_t
> +  3) raw_spinlock_t and bit spinlocks
> +
> +Lockdep is aware of these constraints to ensure that they are respected.

Lockdep will complain if these constraints are violated, both in
CONFIG_PREEMPT_RT and otherwise.


> +Owner semantics
> +===============
> +
> +Most lock types in the Linux kernel have strict owner semantics, i.e. the
> +context (task) which acquires a lock has to release it.

The aforementioned lock types have strict owner semantics: The context
(task) that acquired the lock must release it.

> +There are two exceptions:
> +
> +  - semaphores
> +  - rwsems
> +
> +semaphores have no owner semantics for historical reason, and as such
> +trylock and release operations can be called from any context. They are
> +often used for both serialization and waiting purposes. That's generally
> +discouraged and should be replaced by separate serialization and wait
> +mechanisms, such as mutexes and completions.

semaphores lack owner semantics for historical reasons, so their trylock
and release operations may be called from any context. They are often
used for both serialization and waiting, but new use cases should
instead use separate serialization and wait mechanisms, such as mutexes
and completions.

> +rwsems have grown interfaces which allow non owner release for special
> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> +substitutes all locking primitives except semaphores with RT-mutex based
> +implementations to provide priority inheritance for all lock types except
> +the truly spinning ones. Priority inheritance on ownerless locks is
> +obviously impossible.
> +
> +For now the rwsem non-owner release excludes code which utilizes it from
> +being used on PREEMPT_RT enabled kernels. In same cases this can be
> +mitigated by disabling portions of the code, in other cases the complete
> +functionality has to be disabled until a workable solution has been found.

rwsems have grown special-purpose interfaces that allow non-owner release.
This non-owner release prevents PREEMPT_RT from substituting RT-mutex
implementations, for example, by defeating priority inheritance.
After all, if the lock has no owner, whose priority should be boosted?
As a result, PREEMPT_RT does not currently support rwsem, which in turn
means that code using it must therefore be disabled until a workable
solution presents itself.

[ Note: Not as confident as I would like to be in the above. ]

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-23  2:55     ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-23  2:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-usb, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, Oleg Nesterov, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Davidlohr Bueso, linux-acpi,
	Brian Cain, Jonathan Corbet, linux-hexagon, Rafael J. Wysocki,
	linux-csky, Linus Torvalds, Darren Hart, Zhang Rui, Len Brown,
	Fenghua Yu, Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, netdev, Randy Dunlap, linux-wireless,
	LKML, Davidlohr Bueso, Greg Kroah-Hartman, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The kernel provides a variety of locking primitives. The nesting of these
> lock types and the implications of them on RT enabled kernels is nowhere
> documented.
> 
> Add initial documentation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> V3: Addressed review comments from Paul, Jonathan, Davidlohr
> V2: Addressed review comments from Randy
> ---
>  Documentation/locking/index.rst     |    1 
>  Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 300 insertions(+)
>  create mode 100644 Documentation/locking/locktypes.rst
> 
> --- a/Documentation/locking/index.rst
> +++ b/Documentation/locking/index.rst
> @@ -7,6 +7,7 @@ locking
>  .. toctree::
>      :maxdepth: 1
>  
> +    locktypes
>      lockdep-design
>      lockstat
>      locktorture
> --- /dev/null
> +++ b/Documentation/locking/locktypes.rst
> @@ -0,0 +1,299 @@

[ . . . Adding your example execution sequences . . . ]

> +PREEMPT_RT kernels preserve all other spinlock_t semantics:
> +
> + - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
> +   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
> +   disable migration, which ensures that pointers to per-CPU variables
> +   remain valid even if the task is preempted.
> +
> + - Task state is preserved across spinlock acquisition, ensuring that the
> +   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
> +   kernels leave task state untouched.  However, PREEMPT_RT must change
> +   task state if the task blocks during acquisition.  Therefore, it saves
> +   the current task state before blocking and the corresponding lock wakeup
> +   restores it.
> +
> +   Other types of wakeups would normally unconditionally set the task state
> +   to RUNNING, but that does not work here because the task must remain
> +   blocked until the lock becomes available.  Therefore, when a non-lock
> +   wakeup attempts to awaken a task blocked waiting for a spinlock, it
> +   instead sets the saved state to RUNNING.  Then, when the lock
> +   acquisition completes, the lock wakeup sets the task state to the saved
> +   state, in this case setting it to RUNNING.

In the normal case where the task sleeps through the entire lock
acquisition, the sequence of events is as follows:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                               lock wakeup
                                 state = real_state == UNINTERRUPTIBLE

This sequence of events can occur when the task acquires spinlocks
on its way to sleeping, for example, in a call to wait_event().

The non-lock wakeup can occur when a wakeup races with this wait_event(),
which can result in the following sequence of events:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                             non lock wakeup
                                 real_state = RUNNING

                               lock wakeup
                                 state = real_state == RUNNING

Without this real_state subterfuge, the wakeup might be lost.

[ . . . and continuing where I left off earlier . . . ]

> +bit spinlocks
> +-------------
> +
> +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
> +substituted by an RT-mutex based implementation for obvious reasons.
> +
> +The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
> +caveats vs. raw_spinlock_t apply.
> +
> +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
> +this requires conditional (#ifdef'ed) code changes at the usage site while
> +the spinlock_t substitution is simply done by the compiler and the
> +conditionals are restricted to header files and core implementation of the
> +locking primitives and the usage sites do not require any changes.

PREEMPT_RT cannot substitute bit spinlocks because a single bit is
too small to accommodate an RT-mutex.  Therefore, the semantics of bit
spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
caveats also apply to bit spinlocks.

Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
using conditional (#ifdef'ed) code changes at the usage site.
In contrast, usage-site changes are not needed for the spinlock_t
substitution.  Instead, conditionals in header files and the core locking
implemementation enable the compiler to do the substitution transparently.


> +Lock type nesting rules
> +=======================
> +
> +The most basic rules are:
> +
> +  - Lock types of the same lock category (sleeping, spinning) can nest
> +    arbitrarily as long as they respect the general lock ordering rules to
> +    prevent deadlocks.

  - Lock types in the same category (sleeping, spinning) can nest
     arbitrarily as long as they respect the general deadlock-avoidance
     ordering rules.

[ Give or take lockdep eventually complaining about too-deep nesting,
  but that is probably not worth mentioning here.  Leave that caveat
  to the lockdep documentation. ]

> +  - Sleeping lock types cannot nest inside spinning lock types.
> +
> +  - Spinning lock types can nest inside sleeping lock types.
> +
> +These rules apply in general independent of CONFIG_PREEMPT_RT.

These constraints apply both in CONFIG_PREEMPT_RT and otherwise.

> +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
> +spinning to sleeping this has obviously restrictions how they can nest with
> +raw_spinlock_t.
> +
> +This results in the following nest ordering:

The fact that PREEMPT_RT changes the lock category of spinlock_t and
rwlock_t from spinning to sleeping means that they cannot be acquired
while holding a raw spinlock.  This results in the following nesting
ordering:

> +  1) Sleeping locks
> +  2) spinlock_t and rwlock_t
> +  3) raw_spinlock_t and bit spinlocks
> +
> +Lockdep is aware of these constraints to ensure that they are respected.

Lockdep will complain if these constraints are violated, both in
CONFIG_PREEMPT_RT and otherwise.


> +Owner semantics
> +===============
> +
> +Most lock types in the Linux kernel have strict owner semantics, i.e. the
> +context (task) which acquires a lock has to release it.

The aforementioned lock types have strict owner semantics: The context
(task) that acquired the lock must release it.

> +There are two exceptions:
> +
> +  - semaphores
> +  - rwsems
> +
> +semaphores have no owner semantics for historical reason, and as such
> +trylock and release operations can be called from any context. They are
> +often used for both serialization and waiting purposes. That's generally
> +discouraged and should be replaced by separate serialization and wait
> +mechanisms, such as mutexes and completions.

semaphores lack owner semantics for historical reasons, so their trylock
and release operations may be called from any context. They are often
used for both serialization and waiting, but new use cases should
instead use separate serialization and wait mechanisms, such as mutexes
and completions.

> +rwsems have grown interfaces which allow non owner release for special
> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> +substitutes all locking primitives except semaphores with RT-mutex based
> +implementations to provide priority inheritance for all lock types except
> +the truly spinning ones. Priority inheritance on ownerless locks is
> +obviously impossible.
> +
> +For now the rwsem non-owner release excludes code which utilizes it from
> +being used on PREEMPT_RT enabled kernels. In same cases this can be
> +mitigated by disabling portions of the code, in other cases the complete
> +functionality has to be disabled until a workable solution has been found.

rwsems have grown special-purpose interfaces that allow non-owner release.
This non-owner release prevents PREEMPT_RT from substituting RT-mutex
implementations, for example, by defeating priority inheritance.
After all, if the lock has no owner, whose priority should be boosted?
As a result, PREEMPT_RT does not currently support rwsem, which in turn
means that code using it must therefore be disabled until a workable
solution presents itself.

[ Note: Not as confident as I would like to be in the above. ]

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-23  2:55     ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-23  2:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The kernel provides a variety of locking primitives. The nesting of these
> lock types and the implications of them on RT enabled kernels is nowhere
> documented.
> 
> Add initial documentation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> V3: Addressed review comments from Paul, Jonathan, Davidlohr
> V2: Addressed review comments from Randy
> ---
>  Documentation/locking/index.rst     |    1 
>  Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 300 insertions(+)
>  create mode 100644 Documentation/locking/locktypes.rst
> 
> --- a/Documentation/locking/index.rst
> +++ b/Documentation/locking/index.rst
> @@ -7,6 +7,7 @@ locking
>  .. toctree::
>      :maxdepth: 1
>  
> +    locktypes
>      lockdep-design
>      lockstat
>      locktorture
> --- /dev/null
> +++ b/Documentation/locking/locktypes.rst
> @@ -0,0 +1,299 @@

[ . . . Adding your example execution sequences . . . ]

> +PREEMPT_RT kernels preserve all other spinlock_t semantics:
> +
> + - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
> +   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
> +   disable migration, which ensures that pointers to per-CPU variables
> +   remain valid even if the task is preempted.
> +
> + - Task state is preserved across spinlock acquisition, ensuring that the
> +   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
> +   kernels leave task state untouched.  However, PREEMPT_RT must change
> +   task state if the task blocks during acquisition.  Therefore, it saves
> +   the current task state before blocking and the corresponding lock wakeup
> +   restores it.
> +
> +   Other types of wakeups would normally unconditionally set the task state
> +   to RUNNING, but that does not work here because the task must remain
> +   blocked until the lock becomes available.  Therefore, when a non-lock
> +   wakeup attempts to awaken a task blocked waiting for a spinlock, it
> +   instead sets the saved state to RUNNING.  Then, when the lock
> +   acquisition completes, the lock wakeup sets the task state to the saved
> +   state, in this case setting it to RUNNING.

In the normal case where the task sleeps through the entire lock
acquisition, the sequence of events is as follows:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                               lock wakeup
                                 state = real_state = UNINTERRUPTIBLE

This sequence of events can occur when the task acquires spinlocks
on its way to sleeping, for example, in a call to wait_event().

The non-lock wakeup can occur when a wakeup races with this wait_event(),
which can result in the following sequence of events:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                             non lock wakeup
                                 real_state = RUNNING

                               lock wakeup
                                 state = real_state = RUNNING

Without this real_state subterfuge, the wakeup might be lost.

[ . . . and continuing where I left off earlier . . . ]

> +bit spinlocks
> +-------------
> +
> +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
> +substituted by an RT-mutex based implementation for obvious reasons.
> +
> +The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
> +caveats vs. raw_spinlock_t apply.
> +
> +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
> +this requires conditional (#ifdef'ed) code changes at the usage site while
> +the spinlock_t substitution is simply done by the compiler and the
> +conditionals are restricted to header files and core implementation of the
> +locking primitives and the usage sites do not require any changes.

PREEMPT_RT cannot substitute bit spinlocks because a single bit is
too small to accommodate an RT-mutex.  Therefore, the semantics of bit
spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
caveats also apply to bit spinlocks.

Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
using conditional (#ifdef'ed) code changes at the usage site.
In contrast, usage-site changes are not needed for the spinlock_t
substitution.  Instead, conditionals in header files and the core locking
implemementation enable the compiler to do the substitution transparently.


> +Lock type nesting rules
> +===========> +
> +The most basic rules are:
> +
> +  - Lock types of the same lock category (sleeping, spinning) can nest
> +    arbitrarily as long as they respect the general lock ordering rules to
> +    prevent deadlocks.

  - Lock types in the same category (sleeping, spinning) can nest
     arbitrarily as long as they respect the general deadlock-avoidance
     ordering rules.

[ Give or take lockdep eventually complaining about too-deep nesting,
  but that is probably not worth mentioning here.  Leave that caveat
  to the lockdep documentation. ]

> +  - Sleeping lock types cannot nest inside spinning lock types.
> +
> +  - Spinning lock types can nest inside sleeping lock types.
> +
> +These rules apply in general independent of CONFIG_PREEMPT_RT.

These constraints apply both in CONFIG_PREEMPT_RT and otherwise.

> +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
> +spinning to sleeping this has obviously restrictions how they can nest with
> +raw_spinlock_t.
> +
> +This results in the following nest ordering:

The fact that PREEMPT_RT changes the lock category of spinlock_t and
rwlock_t from spinning to sleeping means that they cannot be acquired
while holding a raw spinlock.  This results in the following nesting
ordering:

> +  1) Sleeping locks
> +  2) spinlock_t and rwlock_t
> +  3) raw_spinlock_t and bit spinlocks
> +
> +Lockdep is aware of these constraints to ensure that they are respected.

Lockdep will complain if these constraints are violated, both in
CONFIG_PREEMPT_RT and otherwise.


> +Owner semantics
> +=======> +
> +Most lock types in the Linux kernel have strict owner semantics, i.e. the
> +context (task) which acquires a lock has to release it.

The aforementioned lock types have strict owner semantics: The context
(task) that acquired the lock must release it.

> +There are two exceptions:
> +
> +  - semaphores
> +  - rwsems
> +
> +semaphores have no owner semantics for historical reason, and as such
> +trylock and release operations can be called from any context. They are
> +often used for both serialization and waiting purposes. That's generally
> +discouraged and should be replaced by separate serialization and wait
> +mechanisms, such as mutexes and completions.

semaphores lack owner semantics for historical reasons, so their trylock
and release operations may be called from any context. They are often
used for both serialization and waiting, but new use cases should
instead use separate serialization and wait mechanisms, such as mutexes
and completions.

> +rwsems have grown interfaces which allow non owner release for special
> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> +substitutes all locking primitives except semaphores with RT-mutex based
> +implementations to provide priority inheritance for all lock types except
> +the truly spinning ones. Priority inheritance on ownerless locks is
> +obviously impossible.
> +
> +For now the rwsem non-owner release excludes code which utilizes it from
> +being used on PREEMPT_RT enabled kernels. In same cases this can be
> +mitigated by disabling portions of the code, in other cases the complete
> +functionality has to be disabled until a workable solution has been found.

rwsems have grown special-purpose interfaces that allow non-owner release.
This non-owner release prevents PREEMPT_RT from substituting RT-mutex
implementations, for example, by defeating priority inheritance.
After all, if the lock has no owner, whose priority should be boosted?
As a result, PREEMPT_RT does not currently support rwsem, which in turn
means that code using it must therefore be disabled until a workable
solution presents itself.

[ Note: Not as confident as I would like to be in the above. ]

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-23  2:55     ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-23  2:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, net

On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The kernel provides a variety of locking primitives. The nesting of these
> lock types and the implications of them on RT enabled kernels is nowhere
> documented.
> 
> Add initial documentation.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Paul E . McKenney" <paulmck@kernel.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> V3: Addressed review comments from Paul, Jonathan, Davidlohr
> V2: Addressed review comments from Randy
> ---
>  Documentation/locking/index.rst     |    1 
>  Documentation/locking/locktypes.rst |  299 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 300 insertions(+)
>  create mode 100644 Documentation/locking/locktypes.rst
> 
> --- a/Documentation/locking/index.rst
> +++ b/Documentation/locking/index.rst
> @@ -7,6 +7,7 @@ locking
>  .. toctree::
>      :maxdepth: 1
>  
> +    locktypes
>      lockdep-design
>      lockstat
>      locktorture
> --- /dev/null
> +++ b/Documentation/locking/locktypes.rst
> @@ -0,0 +1,299 @@

[ . . . Adding your example execution sequences . . . ]

> +PREEMPT_RT kernels preserve all other spinlock_t semantics:
> +
> + - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
> +   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
> +   disable migration, which ensures that pointers to per-CPU variables
> +   remain valid even if the task is preempted.
> +
> + - Task state is preserved across spinlock acquisition, ensuring that the
> +   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
> +   kernels leave task state untouched.  However, PREEMPT_RT must change
> +   task state if the task blocks during acquisition.  Therefore, it saves
> +   the current task state before blocking and the corresponding lock wakeup
> +   restores it.
> +
> +   Other types of wakeups would normally unconditionally set the task state
> +   to RUNNING, but that does not work here because the task must remain
> +   blocked until the lock becomes available.  Therefore, when a non-lock
> +   wakeup attempts to awaken a task blocked waiting for a spinlock, it
> +   instead sets the saved state to RUNNING.  Then, when the lock
> +   acquisition completes, the lock wakeup sets the task state to the saved
> +   state, in this case setting it to RUNNING.

In the normal case where the task sleeps through the entire lock
acquisition, the sequence of events is as follows:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                               lock wakeup
                                 state = real_state == UNINTERRUPTIBLE

This sequence of events can occur when the task acquires spinlocks
on its way to sleeping, for example, in a call to wait_event().

The non-lock wakeup can occur when a wakeup races with this wait_event(),
which can result in the following sequence of events:

     state = UNINTERRUPTIBLE
     lock()
       block()
         real_state = state
         state = SLEEPONLOCK

                             non lock wakeup
                                 real_state = RUNNING

                               lock wakeup
                                 state = real_state == RUNNING

Without this real_state subterfuge, the wakeup might be lost.

[ . . . and continuing where I left off earlier . . . ]

> +bit spinlocks
> +-------------
> +
> +Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
> +substituted by an RT-mutex based implementation for obvious reasons.
> +
> +The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
> +caveats vs. raw_spinlock_t apply.
> +
> +Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
> +this requires conditional (#ifdef'ed) code changes at the usage site while
> +the spinlock_t substitution is simply done by the compiler and the
> +conditionals are restricted to header files and core implementation of the
> +locking primitives and the usage sites do not require any changes.

PREEMPT_RT cannot substitute bit spinlocks because a single bit is
too small to accommodate an RT-mutex.  Therefore, the semantics of bit
spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
caveats also apply to bit spinlocks.

Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
using conditional (#ifdef'ed) code changes at the usage site.
In contrast, usage-site changes are not needed for the spinlock_t
substitution.  Instead, conditionals in header files and the core locking
implemementation enable the compiler to do the substitution transparently.


> +Lock type nesting rules
> +=======================
> +
> +The most basic rules are:
> +
> +  - Lock types of the same lock category (sleeping, spinning) can nest
> +    arbitrarily as long as they respect the general lock ordering rules to
> +    prevent deadlocks.

  - Lock types in the same category (sleeping, spinning) can nest
     arbitrarily as long as they respect the general deadlock-avoidance
     ordering rules.

[ Give or take lockdep eventually complaining about too-deep nesting,
  but that is probably not worth mentioning here.  Leave that caveat
  to the lockdep documentation. ]

> +  - Sleeping lock types cannot nest inside spinning lock types.
> +
> +  - Spinning lock types can nest inside sleeping lock types.
> +
> +These rules apply in general independent of CONFIG_PREEMPT_RT.

These constraints apply both in CONFIG_PREEMPT_RT and otherwise.

> +As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
> +spinning to sleeping this has obviously restrictions how they can nest with
> +raw_spinlock_t.
> +
> +This results in the following nest ordering:

The fact that PREEMPT_RT changes the lock category of spinlock_t and
rwlock_t from spinning to sleeping means that they cannot be acquired
while holding a raw spinlock.  This results in the following nesting
ordering:

> +  1) Sleeping locks
> +  2) spinlock_t and rwlock_t
> +  3) raw_spinlock_t and bit spinlocks
> +
> +Lockdep is aware of these constraints to ensure that they are respected.

Lockdep will complain if these constraints are violated, both in
CONFIG_PREEMPT_RT and otherwise.


> +Owner semantics
> +===============
> +
> +Most lock types in the Linux kernel have strict owner semantics, i.e. the
> +context (task) which acquires a lock has to release it.

The aforementioned lock types have strict owner semantics: The context
(task) that acquired the lock must release it.

> +There are two exceptions:
> +
> +  - semaphores
> +  - rwsems
> +
> +semaphores have no owner semantics for historical reason, and as such
> +trylock and release operations can be called from any context. They are
> +often used for both serialization and waiting purposes. That's generally
> +discouraged and should be replaced by separate serialization and wait
> +mechanisms, such as mutexes and completions.

semaphores lack owner semantics for historical reasons, so their trylock
and release operations may be called from any context. They are often
used for both serialization and waiting, but new use cases should
instead use separate serialization and wait mechanisms, such as mutexes
and completions.

> +rwsems have grown interfaces which allow non owner release for special
> +purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
> +substitutes all locking primitives except semaphores with RT-mutex based
> +implementations to provide priority inheritance for all lock types except
> +the truly spinning ones. Priority inheritance on ownerless locks is
> +obviously impossible.
> +
> +For now the rwsem non-owner release excludes code which utilizes it from
> +being used on PREEMPT_RT enabled kernels. In same cases this can be
> +mitigated by disabling portions of the code, in other cases the complete
> +functionality has to be disabled until a workable solution has been found.

rwsems have grown special-purpose interfaces that allow non-owner release.
This non-owner release prevents PREEMPT_RT from substituting RT-mutex
implementations, for example, by defeating priority inheritance.
After all, if the lock has no owner, whose priority should be boosted?
As a result, PREEMPT_RT does not currently support rwsem, which in turn
means that code using it must therefore be disabled until a workable
solution presents itself.

[ Note: Not as confident as I would like to be in the above. ]

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [PATCH] completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
  2020-03-21 11:26   ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-23 15:20     ` Sebastian Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-23 15:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Davidlohr Bueso,
	Greg Kroah-Hartman, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap

The warning was intended to spot complete_all() users from hardirq
context on PREEMPT_RT. The warning as-is will also trigger in interrupt
handlers, which are threaded on PREEMPT_RT, which was not intended.

Use lockdep_assert_RT_in_threaded_ctx() which triggers in non-preemptive
context on PREEMPT_RT.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/lockdep.h   | 15 +++++++++++++++
 kernel/sched/completion.c |  2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 425b4ceb7cd07..206774ac69460 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -711,6 +711,21 @@ do {									\
 # define lockdep_assert_in_irq() do { } while (0)
 #endif
 
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+
+# define lockdep_assert_RT_in_threaded_ctx() do {			\
+		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
+			  current->hardirq_context &&			\
+			  !(current->hardirq_threaded || current->irq_config),	\
+			  "Not in threaded context on PREEMPT_RT as expected\n");	\
+} while (0)
+
+#else
+
+# define lockdep_assert_RT_in_threaded_ctx() do { } while (0)
+
+#endif
+
 #ifdef CONFIG_LOCKDEP
 void lockdep_rcu_suspicious(const char *file, const int line, const char *s);
 #else
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index f15e96164ff1e..a778554f9dad7 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -58,7 +58,7 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	WARN_ON(irqs_disabled());
+	lockdep_assert_RT_in_threaded_ctx();
 
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH] completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
@ 2020-03-23 15:20     ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-23 15:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Davidlohr Bueso,
	Greg Kroah-Hartman, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko

The warning was intended to spot complete_all() users from hardirq
context on PREEMPT_RT. The warning as-is will also trigger in interrupt
handlers, which are threaded on PREEMPT_RT, which was not intended.

Use lockdep_assert_RT_in_threaded_ctx() which triggers in non-preemptive
context on PREEMPT_RT.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/lockdep.h   | 15 +++++++++++++++
 kernel/sched/completion.c |  2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 425b4ceb7cd07..206774ac69460 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -711,6 +711,21 @@ do {									\
 # define lockdep_assert_in_irq() do { } while (0)
 #endif
 
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+
+# define lockdep_assert_RT_in_threaded_ctx() do {			\
+		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
+			  current->hardirq_context &&			\
+			  !(current->hardirq_threaded || current->irq_config),	\
+			  "Not in threaded context on PREEMPT_RT as expected\n");	\
+} while (0)
+
+#else
+
+# define lockdep_assert_RT_in_threaded_ctx() do { } while (0)
+
+#endif
+
 #ifdef CONFIG_LOCKDEP
 void lockdep_rcu_suspicious(const char *file, const int line, const char *s);
 #else
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index f15e96164ff1e..a778554f9dad7 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -58,7 +58,7 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	WARN_ON(irqs_disabled());
+	lockdep_assert_RT_in_threaded_ctx();
 
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-- 
2.26.0.rc2

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH] completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
@ 2020-03-23 15:20     ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-23 15:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Oleg Nesterov, Guo Ren, Joel Fernandes, Vincent Chen,
	Ingo Molnar, Jonathan Corbet, Davidlohr Bueso, linux-acpi,
	Brian Cain, Davidlohr Bueso, Paul E . McKenney, linux-hexagon,
	Rafael J. Wysocki, linux-csky, Linus Torvalds, Darren Hart,
	Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann, linux-pm,
	linuxppc-dev, Greentime Hu, Bjorn Helgaas, Kurt Schwemmer,
	platform-driver-x86, Kalle Valo, kbuild test robot, Felipe Balbi,
	Michal Simek, Tony Luck, Nick Hu, Geoff Levand, netdev,
	linux-usb, linux-wireless, LKML, Greg Kroah-Hartman,
	Logan Gunthorpe, David S. Miller, Andy Shevchenko

The warning was intended to spot complete_all() users from hardirq
context on PREEMPT_RT. The warning as-is will also trigger in interrupt
handlers, which are threaded on PREEMPT_RT, which was not intended.

Use lockdep_assert_RT_in_threaded_ctx() which triggers in non-preemptive
context on PREEMPT_RT.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/lockdep.h   | 15 +++++++++++++++
 kernel/sched/completion.c |  2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 425b4ceb7cd07..206774ac69460 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -711,6 +711,21 @@ do {									\
 # define lockdep_assert_in_irq() do { } while (0)
 #endif
 
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+
+# define lockdep_assert_RT_in_threaded_ctx() do {			\
+		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
+			  current->hardirq_context &&			\
+			  !(current->hardirq_threaded || current->irq_config),	\
+			  "Not in threaded context on PREEMPT_RT as expected\n");	\
+} while (0)
+
+#else
+
+# define lockdep_assert_RT_in_threaded_ctx() do { } while (0)
+
+#endif
+
 #ifdef CONFIG_LOCKDEP
 void lockdep_rcu_suspicious(const char *file, const int line, const char *s);
 #else
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index f15e96164ff1e..a778554f9dad7 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -58,7 +58,7 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	WARN_ON(irqs_disabled());
+	lockdep_assert_RT_in_threaded_ctx();
 
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH] completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
@ 2020-03-23 15:20     ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-23 15:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Davidlohr Bueso,
	Greg Kroah-Hartman, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Felipe Balbi, linux-usb, Kalle Valo,
	David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap

The warning was intended to spot complete_all() users from hardirq
context on PREEMPT_RT. The warning as-is will also trigger in interrupt
handlers, which are threaded on PREEMPT_RT, which was not intended.

Use lockdep_assert_RT_in_threaded_ctx() which triggers in non-preemptive
context on PREEMPT_RT.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/lockdep.h   | 15 +++++++++++++++
 kernel/sched/completion.c |  2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 425b4ceb7cd07..206774ac69460 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -711,6 +711,21 @@ do {									\
 # define lockdep_assert_in_irq() do { } while (0)
 #endif
 
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+
+# define lockdep_assert_RT_in_threaded_ctx() do {			\
+		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
+			  current->hardirq_context &&			\
+			  !(current->hardirq_threaded || current->irq_config),	\
+			  "Not in threaded context on PREEMPT_RT as expected\n");	\
+} while (0)
+
+#else
+
+# define lockdep_assert_RT_in_threaded_ctx() do { } while (0)
+
+#endif
+
 #ifdef CONFIG_LOCKDEP
 void lockdep_rcu_suspicious(const char *file, const int line, const char *s);
 #else
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index f15e96164ff1e..a778554f9dad7 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -58,7 +58,7 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	WARN_ON(irqs_disabled());
+	lockdep_assert_RT_in_threaded_ctx();
 
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;
-- 
2.26.0.rc2

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
  2020-03-23 15:20     ` Sebastian Siewior
                       ` (2 preceding siblings ...)
  (?)
@ 2020-03-23 17:50     ` tip-bot2 for Sebastian Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Sebastian Siewior @ 2020-03-23 17:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kernel test robot, Peter Zijlstra, Sebastian Andrzej Siewior, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     8bf6c677ddb9c922423ea3bf494fe7c508bfbb8c
Gitweb:        https://git.kernel.org/tip/8bf6c677ddb9c922423ea3bf494fe7c508bfbb8c
Author:        Sebastian Siewior <bigeasy@linutronix.de>
AuthorDate:    Mon, 23 Mar 2020 16:20:19 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 23 Mar 2020 18:40:25 +01:00

completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()

The warning was intended to spot complete_all() users from hardirq
context on PREEMPT_RT. The warning as-is will also trigger in interrupt
handlers, which are threaded on PREEMPT_RT, which was not intended.

Use lockdep_assert_RT_in_threaded_ctx() which triggers in non-preemptive
context on PREEMPT_RT.

Fixes: a5c6234e1028 ("completion: Use simple wait queues")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200323152019.4qjwluldohuh3by5@linutronix.de
---
 include/linux/lockdep.h   | 15 +++++++++++++++
 kernel/sched/completion.c |  2 +-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 425b4ce..206774a 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -711,6 +711,21 @@ do {									\
 # define lockdep_assert_in_irq() do { } while (0)
 #endif
 
+#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
+
+# define lockdep_assert_RT_in_threaded_ctx() do {			\
+		WARN_ONCE(debug_locks && !current->lockdep_recursion &&	\
+			  current->hardirq_context &&			\
+			  !(current->hardirq_threaded || current->irq_config),	\
+			  "Not in threaded context on PREEMPT_RT as expected\n");	\
+} while (0)
+
+#else
+
+# define lockdep_assert_RT_in_threaded_ctx() do { } while (0)
+
+#endif
+
 #ifdef CONFIG_LOCKDEP
 void lockdep_rcu_suspicious(const char *file, const int line, const char *s);
 #else
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index f15e961..a778554 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -58,7 +58,7 @@ void complete_all(struct completion *x)
 {
 	unsigned long flags;
 
-	WARN_ON(irqs_disabled());
+	lockdep_assert_RT_in_threaded_ctx();
 
 	raw_spin_lock_irqsave(&x->wait.lock, flags);
 	x->done = UINT_MAX;

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* RE: [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
  2020-03-21 11:25   ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-23 21:46     ` Brian Cain
  -1 siblings, 0 replies; 195+ messages in thread
From: Brian Cain @ 2020-03-23 21:46 UTC (permalink / raw)
  To: 'Thomas Gleixner', 'LKML'
  Cc: 'Peter Zijlstra', 'Ingo Molnar',
	'Sebastian Siewior', 'Linus Torvalds',
	'Joel Fernandes', 'Oleg Nesterov',
	'Davidlohr Bueso', 'kbuild test robot',
	linux-hexagon, 'Logan Gunthorpe', 'Bjorn Helgaas',
	'Kurt Schwemmer', linux-pci, 'Greg Kroah-Hartman',
	'Felipe Balbi', linux-usb, 'Kalle Valo',
	'David S. Miller',
	linux-wireless, netdev, 'Darren Hart',
	'Andy Shevchenko',
	platform-driver-x86, 'Zhang Rui',
	'Rafael J. Wysocki', linux-pm, 'Len Brown',
	linux-acpi, 'Nick Hu', 'Greentime Hu',
	'Vincent Chen', 'Guo Ren',
	linux-csky, 'Tony Luck', 'Fenghua Yu',
	linux-ia64, 'Michal Simek', 'Michael Ellerman',
	'Arnd Bergmann', 'Geoff Levand',
	linuxppc-dev, 'Paul E . McKenney',
	'Jonathan Corbet', 'Randy Dunlap',
	'Davidlohr Bueso'

> -----Original Message-----
> From: Thomas Gleixner <tglx@linutronix.de>
...
> Subject: [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
> 
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The defconfig compiles without linux/mm.h. With mm.h included the include
> chain leands to:
> |   CC      kernel/locking/percpu-rwsem.o
> | In file included from include/linux/huge_mm.h:8,
> |                  from include/linux/mm.h:567,
> |                  from arch/hexagon/include/asm/uaccess.h:,
> |                  from include/linux/uaccess.h:11,
> |                  from include/linux/sched/task.h:11,
> |                  from include/linux/sched/signal.h:9,
> |                  from include/linux/rcuwait.h:6,
> |                  from include/linux/percpu-rwsem.h:8,
> |                  from kernel/locking/percpu-rwsem.c:6:
> | include/linux/fs.h:1422:29: error: array type has incomplete element type
> 'struct percpu_rw_semaphore'
> |  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
> 
> once rcuwait.h includes linux/sched/signal.h.
> 
> Remove the linux/mm.h include.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Brian Cain <bcain@codeaurora.org>
> Cc: linux-hexagon@vger.kernel.org
> ---
> V3: New patch
> ---
>  arch/hexagon/include/asm/uaccess.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/hexagon/include/asm/uaccess.h
> b/arch/hexagon/include/asm/uaccess.h
> index 00cb38faad0c4..c1019a736ff13 100644
> --- a/arch/hexagon/include/asm/uaccess.h
> +++ b/arch/hexagon/include/asm/uaccess.h
> @@ -10,7 +10,6 @@
>  /*
>   * User space memory access functions
>   */
> -#include <linux/mm.h>
>  #include <asm/sections.h>
> 
>  /*
> --
> 2.26.0.rc2
> 

Acked-by: Brian Cain <bcain@codeaurora.org>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* RE: [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
@ 2020-03-23 21:46     ` Brian Cain
  0 siblings, 0 replies; 195+ messages in thread
From: Brian Cain @ 2020-03-23 21:46 UTC (permalink / raw)
  To: 'Thomas Gleixner', 'LKML'
  Cc: 'Peter Zijlstra', 'Ingo Molnar',
	'Sebastian Siewior', 'Linus Torvalds',
	'Joel Fernandes', 'Oleg Nesterov',
	'Davidlohr Bueso', 'kbuild test robot',
	linux-hexagon, 'Logan Gunthorpe', 'Bjorn Helgaas',
	'Kurt Schwemmer', linux-pci, 'Greg Kroah-Hartman',
	'Felipe Balbi', linux-usb, 'Kalle Valo',
	'David S. Miller',
	linux-wireless

> -----Original Message-----
> From: Thomas Gleixner <tglx@linutronix.de>
...
> Subject: [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
> 
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The defconfig compiles without linux/mm.h. With mm.h included the include
> chain leands to:
> |   CC      kernel/locking/percpu-rwsem.o
> | In file included from include/linux/huge_mm.h:8,
> |                  from include/linux/mm.h:567,
> |                  from arch/hexagon/include/asm/uaccess.h:,
> |                  from include/linux/uaccess.h:11,
> |                  from include/linux/sched/task.h:11,
> |                  from include/linux/sched/signal.h:9,
> |                  from include/linux/rcuwait.h:6,
> |                  from include/linux/percpu-rwsem.h:8,
> |                  from kernel/locking/percpu-rwsem.c:6:
> | include/linux/fs.h:1422:29: error: array type has incomplete element type
> 'struct percpu_rw_semaphore'
> |  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
> 
> once rcuwait.h includes linux/sched/signal.h.
> 
> Remove the linux/mm.h include.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Brian Cain <bcain@codeaurora.org>
> Cc: linux-hexagon@vger.kernel.org
> ---
> V3: New patch
> ---
>  arch/hexagon/include/asm/uaccess.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/hexagon/include/asm/uaccess.h
> b/arch/hexagon/include/asm/uaccess.h
> index 00cb38faad0c4..c1019a736ff13 100644
> --- a/arch/hexagon/include/asm/uaccess.h
> +++ b/arch/hexagon/include/asm/uaccess.h
> @@ -10,7 +10,6 @@
>  /*
>   * User space memory access functions
>   */
> -#include <linux/mm.h>
>  #include <asm/sections.h>
> 
>  /*
> --
> 2.26.0.rc2
> 

Acked-by: Brian Cain <bcain@codeaurora.org>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* RE: [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
@ 2020-03-23 21:46     ` Brian Cain
  0 siblings, 0 replies; 195+ messages in thread
From: Brian Cain @ 2020-03-23 21:46 UTC (permalink / raw)
  To: 'Thomas Gleixner', 'LKML'
  Cc: 'Randy Dunlap', linux-ia64, 'Peter Zijlstra',
	linux-pci, 'Sebastian Siewior',
	platform-driver-x86, 'Guo Ren', 'Joel Fernandes',
	'Vincent Chen', 'Ingo Molnar',
	'Jonathan Corbet', 'Davidlohr Bueso',
	'kbuild test robot',
	linux-acpi, 'Paul E . McKenney',
	linux-hexagon, 'Rafael J. Wysocki',
	linux-csky, 'Linus Torvalds', 'Darren Hart',
	'Zhang Rui', 'Len Brown', 'Fenghua Yu',
	'Arnd Bergmann',
	linux-pm, linuxppc-dev, 'Greentime Hu',
	'Bjorn Helgaas', 'Kurt Schwemmer',
	'Kalle Valo', 'Felipe Balbi',
	'Michal Simek', 'Tony Luck', 'Nick Hu',
	'Geoff Levand', 'Greg Kroah-Hartman',
	linux-usb, linux-wireless, 'Oleg Nesterov',
	'Davidlohr Bueso', netdev, 'Logan Gunthorpe',
	'David S. Miller', 'Andy Shevchenko'

> -----Original Message-----
> From: Thomas Gleixner <tglx@linutronix.de>
...
> Subject: [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
> 
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The defconfig compiles without linux/mm.h. With mm.h included the include
> chain leands to:
> |   CC      kernel/locking/percpu-rwsem.o
> | In file included from include/linux/huge_mm.h:8,
> |                  from include/linux/mm.h:567,
> |                  from arch/hexagon/include/asm/uaccess.h:,
> |                  from include/linux/uaccess.h:11,
> |                  from include/linux/sched/task.h:11,
> |                  from include/linux/sched/signal.h:9,
> |                  from include/linux/rcuwait.h:6,
> |                  from include/linux/percpu-rwsem.h:8,
> |                  from kernel/locking/percpu-rwsem.c:6:
> | include/linux/fs.h:1422:29: error: array type has incomplete element type
> 'struct percpu_rw_semaphore'
> |  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
> 
> once rcuwait.h includes linux/sched/signal.h.
> 
> Remove the linux/mm.h include.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Brian Cain <bcain@codeaurora.org>
> Cc: linux-hexagon@vger.kernel.org
> ---
> V3: New patch
> ---
>  arch/hexagon/include/asm/uaccess.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/hexagon/include/asm/uaccess.h
> b/arch/hexagon/include/asm/uaccess.h
> index 00cb38faad0c4..c1019a736ff13 100644
> --- a/arch/hexagon/include/asm/uaccess.h
> +++ b/arch/hexagon/include/asm/uaccess.h
> @@ -10,7 +10,6 @@
>  /*
>   * User space memory access functions
>   */
> -#include <linux/mm.h>
>  #include <asm/sections.h>
> 
>  /*
> --
> 2.26.0.rc2
> 

Acked-by: Brian Cain <bcain@codeaurora.org>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* RE: [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
@ 2020-03-23 21:46     ` Brian Cain
  0 siblings, 0 replies; 195+ messages in thread
From: Brian Cain @ 2020-03-23 21:46 UTC (permalink / raw)
  To: 'Thomas Gleixner', 'LKML'
  Cc: 'Peter Zijlstra', 'Ingo Molnar',
	'Sebastian Siewior', 'Linus Torvalds',
	'Joel Fernandes', 'Oleg Nesterov',
	'Davidlohr Bueso', 'kbuild test robot',
	linux-hexagon, 'Logan Gunthorpe', 'Bjorn Helgaas',
	'Kurt Schwemmer', linux-pci, 'Greg Kroah-Hartman',
	'Felipe Balbi', linux-usb, 'Kalle Valo',
	'David S. Miller',
	linux-wireless, netdev, 'Darren Hart',
	'Andy Shevchenko',
	platform-driver-x86, 'Zhang Rui',
	'Rafael J. Wysocki', linux-pm, 'Len Brown',
	linux-acpi, 'Nick Hu', 'Greentime Hu',
	'Vincent Chen', 'Guo Ren',
	linux-csky, 'Tony Luck', 'Fenghua Yu',
	linux-ia64, 'Michal Simek', 'Michael Ellerman',
	'Arnd Bergmann', 'Geoff Levand',
	linuxppc-dev, 'Paul E . McKenney',
	'Jonathan Corbet', 'Randy Dunlap',
	'Davidlohr Bueso'

> -----Original Message-----
> From: Thomas Gleixner <tglx@linutronix.de>
...
> Subject: [patch V3 08/20] hexagon: Remove mm.h from asm/uaccess.h
> 
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The defconfig compiles without linux/mm.h. With mm.h included the include
> chain leands to:
> |   CC      kernel/locking/percpu-rwsem.o
> | In file included from include/linux/huge_mm.h:8,
> |                  from include/linux/mm.h:567,
> |                  from arch/hexagon/include/asm/uaccess.h:,
> |                  from include/linux/uaccess.h:11,
> |                  from include/linux/sched/task.h:11,
> |                  from include/linux/sched/signal.h:9,
> |                  from include/linux/rcuwait.h:6,
> |                  from include/linux/percpu-rwsem.h:8,
> |                  from kernel/locking/percpu-rwsem.c:6:
> | include/linux/fs.h:1422:29: error: array type has incomplete element type
> 'struct percpu_rw_semaphore'
> |  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];
> 
> once rcuwait.h includes linux/sched/signal.h.
> 
> Remove the linux/mm.h include.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Brian Cain <bcain@codeaurora.org>
> Cc: linux-hexagon@vger.kernel.org
> ---
> V3: New patch
> ---
>  arch/hexagon/include/asm/uaccess.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/hexagon/include/asm/uaccess.h
> b/arch/hexagon/include/asm/uaccess.h
> index 00cb38faad0c4..c1019a736ff13 100644
> --- a/arch/hexagon/include/asm/uaccess.h
> +++ b/arch/hexagon/include/asm/uaccess.h
> @@ -10,7 +10,6 @@
>  /*
>   * User space memory access functions
>   */
> -#include <linux/mm.h>
>  #include <asm/sections.h>
> 
>  /*
> --
> 2.26.0.rc2
> 

Acked-by: Brian Cain <bcain@codeaurora.org>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
  2020-03-23  2:55     ` Paul E. McKenney
                         ` (2 preceding siblings ...)
  (?)
@ 2020-03-24 23:13       ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-24 23:13 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

Paul,

"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> In the normal case where the task sleeps through the entire lock
> acquisition, the sequence of events is as follows:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                                lock wakeup
>                                  state = real_state == UNINTERRUPTIBLE
>
> This sequence of events can occur when the task acquires spinlocks
> on its way to sleeping, for example, in a call to wait_event().
>
> The non-lock wakeup can occur when a wakeup races with this wait_event(),
> which can result in the following sequence of events:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                              non lock wakeup
>                                  real_state = RUNNING
>
>                                lock wakeup
>                                  state = real_state == RUNNING
>
> Without this real_state subterfuge, the wakeup might be lost.

I added this with a few modifications which reflect the actual
implementation. Conceptually the same.

> rwsems have grown special-purpose interfaces that allow non-owner release.
> This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> implementations, for example, by defeating priority inheritance.
> After all, if the lock has no owner, whose priority should be boosted?
> As a result, PREEMPT_RT does not currently support rwsem, which in turn
> means that code using it must therefore be disabled until a workable
> solution presents itself.
>
> [ Note: Not as confident as I would like to be in the above. ]

I'm not confident either especially not after looking at the actual
code.

In fact I feel really stupid because the rw_semaphore reader non-owner
restriction on RT simply does not exist anymore and my history biased
memory tricked me.

The first rw_semaphore implementation of RT was simple and restricted
the reader side to a single reader to support PI on both the reader and
the writer side. That obviosuly did not scale well and made mmap_sem
heavy use cases pretty unhappy.

The short interlude with multi-reader boosting turned out to be a failed
experiment - Steven might still disagree though :)

At some point we gave up and I myself (sic!) reimplemented the RT
variant of rw_semaphore with a reader biased mechanism.

The reader never holds the underlying rt_mutex accross the read side
critical section. It merily increments the reader count and drops it on
release.

The only time a reader takes the rt_mutex is when it blocks on a
writer. Writers hold the rt_mutex across the write side critical section
to allow incoming readers to boost them. Once the writer releases the
rw_semaphore it unlocks the rt_mutex which is then handed off to the
readers. They increment the reader count and then drop the rt_mutex
before continuing in the read side critical section.

So while I changed the implementation it did obviously not occur to me
that this also lifted the non-owner release restriction. Nobody else
noticed either. So we kept dragging this along in both memory and
implementation. Both will be fixed now :)

The owner semantics of down/up_read() are only enforced by lockdep. That
applies to both RT and !RT. The up/down_read_non_owner() variants are
just there to tell lockdep about it.

So, I picked up your other suggestions with slight modifications and
adjusted the owner, semaphore and rw_semaphore docs accordingly.

Please have a close look at the patch below (applies on tip core/locking).

Thanks,

        tglx, who is searching a brown paperbag

8<----------

 Documentation/locking/locktypes.rst |  148 +++++++++++++++++++++++-------------
 1 file changed, 98 insertions(+), 50 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+sempahore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+sempahores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the sempahore implementation. That's impossible
+due to the counting semaphore semantics which have no concept of owners.
+The lack of an owner conflicts with priority inheritance. After all an
+unknown owner cannot be boosted. As a consequence blocking on semaphores
+can be subject to priority inversion.
+
+
+rw_sempahore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_sempahore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_sempaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+CONFIG_PREEMPT_RT and otherwise.


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-24 23:13       ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-24 23:13 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless

Paul,

"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> In the normal case where the task sleeps through the entire lock
> acquisition, the sequence of events is as follows:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                                lock wakeup
>                                  state = real_state == UNINTERRUPTIBLE
>
> This sequence of events can occur when the task acquires spinlocks
> on its way to sleeping, for example, in a call to wait_event().
>
> The non-lock wakeup can occur when a wakeup races with this wait_event(),
> which can result in the following sequence of events:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                              non lock wakeup
>                                  real_state = RUNNING
>
>                                lock wakeup
>                                  state = real_state == RUNNING
>
> Without this real_state subterfuge, the wakeup might be lost.

I added this with a few modifications which reflect the actual
implementation. Conceptually the same.

> rwsems have grown special-purpose interfaces that allow non-owner release.
> This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> implementations, for example, by defeating priority inheritance.
> After all, if the lock has no owner, whose priority should be boosted?
> As a result, PREEMPT_RT does not currently support rwsem, which in turn
> means that code using it must therefore be disabled until a workable
> solution presents itself.
>
> [ Note: Not as confident as I would like to be in the above. ]

I'm not confident either especially not after looking at the actual
code.

In fact I feel really stupid because the rw_semaphore reader non-owner
restriction on RT simply does not exist anymore and my history biased
memory tricked me.

The first rw_semaphore implementation of RT was simple and restricted
the reader side to a single reader to support PI on both the reader and
the writer side. That obviosuly did not scale well and made mmap_sem
heavy use cases pretty unhappy.

The short interlude with multi-reader boosting turned out to be a failed
experiment - Steven might still disagree though :)

At some point we gave up and I myself (sic!) reimplemented the RT
variant of rw_semaphore with a reader biased mechanism.

The reader never holds the underlying rt_mutex accross the read side
critical section. It merily increments the reader count and drops it on
release.

The only time a reader takes the rt_mutex is when it blocks on a
writer. Writers hold the rt_mutex across the write side critical section
to allow incoming readers to boost them. Once the writer releases the
rw_semaphore it unlocks the rt_mutex which is then handed off to the
readers. They increment the reader count and then drop the rt_mutex
before continuing in the read side critical section.

So while I changed the implementation it did obviously not occur to me
that this also lifted the non-owner release restriction. Nobody else
noticed either. So we kept dragging this along in both memory and
implementation. Both will be fixed now :)

The owner semantics of down/up_read() are only enforced by lockdep. That
applies to both RT and !RT. The up/down_read_non_owner() variants are
just there to tell lockdep about it.

So, I picked up your other suggestions with slight modifications and
adjusted the owner, semaphore and rw_semaphore docs accordingly.

Please have a close look at the patch below (applies on tip core/locking).

Thanks,

        tglx, who is searching a brown paperbag

8<----------

 Documentation/locking/locktypes.rst |  148 +++++++++++++++++++++++-------------
 1 file changed, 98 insertions(+), 50 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+sempahore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+sempahores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the sempahore implementation. That's impossible
+due to the counting semaphore semantics which have no concept of owners.
+The lack of an owner conflicts with priority inheritance. After all an
+unknown owner cannot be boosted. As a consequence blocking on semaphores
+can be subject to priority inversion.
+
+
+rw_sempahore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_sempahore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_sempaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+CONFIG_PREEMPT_RT and otherwise.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-24 23:13       ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-24 23:13 UTC (permalink / raw)
  To: paulmck
  Cc: linux-usb, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, Oleg Nesterov, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Davidlohr Bueso, linux-acpi,
	Brian Cain, Jonathan Corbet, linux-hexagon, Rafael J. Wysocki,
	linux-csky, Linus Torvalds, Darren Hart, Zhang Rui, Len Brown,
	Fenghua Yu, Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, netdev, Randy Dunlap, linux-wireless,
	LKML, Davidlohr Bueso, Greg Kroah-Hartman, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

Paul,

"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> In the normal case where the task sleeps through the entire lock
> acquisition, the sequence of events is as follows:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                                lock wakeup
>                                  state = real_state == UNINTERRUPTIBLE
>
> This sequence of events can occur when the task acquires spinlocks
> on its way to sleeping, for example, in a call to wait_event().
>
> The non-lock wakeup can occur when a wakeup races with this wait_event(),
> which can result in the following sequence of events:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                              non lock wakeup
>                                  real_state = RUNNING
>
>                                lock wakeup
>                                  state = real_state == RUNNING
>
> Without this real_state subterfuge, the wakeup might be lost.

I added this with a few modifications which reflect the actual
implementation. Conceptually the same.

> rwsems have grown special-purpose interfaces that allow non-owner release.
> This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> implementations, for example, by defeating priority inheritance.
> After all, if the lock has no owner, whose priority should be boosted?
> As a result, PREEMPT_RT does not currently support rwsem, which in turn
> means that code using it must therefore be disabled until a workable
> solution presents itself.
>
> [ Note: Not as confident as I would like to be in the above. ]

I'm not confident either especially not after looking at the actual
code.

In fact I feel really stupid because the rw_semaphore reader non-owner
restriction on RT simply does not exist anymore and my history biased
memory tricked me.

The first rw_semaphore implementation of RT was simple and restricted
the reader side to a single reader to support PI on both the reader and
the writer side. That obviosuly did not scale well and made mmap_sem
heavy use cases pretty unhappy.

The short interlude with multi-reader boosting turned out to be a failed
experiment - Steven might still disagree though :)

At some point we gave up and I myself (sic!) reimplemented the RT
variant of rw_semaphore with a reader biased mechanism.

The reader never holds the underlying rt_mutex accross the read side
critical section. It merily increments the reader count and drops it on
release.

The only time a reader takes the rt_mutex is when it blocks on a
writer. Writers hold the rt_mutex across the write side critical section
to allow incoming readers to boost them. Once the writer releases the
rw_semaphore it unlocks the rt_mutex which is then handed off to the
readers. They increment the reader count and then drop the rt_mutex
before continuing in the read side critical section.

So while I changed the implementation it did obviously not occur to me
that this also lifted the non-owner release restriction. Nobody else
noticed either. So we kept dragging this along in both memory and
implementation. Both will be fixed now :)

The owner semantics of down/up_read() are only enforced by lockdep. That
applies to both RT and !RT. The up/down_read_non_owner() variants are
just there to tell lockdep about it.

So, I picked up your other suggestions with slight modifications and
adjusted the owner, semaphore and rw_semaphore docs accordingly.

Please have a close look at the patch below (applies on tip core/locking).

Thanks,

        tglx, who is searching a brown paperbag

8<----------

 Documentation/locking/locktypes.rst |  148 +++++++++++++++++++++++-------------
 1 file changed, 98 insertions(+), 50 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+sempahore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+sempahores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the sempahore implementation. That's impossible
+due to the counting semaphore semantics which have no concept of owners.
+The lack of an owner conflicts with priority inheritance. After all an
+unknown owner cannot be boosted. As a consequence blocking on semaphores
+can be subject to priority inversion.
+
+
+rw_sempahore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_sempahore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_sempaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+CONFIG_PREEMPT_RT and otherwise.


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-24 23:13       ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-24 23:13 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

Paul,

"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> In the normal case where the task sleeps through the entire lock
> acquisition, the sequence of events is as follows:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                                lock wakeup
>                                  state = real_state = UNINTERRUPTIBLE
>
> This sequence of events can occur when the task acquires spinlocks
> on its way to sleeping, for example, in a call to wait_event().
>
> The non-lock wakeup can occur when a wakeup races with this wait_event(),
> which can result in the following sequence of events:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                              non lock wakeup
>                                  real_state = RUNNING
>
>                                lock wakeup
>                                  state = real_state = RUNNING
>
> Without this real_state subterfuge, the wakeup might be lost.

I added this with a few modifications which reflect the actual
implementation. Conceptually the same.

> rwsems have grown special-purpose interfaces that allow non-owner release.
> This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> implementations, for example, by defeating priority inheritance.
> After all, if the lock has no owner, whose priority should be boosted?
> As a result, PREEMPT_RT does not currently support rwsem, which in turn
> means that code using it must therefore be disabled until a workable
> solution presents itself.
>
> [ Note: Not as confident as I would like to be in the above. ]

I'm not confident either especially not after looking at the actual
code.

In fact I feel really stupid because the rw_semaphore reader non-owner
restriction on RT simply does not exist anymore and my history biased
memory tricked me.

The first rw_semaphore implementation of RT was simple and restricted
the reader side to a single reader to support PI on both the reader and
the writer side. That obviosuly did not scale well and made mmap_sem
heavy use cases pretty unhappy.

The short interlude with multi-reader boosting turned out to be a failed
experiment - Steven might still disagree though :)

At some point we gave up and I myself (sic!) reimplemented the RT
variant of rw_semaphore with a reader biased mechanism.

The reader never holds the underlying rt_mutex accross the read side
critical section. It merily increments the reader count and drops it on
release.

The only time a reader takes the rt_mutex is when it blocks on a
writer. Writers hold the rt_mutex across the write side critical section
to allow incoming readers to boost them. Once the writer releases the
rw_semaphore it unlocks the rt_mutex which is then handed off to the
readers. They increment the reader count and then drop the rt_mutex
before continuing in the read side critical section.

So while I changed the implementation it did obviously not occur to me
that this also lifted the non-owner release restriction. Nobody else
noticed either. So we kept dragging this along in both memory and
implementation. Both will be fixed now :)

The owner semantics of down/up_read() are only enforced by lockdep. That
applies to both RT and !RT. The up/down_read_non_owner() variants are
just there to tell lockdep about it.

So, I picked up your other suggestions with slight modifications and
adjusted the owner, semaphore and rw_semaphore docs accordingly.

Please have a close look at the patch below (applies on tip core/locking).

Thanks,

        tglx, who is searching a brown paperbag

8<----------

 Documentation/locking/locktypes.rst |  148 +++++++++++++++++++++++-------------
 1 file changed, 98 insertions(+), 50 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ==========  ==========================
 
+Owner semantics
+=======+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 ===@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+sempahore
+====+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+sempahores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the sempahore implementation. That's impossible
+due to the counting semaphore semantics which have no concept of owners.
+The lack of an owner conflicts with priority inheritance. After all an
+unknown owner cannot be boosted. As a consequence blocking on semaphores
+can be subject to priority inversion.
+
+
+rw_sempahore
+======
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_sempahore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_sempaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 ============== 
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ====
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-=======-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+CONFIG_PREEMPT_RT and otherwise.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-24 23:13       ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-24 23:13 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, net

Paul,

"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> In the normal case where the task sleeps through the entire lock
> acquisition, the sequence of events is as follows:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                                lock wakeup
>                                  state = real_state == UNINTERRUPTIBLE
>
> This sequence of events can occur when the task acquires spinlocks
> on its way to sleeping, for example, in a call to wait_event().
>
> The non-lock wakeup can occur when a wakeup races with this wait_event(),
> which can result in the following sequence of events:
>
>      state = UNINTERRUPTIBLE
>      lock()
>        block()
>          real_state = state
>          state = SLEEPONLOCK
>
>                              non lock wakeup
>                                  real_state = RUNNING
>
>                                lock wakeup
>                                  state = real_state == RUNNING
>
> Without this real_state subterfuge, the wakeup might be lost.

I added this with a few modifications which reflect the actual
implementation. Conceptually the same.

> rwsems have grown special-purpose interfaces that allow non-owner release.
> This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> implementations, for example, by defeating priority inheritance.
> After all, if the lock has no owner, whose priority should be boosted?
> As a result, PREEMPT_RT does not currently support rwsem, which in turn
> means that code using it must therefore be disabled until a workable
> solution presents itself.
>
> [ Note: Not as confident as I would like to be in the above. ]

I'm not confident either especially not after looking at the actual
code.

In fact I feel really stupid because the rw_semaphore reader non-owner
restriction on RT simply does not exist anymore and my history biased
memory tricked me.

The first rw_semaphore implementation of RT was simple and restricted
the reader side to a single reader to support PI on both the reader and
the writer side. That obviosuly did not scale well and made mmap_sem
heavy use cases pretty unhappy.

The short interlude with multi-reader boosting turned out to be a failed
experiment - Steven might still disagree though :)

At some point we gave up and I myself (sic!) reimplemented the RT
variant of rw_semaphore with a reader biased mechanism.

The reader never holds the underlying rt_mutex accross the read side
critical section. It merily increments the reader count and drops it on
release.

The only time a reader takes the rt_mutex is when it blocks on a
writer. Writers hold the rt_mutex across the write side critical section
to allow incoming readers to boost them. Once the writer releases the
rw_semaphore it unlocks the rt_mutex which is then handed off to the
readers. They increment the reader count and then drop the rt_mutex
before continuing in the read side critical section.

So while I changed the implementation it did obviously not occur to me
that this also lifted the non-owner release restriction. Nobody else
noticed either. So we kept dragging this along in both memory and
implementation. Both will be fixed now :)

The owner semantics of down/up_read() are only enforced by lockdep. That
applies to both RT and !RT. The up/down_read_non_owner() variants are
just there to tell lockdep about it.

So, I picked up your other suggestions with slight modifications and
adjusted the owner, semaphore and rw_semaphore docs accordingly.

Please have a close look at the patch below (applies on tip core/locking).

Thanks,

        tglx, who is searching a brown paperbag

8<----------

 Documentation/locking/locktypes.rst |  148 +++++++++++++++++++++++-------------
 1 file changed, 98 insertions(+), 50 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+sempahore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+sempahores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the sempahore implementation. That's impossible
+due to the counting semaphore semantics which have no concept of owners.
+The lack of an owner conflicts with priority inheritance. After all an
+unknown owner cannot be boosted. As a consequence blocking on semaphores
+can be subject to priority inversion.
+
+
+rw_sempahore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_sempahore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_sempaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+CONFIG_PREEMPT_RT and otherwise.


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
  2020-03-24 23:13       ` Thomas Gleixner
                           ` (2 preceding siblings ...)
  (?)
@ 2020-03-25  0:28         ` Paul E. McKenney
  -1 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25  0:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On Wed, Mar 25, 2020 at 12:13:34AM +0100, Thomas Gleixner wrote:
> Paul,
> 
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> > In the normal case where the task sleeps through the entire lock
> > acquisition, the sequence of events is as follows:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                                lock wakeup
> >                                  state = real_state == UNINTERRUPTIBLE
> >
> > This sequence of events can occur when the task acquires spinlocks
> > on its way to sleeping, for example, in a call to wait_event().
> >
> > The non-lock wakeup can occur when a wakeup races with this wait_event(),
> > which can result in the following sequence of events:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                              non lock wakeup
> >                                  real_state = RUNNING
> >
> >                                lock wakeup
> >                                  state = real_state == RUNNING
> >
> > Without this real_state subterfuge, the wakeup might be lost.
> 
> I added this with a few modifications which reflect the actual
> implementation. Conceptually the same.

Looks good!

> > rwsems have grown special-purpose interfaces that allow non-owner release.
> > This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> > implementations, for example, by defeating priority inheritance.
> > After all, if the lock has no owner, whose priority should be boosted?
> > As a result, PREEMPT_RT does not currently support rwsem, which in turn
> > means that code using it must therefore be disabled until a workable
> > solution presents itself.
> >
> > [ Note: Not as confident as I would like to be in the above. ]
> 
> I'm not confident either especially not after looking at the actual
> code.
> 
> In fact I feel really stupid because the rw_semaphore reader non-owner
> restriction on RT simply does not exist anymore and my history biased
> memory tricked me.

I guess I am glad that it is not just me.  ;-)

> The first rw_semaphore implementation of RT was simple and restricted
> the reader side to a single reader to support PI on both the reader and
> the writer side. That obviosuly did not scale well and made mmap_sem
> heavy use cases pretty unhappy.
> 
> The short interlude with multi-reader boosting turned out to be a failed
> experiment - Steven might still disagree though :)
> 
> At some point we gave up and I myself (sic!) reimplemented the RT
> variant of rw_semaphore with a reader biased mechanism.
> 
> The reader never holds the underlying rt_mutex accross the read side
> critical section. It merily increments the reader count and drops it on
> release.
> 
> The only time a reader takes the rt_mutex is when it blocks on a
> writer. Writers hold the rt_mutex across the write side critical section
> to allow incoming readers to boost them. Once the writer releases the
> rw_semaphore it unlocks the rt_mutex which is then handed off to the
> readers. They increment the reader count and then drop the rt_mutex
> before continuing in the read side critical section.
> 
> So while I changed the implementation it did obviously not occur to me
> that this also lifted the non-owner release restriction. Nobody else
> noticed either. So we kept dragging this along in both memory and
> implementation. Both will be fixed now :)
> 
> The owner semantics of down/up_read() are only enforced by lockdep. That
> applies to both RT and !RT. The up/down_read_non_owner() variants are
> just there to tell lockdep about it.
> 
> So, I picked up your other suggestions with slight modifications and
> adjusted the owner, semaphore and rw_semaphore docs accordingly.
> 
> Please have a close look at the patch below (applies on tip core/locking).
> 
> Thanks,
> 
>         tglx, who is searching a brown paperbag

Sorry, used all the ones here over the past few days.  :-/

Please see below for a wordsmithing patch to be applied on top of
or merged into the patch in your email.

							Thanx, Paul

------------------------------------------------------------------------

commit e38c64ce8db45e2b0a19082f1e1f988c3b25fb81
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Mar 24 17:23:36 2020 -0700

    Documentation: Wordsmith lock ordering and nesting documentation
    
    This commit is strictly wordsmithing with no (intended) semantic
    changes.
    
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index ca7bf84..8eb52e9 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -94,7 +94,7 @@ interrupt handlers and soft interrupts.  This conversion allows spinlock_t
 and rwlock_t to be implemented via RT-mutexes.
 
 
-sempahore
+semaphore
 =========
 
 semaphore is a counting semaphore implementation.
@@ -103,17 +103,17 @@ Semaphores are often used for both serialization and waiting, but new use
 cases should instead use separate serialization and wait mechanisms, such
 as mutexes and completions.
 
-sempahores and PREEMPT_RT
+semaphores and PREEMPT_RT
 ----------------------------
 
-PREEMPT_RT does not change the sempahore implementation. That's impossible
-due to the counting semaphore semantics which have no concept of owners.
-The lack of an owner conflicts with priority inheritance. After all an
-unknown owner cannot be boosted. As a consequence blocking on semaphores
-can be subject to priority inversion.
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
 
 
-rw_sempahore
+rw_semaphore
 ============
 
 rw_semaphore is a multiple readers and single writer lock mechanism.
@@ -125,13 +125,13 @@ rw_semaphore complies by default with the strict owner semantics, but there
 exist special-purpose interfaces that allow non-owner release for readers.
 These work independent of the kernel configuration.
 
-rw_sempahore and PREEMPT_RT
+rw_semaphore and PREEMPT_RT
 ---------------------------
 
-PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
 implementation, thus changing the fairness:
 
- Because an rw_sempaphore writer cannot grant its priority to multiple
+ Because an rw_semaphore writer cannot grant its priority to multiple
  readers, a preempted low-priority reader will continue holding its lock,
  thus starving even high-priority writers.  In contrast, because readers
  can grant their priority to a writer, a preempted low-priority writer will
@@ -158,7 +158,7 @@ critical section is tiny, thus avoiding RT-mutex overhead.
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -196,7 +196,7 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it::
+   restores it, as shown below::
 
     task->state = TASK_INTERRUPTIBLE
      lock()
@@ -333,7 +333,7 @@ The most basic rules are:
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
+These constraints apply both in PREEMPT_RT and otherwise.
 
 The fact that PREEMPT_RT changes the lock category of spinlock_t and
 rwlock_t from spinning to sleeping means that they cannot be acquired while
@@ -344,4 +344,4 @@ holding a raw spinlock.  This results in the following nesting ordering:
   3) raw_spinlock_t and bit spinlocks
 
 Lockdep will complain if these constraints are violated, both in
-CONFIG_PREEMPT_RT and otherwise.
+PREEMPT_RT and otherwise.

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-25  0:28         ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25  0:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless

On Wed, Mar 25, 2020 at 12:13:34AM +0100, Thomas Gleixner wrote:
> Paul,
> 
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> > In the normal case where the task sleeps through the entire lock
> > acquisition, the sequence of events is as follows:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                                lock wakeup
> >                                  state = real_state == UNINTERRUPTIBLE
> >
> > This sequence of events can occur when the task acquires spinlocks
> > on its way to sleeping, for example, in a call to wait_event().
> >
> > The non-lock wakeup can occur when a wakeup races with this wait_event(),
> > which can result in the following sequence of events:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                              non lock wakeup
> >                                  real_state = RUNNING
> >
> >                                lock wakeup
> >                                  state = real_state == RUNNING
> >
> > Without this real_state subterfuge, the wakeup might be lost.
> 
> I added this with a few modifications which reflect the actual
> implementation. Conceptually the same.

Looks good!

> > rwsems have grown special-purpose interfaces that allow non-owner release.
> > This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> > implementations, for example, by defeating priority inheritance.
> > After all, if the lock has no owner, whose priority should be boosted?
> > As a result, PREEMPT_RT does not currently support rwsem, which in turn
> > means that code using it must therefore be disabled until a workable
> > solution presents itself.
> >
> > [ Note: Not as confident as I would like to be in the above. ]
> 
> I'm not confident either especially not after looking at the actual
> code.
> 
> In fact I feel really stupid because the rw_semaphore reader non-owner
> restriction on RT simply does not exist anymore and my history biased
> memory tricked me.

I guess I am glad that it is not just me.  ;-)

> The first rw_semaphore implementation of RT was simple and restricted
> the reader side to a single reader to support PI on both the reader and
> the writer side. That obviosuly did not scale well and made mmap_sem
> heavy use cases pretty unhappy.
> 
> The short interlude with multi-reader boosting turned out to be a failed
> experiment - Steven might still disagree though :)
> 
> At some point we gave up and I myself (sic!) reimplemented the RT
> variant of rw_semaphore with a reader biased mechanism.
> 
> The reader never holds the underlying rt_mutex accross the read side
> critical section. It merily increments the reader count and drops it on
> release.
> 
> The only time a reader takes the rt_mutex is when it blocks on a
> writer. Writers hold the rt_mutex across the write side critical section
> to allow incoming readers to boost them. Once the writer releases the
> rw_semaphore it unlocks the rt_mutex which is then handed off to the
> readers. They increment the reader count and then drop the rt_mutex
> before continuing in the read side critical section.
> 
> So while I changed the implementation it did obviously not occur to me
> that this also lifted the non-owner release restriction. Nobody else
> noticed either. So we kept dragging this along in both memory and
> implementation. Both will be fixed now :)
> 
> The owner semantics of down/up_read() are only enforced by lockdep. That
> applies to both RT and !RT. The up/down_read_non_owner() variants are
> just there to tell lockdep about it.
> 
> So, I picked up your other suggestions with slight modifications and
> adjusted the owner, semaphore and rw_semaphore docs accordingly.
> 
> Please have a close look at the patch below (applies on tip core/locking).
> 
> Thanks,
> 
>         tglx, who is searching a brown paperbag

Sorry, used all the ones here over the past few days.  :-/

Please see below for a wordsmithing patch to be applied on top of
or merged into the patch in your email.

							Thanx, Paul

------------------------------------------------------------------------

commit e38c64ce8db45e2b0a19082f1e1f988c3b25fb81
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Mar 24 17:23:36 2020 -0700

    Documentation: Wordsmith lock ordering and nesting documentation
    
    This commit is strictly wordsmithing with no (intended) semantic
    changes.
    
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index ca7bf84..8eb52e9 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -94,7 +94,7 @@ interrupt handlers and soft interrupts.  This conversion allows spinlock_t
 and rwlock_t to be implemented via RT-mutexes.
 
 
-sempahore
+semaphore
 =========
 
 semaphore is a counting semaphore implementation.
@@ -103,17 +103,17 @@ Semaphores are often used for both serialization and waiting, but new use
 cases should instead use separate serialization and wait mechanisms, such
 as mutexes and completions.
 
-sempahores and PREEMPT_RT
+semaphores and PREEMPT_RT
 ----------------------------
 
-PREEMPT_RT does not change the sempahore implementation. That's impossible
-due to the counting semaphore semantics which have no concept of owners.
-The lack of an owner conflicts with priority inheritance. After all an
-unknown owner cannot be boosted. As a consequence blocking on semaphores
-can be subject to priority inversion.
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
 
 
-rw_sempahore
+rw_semaphore
 ============
 
 rw_semaphore is a multiple readers and single writer lock mechanism.
@@ -125,13 +125,13 @@ rw_semaphore complies by default with the strict owner semantics, but there
 exist special-purpose interfaces that allow non-owner release for readers.
 These work independent of the kernel configuration.
 
-rw_sempahore and PREEMPT_RT
+rw_semaphore and PREEMPT_RT
 ---------------------------
 
-PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
 implementation, thus changing the fairness:
 
- Because an rw_sempaphore writer cannot grant its priority to multiple
+ Because an rw_semaphore writer cannot grant its priority to multiple
  readers, a preempted low-priority reader will continue holding its lock,
  thus starving even high-priority writers.  In contrast, because readers
  can grant their priority to a writer, a preempted low-priority writer will
@@ -158,7 +158,7 @@ critical section is tiny, thus avoiding RT-mutex overhead.
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -196,7 +196,7 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it::
+   restores it, as shown below::
 
     task->state = TASK_INTERRUPTIBLE
      lock()
@@ -333,7 +333,7 @@ The most basic rules are:
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
+These constraints apply both in PREEMPT_RT and otherwise.
 
 The fact that PREEMPT_RT changes the lock category of spinlock_t and
 rwlock_t from spinning to sleeping means that they cannot be acquired while
@@ -344,4 +344,4 @@ holding a raw spinlock.  This results in the following nesting ordering:
   3) raw_spinlock_t and bit spinlocks
 
 Lockdep will complain if these constraints are violated, both in
-CONFIG_PREEMPT_RT and otherwise.
+PREEMPT_RT and otherwise.

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-25  0:28         ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25  0:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-usb, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, Oleg Nesterov, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Davidlohr Bueso, linux-acpi,
	Brian Cain, Jonathan Corbet, linux-hexagon, Rafael J. Wysocki,
	linux-csky, Linus Torvalds, Darren Hart, Zhang Rui, Len Brown,
	Fenghua Yu, Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, netdev, Randy Dunlap, linux-wireless,
	LKML, Davidlohr Bueso, Greg Kroah-Hartman, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

On Wed, Mar 25, 2020 at 12:13:34AM +0100, Thomas Gleixner wrote:
> Paul,
> 
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> > In the normal case where the task sleeps through the entire lock
> > acquisition, the sequence of events is as follows:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                                lock wakeup
> >                                  state = real_state == UNINTERRUPTIBLE
> >
> > This sequence of events can occur when the task acquires spinlocks
> > on its way to sleeping, for example, in a call to wait_event().
> >
> > The non-lock wakeup can occur when a wakeup races with this wait_event(),
> > which can result in the following sequence of events:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                              non lock wakeup
> >                                  real_state = RUNNING
> >
> >                                lock wakeup
> >                                  state = real_state == RUNNING
> >
> > Without this real_state subterfuge, the wakeup might be lost.
> 
> I added this with a few modifications which reflect the actual
> implementation. Conceptually the same.

Looks good!

> > rwsems have grown special-purpose interfaces that allow non-owner release.
> > This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> > implementations, for example, by defeating priority inheritance.
> > After all, if the lock has no owner, whose priority should be boosted?
> > As a result, PREEMPT_RT does not currently support rwsem, which in turn
> > means that code using it must therefore be disabled until a workable
> > solution presents itself.
> >
> > [ Note: Not as confident as I would like to be in the above. ]
> 
> I'm not confident either especially not after looking at the actual
> code.
> 
> In fact I feel really stupid because the rw_semaphore reader non-owner
> restriction on RT simply does not exist anymore and my history biased
> memory tricked me.

I guess I am glad that it is not just me.  ;-)

> The first rw_semaphore implementation of RT was simple and restricted
> the reader side to a single reader to support PI on both the reader and
> the writer side. That obviosuly did not scale well and made mmap_sem
> heavy use cases pretty unhappy.
> 
> The short interlude with multi-reader boosting turned out to be a failed
> experiment - Steven might still disagree though :)
> 
> At some point we gave up and I myself (sic!) reimplemented the RT
> variant of rw_semaphore with a reader biased mechanism.
> 
> The reader never holds the underlying rt_mutex accross the read side
> critical section. It merily increments the reader count and drops it on
> release.
> 
> The only time a reader takes the rt_mutex is when it blocks on a
> writer. Writers hold the rt_mutex across the write side critical section
> to allow incoming readers to boost them. Once the writer releases the
> rw_semaphore it unlocks the rt_mutex which is then handed off to the
> readers. They increment the reader count and then drop the rt_mutex
> before continuing in the read side critical section.
> 
> So while I changed the implementation it did obviously not occur to me
> that this also lifted the non-owner release restriction. Nobody else
> noticed either. So we kept dragging this along in both memory and
> implementation. Both will be fixed now :)
> 
> The owner semantics of down/up_read() are only enforced by lockdep. That
> applies to both RT and !RT. The up/down_read_non_owner() variants are
> just there to tell lockdep about it.
> 
> So, I picked up your other suggestions with slight modifications and
> adjusted the owner, semaphore and rw_semaphore docs accordingly.
> 
> Please have a close look at the patch below (applies on tip core/locking).
> 
> Thanks,
> 
>         tglx, who is searching a brown paperbag

Sorry, used all the ones here over the past few days.  :-/

Please see below for a wordsmithing patch to be applied on top of
or merged into the patch in your email.

							Thanx, Paul

------------------------------------------------------------------------

commit e38c64ce8db45e2b0a19082f1e1f988c3b25fb81
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Mar 24 17:23:36 2020 -0700

    Documentation: Wordsmith lock ordering and nesting documentation
    
    This commit is strictly wordsmithing with no (intended) semantic
    changes.
    
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index ca7bf84..8eb52e9 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -94,7 +94,7 @@ interrupt handlers and soft interrupts.  This conversion allows spinlock_t
 and rwlock_t to be implemented via RT-mutexes.
 
 
-sempahore
+semaphore
 =========
 
 semaphore is a counting semaphore implementation.
@@ -103,17 +103,17 @@ Semaphores are often used for both serialization and waiting, but new use
 cases should instead use separate serialization and wait mechanisms, such
 as mutexes and completions.
 
-sempahores and PREEMPT_RT
+semaphores and PREEMPT_RT
 ----------------------------
 
-PREEMPT_RT does not change the sempahore implementation. That's impossible
-due to the counting semaphore semantics which have no concept of owners.
-The lack of an owner conflicts with priority inheritance. After all an
-unknown owner cannot be boosted. As a consequence blocking on semaphores
-can be subject to priority inversion.
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
 
 
-rw_sempahore
+rw_semaphore
 ============
 
 rw_semaphore is a multiple readers and single writer lock mechanism.
@@ -125,13 +125,13 @@ rw_semaphore complies by default with the strict owner semantics, but there
 exist special-purpose interfaces that allow non-owner release for readers.
 These work independent of the kernel configuration.
 
-rw_sempahore and PREEMPT_RT
+rw_semaphore and PREEMPT_RT
 ---------------------------
 
-PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
 implementation, thus changing the fairness:
 
- Because an rw_sempaphore writer cannot grant its priority to multiple
+ Because an rw_semaphore writer cannot grant its priority to multiple
  readers, a preempted low-priority reader will continue holding its lock,
  thus starving even high-priority writers.  In contrast, because readers
  can grant their priority to a writer, a preempted low-priority writer will
@@ -158,7 +158,7 @@ critical section is tiny, thus avoiding RT-mutex overhead.
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -196,7 +196,7 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it::
+   restores it, as shown below::
 
     task->state = TASK_INTERRUPTIBLE
      lock()
@@ -333,7 +333,7 @@ The most basic rules are:
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
+These constraints apply both in PREEMPT_RT and otherwise.
 
 The fact that PREEMPT_RT changes the lock category of spinlock_t and
 rwlock_t from spinning to sleeping means that they cannot be acquired while
@@ -344,4 +344,4 @@ holding a raw spinlock.  This results in the following nesting ordering:
   3) raw_spinlock_t and bit spinlocks
 
 Lockdep will complain if these constraints are violated, both in
-CONFIG_PREEMPT_RT and otherwise.
+PREEMPT_RT and otherwise.

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-25  0:28         ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25  0:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On Wed, Mar 25, 2020 at 12:13:34AM +0100, Thomas Gleixner wrote:
> Paul,
> 
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> > In the normal case where the task sleeps through the entire lock
> > acquisition, the sequence of events is as follows:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                                lock wakeup
> >                                  state = real_state = UNINTERRUPTIBLE
> >
> > This sequence of events can occur when the task acquires spinlocks
> > on its way to sleeping, for example, in a call to wait_event().
> >
> > The non-lock wakeup can occur when a wakeup races with this wait_event(),
> > which can result in the following sequence of events:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                              non lock wakeup
> >                                  real_state = RUNNING
> >
> >                                lock wakeup
> >                                  state = real_state = RUNNING
> >
> > Without this real_state subterfuge, the wakeup might be lost.
> 
> I added this with a few modifications which reflect the actual
> implementation. Conceptually the same.

Looks good!

> > rwsems have grown special-purpose interfaces that allow non-owner release.
> > This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> > implementations, for example, by defeating priority inheritance.
> > After all, if the lock has no owner, whose priority should be boosted?
> > As a result, PREEMPT_RT does not currently support rwsem, which in turn
> > means that code using it must therefore be disabled until a workable
> > solution presents itself.
> >
> > [ Note: Not as confident as I would like to be in the above. ]
> 
> I'm not confident either especially not after looking at the actual
> code.
> 
> In fact I feel really stupid because the rw_semaphore reader non-owner
> restriction on RT simply does not exist anymore and my history biased
> memory tricked me.

I guess I am glad that it is not just me.  ;-)

> The first rw_semaphore implementation of RT was simple and restricted
> the reader side to a single reader to support PI on both the reader and
> the writer side. That obviosuly did not scale well and made mmap_sem
> heavy use cases pretty unhappy.
> 
> The short interlude with multi-reader boosting turned out to be a failed
> experiment - Steven might still disagree though :)
> 
> At some point we gave up and I myself (sic!) reimplemented the RT
> variant of rw_semaphore with a reader biased mechanism.
> 
> The reader never holds the underlying rt_mutex accross the read side
> critical section. It merily increments the reader count and drops it on
> release.
> 
> The only time a reader takes the rt_mutex is when it blocks on a
> writer. Writers hold the rt_mutex across the write side critical section
> to allow incoming readers to boost them. Once the writer releases the
> rw_semaphore it unlocks the rt_mutex which is then handed off to the
> readers. They increment the reader count and then drop the rt_mutex
> before continuing in the read side critical section.
> 
> So while I changed the implementation it did obviously not occur to me
> that this also lifted the non-owner release restriction. Nobody else
> noticed either. So we kept dragging this along in both memory and
> implementation. Both will be fixed now :)
> 
> The owner semantics of down/up_read() are only enforced by lockdep. That
> applies to both RT and !RT. The up/down_read_non_owner() variants are
> just there to tell lockdep about it.
> 
> So, I picked up your other suggestions with slight modifications and
> adjusted the owner, semaphore and rw_semaphore docs accordingly.
> 
> Please have a close look at the patch below (applies on tip core/locking).
> 
> Thanks,
> 
>         tglx, who is searching a brown paperbag

Sorry, used all the ones here over the past few days.  :-/

Please see below for a wordsmithing patch to be applied on top of
or merged into the patch in your email.

							Thanx, Paul

------------------------------------------------------------------------

commit e38c64ce8db45e2b0a19082f1e1f988c3b25fb81
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Mar 24 17:23:36 2020 -0700

    Documentation: Wordsmith lock ordering and nesting documentation
    
    This commit is strictly wordsmithing with no (intended) semantic
    changes.
    
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index ca7bf84..8eb52e9 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -94,7 +94,7 @@ interrupt handlers and soft interrupts.  This conversion allows spinlock_t
 and rwlock_t to be implemented via RT-mutexes.
 
 
-sempahore
+semaphore
 ==== 
 semaphore is a counting semaphore implementation.
@@ -103,17 +103,17 @@ Semaphores are often used for both serialization and waiting, but new use
 cases should instead use separate serialization and wait mechanisms, such
 as mutexes and completions.
 
-sempahores and PREEMPT_RT
+semaphores and PREEMPT_RT
 ----------------------------
 
-PREEMPT_RT does not change the sempahore implementation. That's impossible
-due to the counting semaphore semantics which have no concept of owners.
-The lack of an owner conflicts with priority inheritance. After all an
-unknown owner cannot be boosted. As a consequence blocking on semaphores
-can be subject to priority inversion.
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
 
 
-rw_sempahore
+rw_semaphore
 ======
 
 rw_semaphore is a multiple readers and single writer lock mechanism.
@@ -125,13 +125,13 @@ rw_semaphore complies by default with the strict owner semantics, but there
 exist special-purpose interfaces that allow non-owner release for readers.
 These work independent of the kernel configuration.
 
-rw_sempahore and PREEMPT_RT
+rw_semaphore and PREEMPT_RT
 ---------------------------
 
-PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
 implementation, thus changing the fairness:
 
- Because an rw_sempaphore writer cannot grant its priority to multiple
+ Because an rw_semaphore writer cannot grant its priority to multiple
  readers, a preempted low-priority reader will continue holding its lock,
  thus starving even high-priority writers.  In contrast, because readers
  can grant their priority to a writer, a preempted low-priority writer will
@@ -158,7 +158,7 @@ critical section is tiny, thus avoiding RT-mutex overhead.
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -196,7 +196,7 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it::
+   restores it, as shown below::
 
     task->state = TASK_INTERRUPTIBLE
      lock()
@@ -333,7 +333,7 @@ The most basic rules are:
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
+These constraints apply both in PREEMPT_RT and otherwise.
 
 The fact that PREEMPT_RT changes the lock category of spinlock_t and
 rwlock_t from spinning to sleeping means that they cannot be acquired while
@@ -344,4 +344,4 @@ holding a raw spinlock.  This results in the following nesting ordering:
   3) raw_spinlock_t and bit spinlocks
 
 Lockdep will complain if these constraints are violated, both in
-CONFIG_PREEMPT_RT and otherwise.
+PREEMPT_RT and otherwise.

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* Re: [patch V3 13/20] Documentation: Add lock ordering and nesting documentation
@ 2020-03-25  0:28         ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25  0:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, net

On Wed, Mar 25, 2020 at 12:13:34AM +0100, Thomas Gleixner wrote:
> Paul,
> 
> "Paul E. McKenney" <paulmck@kernel.org> writes:
> > On Sat, Mar 21, 2020 at 12:25:57PM +0100, Thomas Gleixner wrote:
> > In the normal case where the task sleeps through the entire lock
> > acquisition, the sequence of events is as follows:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                                lock wakeup
> >                                  state = real_state == UNINTERRUPTIBLE
> >
> > This sequence of events can occur when the task acquires spinlocks
> > on its way to sleeping, for example, in a call to wait_event().
> >
> > The non-lock wakeup can occur when a wakeup races with this wait_event(),
> > which can result in the following sequence of events:
> >
> >      state = UNINTERRUPTIBLE
> >      lock()
> >        block()
> >          real_state = state
> >          state = SLEEPONLOCK
> >
> >                              non lock wakeup
> >                                  real_state = RUNNING
> >
> >                                lock wakeup
> >                                  state = real_state == RUNNING
> >
> > Without this real_state subterfuge, the wakeup might be lost.
> 
> I added this with a few modifications which reflect the actual
> implementation. Conceptually the same.

Looks good!

> > rwsems have grown special-purpose interfaces that allow non-owner release.
> > This non-owner release prevents PREEMPT_RT from substituting RT-mutex
> > implementations, for example, by defeating priority inheritance.
> > After all, if the lock has no owner, whose priority should be boosted?
> > As a result, PREEMPT_RT does not currently support rwsem, which in turn
> > means that code using it must therefore be disabled until a workable
> > solution presents itself.
> >
> > [ Note: Not as confident as I would like to be in the above. ]
> 
> I'm not confident either especially not after looking at the actual
> code.
> 
> In fact I feel really stupid because the rw_semaphore reader non-owner
> restriction on RT simply does not exist anymore and my history biased
> memory tricked me.

I guess I am glad that it is not just me.  ;-)

> The first rw_semaphore implementation of RT was simple and restricted
> the reader side to a single reader to support PI on both the reader and
> the writer side. That obviosuly did not scale well and made mmap_sem
> heavy use cases pretty unhappy.
> 
> The short interlude with multi-reader boosting turned out to be a failed
> experiment - Steven might still disagree though :)
> 
> At some point we gave up and I myself (sic!) reimplemented the RT
> variant of rw_semaphore with a reader biased mechanism.
> 
> The reader never holds the underlying rt_mutex accross the read side
> critical section. It merily increments the reader count and drops it on
> release.
> 
> The only time a reader takes the rt_mutex is when it blocks on a
> writer. Writers hold the rt_mutex across the write side critical section
> to allow incoming readers to boost them. Once the writer releases the
> rw_semaphore it unlocks the rt_mutex which is then handed off to the
> readers. They increment the reader count and then drop the rt_mutex
> before continuing in the read side critical section.
> 
> So while I changed the implementation it did obviously not occur to me
> that this also lifted the non-owner release restriction. Nobody else
> noticed either. So we kept dragging this along in both memory and
> implementation. Both will be fixed now :)
> 
> The owner semantics of down/up_read() are only enforced by lockdep. That
> applies to both RT and !RT. The up/down_read_non_owner() variants are
> just there to tell lockdep about it.
> 
> So, I picked up your other suggestions with slight modifications and
> adjusted the owner, semaphore and rw_semaphore docs accordingly.
> 
> Please have a close look at the patch below (applies on tip core/locking).
> 
> Thanks,
> 
>         tglx, who is searching a brown paperbag

Sorry, used all the ones here over the past few days.  :-/

Please see below for a wordsmithing patch to be applied on top of
or merged into the patch in your email.

							Thanx, Paul

------------------------------------------------------------------------

commit e38c64ce8db45e2b0a19082f1e1f988c3b25fb81
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Mar 24 17:23:36 2020 -0700

    Documentation: Wordsmith lock ordering and nesting documentation
    
    This commit is strictly wordsmithing with no (intended) semantic
    changes.
    
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index ca7bf84..8eb52e9 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -94,7 +94,7 @@ interrupt handlers and soft interrupts.  This conversion allows spinlock_t
 and rwlock_t to be implemented via RT-mutexes.
 
 
-sempahore
+semaphore
 =========
 
 semaphore is a counting semaphore implementation.
@@ -103,17 +103,17 @@ Semaphores are often used for both serialization and waiting, but new use
 cases should instead use separate serialization and wait mechanisms, such
 as mutexes and completions.
 
-sempahores and PREEMPT_RT
+semaphores and PREEMPT_RT
 ----------------------------
 
-PREEMPT_RT does not change the sempahore implementation. That's impossible
-due to the counting semaphore semantics which have no concept of owners.
-The lack of an owner conflicts with priority inheritance. After all an
-unknown owner cannot be boosted. As a consequence blocking on semaphores
-can be subject to priority inversion.
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
 
 
-rw_sempahore
+rw_semaphore
 ============
 
 rw_semaphore is a multiple readers and single writer lock mechanism.
@@ -125,13 +125,13 @@ rw_semaphore complies by default with the strict owner semantics, but there
 exist special-purpose interfaces that allow non-owner release for readers.
 These work independent of the kernel configuration.
 
-rw_sempahore and PREEMPT_RT
+rw_semaphore and PREEMPT_RT
 ---------------------------
 
-PREEMPT_RT kernels map rw_sempahore to a separate rt_mutex-based
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
 implementation, thus changing the fairness:
 
- Because an rw_sempaphore writer cannot grant its priority to multiple
+ Because an rw_semaphore writer cannot grant its priority to multiple
  readers, a preempted low-priority reader will continue holding its lock,
  thus starving even high-priority writers.  In contrast, because readers
  can grant their priority to a writer, a preempted low-priority writer will
@@ -158,7 +158,7 @@ critical section is tiny, thus avoiding RT-mutex overhead.
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -196,7 +196,7 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it::
+   restores it, as shown below::
 
     task->state = TASK_INTERRUPTIBLE
      lock()
@@ -333,7 +333,7 @@ The most basic rules are:
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These constraints apply both in CONFIG_PREEMPT_RT and otherwise.
+These constraints apply both in PREEMPT_RT and otherwise.
 
 The fact that PREEMPT_RT changes the lock category of spinlock_t and
 rwlock_t from spinning to sleeping means that they cannot be acquired while
@@ -344,4 +344,4 @@ holding a raw spinlock.  This results in the following nesting ordering:
   3) raw_spinlock_t and bit spinlocks
 
 Lockdep will complain if these constraints are violated, both in
-CONFIG_PREEMPT_RT and otherwise.
+PREEMPT_RT and otherwise.

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* Re: [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
  2020-03-21 11:25   ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-25  8:37     ` Felipe Balbi
  -1 siblings, 0 replies; 195+ messages in thread
From: Felipe Balbi @ 2020-03-25  8:37 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> ep_io() uses a completion on stack and open codes the waiting with:
>
>   wait_event_interruptible (done.wait, done.done);
> and
>   wait_event (done.wait, done.done);
>
> This waits in non-exclusive mode for complete(), but there is no reason to
> do so because the completion can only be waited for by the task itself and
> complete() wakes exactly one exlusive waiter.
>
> Replace the open coded implementation with the corresponding
> wait_for_completion*() functions.
>
> No functional change.
>
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: linux-usb@vger.kernel.org

Do you want to carry it via your tree? If so:

Acked-by: Felipe Balbi <balbi@kernel.org>

Otherwise, let me know and I'll pick this patch.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-25  8:37     ` Felipe Balbi
  0 siblings, 0 replies; 195+ messages in thread
From: Felipe Balbi @ 2020-03-25  8:37 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> ep_io() uses a completion on stack and open codes the waiting with:
>
>   wait_event_interruptible (done.wait, done.done);
> and
>   wait_event (done.wait, done.done);
>
> This waits in non-exclusive mode for complete(), but there is no reason to
> do so because the completion can only be waited for by the task itself and
> complete() wakes exactly one exlusive waiter.
>
> Replace the open coded implementation with the corresponding
> wait_for_completion*() functions.
>
> No functional change.
>
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: linux-usb@vger.kernel.org

Do you want to carry it via your tree? If so:

Acked-by: Felipe Balbi <balbi@kernel.org>

Otherwise, let me know and I'll pick this patch.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-25  8:37     ` Felipe Balbi
  0 siblings, 0 replies; 195+ messages in thread
From: Felipe Balbi @ 2020-03-25  8:37 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, linuxppc-dev, Greentime Hu, Bjorn Helgaas,
	Kurt Schwemmer, Kalle Valo, Michal Simek, Tony Luck, Nick Hu,
	Geoff Levand, Greg Kroah-Hartman, linux-usb, linux-wireless,
	Oleg Nesterov, Davidlohr Bueso, netdev, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> ep_io() uses a completion on stack and open codes the waiting with:
>
>   wait_event_interruptible (done.wait, done.done);
> and
>   wait_event (done.wait, done.done);
>
> This waits in non-exclusive mode for complete(), but there is no reason to
> do so because the completion can only be waited for by the task itself and
> complete() wakes exactly one exlusive waiter.
>
> Replace the open coded implementation with the corresponding
> wait_for_completion*() functions.
>
> No functional change.
>
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: linux-usb@vger.kernel.org

Do you want to carry it via your tree? If so:

Acked-by: Felipe Balbi <balbi@kernel.org>

Otherwise, let me know and I'll pick this patch.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-25  8:37     ` Felipe Balbi
  0 siblings, 0 replies; 195+ messages in thread
From: Felipe Balbi @ 2020-03-25  8:37 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]

Thomas Gleixner <tglx@linutronix.de> writes:

> From: Thomas Gleixner <tglx@linutronix.de>
>
> ep_io() uses a completion on stack and open codes the waiting with:
>
>   wait_event_interruptible (done.wait, done.done);
> and
>   wait_event (done.wait, done.done);
>
> This waits in non-exclusive mode for complete(), but there is no reason to
> do so because the completion can only be waited for by the task itself and
> complete() wakes exactly one exlusive waiter.
>
> Replace the open coded implementation with the corresponding
> wait_for_completion*() functions.
>
> No functional change.
>
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: linux-usb@vger.kernel.org

Do you want to carry it via your tree? If so:

Acked-by: Felipe Balbi <balbi@kernel.org>

Otherwise, let me know and I'll pick this patch.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Documentation/locking/locktypes: Further clarifications and wordsmithing
  2020-03-25  0:28         ` Paul E. McKenney
                             ` (2 preceding siblings ...)
  (?)
@ 2020-03-25 12:27           ` Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-25 12:27 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

The documentation of rw_semaphores is wrong as it claims that the non-owner
reader release is not supported by RT. That's just history biased memory
distortion.

Split the 'Owner semantics' section up and add separate sections for
semaphore and rw_semaphore to reflect reality.

Aside of that the following updates are done:

 - Add pseudo code to document the spinlock state preserving mechanism on
   PREEMPT_RT

 - Wordsmith the bitspinlock and lock nesting sections

Co-developed-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/locking/locktypes.rst |  150 +++++++++++++++++++++++-------------
 1 file changed, 99 insertions(+), 51 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+semaphore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+semaphores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
+
+
+rw_semaphore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_semaphore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_semaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -102,7 +158,7 @@ critical section is tiny, thus avoiding
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it, as shown below::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+PREEMPT_RT and otherwise.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 12:27           ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-25 12:27 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless

The documentation of rw_semaphores is wrong as it claims that the non-owner
reader release is not supported by RT. That's just history biased memory
distortion.

Split the 'Owner semantics' section up and add separate sections for
semaphore and rw_semaphore to reflect reality.

Aside of that the following updates are done:

 - Add pseudo code to document the spinlock state preserving mechanism on
   PREEMPT_RT

 - Wordsmith the bitspinlock and lock nesting sections

Co-developed-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/locking/locktypes.rst |  150 +++++++++++++++++++++++-------------
 1 file changed, 99 insertions(+), 51 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+semaphore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+semaphores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
+
+
+rw_semaphore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_semaphore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_semaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -102,7 +158,7 @@ critical section is tiny, thus avoiding
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it, as shown below::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+PREEMPT_RT and otherwise.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 12:27           ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-25 12:27 UTC (permalink / raw)
  To: paulmck
  Cc: linux-usb, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, Oleg Nesterov, Guo Ren, Joel Fernandes,
	Vincent Chen, Ingo Molnar, Davidlohr Bueso, linux-acpi,
	Brian Cain, Jonathan Corbet, linux-hexagon, Rafael J. Wysocki,
	linux-csky, Linus Torvalds, Darren Hart, Zhang Rui, Len Brown,
	Fenghua Yu, Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, netdev, Randy Dunlap, linux-wireless,
	LKML, Davidlohr Bueso, Greg Kroah-Hartman, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

The documentation of rw_semaphores is wrong as it claims that the non-owner
reader release is not supported by RT. That's just history biased memory
distortion.

Split the 'Owner semantics' section up and add separate sections for
semaphore and rw_semaphore to reflect reality.

Aside of that the following updates are done:

 - Add pseudo code to document the spinlock state preserving mechanism on
   PREEMPT_RT

 - Wordsmith the bitspinlock and lock nesting sections

Co-developed-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/locking/locktypes.rst |  150 +++++++++++++++++++++++-------------
 1 file changed, 99 insertions(+), 51 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+semaphore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+semaphores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
+
+
+rw_semaphore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_semaphore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_semaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -102,7 +158,7 @@ critical section is tiny, thus avoiding
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it, as shown below::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+PREEMPT_RT and otherwise.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 12:27           ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-25 12:27 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

The documentation of rw_semaphores is wrong as it claims that the non-owner
reader release is not supported by RT. That's just history biased memory
distortion.

Split the 'Owner semantics' section up and add separate sections for
semaphore and rw_semaphore to reflect reality.

Aside of that the following updates are done:

 - Add pseudo code to document the spinlock state preserving mechanism on
   PREEMPT_RT

 - Wordsmith the bitspinlock and lock nesting sections

Co-developed-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/locking/locktypes.rst |  150 +++++++++++++++++++++++-------------
 1 file changed, 99 insertions(+), 51 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ==========  ==========================
 
+Owner semantics
+=======+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 ===@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+semaphore
+====+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+semaphores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
+
+
+rw_semaphore
+======
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_semaphore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_semaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 ============== 
@@ -102,7 +158,7 @@ critical section is tiny, thus avoiding
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it, as shown below::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ====
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-=======-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+PREEMPT_RT and otherwise.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 12:27           ` Thomas Gleixner
  0 siblings, 0 replies; 195+ messages in thread
From: Thomas Gleixner @ 2020-03-25 12:27 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, net

The documentation of rw_semaphores is wrong as it claims that the non-owner
reader release is not supported by RT. That's just history biased memory
distortion.

Split the 'Owner semantics' section up and add separate sections for
semaphore and rw_semaphore to reflect reality.

Aside of that the following updates are done:

 - Add pseudo code to document the spinlock state preserving mechanism on
   PREEMPT_RT

 - Wordsmith the bitspinlock and lock nesting sections

Co-developed-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/locking/locktypes.rst |  150 +++++++++++++++++++++++-------------
 1 file changed, 99 insertions(+), 51 deletions(-)

--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ Spinning locks implicitly disable preemp
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.
 and rwlock_t to be implemented via RT-mutexes.
 
 
+semaphore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+semaphores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
+
+
+rw_semaphore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These work independent of the kernel configuration.
+
+rw_semaphore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_semaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -102,7 +158,7 @@ critical section is tiny, thus avoiding
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -140,7 +196,16 @@ On a PREEMPT_RT enabled kernel spinlock_
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it, as shown below::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ On a PREEMPT_RT enabled kernel spinlock_
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ while holding normal non-raw spinlocks b
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
-
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
+
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ Lock type nesting rules
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+PREEMPT_RT and otherwise.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
  2020-03-25 12:27           ` Thomas Gleixner
                               ` (2 preceding siblings ...)
  (?)
@ 2020-03-25 16:02             ` Sebastian Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: paulmck, LKML, Peter Zijlstra, Ingo Molnar, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Jonathan Corbet,
	Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer,
	linux-pci, Greg Kroah-Hartman, Felipe Balbi, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> The documentation of rw_semaphores is wrong as it claims that the non-owner
> reader release is not supported by RT. That's just history biased memory
> distortion.
> 
> Split the 'Owner semantics' section up and add separate sections for
> semaphore and rw_semaphore to reflect reality.
> 
> Aside of that the following updates are done:
> 
>  - Add pseudo code to document the spinlock state preserving mechanism on
>    PREEMPT_RT
> 
>  - Wordsmith the bitspinlock and lock nesting sections
> 
> Co-developed-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> --- a/Documentation/locking/locktypes.rst
> +++ b/Documentation/locking/locktypes.rst
> +rw_semaphore
> +============
> +
> +rw_semaphore is a multiple readers and single writer lock mechanism.
> +
> +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> +writer starvation.
> +
> +rw_semaphore complies by default with the strict owner semantics, but there
> +exist special-purpose interfaces that allow non-owner release for readers.
> +These work independent of the kernel configuration.

This reads funny, could be my English. "This works independent …" maybe?

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:02             ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: paulmck, LKML, Peter Zijlstra, Ingo Molnar, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Jonathan Corbet,
	Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer,
	linux-pci, Greg Kroah-Hartman, Felipe Balbi, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev

On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> The documentation of rw_semaphores is wrong as it claims that the non-owner
> reader release is not supported by RT. That's just history biased memory
> distortion.
> 
> Split the 'Owner semantics' section up and add separate sections for
> semaphore and rw_semaphore to reflect reality.
> 
> Aside of that the following updates are done:
> 
>  - Add pseudo code to document the spinlock state preserving mechanism on
>    PREEMPT_RT
> 
>  - Wordsmith the bitspinlock and lock nesting sections
> 
> Co-developed-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> --- a/Documentation/locking/locktypes.rst
> +++ b/Documentation/locking/locktypes.rst
> +rw_semaphore
> +============
> +
> +rw_semaphore is a multiple readers and single writer lock mechanism.
> +
> +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> +writer starvation.
> +
> +rw_semaphore complies by default with the strict owner semantics, but there
> +exist special-purpose interfaces that allow non-owner release for readers.
> +These work independent of the kernel configuration.

This reads funny, could be my English. "This works independent …" maybe?

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:02             ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-usb, linux-ia64, Peter Zijlstra, linux-pci, Oleg Nesterov,
	Guo Ren, Joel Fernandes, Vincent Chen, Ingo Molnar,
	Davidlohr Bueso, linux-acpi, Brian Cain, Jonathan Corbet,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, paulmck, linux-pm,
	linuxppc-dev, Greentime Hu, Bjorn Helgaas, Kurt Schwemmer,
	platform-driver-x86, Kalle Valo, kbuild test robot, Felipe Balbi,
	Michal Simek, Tony Luck, Nick Hu, Geoff Levand,
	Greg Kroah-Hartman, Randy Dunlap, linux-wireless, LKML,
	Davidlohr Bueso, Arnd Bergmann, netdev, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> The documentation of rw_semaphores is wrong as it claims that the non-owner
> reader release is not supported by RT. That's just history biased memory
> distortion.
> 
> Split the 'Owner semantics' section up and add separate sections for
> semaphore and rw_semaphore to reflect reality.
> 
> Aside of that the following updates are done:
> 
>  - Add pseudo code to document the spinlock state preserving mechanism on
>    PREEMPT_RT
> 
>  - Wordsmith the bitspinlock and lock nesting sections
> 
> Co-developed-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> --- a/Documentation/locking/locktypes.rst
> +++ b/Documentation/locking/locktypes.rst
> +rw_semaphore
> +============
> +
> +rw_semaphore is a multiple readers and single writer lock mechanism.
> +
> +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> +writer starvation.
> +
> +rw_semaphore complies by default with the strict owner semantics, but there
> +exist special-purpose interfaces that allow non-owner release for readers.
> +These work independent of the kernel configuration.

This reads funny, could be my English. "This works independent …" maybe?

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:02             ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: paulmck, LKML, Peter Zijlstra, Ingo Molnar, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Jonathan Corbet,
	Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer,
	linux-pci, Greg Kroah-Hartman, Felipe Balbi, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> The documentation of rw_semaphores is wrong as it claims that the non-owner
> reader release is not supported by RT. That's just history biased memory
> distortion.
> 
> Split the 'Owner semantics' section up and add separate sections for
> semaphore and rw_semaphore to reflect reality.
> 
> Aside of that the following updates are done:
> 
>  - Add pseudo code to document the spinlock state preserving mechanism on
>    PREEMPT_RT
> 
>  - Wordsmith the bitspinlock and lock nesting sections
> 
> Co-developed-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> --- a/Documentation/locking/locktypes.rst
> +++ b/Documentation/locking/locktypes.rst
> +rw_semaphore
> +======
> +
> +rw_semaphore is a multiple readers and single writer lock mechanism.
> +
> +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> +writer starvation.
> +
> +rw_semaphore complies by default with the strict owner semantics, but there
> +exist special-purpose interfaces that allow non-owner release for readers.
> +These work independent of the kernel configuration.

This reads funny, could be my English. "This works independent …" maybe?

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:02             ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: paulmck, LKML, Peter Zijlstra, Ingo Molnar, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Jonathan Corbet,
	Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer,
	linux-pci, Greg Kroah-Hartman, Felipe Balbi, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev, Da

On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> The documentation of rw_semaphores is wrong as it claims that the non-owner
> reader release is not supported by RT. That's just history biased memory
> distortion.
> 
> Split the 'Owner semantics' section up and add separate sections for
> semaphore and rw_semaphore to reflect reality.
> 
> Aside of that the following updates are done:
> 
>  - Add pseudo code to document the spinlock state preserving mechanism on
>    PREEMPT_RT
> 
>  - Wordsmith the bitspinlock and lock nesting sections
> 
> Co-developed-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> --- a/Documentation/locking/locktypes.rst
> +++ b/Documentation/locking/locktypes.rst
> +rw_semaphore
> +============
> +
> +rw_semaphore is a multiple readers and single writer lock mechanism.
> +
> +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> +writer starvation.
> +
> +rw_semaphore complies by default with the strict owner semantics, but there
> +exist special-purpose interfaces that allow non-owner release for readers.
> +These work independent of the kernel configuration.

This reads funny, could be my English. "This works independent …" maybe?

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
  2020-03-25 16:02             ` Sebastian Siewior
  (?)
  (?)
@ 2020-03-25 16:39               ` Paul E. McKenney
  -1 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25 16:39 UTC (permalink / raw)
  To: Sebastian Siewior
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On Wed, Mar 25, 2020 at 05:02:12PM +0100, Sebastian Siewior wrote:
> On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> > The documentation of rw_semaphores is wrong as it claims that the non-owner
> > reader release is not supported by RT. That's just history biased memory
> > distortion.
> > 
> > Split the 'Owner semantics' section up and add separate sections for
> > semaphore and rw_semaphore to reflect reality.
> > 
> > Aside of that the following updates are done:
> > 
> >  - Add pseudo code to document the spinlock state preserving mechanism on
> >    PREEMPT_RT
> > 
> >  - Wordsmith the bitspinlock and lock nesting sections
> > 
> > Co-developed-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> > --- a/Documentation/locking/locktypes.rst
> > +++ b/Documentation/locking/locktypes.rst
> …
> > +rw_semaphore
> > +============
> > +
> > +rw_semaphore is a multiple readers and single writer lock mechanism.
> > +
> > +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> > +writer starvation.
> > +
> > +rw_semaphore complies by default with the strict owner semantics, but there
> > +exist special-purpose interfaces that allow non-owner release for readers.
> > +These work independent of the kernel configuration.
> 
> This reads funny, could be my English. "This works independent …" maybe?

The "These" refers to "interfaces", which is plural, so "These" rather
than "This".  But yes, it is a bit awkward, because you have to skip
back past "readers", "release", and "non-owner" to find the implied
subject of that last sentence.

So how about this instead, making the implied subject explicit?

rw_semaphore complies by default with the strict owner semantics, but there
exist special-purpose interfaces that allow non-owner release for readers.
These interfaces work independent of the kernel configuration.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:39               ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25 16:39 UTC (permalink / raw)
  To: Sebastian Siewior
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev

On Wed, Mar 25, 2020 at 05:02:12PM +0100, Sebastian Siewior wrote:
> On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> > The documentation of rw_semaphores is wrong as it claims that the non-owner
> > reader release is not supported by RT. That's just history biased memory
> > distortion.
> > 
> > Split the 'Owner semantics' section up and add separate sections for
> > semaphore and rw_semaphore to reflect reality.
> > 
> > Aside of that the following updates are done:
> > 
> >  - Add pseudo code to document the spinlock state preserving mechanism on
> >    PREEMPT_RT
> > 
> >  - Wordsmith the bitspinlock and lock nesting sections
> > 
> > Co-developed-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> > --- a/Documentation/locking/locktypes.rst
> > +++ b/Documentation/locking/locktypes.rst
> …
> > +rw_semaphore
> > +============
> > +
> > +rw_semaphore is a multiple readers and single writer lock mechanism.
> > +
> > +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> > +writer starvation.
> > +
> > +rw_semaphore complies by default with the strict owner semantics, but there
> > +exist special-purpose interfaces that allow non-owner release for readers.
> > +These work independent of the kernel configuration.
> 
> This reads funny, could be my English. "This works independent …" maybe?

The "These" refers to "interfaces", which is plural, so "These" rather
than "This".  But yes, it is a bit awkward, because you have to skip
back past "readers", "release", and "non-owner" to find the implied
subject of that last sentence.

So how about this instead, making the implied subject explicit?

rw_semaphore complies by default with the strict owner semantics, but there
exist special-purpose interfaces that allow non-owner release for readers.
These interfaces work independent of the kernel configuration.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:39               ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25 16:39 UTC (permalink / raw)
  To: Sebastian Siewior
  Cc: linux-usb, linux-ia64, Peter Zijlstra, linux-pci, Oleg Nesterov,
	Guo Ren, Joel Fernandes, Vincent Chen, Thomas Gleixner,
	Davidlohr Bueso, linux-acpi, Brian Cain, Jonathan Corbet,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Ingo Molnar,
	Linus Torvalds, Darren Hart, Zhang Rui, Len Brown, Fenghua Yu,
	Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, Greg Kroah-Hartman, Randy Dunlap,
	linux-wireless, LKML, Davidlohr Bueso, netdev, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

On Wed, Mar 25, 2020 at 05:02:12PM +0100, Sebastian Siewior wrote:
> On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> > The documentation of rw_semaphores is wrong as it claims that the non-owner
> > reader release is not supported by RT. That's just history biased memory
> > distortion.
> > 
> > Split the 'Owner semantics' section up and add separate sections for
> > semaphore and rw_semaphore to reflect reality.
> > 
> > Aside of that the following updates are done:
> > 
> >  - Add pseudo code to document the spinlock state preserving mechanism on
> >    PREEMPT_RT
> > 
> >  - Wordsmith the bitspinlock and lock nesting sections
> > 
> > Co-developed-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> > --- a/Documentation/locking/locktypes.rst
> > +++ b/Documentation/locking/locktypes.rst
> …
> > +rw_semaphore
> > +============
> > +
> > +rw_semaphore is a multiple readers and single writer lock mechanism.
> > +
> > +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> > +writer starvation.
> > +
> > +rw_semaphore complies by default with the strict owner semantics, but there
> > +exist special-purpose interfaces that allow non-owner release for readers.
> > +These work independent of the kernel configuration.
> 
> This reads funny, could be my English. "This works independent …" maybe?

The "These" refers to "interfaces", which is plural, so "These" rather
than "This".  But yes, it is a bit awkward, because you have to skip
back past "readers", "release", and "non-owner" to find the implied
subject of that last sentence.

So how about this instead, making the implied subject explicit?

rw_semaphore complies by default with the strict owner semantics, but there
exist special-purpose interfaces that allow non-owner release for readers.
These interfaces work independent of the kernel configuration.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:39               ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-25 16:39 UTC (permalink / raw)
  To: Sebastian Siewior
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On Wed, Mar 25, 2020 at 05:02:12PM +0100, Sebastian Siewior wrote:
> On 2020-03-25 13:27:49 [+0100], Thomas Gleixner wrote:
> > The documentation of rw_semaphores is wrong as it claims that the non-owner
> > reader release is not supported by RT. That's just history biased memory
> > distortion.
> > 
> > Split the 'Owner semantics' section up and add separate sections for
> > semaphore and rw_semaphore to reflect reality.
> > 
> > Aside of that the following updates are done:
> > 
> >  - Add pseudo code to document the spinlock state preserving mechanism on
> >    PREEMPT_RT
> > 
> >  - Wordsmith the bitspinlock and lock nesting sections
> > 
> > Co-developed-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> > --- a/Documentation/locking/locktypes.rst
> > +++ b/Documentation/locking/locktypes.rst
> …
> > +rw_semaphore
> > +======
> > +
> > +rw_semaphore is a multiple readers and single writer lock mechanism.
> > +
> > +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> > +writer starvation.
> > +
> > +rw_semaphore complies by default with the strict owner semantics, but there
> > +exist special-purpose interfaces that allow non-owner release for readers.
> > +These work independent of the kernel configuration.
> 
> This reads funny, could be my English. "This works independent …" maybe?

The "These" refers to "interfaces", which is plural, so "These" rather
than "This".  But yes, it is a bit awkward, because you have to skip
back past "readers", "release", and "non-owner" to find the implied
subject of that last sentence.

So how about this instead, making the implied subject explicit?

rw_semaphore complies by default with the strict owner semantics, but there
exist special-purpose interfaces that allow non-owner release for readers.
These interfaces work independent of the kernel configuration.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
  2020-03-25 16:39               ` Paul E. McKenney
  (?)
  (?)
@ 2020-03-25 16:54                 ` Sebastian Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On 2020-03-25 09:39:19 [-0700], Paul E. McKenney wrote:
> > > --- a/Documentation/locking/locktypes.rst
> > > +++ b/Documentation/locking/locktypes.rst
> > …
> > > +rw_semaphore
> > > +============
> > > +
> > > +rw_semaphore is a multiple readers and single writer lock mechanism.
> > > +
> > > +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> > > +writer starvation.
> > > +
> > > +rw_semaphore complies by default with the strict owner semantics, but there
> > > +exist special-purpose interfaces that allow non-owner release for readers.
> > > +These work independent of the kernel configuration.
> > 
> > This reads funny, could be my English. "This works independent …" maybe?
> 
> The "These" refers to "interfaces", which is plural, so "These" rather
> than "This".  But yes, it is a bit awkward, because you have to skip
> back past "readers", "release", and "non-owner" to find the implied
> subject of that last sentence.
> 
> So how about this instead, making the implied subject explicit?
> 
> rw_semaphore complies by default with the strict owner semantics, but there
> exist special-purpose interfaces that allow non-owner release for readers.
> These interfaces work independent of the kernel configuration.

Yes, perfect. Thank you.

> 							Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:54                 ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev

On 2020-03-25 09:39:19 [-0700], Paul E. McKenney wrote:
> > > --- a/Documentation/locking/locktypes.rst
> > > +++ b/Documentation/locking/locktypes.rst
> > …
> > > +rw_semaphore
> > > +============
> > > +
> > > +rw_semaphore is a multiple readers and single writer lock mechanism.
> > > +
> > > +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> > > +writer starvation.
> > > +
> > > +rw_semaphore complies by default with the strict owner semantics, but there
> > > +exist special-purpose interfaces that allow non-owner release for readers.
> > > +These work independent of the kernel configuration.
> > 
> > This reads funny, could be my English. "This works independent …" maybe?
> 
> The "These" refers to "interfaces", which is plural, so "These" rather
> than "This".  But yes, it is a bit awkward, because you have to skip
> back past "readers", "release", and "non-owner" to find the implied
> subject of that last sentence.
> 
> So how about this instead, making the implied subject explicit?
> 
> rw_semaphore complies by default with the strict owner semantics, but there
> exist special-purpose interfaces that allow non-owner release for readers.
> These interfaces work independent of the kernel configuration.

Yes, perfect. Thank you.

> 							Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:54                 ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-usb, linux-ia64, Peter Zijlstra, linux-pci, Oleg Nesterov,
	Guo Ren, Joel Fernandes, Vincent Chen, Thomas Gleixner,
	Davidlohr Bueso, linux-acpi, Brian Cain, Jonathan Corbet,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Ingo Molnar,
	Linus Torvalds, Darren Hart, Zhang Rui, Len Brown, Fenghua Yu,
	Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, Greg Kroah-Hartman, Randy Dunlap,
	linux-wireless, LKML, Davidlohr Bueso, netdev, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

On 2020-03-25 09:39:19 [-0700], Paul E. McKenney wrote:
> > > --- a/Documentation/locking/locktypes.rst
> > > +++ b/Documentation/locking/locktypes.rst
> > …
> > > +rw_semaphore
> > > +============
> > > +
> > > +rw_semaphore is a multiple readers and single writer lock mechanism.
> > > +
> > > +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> > > +writer starvation.
> > > +
> > > +rw_semaphore complies by default with the strict owner semantics, but there
> > > +exist special-purpose interfaces that allow non-owner release for readers.
> > > +These work independent of the kernel configuration.
> > 
> > This reads funny, could be my English. "This works independent …" maybe?
> 
> The "These" refers to "interfaces", which is plural, so "These" rather
> than "This".  But yes, it is a bit awkward, because you have to skip
> back past "readers", "release", and "non-owner" to find the implied
> subject of that last sentence.
> 
> So how about this instead, making the implied subject explicit?
> 
> rw_semaphore complies by default with the strict owner semantics, but there
> exist special-purpose interfaces that allow non-owner release for readers.
> These interfaces work independent of the kernel configuration.

Yes, perfect. Thank you.

> 							Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: Documentation/locking/locktypes: Further clarifications and wordsmithing
@ 2020-03-25 16:54                 ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-25 16:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Randy Dunlap, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On 2020-03-25 09:39:19 [-0700], Paul E. McKenney wrote:
> > > --- a/Documentation/locking/locktypes.rst
> > > +++ b/Documentation/locking/locktypes.rst
> > …
> > > +rw_semaphore
> > > +======
> > > +
> > > +rw_semaphore is a multiple readers and single writer lock mechanism.
> > > +
> > > +On non-PREEMPT_RT kernels the implementation is fair, thus preventing
> > > +writer starvation.
> > > +
> > > +rw_semaphore complies by default with the strict owner semantics, but there
> > > +exist special-purpose interfaces that allow non-owner release for readers.
> > > +These work independent of the kernel configuration.
> > 
> > This reads funny, could be my English. "This works independent …" maybe?
> 
> The "These" refers to "interfaces", which is plural, so "These" rather
> than "This".  But yes, it is a bit awkward, because you have to skip
> back past "readers", "release", and "non-owner" to find the implied
> subject of that last sentence.
> 
> So how about this instead, making the implied subject explicit?
> 
> rw_semaphore complies by default with the strict owner semantics, but there
> exist special-purpose interfaces that allow non-owner release for readers.
> These interfaces work independent of the kernel configuration.

Yes, perfect. Thank you.

> 							Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
@ 2020-03-25 16:58             ` Randy Dunlap
  0 siblings, 0 replies; 195+ messages in thread
From: Randy Dunlap @ 2020-03-25 16:58 UTC (permalink / raw)
  To: Thomas Gleixner, paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer,
	linux-pci, Greg Kroah-Hartman, Felipe Balbi, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

From: Randy Dunlap <rdunlap@infradead.org>

Minor editorial fixes:
- add some hyphens in multi-word adjectives
- add some periods for consistency
- add "'" for possessive CPU's
- capitalize IRQ when it's an acronym and not part of a function name

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 Documentation/locking/locktypes.rst |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- linux-next-20200325.orig/Documentation/locking/locktypes.rst
+++ linux-next-20200325/Documentation/locking/locktypes.rst
@@ -84,7 +84,7 @@ rtmutex
 
 RT-mutexes are mutexes with support for priority inheritance (PI).
 
-PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and
 interrupt disabled sections.
 
 PI clearly cannot preempt preemption-disabled or interrupt-disabled
@@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
 
 raw_spinlock_t is a strict spinning lock implementation in all kernels,
 including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
-core code, low level interrupt handling and places where disabling
+core code, low-level interrupt handling and places where disabling
 preemption or interrupts is required, for example, to safely access
 hardware state.  raw_spinlock_t can sometimes also be used when the
 critical section is tiny, thus avoiding RT-mutex overhead.
@@ -160,20 +160,20 @@ spinlock_t
 
 The semantics of spinlock_t change with the state of PREEMPT_RT.
 
-On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
 
 spinlock_t and PREEMPT_RT
 -------------------------
 
-On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate
 implementation based on rt_mutex which changes the semantics:
 
- - Preemption is not disabled
+ - Preemption is not disabled.
 
  - The hard interrupt related suffixes for spin_lock / spin_unlock
-   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
-   interrupt disabled state
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
+   interrupt disabled state.
 
  - The soft interrupt related suffix (_bh()) still disables softirq
    handlers.
@@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
 spin_lock_irqsave() and their unlock counterparts.  In cases where the
 interrupt disabling and locking must remain separate, PREEMPT_RT offers a
 local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
-allowing things like per-CPU irq-disabled locks to be acquired.  However,
+allowing things like per-CPU IRQ-disabled locks to be acquired.  However,
 this approach should be used only where absolutely necessary.
 
 


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
@ 2020-03-25 16:58             ` Randy Dunlap
  0 siblings, 0 replies; 195+ messages in thread
From: Randy Dunlap @ 2020-03-25 16:58 UTC (permalink / raw)
  To: Thomas Gleixner, paulmck-DgEjT+Ai2ygdnm+yROfE0A
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	Felipe Balbi, linux-usb-u79uwXL29TY76Z2rM5mHXA, Kalle Valo,
	David S. Miller, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Darren Hart

From: Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>

Minor editorial fixes:
- add some hyphens in multi-word adjectives
- add some periods for consistency
- add "'" for possessive CPU's
- capitalize IRQ when it's an acronym and not part of a function name

Signed-off-by: Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Paul McKenney <paulmck-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Cc: Sebastian Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Cc: Joel Fernandes <joel-QYYGw3jwrUn5owFQY34kdNi2O/JbrIOy@public.gmane.org>
Cc: Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
---
 Documentation/locking/locktypes.rst |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- linux-next-20200325.orig/Documentation/locking/locktypes.rst
+++ linux-next-20200325/Documentation/locking/locktypes.rst
@@ -84,7 +84,7 @@ rtmutex
 
 RT-mutexes are mutexes with support for priority inheritance (PI).
 
-PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and
 interrupt disabled sections.
 
 PI clearly cannot preempt preemption-disabled or interrupt-disabled
@@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
 
 raw_spinlock_t is a strict spinning lock implementation in all kernels,
 including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
-core code, low level interrupt handling and places where disabling
+core code, low-level interrupt handling and places where disabling
 preemption or interrupts is required, for example, to safely access
 hardware state.  raw_spinlock_t can sometimes also be used when the
 critical section is tiny, thus avoiding RT-mutex overhead.
@@ -160,20 +160,20 @@ spinlock_t
 
 The semantics of spinlock_t change with the state of PREEMPT_RT.
 
-On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
 
 spinlock_t and PREEMPT_RT
 -------------------------
 
-On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate
 implementation based on rt_mutex which changes the semantics:
 
- - Preemption is not disabled
+ - Preemption is not disabled.
 
  - The hard interrupt related suffixes for spin_lock / spin_unlock
-   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
-   interrupt disabled state
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
+   interrupt disabled state.
 
  - The soft interrupt related suffix (_bh()) still disables softirq
    handlers.
@@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
 spin_lock_irqsave() and their unlock counterparts.  In cases where the
 interrupt disabling and locking must remain separate, PREEMPT_RT offers a
 local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
-allowing things like per-CPU irq-disabled locks to be acquired.  However,
+allowing things like per-CPU IRQ-disabled locks to be acquired.  However,
 this approach should be used only where absolutely necessary.
 
 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
@ 2020-03-25 16:58             ` Randy Dunlap
  0 siblings, 0 replies; 195+ messages in thread
From: Randy Dunlap @ 2020-03-25 16:58 UTC (permalink / raw)
  To: Thomas Gleixner, paulmck
  Cc: linux-ia64, Peter Zijlstra, linux-pci, Sebastian Siewior,
	Oleg Nesterov, Guo Ren, Joel Fernandes, Vincent Chen,
	Ingo Molnar, Davidlohr Bueso, linux-acpi, Brian Cain,
	Jonathan Corbet, linux-hexagon, Rafael J. Wysocki, linux-csky,
	Linus Torvalds, Darren Hart, Zhang Rui, Len Brown, Fenghua Yu,
	Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, netdev, linux-usb, linux-wireless, LKML,
	Davidlohr Bueso, Greg Kroah-Hartman, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

From: Randy Dunlap <rdunlap@infradead.org>

Minor editorial fixes:
- add some hyphens in multi-word adjectives
- add some periods for consistency
- add "'" for possessive CPU's
- capitalize IRQ when it's an acronym and not part of a function name

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 Documentation/locking/locktypes.rst |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- linux-next-20200325.orig/Documentation/locking/locktypes.rst
+++ linux-next-20200325/Documentation/locking/locktypes.rst
@@ -84,7 +84,7 @@ rtmutex
 
 RT-mutexes are mutexes with support for priority inheritance (PI).
 
-PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and
 interrupt disabled sections.
 
 PI clearly cannot preempt preemption-disabled or interrupt-disabled
@@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
 
 raw_spinlock_t is a strict spinning lock implementation in all kernels,
 including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
-core code, low level interrupt handling and places where disabling
+core code, low-level interrupt handling and places where disabling
 preemption or interrupts is required, for example, to safely access
 hardware state.  raw_spinlock_t can sometimes also be used when the
 critical section is tiny, thus avoiding RT-mutex overhead.
@@ -160,20 +160,20 @@ spinlock_t
 
 The semantics of spinlock_t change with the state of PREEMPT_RT.
 
-On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
 
 spinlock_t and PREEMPT_RT
 -------------------------
 
-On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate
 implementation based on rt_mutex which changes the semantics:
 
- - Preemption is not disabled
+ - Preemption is not disabled.
 
  - The hard interrupt related suffixes for spin_lock / spin_unlock
-   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
-   interrupt disabled state
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
+   interrupt disabled state.
 
  - The soft interrupt related suffix (_bh()) still disables softirq
    handlers.
@@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
 spin_lock_irqsave() and their unlock counterparts.  In cases where the
 interrupt disabling and locking must remain separate, PREEMPT_RT offers a
 local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
-allowing things like per-CPU irq-disabled locks to be acquired.  However,
+allowing things like per-CPU IRQ-disabled locks to be acquired.  However,
 this approach should be used only where absolutely necessary.
 
 


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
@ 2020-03-25 16:58             ` Randy Dunlap
  0 siblings, 0 replies; 195+ messages in thread
From: Randy Dunlap @ 2020-03-25 16:58 UTC (permalink / raw)
  To: Thomas Gleixner, paulmck
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Sebastian Siewior,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Jonathan Corbet, Logan Gunthorpe, Bjorn Helgaas, Kurt Schwemmer,
	linux-pci, Greg Kroah-Hartman, Felipe Balbi, linux-usb,
	Kalle Valo, David S. Miller, linux-wireless, netdev, Darren Hart,
	Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

From: Randy Dunlap <rdunlap@infradead.org>

Minor editorial fixes:
- add some hyphens in multi-word adjectives
- add some periods for consistency
- add "'" for possessive CPU's
- capitalize IRQ when it's an acronym and not part of a function name

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 Documentation/locking/locktypes.rst |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- linux-next-20200325.orig/Documentation/locking/locktypes.rst
+++ linux-next-20200325/Documentation/locking/locktypes.rst
@@ -84,7 +84,7 @@ rtmutex
 
 RT-mutexes are mutexes with support for priority inheritance (PI).
 
-PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and
 interrupt disabled sections.
 
 PI clearly cannot preempt preemption-disabled or interrupt-disabled
@@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
 
 raw_spinlock_t is a strict spinning lock implementation in all kernels,
 including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
-core code, low level interrupt handling and places where disabling
+core code, low-level interrupt handling and places where disabling
 preemption or interrupts is required, for example, to safely access
 hardware state.  raw_spinlock_t can sometimes also be used when the
 critical section is tiny, thus avoiding RT-mutex overhead.
@@ -160,20 +160,20 @@ spinlock_t
 
 The semantics of spinlock_t change with the state of PREEMPT_RT.
 
-On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
+On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
 
 spinlock_t and PREEMPT_RT
 -------------------------
 
-On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
+On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate
 implementation based on rt_mutex which changes the semantics:
 
- - Preemption is not disabled
+ - Preemption is not disabled.
 
  - The hard interrupt related suffixes for spin_lock / spin_unlock
-   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
-   interrupt disabled state
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
+   interrupt disabled state.
 
  - The soft interrupt related suffix (_bh()) still disables softirq
    handlers.
@@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
 spin_lock_irqsave() and their unlock counterparts.  In cases where the
 interrupt disabling and locking must remain separate, PREEMPT_RT offers a
 local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
-allowing things like per-CPU irq-disabled locks to be acquired.  However,
+allowing things like per-CPU IRQ-disabled locks to be acquired.  However,
 this approach should be used only where absolutely necessary.
 
 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
  2020-03-25 16:58             ` Randy Dunlap
                                 ` (2 preceding siblings ...)
  (?)
@ 2020-03-26  2:40               ` Paul E. McKenney
  -1 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-26  2:40 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Linus Torvalds, Joel Fernandes, Oleg Nesterov,
	Davidlohr Bueso, Jonathan Corbet, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On Wed, Mar 25, 2020 at 09:58:14AM -0700, Randy Dunlap wrote:
> From: Randy Dunlap <rdunlap@infradead.org>
> 
> Minor editorial fixes:
> - add some hyphens in multi-word adjectives
> - add some periods for consistency
> - add "'" for possessive CPU's
> - capitalize IRQ when it's an acronym and not part of a function name
> 
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Paul McKenney <paulmck@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>

Some nits below, but with or without those suggested changes:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  Documentation/locking/locktypes.rst |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> --- linux-next-20200325.orig/Documentation/locking/locktypes.rst
> +++ linux-next-20200325/Documentation/locking/locktypes.rst
> @@ -84,7 +84,7 @@ rtmutex
>  
>  RT-mutexes are mutexes with support for priority inheritance (PI).
>  
> -PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
> +PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and

Or just drop the " enabled".

>  interrupt disabled sections.
>  
>  PI clearly cannot preempt preemption-disabled or interrupt-disabled
> @@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
>  
>  raw_spinlock_t is a strict spinning lock implementation in all kernels,
>  including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
> -core code, low level interrupt handling and places where disabling
> +core code, low-level interrupt handling and places where disabling
>  preemption or interrupts is required, for example, to safely access
>  hardware state.  raw_spinlock_t can sometimes also be used when the
>  critical section is tiny, thus avoiding RT-mutex overhead.
> @@ -160,20 +160,20 @@ spinlock_t
>  
>  The semantics of spinlock_t change with the state of PREEMPT_RT.
>  
> -On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
> +On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t

Ditto.

>  and has exactly the same semantics.
>  
>  spinlock_t and PREEMPT_RT
>  -------------------------
>  
> -On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
> +On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate

And here as well.

>  implementation based on rt_mutex which changes the semantics:
>  
> - - Preemption is not disabled
> + - Preemption is not disabled.
>  
>   - The hard interrupt related suffixes for spin_lock / spin_unlock
> -   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
> -   interrupt disabled state
> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
> +   interrupt disabled state.
>  
>   - The soft interrupt related suffix (_bh()) still disables softirq
>     handlers.
> @@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
>  spin_lock_irqsave() and their unlock counterparts.  In cases where the
>  interrupt disabling and locking must remain separate, PREEMPT_RT offers a
>  local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
> -allowing things like per-CPU irq-disabled locks to be acquired.  However,
> +allowing things like per-CPU IRQ-disabled locks to be acquired.  However,

Quite a bit of text in the kernel uses "irq", lower case.  Another
option is to spell out "interrupt".

>  this approach should be used only where absolutely necessary.
>  
>  
> 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
@ 2020-03-26  2:40               ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-26  2:40 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Linus Torvalds, Joel Fernandes, Oleg Nesterov,
	Davidlohr Bueso, Jonathan Corbet, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless

On Wed, Mar 25, 2020 at 09:58:14AM -0700, Randy Dunlap wrote:
> From: Randy Dunlap <rdunlap@infradead.org>
> 
> Minor editorial fixes:
> - add some hyphens in multi-word adjectives
> - add some periods for consistency
> - add "'" for possessive CPU's
> - capitalize IRQ when it's an acronym and not part of a function name
> 
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Paul McKenney <paulmck@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>

Some nits below, but with or without those suggested changes:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  Documentation/locking/locktypes.rst |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> --- linux-next-20200325.orig/Documentation/locking/locktypes.rst
> +++ linux-next-20200325/Documentation/locking/locktypes.rst
> @@ -84,7 +84,7 @@ rtmutex
>  
>  RT-mutexes are mutexes with support for priority inheritance (PI).
>  
> -PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
> +PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and

Or just drop the " enabled".

>  interrupt disabled sections.
>  
>  PI clearly cannot preempt preemption-disabled or interrupt-disabled
> @@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
>  
>  raw_spinlock_t is a strict spinning lock implementation in all kernels,
>  including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
> -core code, low level interrupt handling and places where disabling
> +core code, low-level interrupt handling and places where disabling
>  preemption or interrupts is required, for example, to safely access
>  hardware state.  raw_spinlock_t can sometimes also be used when the
>  critical section is tiny, thus avoiding RT-mutex overhead.
> @@ -160,20 +160,20 @@ spinlock_t
>  
>  The semantics of spinlock_t change with the state of PREEMPT_RT.
>  
> -On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
> +On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t

Ditto.

>  and has exactly the same semantics.
>  
>  spinlock_t and PREEMPT_RT
>  -------------------------
>  
> -On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
> +On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate

And here as well.

>  implementation based on rt_mutex which changes the semantics:
>  
> - - Preemption is not disabled
> + - Preemption is not disabled.
>  
>   - The hard interrupt related suffixes for spin_lock / spin_unlock
> -   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
> -   interrupt disabled state
> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
> +   interrupt disabled state.
>  
>   - The soft interrupt related suffix (_bh()) still disables softirq
>     handlers.
> @@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
>  spin_lock_irqsave() and their unlock counterparts.  In cases where the
>  interrupt disabling and locking must remain separate, PREEMPT_RT offers a
>  local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
> -allowing things like per-CPU irq-disabled locks to be acquired.  However,
> +allowing things like per-CPU IRQ-disabled locks to be acquired.  However,

Quite a bit of text in the kernel uses "irq", lower case.  Another
option is to spell out "interrupt".

>  this approach should be used only where absolutely necessary.
>  
>  
> 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
@ 2020-03-26  2:40               ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-26  2:40 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-ia64, Peter Zijlstra, linux-pci, Sebastian Siewior,
	Oleg Nesterov, Guo Ren, Joel Fernandes, Vincent Chen,
	Thomas Gleixner, Davidlohr Bueso, linux-acpi, Brian Cain,
	Jonathan Corbet, linux-hexagon, Rafael J. Wysocki, linux-csky,
	Ingo Molnar, Linus Torvalds, Darren Hart, Zhang Rui, Len Brown,
	Fenghua Yu, Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	kbuild test robot, Felipe Balbi, Michal Simek, Tony Luck,
	Nick Hu, Geoff Levand, Greg Kroah-Hartman, linux-usb,
	linux-wireless, LKML, Davidlohr Bueso, netdev, Logan Gunthorpe,
	David S. Miller, Andy Shevchenko

On Wed, Mar 25, 2020 at 09:58:14AM -0700, Randy Dunlap wrote:
> From: Randy Dunlap <rdunlap@infradead.org>
> 
> Minor editorial fixes:
> - add some hyphens in multi-word adjectives
> - add some periods for consistency
> - add "'" for possessive CPU's
> - capitalize IRQ when it's an acronym and not part of a function name
> 
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Paul McKenney <paulmck@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>

Some nits below, but with or without those suggested changes:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  Documentation/locking/locktypes.rst |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> --- linux-next-20200325.orig/Documentation/locking/locktypes.rst
> +++ linux-next-20200325/Documentation/locking/locktypes.rst
> @@ -84,7 +84,7 @@ rtmutex
>  
>  RT-mutexes are mutexes with support for priority inheritance (PI).
>  
> -PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
> +PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and

Or just drop the " enabled".

>  interrupt disabled sections.
>  
>  PI clearly cannot preempt preemption-disabled or interrupt-disabled
> @@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
>  
>  raw_spinlock_t is a strict spinning lock implementation in all kernels,
>  including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
> -core code, low level interrupt handling and places where disabling
> +core code, low-level interrupt handling and places where disabling
>  preemption or interrupts is required, for example, to safely access
>  hardware state.  raw_spinlock_t can sometimes also be used when the
>  critical section is tiny, thus avoiding RT-mutex overhead.
> @@ -160,20 +160,20 @@ spinlock_t
>  
>  The semantics of spinlock_t change with the state of PREEMPT_RT.
>  
> -On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
> +On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t

Ditto.

>  and has exactly the same semantics.
>  
>  spinlock_t and PREEMPT_RT
>  -------------------------
>  
> -On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
> +On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate

And here as well.

>  implementation based on rt_mutex which changes the semantics:
>  
> - - Preemption is not disabled
> + - Preemption is not disabled.
>  
>   - The hard interrupt related suffixes for spin_lock / spin_unlock
> -   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
> -   interrupt disabled state
> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
> +   interrupt disabled state.
>  
>   - The soft interrupt related suffix (_bh()) still disables softirq
>     handlers.
> @@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
>  spin_lock_irqsave() and their unlock counterparts.  In cases where the
>  interrupt disabling and locking must remain separate, PREEMPT_RT offers a
>  local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
> -allowing things like per-CPU irq-disabled locks to be acquired.  However,
> +allowing things like per-CPU IRQ-disabled locks to be acquired.  However,

Quite a bit of text in the kernel uses "irq", lower case.  Another
option is to spell out "interrupt".

>  this approach should be used only where absolutely necessary.
>  
>  
> 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
@ 2020-03-26  2:40               ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-26  2:40 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Linus Torvalds, Joel Fernandes, Oleg Nesterov,
	Davidlohr Bueso, Jonathan Corbet, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Michael Ellerman, Arnd Bergmann,
	Geoff Levand, linuxppc-dev, Davidlohr Bueso

On Wed, Mar 25, 2020 at 09:58:14AM -0700, Randy Dunlap wrote:
> From: Randy Dunlap <rdunlap@infradead.org>
> 
> Minor editorial fixes:
> - add some hyphens in multi-word adjectives
> - add some periods for consistency
> - add "'" for possessive CPU's
> - capitalize IRQ when it's an acronym and not part of a function name
> 
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Paul McKenney <paulmck@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>

Some nits below, but with or without those suggested changes:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  Documentation/locking/locktypes.rst |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> --- linux-next-20200325.orig/Documentation/locking/locktypes.rst
> +++ linux-next-20200325/Documentation/locking/locktypes.rst
> @@ -84,7 +84,7 @@ rtmutex
>  
>  RT-mutexes are mutexes with support for priority inheritance (PI).
>  
> -PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
> +PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and

Or just drop the " enabled".

>  interrupt disabled sections.
>  
>  PI clearly cannot preempt preemption-disabled or interrupt-disabled
> @@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
>  
>  raw_spinlock_t is a strict spinning lock implementation in all kernels,
>  including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
> -core code, low level interrupt handling and places where disabling
> +core code, low-level interrupt handling and places where disabling
>  preemption or interrupts is required, for example, to safely access
>  hardware state.  raw_spinlock_t can sometimes also be used when the
>  critical section is tiny, thus avoiding RT-mutex overhead.
> @@ -160,20 +160,20 @@ spinlock_t
>  
>  The semantics of spinlock_t change with the state of PREEMPT_RT.
>  
> -On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
> +On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t

Ditto.

>  and has exactly the same semantics.
>  
>  spinlock_t and PREEMPT_RT
>  -------------------------
>  
> -On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
> +On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate

And here as well.

>  implementation based on rt_mutex which changes the semantics:
>  
> - - Preemption is not disabled
> + - Preemption is not disabled.
>  
>   - The hard interrupt related suffixes for spin_lock / spin_unlock
> -   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
> -   interrupt disabled state
> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
> +   interrupt disabled state.
>  
>   - The soft interrupt related suffix (_bh()) still disables softirq
>     handlers.
> @@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
>  spin_lock_irqsave() and their unlock counterparts.  In cases where the
>  interrupt disabling and locking must remain separate, PREEMPT_RT offers a
>  local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
> -allowing things like per-CPU irq-disabled locks to be acquired.  However,
> +allowing things like per-CPU IRQ-disabled locks to be acquired.  However,

Quite a bit of text in the kernel uses "irq", lower case.  Another
option is to spell out "interrupt".

>  this approach should be used only where absolutely necessary.
>  
>  
> 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes
@ 2020-03-26  2:40               ` Paul E. McKenney
  0 siblings, 0 replies; 195+ messages in thread
From: Paul E. McKenney @ 2020-03-26  2:40 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Linus Torvalds, Joel Fernandes, Oleg Nesterov,
	Davidlohr Bueso, Jonathan Corbet, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, net

On Wed, Mar 25, 2020 at 09:58:14AM -0700, Randy Dunlap wrote:
> From: Randy Dunlap <rdunlap@infradead.org>
> 
> Minor editorial fixes:
> - add some hyphens in multi-word adjectives
> - add some periods for consistency
> - add "'" for possessive CPU's
> - capitalize IRQ when it's an acronym and not part of a function name
> 
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> Cc: Paul McKenney <paulmck@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>

Some nits below, but with or without those suggested changes:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  Documentation/locking/locktypes.rst |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> --- linux-next-20200325.orig/Documentation/locking/locktypes.rst
> +++ linux-next-20200325/Documentation/locking/locktypes.rst
> @@ -84,7 +84,7 @@ rtmutex
>  
>  RT-mutexes are mutexes with support for priority inheritance (PI).
>  
> -PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
> +PI has limitations on non-PREEMPT_RT-enabled kernels due to preemption and

Or just drop the " enabled".

>  interrupt disabled sections.
>  
>  PI clearly cannot preempt preemption-disabled or interrupt-disabled
> @@ -150,7 +150,7 @@ kernel configuration including PREEMPT_R
>  
>  raw_spinlock_t is a strict spinning lock implementation in all kernels,
>  including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
> -core code, low level interrupt handling and places where disabling
> +core code, low-level interrupt handling and places where disabling
>  preemption or interrupts is required, for example, to safely access
>  hardware state.  raw_spinlock_t can sometimes also be used when the
>  critical section is tiny, thus avoiding RT-mutex overhead.
> @@ -160,20 +160,20 @@ spinlock_t
>  
>  The semantics of spinlock_t change with the state of PREEMPT_RT.
>  
> -On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
> +On a non-PREEMPT_RT-enabled kernel spinlock_t is mapped to raw_spinlock_t

Ditto.

>  and has exactly the same semantics.
>  
>  spinlock_t and PREEMPT_RT
>  -------------------------
>  
> -On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
> +On a PREEMPT_RT-enabled kernel spinlock_t is mapped to a separate

And here as well.

>  implementation based on rt_mutex which changes the semantics:
>  
> - - Preemption is not disabled
> + - Preemption is not disabled.
>  
>   - The hard interrupt related suffixes for spin_lock / spin_unlock
> -   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
> -   interrupt disabled state
> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
> +   interrupt disabled state.
>  
>   - The soft interrupt related suffix (_bh()) still disables softirq
>     handlers.
> @@ -279,7 +279,7 @@ fully preemptible context.  Instead, use
>  spin_lock_irqsave() and their unlock counterparts.  In cases where the
>  interrupt disabling and locking must remain separate, PREEMPT_RT offers a
>  local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
> -allowing things like per-CPU irq-disabled locks to be acquired.  However,
> +allowing things like per-CPU IRQ-disabled locks to be acquired.  However,

Quite a bit of text in the kernel uses "irq", lower case.  Another
option is to spell out "interrupt".

>  this approach should be used only where absolutely necessary.
>  
>  
> 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
  2020-03-25  8:37     ` Felipe Balbi
  (?)
  (?)
@ 2020-03-27 12:14       ` Sebastian Siewior
  -1 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-27 12:14 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

On 2020-03-25 10:37:57 [+0200], Felipe Balbi wrote:
> Do you want to carry it via your tree? If so:

We would like to do so.

> Acked-by: Felipe Balbi <balbi@kernel.org>

Thank you.

> Otherwise, let me know and I'll pick this patch.

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-27 12:14       ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-27 12:14 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86

On 2020-03-25 10:37:57 [+0200], Felipe Balbi wrote:
> Do you want to carry it via your tree? If so:

We would like to do so.

> Acked-by: Felipe Balbi <balbi@kernel.org>

Thank you.

> Otherwise, let me know and I'll pick this patch.

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-27 12:14       ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-27 12:14 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Oleg Nesterov, Guo Ren, Joel Fernandes, Vincent Chen,
	Thomas Gleixner, Jonathan Corbet, Davidlohr Bueso,
	kbuild test robot, Brian Cain, linux-acpi, Paul E . McKenney,
	linux-hexagon, Rafael J. Wysocki, linux-csky, Ingo Molnar,
	Linus Torvalds, Darren Hart, Zhang Rui, Len Brown, Fenghua Yu,
	Arnd Bergmann, linux-pm, linuxppc-dev, Greentime Hu,
	Bjorn Helgaas, Kurt Schwemmer, platform-driver-x86, Kalle Valo,
	Michal Simek, Tony Luck, Nick Hu, Geoff Levand,
	Greg Kroah-Hartman, linux-usb, linux-wireless, LKML,
	Davidlohr Bueso, netdev, Logan Gunthorpe, David S. Miller,
	Andy Shevchenko

On 2020-03-25 10:37:57 [+0200], Felipe Balbi wrote:
> Do you want to carry it via your tree? If so:

We would like to do so.

> Acked-by: Felipe Balbi <balbi@kernel.org>

Thank you.

> Otherwise, let me know and I'll pick this patch.

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it
@ 2020-03-27 12:14       ` Sebastian Siewior
  0 siblings, 0 replies; 195+ messages in thread
From: Sebastian Siewior @ 2020-03-27 12:14 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Linus Torvalds, Joel Fernandes, Oleg Nesterov, Davidlohr Bueso,
	Greg Kroah-Hartman, linux-usb, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Kalle Valo, David S. Miller,
	linux-wireless, netdev, Darren Hart, Andy Shevchenko,
	platform-driver-x86, Zhang Rui, Rafael J. Wysocki, linux-pm,
	Len Brown, linux-acpi, kbuild test robot, Nick Hu, Greentime Hu,
	Vincent Chen, Guo Ren, linux-csky, Brian Cain, linux-hexagon,
	Tony Luck, Fenghua Yu, linux-ia64, Michal Simek,
	Michael Ellerman, Arnd Bergmann, Geoff Levand, linuxppc-dev,
	Paul E . McKenney, Jonathan Corbet, Randy Dunlap,
	Davidlohr Bueso

On 2020-03-25 10:37:57 [+0200], Felipe Balbi wrote:
> Do you want to carry it via your tree? If so:

We would like to do so.

> Acked-by: Felipe Balbi <balbi@kernel.org>

Thank you.

> Otherwise, let me know and I'll pick this patch.

Sebastian

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
  2020-03-21 11:25   ` Thomas Gleixner
  (?)
  (?)
@ 2020-03-27 19:14     ` Geoff Levand
  -1 siblings, 0 replies; 195+ messages in thread
From: Geoff Levand @ 2020-03-27 19:14 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, linuxppc-dev, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

Hi,

On 3/21/20 4:25 AM, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The PS3 notification interrupt and kthread use a hacked up completion to
> communicate. Since we're wanting to change the completion implementation and
> this is abuse anyway, replace it with a simple rcuwait since there is only ever
> the one waiter.
> 
> AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
> cannot receive signals by default and this one doesn't look different. Use
> TASK_IDLE instead.

I tested the patch set applied against v5.6-rc7 on the PS3 and it worked
as expected.

Tested by: Geoff Levand <geoff@infradead.org>


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-27 19:14     ` Geoff Levand
  0 siblings, 0 replies; 195+ messages in thread
From: Geoff Levand @ 2020-03-27 19:14 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, linuxppc-dev, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev

Hi,

On 3/21/20 4:25 AM, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The PS3 notification interrupt and kthread use a hacked up completion to
> communicate. Since we're wanting to change the completion implementation and
> this is abuse anyway, replace it with a simple rcuwait since there is only ever
> the one waiter.
> 
> AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
> cannot receive signals by default and this one doesn't look different. Use
> TASK_IDLE instead.

I tested the patch set applied against v5.6-rc7 on the PS3 and it worked
as expected.

Tested by: Geoff Levand <geoff@infradead.org>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-27 19:14     ` Geoff Levand
  0 siblings, 0 replies; 195+ messages in thread
From: Geoff Levand @ 2020-03-27 19:14 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Randy Dunlap, linux-ia64, Peter Zijlstra, linux-pci,
	Sebastian Siewior, platform-driver-x86, Guo Ren, Joel Fernandes,
	linux-hexagon, Vincent Chen, Ingo Molnar, Jonathan Corbet,
	Davidlohr Bueso, kbuild test robot, Brian Cain, linux-acpi,
	Paul E . McKenney, Rafael J. Wysocki, linux-csky, Linus Torvalds,
	Darren Hart, Zhang Rui, Len Brown, Fenghua Yu, Arnd Bergmann,
	linux-pm, Greentime Hu, Bjorn Helgaas, Kurt Schwemmer,
	Kalle Valo, Felipe Balbi, Michal Simek, Tony Luck, Nick Hu,
	Greg Kroah-Hartman, linux-usb, linux-wireless, Oleg Nesterov,
	Davidlohr Bueso, Logan Gunthorpe, netdev, linuxppc-dev,
	David S. Miller, Andy Shevchenko

Hi,

On 3/21/20 4:25 AM, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The PS3 notification interrupt and kthread use a hacked up completion to
> communicate. Since we're wanting to change the completion implementation and
> this is abuse anyway, replace it with a simple rcuwait since there is only ever
> the one waiter.
> 
> AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
> cannot receive signals by default and this one doesn't look different. Use
> TASK_IDLE instead.

I tested the patch set applied against v5.6-rc7 on the PS3 and it worked
as expected.

Tested by: Geoff Levand <geoff@infradead.org>


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait
@ 2020-03-27 19:14     ` Geoff Levand
  0 siblings, 0 replies; 195+ messages in thread
From: Geoff Levand @ 2020-03-27 19:14 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Peter Zijlstra, Ingo Molnar, Sebastian Siewior, Linus Torvalds,
	Joel Fernandes, Oleg Nesterov, Davidlohr Bueso, Michael Ellerman,
	Arnd Bergmann, linuxppc-dev, Logan Gunthorpe, Bjorn Helgaas,
	Kurt Schwemmer, linux-pci, Greg Kroah-Hartman, Felipe Balbi,
	linux-usb, Kalle Valo, David S. Miller, linux-wireless, netdev,
	Darren Hart, Andy Shevchenko, platform-driver-x86, Zhang Rui,
	Rafael J. Wysocki, linux-pm, Len Brown, linux-acpi,
	kbuild test robot, Nick Hu, Greentime Hu, Vincent Chen, Guo Ren,
	linux-csky, Brian Cain, linux-hexagon, Tony Luck, Fenghua Yu,
	linux-ia64, Michal Simek, Paul E . McKenney, Jonathan Corbet,
	Randy Dunlap, Davidlohr Bueso

Hi,

On 3/21/20 4:25 AM, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The PS3 notification interrupt and kthread use a hacked up completion to
> communicate. Since we're wanting to change the completion implementation and
> this is abuse anyway, replace it with a simple rcuwait since there is only ever
> the one waiter.
> 
> AFAICT the kthread uses TASK_INTERRUPTIBLE to not increase loadavg, kthreads
> cannot receive signals by default and this one doesn't look different. Use
> TASK_IDLE instead.

I tested the patch set applied against v5.6-rc7 on the PS3 and it worked
as expected.

Tested by: Geoff Levand <geoff@infradead.org>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [tip: locking/core] Documentation/locking/locktypes: Minor copy editor fixes
  2020-03-25 16:58             ` Randy Dunlap
                               ` (3 preceding siblings ...)
  (?)
@ 2020-03-28 11:52             ` tip-bot2 for Randy Dunlap
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Randy Dunlap @ 2020-03-28 11:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Randy Dunlap, Thomas Gleixner, Paul E. McKenney, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     51e69e6551a8c6fffe0185ba305bb4e2d7223616
Gitweb:        https://git.kernel.org/tip/51e69e6551a8c6fffe0185ba305bb4e2d7223616
Author:        Randy Dunlap <rdunlap@infradead.org>
AuthorDate:    Wed, 25 Mar 2020 09:58:14 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 28 Mar 2020 12:47:34 +01:00

Documentation/locking/locktypes: Minor copy editor fixes

Minor editorial fixes:
- remove 'enabled' from PREEMPT_RT enabled kernels for consistency
- add some periods for consistency
- add "'" for possessive CPU's
- spell out interrupts

[ tglx: Picked up Paul's suggestions ]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/ac615f36-0b44-408d-aeab-d76e4241add4@infradead.org

---
 Documentation/locking/locktypes.rst | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index 1c18bb8..09f45ce 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -84,7 +84,7 @@ rtmutex
 
 RT-mutexes are mutexes with support for priority inheritance (PI).
 
-PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
+PI has limitations on non-PREEMPT_RT kernels due to preemption and
 interrupt disabled sections.
 
 PI clearly cannot preempt preemption-disabled or interrupt-disabled
@@ -150,7 +150,7 @@ kernel configuration including PREEMPT_RT enabled kernels.
 
 raw_spinlock_t is a strict spinning lock implementation in all kernels,
 including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
-core code, low level interrupt handling and places where disabling
+core code, low-level interrupt handling and places where disabling
 preemption or interrupts is required, for example, to safely access
 hardware state.  raw_spinlock_t can sometimes also be used when the
 critical section is tiny, thus avoiding RT-mutex overhead.
@@ -160,20 +160,20 @@ spinlock_t
 
 The semantics of spinlock_t change with the state of PREEMPT_RT.
 
-On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
-and has exactly the same semantics.
+On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has
+exactly the same semantics.
 
 spinlock_t and PREEMPT_RT
 -------------------------
 
-On a PREEMPT_RT enabled kernel spinlock_t is mapped to a separate
-implementation based on rt_mutex which changes the semantics:
+On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation
+based on rt_mutex which changes the semantics:
 
- - Preemption is not disabled
+ - Preemption is not disabled.
 
  - The hard interrupt related suffixes for spin_lock / spin_unlock
-   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
-   interrupt disabled state
+   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
+   interrupt disabled state.
 
  - The soft interrupt related suffix (_bh()) still disables softirq
    handlers.
@@ -279,8 +279,8 @@ fully preemptible context.  Instead, use spin_lock_irq() or
 spin_lock_irqsave() and their unlock counterparts.  In cases where the
 interrupt disabling and locking must remain separate, PREEMPT_RT offers a
 local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
-allowing things like per-CPU irq-disabled locks to be acquired.  However,
-this approach should be used only where absolutely necessary.
+allowing things like per-CPU interrupt disabled locks to be acquired.
+However, this approach should be used only where absolutely necessary.
 
 
 raw_spinlock_t

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [tip: locking/core] Documentation/locking/locktypes: Further clarifications and wordsmithing
  2020-03-25 12:27           ` Thomas Gleixner
                             ` (5 preceding siblings ...)
  (?)
@ 2020-03-28 11:52           ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 195+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-03-28 11:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Paul McKenney, Thomas Gleixner, Sebastian Andrzej Siewior, x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     7ecc6aa522e1b812a2eacc31066945e920b0fde4
Gitweb:        https://git.kernel.org/tip/7ecc6aa522e1b812a2eacc31066945e920b0fde4
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 25 Mar 2020 13:27:49 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 28 Mar 2020 12:47:34 +01:00

Documentation/locking/locktypes: Further clarifications and wordsmithing

The documentation of rw_semaphores is wrong as it claims that the non-owner
reader release is not supported by RT. That's just history biased memory
distortion.

Split the 'Owner semantics' section up and add separate sections for
semaphore and rw_semaphore to reflect reality.

Aside of that the following updates are done:

 - Add pseudo code to document the spinlock state preserving mechanism on
   PREEMPT_RT

 - Wordsmith the bitspinlock and lock nesting sections

Co-developed-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/87wo78y5yy.fsf@nanos.tec.linutronix.de

---
 Documentation/locking/locktypes.rst | 148 +++++++++++++++++----------
 1 file changed, 98 insertions(+), 50 deletions(-)

diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index f0aa911..1c18bb8 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -67,6 +67,17 @@ can have suffixes which apply further protections:
  _irqsave/restore()   Save and disable / restore interrupt disabled state
  ===================  ====================================================
 
+Owner semantics
+===============
+
+The aforementioned lock types except semaphores have strict owner
+semantics:
+
+  The context (task) that acquired the lock must release it.
+
+rw_semaphores have a special interface which allows non-owner release for
+readers.
+
 
 rtmutex
 =======
@@ -83,6 +94,51 @@ interrupt handlers and soft interrupts.  This conversion allows spinlock_t
 and rwlock_t to be implemented via RT-mutexes.
 
 
+semaphore
+=========
+
+semaphore is a counting semaphore implementation.
+
+Semaphores are often used for both serialization and waiting, but new use
+cases should instead use separate serialization and wait mechanisms, such
+as mutexes and completions.
+
+semaphores and PREEMPT_RT
+----------------------------
+
+PREEMPT_RT does not change the semaphore implementation because counting
+semaphores have no concept of owners, thus preventing PREEMPT_RT from
+providing priority inheritance for semaphores.  After all, an unknown
+owner cannot be boosted. As a consequence, blocking on semaphores can
+result in priority inversion.
+
+
+rw_semaphore
+============
+
+rw_semaphore is a multiple readers and single writer lock mechanism.
+
+On non-PREEMPT_RT kernels the implementation is fair, thus preventing
+writer starvation.
+
+rw_semaphore complies by default with the strict owner semantics, but there
+exist special-purpose interfaces that allow non-owner release for readers.
+These interfaces work independent of the kernel configuration.
+
+rw_semaphore and PREEMPT_RT
+---------------------------
+
+PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
+implementation, thus changing the fairness:
+
+ Because an rw_semaphore writer cannot grant its priority to multiple
+ readers, a preempted low-priority reader will continue holding its lock,
+ thus starving even high-priority writers.  In contrast, because readers
+ can grant their priority to a writer, a preempted low-priority writer will
+ have its priority boosted until it releases the lock, thus preventing that
+ writer from starving readers.
+
+
 raw_spinlock_t and spinlock_t
 =============================
 
@@ -102,7 +158,7 @@ critical section is tiny, thus avoiding RT-mutex overhead.
 spinlock_t
 ----------
 
-The semantics of spinlock_t change with the state of CONFIG_PREEMPT_RT.
+The semantics of spinlock_t change with the state of PREEMPT_RT.
 
 On a non PREEMPT_RT enabled kernel spinlock_t is mapped to raw_spinlock_t
 and has exactly the same semantics.
@@ -140,7 +196,16 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
    kernels leave task state untouched.  However, PREEMPT_RT must change
    task state if the task blocks during acquisition.  Therefore, it saves
    the current task state before blocking and the corresponding lock wakeup
-   restores it.
+   restores it, as shown below::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					lock wakeup
+					  task->state = task->saved_state
 
    Other types of wakeups would normally unconditionally set the task state
    to RUNNING, but that does not work here because the task must remain
@@ -148,7 +213,22 @@ PREEMPT_RT kernels preserve all other spinlock_t semantics:
    wakeup attempts to awaken a task blocked waiting for a spinlock, it
    instead sets the saved state to RUNNING.  Then, when the lock
    acquisition completes, the lock wakeup sets the task state to the saved
-   state, in this case setting it to RUNNING.
+   state, in this case setting it to RUNNING::
+
+    task->state = TASK_INTERRUPTIBLE
+     lock()
+       block()
+         task->saved_state = task->state
+	 task->state = TASK_UNINTERRUPTIBLE
+	 schedule()
+					non lock wakeup
+					  task->saved_state = TASK_RUNNING
+
+					lock wakeup
+					  task->state = task->saved_state
+
+   This ensures that the real wakeup cannot be lost.
+
 
 rwlock_t
 ========
@@ -228,17 +308,16 @@ preemption on PREEMPT_RT kernels::
 bit spinlocks
 -------------
 
-Bit spinlocks are problematic for PREEMPT_RT as they cannot be easily
-substituted by an RT-mutex based implementation for obvious reasons.
-
-The semantics of bit spinlocks are preserved on PREEMPT_RT kernels and the
-caveats vs. raw_spinlock_t apply.
+PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
+small to accommodate an RT-mutex.  Therefore, the semantics of bit
+spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
+caveats also apply to bit spinlocks.
 
-Some bit spinlocks are substituted by regular spinlock_t for PREEMPT_RT but
-this requires conditional (#ifdef'ed) code changes at the usage site while
-the spinlock_t substitution is simply done by the compiler and the
-conditionals are restricted to header files and core implementation of the
-locking primitives and the usage sites do not require any changes.
+Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
+using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
+usage-site changes are not needed for the spinlock_t substitution.
+Instead, conditionals in header files and the core locking implemementation
+enable the compiler to do the substitution transparently.
 
 
 Lock type nesting rules
@@ -254,46 +333,15 @@ The most basic rules are:
 
   - Spinning lock types can nest inside sleeping lock types.
 
-These rules apply in general independent of CONFIG_PREEMPT_RT.
+These constraints apply both in PREEMPT_RT and otherwise.
 
-As PREEMPT_RT changes the lock category of spinlock_t and rwlock_t from
-spinning to sleeping this has obviously restrictions how they can nest with
-raw_spinlock_t.
-
-This results in the following nest ordering:
+The fact that PREEMPT_RT changes the lock category of spinlock_t and
+rwlock_t from spinning to sleeping means that they cannot be acquired while
+holding a raw spinlock.  This results in the following nesting ordering:
 
   1) Sleeping locks
   2) spinlock_t and rwlock_t
   3) raw_spinlock_t and bit spinlocks
 
-Lockdep is aware of these constraints to ensure that they are respected.
-
-
-Owner semantics
-===============
-
-Most lock types in the Linux kernel have strict owner semantics, i.e. the
-context (task) which acquires a lock has to release it.
-
-There are two exceptions:
-
-  - semaphores
-  - rwsems
-
-semaphores have no owner semantics for historical reason, and as such
-trylock and release operations can be called from any context. They are
-often used for both serialization and waiting purposes. That's generally
-discouraged and should be replaced by separate serialization and wait
-mechanisms, such as mutexes and completions.
-
-rwsems have grown interfaces which allow non owner release for special
-purposes. This usage is problematic on PREEMPT_RT because PREEMPT_RT
-substitutes all locking primitives except semaphores with RT-mutex based
-implementations to provide priority inheritance for all lock types except
-the truly spinning ones. Priority inheritance on ownerless locks is
-obviously impossible.
-
-For now the rwsem non-owner release excludes code which utilizes it from
-being used on PREEMPT_RT enabled kernels. In same cases this can be
-mitigated by disabling portions of the code, in other cases the complete
-functionality has to be disabled until a workable solution has been found.
+Lockdep will complain if these constraints are violated, both in
+PREEMPT_RT and otherwise.

^ permalink raw reply related	[flat|nested] 195+ messages in thread

end of thread, other threads:[~2020-03-28 11:52 UTC | newest]

Thread overview: 195+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-21 11:25 [patch V3 00/20] Lock ordering documentation and annotation for lockdep Thomas Gleixner
2020-03-21 11:25 ` Thomas Gleixner
2020-03-21 11:25 ` Thomas Gleixner
2020-03-21 11:25 ` Thomas Gleixner
2020-03-21 11:25 ` [patch V3 01/20] PCI/switchtec: Fix init_completion race condition with poll_wait() Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25 ` [patch V3 02/20] pci/switchtec: Replace completion wait queue usage for poll Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-21 11:25 ` [patch V3 03/20] usb: gadget: Use completion interface instead of open coding it Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2020-03-25  8:37   ` [patch V3 03/20] " Felipe Balbi
2020-03-25  8:37     ` Felipe Balbi
2020-03-25  8:37     ` Felipe Balbi
2020-03-25  8:37     ` Felipe Balbi
2020-03-27 12:14     ` Sebastian Siewior
2020-03-27 12:14       ` Sebastian Siewior
2020-03-27 12:14       ` Sebastian Siewior
2020-03-27 12:14       ` Sebastian Siewior
2020-03-21 11:25 ` [patch V3 04/20] orinoco_usb: Use the regular completion interfaces Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2020-03-22 14:42   ` [patch V3 04/20] " Kalle Valo
2020-03-22 14:42     ` Kalle Valo
2020-03-22 14:42     ` Kalle Valo
2020-03-21 11:25 ` [patch V3 05/20] acpi: Remove header dependency Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 12:23   ` Andy Shevchenko
2020-03-21 12:23     ` Andy Shevchenko
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Peter Zijlstra
2020-03-22  7:02   ` [patch V3 05/20] " Rafael J. Wysocki
2020-03-22  7:02     ` Rafael J. Wysocki
2020-03-21 11:25 ` [patch V3 06/20] nds32: Remove mm.h from asm/uaccess.h Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-21 11:25 ` [patch V3 07/20] csky: " Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-21 11:25 ` [patch V3 08/20] hexagon: " Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-23 21:46   ` [patch V3 08/20] " Brian Cain
2020-03-23 21:46     ` Brian Cain
2020-03-23 21:46     ` Brian Cain
2020-03-23 21:46     ` Brian Cain
2020-03-21 11:25 ` [patch V3 09/20] ia64: " Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-21 11:25 ` [patch V3 10/20] microblaze: " Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-21 11:25 ` [patch V3 11/20] rcuwait: Add @state argument to rcuwait_wait_event() Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Peter Zijlstra (Intel)
2020-03-21 11:25 ` [patch V3 12/20] powerpc/ps3: Convert half completion to rcuwait Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 13:22   ` Thomas Gleixner
2020-03-21 13:22     ` Thomas Gleixner
2020-03-21 13:22     ` Thomas Gleixner
2020-03-21 13:22     ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Peter Zijlstra (Intel)
2020-03-27 19:14   ` [patch V3 12/20] " Geoff Levand
2020-03-27 19:14     ` Geoff Levand
2020-03-27 19:14     ` Geoff Levand
2020-03-27 19:14     ` Geoff Levand
2020-03-21 11:25 ` [patch V3 13/20] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2020-03-23  2:55   ` [patch V3 13/20] " Paul E. McKenney
2020-03-23  2:55     ` Paul E. McKenney
2020-03-23  2:55     ` Paul E. McKenney
2020-03-23  2:55     ` Paul E. McKenney
2020-03-23  2:55     ` Paul E. McKenney
2020-03-24 23:13     ` Thomas Gleixner
2020-03-24 23:13       ` Thomas Gleixner
2020-03-24 23:13       ` Thomas Gleixner
2020-03-24 23:13       ` Thomas Gleixner
2020-03-24 23:13       ` Thomas Gleixner
2020-03-25  0:28       ` Paul E. McKenney
2020-03-25  0:28         ` Paul E. McKenney
2020-03-25  0:28         ` Paul E. McKenney
2020-03-25  0:28         ` Paul E. McKenney
2020-03-25  0:28         ` Paul E. McKenney
2020-03-25 12:27         ` Documentation/locking/locktypes: Further clarifications and wordsmithing Thomas Gleixner
2020-03-25 12:27           ` Thomas Gleixner
2020-03-25 12:27           ` Thomas Gleixner
2020-03-25 12:27           ` Thomas Gleixner
2020-03-25 12:27           ` Thomas Gleixner
2020-03-25 16:02           ` Sebastian Siewior
2020-03-25 16:02             ` Sebastian Siewior
2020-03-25 16:02             ` Sebastian Siewior
2020-03-25 16:02             ` Sebastian Siewior
2020-03-25 16:02             ` Sebastian Siewior
2020-03-25 16:39             ` Paul E. McKenney
2020-03-25 16:39               ` Paul E. McKenney
2020-03-25 16:39               ` Paul E. McKenney
2020-03-25 16:39               ` Paul E. McKenney
2020-03-25 16:54               ` Sebastian Siewior
2020-03-25 16:54                 ` Sebastian Siewior
2020-03-25 16:54                 ` Sebastian Siewior
2020-03-25 16:54                 ` Sebastian Siewior
2020-03-25 16:58           ` [PATCH v2] Documentation/locking/locktypes: minor copy editor fixes Randy Dunlap
2020-03-25 16:58             ` Randy Dunlap
2020-03-25 16:58             ` Randy Dunlap
2020-03-25 16:58             ` Randy Dunlap
2020-03-26  2:40             ` Paul E. McKenney
2020-03-26  2:40               ` Paul E. McKenney
2020-03-26  2:40               ` Paul E. McKenney
2020-03-26  2:40               ` Paul E. McKenney
2020-03-26  2:40               ` Paul E. McKenney
2020-03-28 11:52             ` [tip: locking/core] Documentation/locking/locktypes: Minor " tip-bot2 for Randy Dunlap
2020-03-28 11:52           ` [tip: locking/core] Documentation/locking/locktypes: Further clarifications and wordsmithing tip-bot2 for Thomas Gleixner
2020-03-21 11:25 ` [patch V3 14/20] timekeeping: Split jiffies seqlock Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2020-03-21 11:25 ` [patch V3 15/20] sched/swait: Prepare usage in completions Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 11:25   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2020-03-21 11:26 ` [patch V3 16/20] completion: Use simple wait queues Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2020-03-23 15:20   ` [PATCH] completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() Sebastian Siewior
2020-03-23 15:20     ` Sebastian Siewior
2020-03-23 15:20     ` Sebastian Siewior
2020-03-23 15:20     ` Sebastian Siewior
2020-03-23 17:50     ` [tip: locking/core] " tip-bot2 for Sebastian Siewior
2020-03-21 11:26 ` [patch V3 17/20] lockdep: Introduce wait-type checks Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Peter Zijlstra
2020-03-21 11:26 ` [patch V3 18/20] lockdep: Add hrtimer context tracing bits Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-21 16:46     ` Frederic Weisbecker
2020-03-21 11:26 ` [patch V3 19/20] lockdep: Annotate irq_work Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-21 16:40     ` Frederic Weisbecker
2020-03-21 18:12       ` Sebastian Andrzej Siewior
2020-03-22  2:33         ` Frederic Weisbecker
2020-03-22  2:39           ` Frederic Weisbecker
2020-03-22 12:27           ` Sebastian Andrzej Siewior
2020-03-21 11:26 ` [patch V3 20/20] lockdep: Add posixtimer context tracing bits Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 11:26   ` Thomas Gleixner
2020-03-21 15:53   ` [tip: locking/core] " tip-bot2 for Sebastian Andrzej Siewior
2020-03-21 17:19 ` [patch V3 00/20] Lock ordering documentation and annotation for lockdep Davidlohr Bueso
2020-03-21 17:19   ` Davidlohr Bueso
2020-03-21 17:19   ` Davidlohr Bueso
2020-03-21 17:19   ` Davidlohr Bueso
2020-03-21 17:45   ` Thomas Gleixner
2020-03-21 17:45     ` Thomas Gleixner
2020-03-21 17:45     ` Thomas Gleixner
2020-03-21 17:45     ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.