linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* write_lock_irq(&tasklist_lock)
@ 2018-05-22 19:40 Sodagudi Prasad
  2018-05-22 20:27 ` write_lock_irq(&tasklist_lock) Linus Torvalds
  2018-05-23 13:05 ` write_lock_irq(&tasklist_lock) Will Deacon
  0 siblings, 2 replies; 14+ messages in thread
From: Sodagudi Prasad @ 2018-05-22 19:40 UTC (permalink / raw)
  To: keescook, luto, wad, akpm, riel, tglx, mingo, peterz, ebiggers,
	fweisbec, sherryy, vegard.nossum, cl, aarcange, alexander.levin,
	vegard.nossum, sherryy, fweisbec, ebiggers, peterz
  Cc: linux-kernel, torvalds

Hi All,

When following test is executed on 4.14.41 stable kernel, observed that 
one of the core is waiting for tasklist_lock for long time with IRQs 
disabled.
./stress-ng-64 --get 8 -t 3h --times --metrics-brief

Every time when device is crashed, I observed that one the task stuck at 
fork system call and waiting for tasklist_lock as writer with irq 
disabled.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

Some other tasks are making getrlimit, prlimit system calls, so that 
these readers are continuously taking tasklist_list read lock.

Writer has disabled local IRQs for long time and waiting to readers to 
finish but readers are keeping tasklist_lock busy
for quite long time.

I think, −−get N option creates N thread and they make following system 
calls.
========================================================================
start N workers that call system calls that fetch data from the kernel, 
currently these are: getpid,
getppid, getcwd, getgid, getegid, getuid, getgroups, getpgrp, getpgid, 
getpriority, getresgid, getresuid,
getrlimit, prlimit, getrusage, getsid, gettid, getcpu, gettimeofday, 
uname, adjtimex, sysfs.
Some of these system calls are OS specific.
========================================================================

Have you observed this type of issues with tasklist_lock ?
Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can 
I use write_unlock instead of write_lock_irq in portion of code?
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

-Thanks, Prasad

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-22 19:40 write_lock_irq(&tasklist_lock) Sodagudi Prasad
@ 2018-05-22 20:27 ` Linus Torvalds
  2018-05-22 21:17   ` write_lock_irq(&tasklist_lock) Peter Zijlstra
  2018-05-23 13:05 ` write_lock_irq(&tasklist_lock) Will Deacon
  1 sibling, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2018-05-22 20:27 UTC (permalink / raw)
  To: psodagud
  Cc: Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum,
	Christoph Lameter, Andrea Arcangeli, Sasha Levin,
	Linux Kernel Mailing List

On Tue, May 22, 2018 at 12:40 PM Sodagudi Prasad <psodagud@codeaurora.org>
wrote:

> Have you observed this type of issues with tasklist_lock ?

tasklist_lock remains pretty painful. It covers too much, but trying to
split it up has never worked well.

It's usually not a huge problem because there are so few writers, but what
you're seeing is the writer starvation issue because readers can be
plentiful.

> Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can
> I use write_unlock instead of write_lock_irq in portion of code?

You absolutely need write_lock_irq(), because taking the tasklist_lock
without disabling interrupts will deadlock very quickly due to an interrupt
taking the tasklist_lock for reading.

That said, the write_lock_irq(&tasklist_lock) could possibly be replaced
with something like

   static void tasklist_write_lock(void)
   {
         unsigned long flags;
         local_irq_save(flags);
         while (!write_trylock(&tasklist_lock)) {
                 local_irq_restore(flags);
                 do { cpu_relax(); } while (write_islocked(&tasklist_lock));
                 local_irq_disable();
         }
   }

but we don't have that "write_islocked()" function.

So the above would need more work, and is entirely untested anyway,
obviously.

                 Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-22 20:27 ` write_lock_irq(&tasklist_lock) Linus Torvalds
@ 2018-05-22 21:17   ` Peter Zijlstra
  2018-05-22 21:31     ` write_lock_irq(&tasklist_lock) Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2018-05-22 21:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Eric Biggers,
	Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter,
	Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List

On Tue, May 22, 2018 at 01:27:17PM -0700, Linus Torvalds wrote:

> It's usually not a huge problem because there are so few writers, but what
> you're seeing is the writer starvation issue because readers can be
> plentiful.

qrwlock is a fair lock and should not exhibit writer starvation. Write
acquire time is of course proportional to the number of CPUs in the
system because each can be holding a reader, but once there is a pending
writer there should not be new readers.

> > Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can
> > I use write_unlock instead of write_lock_irq in portion of code?
> 
> You absolutely need write_lock_irq(), because taking the tasklist_lock
> without disabling interrupts will deadlock very quickly due to an interrupt
> taking the tasklist_lock for reading.
> 
> That said, the write_lock_irq(&tasklist_lock) could possibly be replaced
> with something like
> 
>    static void tasklist_write_lock(void)
>    {
>          unsigned long flags;
>          local_irq_save(flags);
>          while (!write_trylock(&tasklist_lock)) {
>                  local_irq_restore(flags);
>                  do { cpu_relax(); } while (write_islocked(&tasklist_lock));
>                  local_irq_disable();
>          }
>    }
> 
> but we don't have that "write_islocked()" function.

You basically want to spin-wait with interrupts enabled, right? I think
there were problems with that, but I can't remember, my brain is
entirely shot for the day. But given how qrwlock is constructed, you'd
have to look at the arch spinlock implementation for that (qspinlock for
x86).

The obvious problem of course is:


	spin_lock_irq(&L)
	// contended
	local_irq_enable();
	while(...)
	  cpu_relax();
	   <IRQ>
	     spin_lock(&L);

Which because the spinlock is fair, is an instant deadlock.


You can do the IRQ enabled spin-wait for unfair locks, but by now all
our locks are fair (because we kept hitting starvation).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-22 21:17   ` write_lock_irq(&tasklist_lock) Peter Zijlstra
@ 2018-05-22 21:31     ` Linus Torvalds
  2018-05-23  8:19       ` write_lock_irq(&tasklist_lock) Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2018-05-22 21:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Eric Biggers,
	Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter,
	Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List

On Tue, May 22, 2018 at 2:17 PM Peter Zijlstra <peterz@infradead.org> wrote:

> qrwlock is a fair lock and should not exhibit writer starvation.

We actually have a special rule to make it *not* be fair, in that
interrupts are allowed to take the read lock if there are readers - even if
there are waiting writers.

I'm not sure how much of an fairness effect this has, but it's required
because of our rule that you can take it for reading without disabling
interrupts.

See

      void queued_read_lock_slowpath(struct qrwlock *lock)

in kernel/locking/qrwlock.c.

> You basically want to spin-wait with interrupts enabled, right?

That was the intent of my (untested) pseudo-code. It should work fine. Note
that I used write_trylock() only, so there is no queueing (which also
implies no fairness).

I'm not saying it's a _good_ idea.  I'm saying it might work if all you
worry about is the irq-disabled part.

                 Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-22 21:31     ` write_lock_irq(&tasklist_lock) Linus Torvalds
@ 2018-05-23  8:19       ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2018-05-23  8:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Eric Biggers,
	Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter,
	Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List

On Tue, May 22, 2018 at 02:31:42PM -0700, Linus Torvalds wrote:
> On Tue, May 22, 2018 at 2:17 PM Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > qrwlock is a fair lock and should not exhibit writer starvation.
> 
> We actually have a special rule to make it *not* be fair, in that
> interrupts are allowed to take the read lock if there are readers - even if
> there are waiting writers.

Urgh, right.. would be interesting to know how much of that is happening
in that workload. I assumed the readers were mostly due to the syscalls
the reporter talked about, and those should not trigger that case.

> > You basically want to spin-wait with interrupts enabled, right?
> 
> That was the intent of my (untested) pseudo-code. It should work fine. Note
> that I used write_trylock() only, so there is no queueing (which also
> implies no fairness).
> 
> I'm not saying it's a _good_ idea.  I'm saying it might work if all you
> worry about is the irq-disabled part.

Right, if you make it unfair and utterly prone to starvation then yes,
you can make it 'work'.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-22 19:40 write_lock_irq(&tasklist_lock) Sodagudi Prasad
  2018-05-22 20:27 ` write_lock_irq(&tasklist_lock) Linus Torvalds
@ 2018-05-23 13:05 ` Will Deacon
  2018-05-23 15:25   ` write_lock_irq(&tasklist_lock) Linus Torvalds
  1 sibling, 1 reply; 14+ messages in thread
From: Will Deacon @ 2018-05-23 13:05 UTC (permalink / raw)
  To: Sodagudi Prasad
  Cc: keescook, luto, wad, akpm, riel, tglx, mingo, peterz, ebiggers,
	fweisbec, sherryy, vegard.nossum, cl, aarcange, alexander.levin,
	linux-kernel, torvalds

Hi Prasad,

On Tue, May 22, 2018 at 12:40:05PM -0700, Sodagudi Prasad wrote:
> When following test is executed on 4.14.41 stable kernel, observed that one
> of the core is waiting for tasklist_lock for long time with IRQs disabled.
> ./stress-ng-64 --get 8 -t 3h --times --metrics-brief
> 
> Every time when device is crashed, I observed that one the task stuck at
> fork system call and waiting for tasklist_lock as writer with irq disabled.
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

Please use a newer kernel. We've addressed this in mainline by moving
arm64 over to the qrwlock implementation which (after some other changes)
guarantees forward progress for well-behaved readers and writers.

To give an example from locktorture with 2 writers and 8 readers, after
a few seconds I see:

rwlock:
	Writes:	Total: 6725	Max/Min: 0/0	Fail: 0
	Reads:	Total: 5103727	Max/Min: 0/0	Fail: 0

qrwlock:
	Writes:	Total: 155284	Max/Min: 0/0	Fail: 0
	Reads:	Total: 767703	Max/Min: 0/0	Fail: 0

so the ratio is closer to ~6:1 rather than ~191:1 for this case. The
total locking throughput has dropped, but that's expected for this type
of lock where maximum throughput would be achieved by favouring readers.

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-23 13:05 ` write_lock_irq(&tasklist_lock) Will Deacon
@ 2018-05-23 15:25   ` Linus Torvalds
  2018-05-23 15:36     ` write_lock_irq(&tasklist_lock) Will Deacon
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2018-05-23 15:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum,
	Christoph Lameter, Andrea Arcangeli, Sasha Levin,
	Linux Kernel Mailing List

On Wed, May 23, 2018 at 6:05 AM Will Deacon <will.deacon@arm.com> wrote:

> Please use a newer kernel. We've addressed this in mainline by moving
> arm64 over to the qrwlock implementation which (after some other changes)
> guarantees forward progress for well-behaved readers and writers.

Oh, I didn't even realize that this wasn't x86, and that there was still
the very unfair rwlock issue on 4.14 on arm.

Yeah, the queuing rwlocks shouldn't show the really pathological problems
we used to have long ago.

               Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-23 15:25   ` write_lock_irq(&tasklist_lock) Linus Torvalds
@ 2018-05-23 15:36     ` Will Deacon
  2018-05-23 16:26       ` write_lock_irq(&tasklist_lock) Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2018-05-23 15:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum,
	Christoph Lameter, Andrea Arcangeli, Sasha Levin,
	Linux Kernel Mailing List, boqun.feng

[+Boqun]

On Wed, May 23, 2018 at 08:25:06AM -0700, Linus Torvalds wrote:
> On Wed, May 23, 2018 at 6:05 AM Will Deacon <will.deacon@arm.com> wrote:
> 
> > Please use a newer kernel. We've addressed this in mainline by moving
> > arm64 over to the qrwlock implementation which (after some other changes)
> > guarantees forward progress for well-behaved readers and writers.
> 
> Oh, I didn't even realize that this wasn't x86, and that there was still
> the very unfair rwlock issue on 4.14 on arm.
> 
> Yeah, the queuing rwlocks shouldn't show the really pathological problems
> we used to have long ago.

Yup, although they do reveal new issues that Boqun has been running into
recently with his lockdep improvements. The general pattern is if you
have:

P0:			P1:			P2:

spin_lock(&slock)	read_lock(&rwlock)	write_lock(&rwlock)
read_lock(&rwlock)	spin_lock(&slock)

then the CPUs can be queued on the rwlock such that P1 has the lock, then
P2 is queued and then P0. If P0 has taken the spinlock, we have a deadlock
which wouldn't arise with the non-queued version.

In other words, qrwlock requires consistent locking order wrt spinlocks.

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-23 15:36     ` write_lock_irq(&tasklist_lock) Will Deacon
@ 2018-05-23 16:26       ` Linus Torvalds
  2018-05-24 12:49         ` write_lock_irq(&tasklist_lock) Will Deacon
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2018-05-23 16:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum,
	Christoph Lameter, Andrea Arcangeli, Sasha Levin,
	Linux Kernel Mailing List, Boqun Feng

On Wed, May 23, 2018 at 8:35 AM Will Deacon <will.deacon@arm.com> wrote:

> In other words, qrwlock requires consistent locking order wrt spinlocks.

I *thought* lockdep already tracked and detected this. Or is that only with
with the sleeping versions?

But yes, that's equivalent to the irq-unfairness thing we have a special
case for.

              Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-23 16:26       ` write_lock_irq(&tasklist_lock) Linus Torvalds
@ 2018-05-24 12:49         ` Will Deacon
  2018-05-24 13:51           ` write_lock_irq(&tasklist_lock) Boqun Feng
  0 siblings, 1 reply; 14+ messages in thread
From: Will Deacon @ 2018-05-24 12:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton,
	Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum,
	Christoph Lameter, Andrea Arcangeli, Sasha Levin,
	Linux Kernel Mailing List, Boqun Feng

On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote:
> On Wed, May 23, 2018 at 8:35 AM Will Deacon <will.deacon@arm.com> wrote:
> 
> > In other words, qrwlock requires consistent locking order wrt spinlocks.
> 
> I *thought* lockdep already tracked and detected this. Or is that only with
> with the sleeping versions?

There are patches in-flight to detect this:

https://marc.info/?l=linux-kernel&m=152483640529740&w=2k

as part of Boqun's work into recursive read locking.

Will

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-24 12:49         ` write_lock_irq(&tasklist_lock) Will Deacon
@ 2018-05-24 13:51           ` Boqun Feng
  2018-05-24 17:37             ` write_lock_irq(&tasklist_lock) Sodagudi Prasad
  2018-05-24 21:14             ` write_lock_irq(&tasklist_lock) Andrea Parri
  0 siblings, 2 replies; 14+ messages in thread
From: Boqun Feng @ 2018-05-24 13:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, psodagud, Kees Cook, Andy Lutomirski,
	Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker,
	sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli,
	Sasha Levin, Linux Kernel Mailing List

On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote:
> On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote:
> > On Wed, May 23, 2018 at 8:35 AM Will Deacon <will.deacon@arm.com> wrote:
> > 
> > > In other words, qrwlock requires consistent locking order wrt spinlocks.
> > 
> > I *thought* lockdep already tracked and detected this. Or is that only with
> > with the sleeping versions?
> 
> There are patches in-flight to detect this:
> 
> https://marc.info/?l=linux-kernel&m=152483640529740&w=2k
> 
> as part of Boqun's work into recursive read locking.
> 

Yeah, lemme put some details here:

So we have three cases:

Case #1 (from Will)

	P0:			P1:			P2:

	spin_lock(&slock)	read_lock(&rwlock)
							write_lock(&rwlock)
	read_lock(&rwlock)	spin_lock(&slock)

, which is a deadlock, and couldn't not be detected by lockdep yet. And
lockdep could detect this with the patch I attach at the end of the
mail.

Case #2

	P0:			P1:			P2:

	<in irq handler>
	spin_lock(&slock)	read_lock(&rwlock)
							write_lock(&rwlock)
	read_lock(&rwlock)	spin_lock_irq(&slock)

, which is not a deadlock, as the read_lock() on P0 can use the unfair
fastpass.

Case #3

	P0:			P1:			P2:

				<in irq handler>
	spin_lock_irq(&slock)	read_lock(&rwlock)
							write_lock_irq(&rwlock)
	read_lock(&rwlock)	spin_lock(&slock)

, which is a deadlock, as the read_lock() on P0 cannot use the fastpass.
To detect this and not to make case #2 as a false positive, the
recursive deadlock detection patchset is needed, the WIP version is at:

	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git arr-rfc-wip

The code is done, I'm just working on the rework for documention stuff,
so if anyone is interested, could try it out ;-)

Regards,
Boqun

------------------->8
Subject: [PATCH] locking: More accurate annotations for read_lock()

On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.

Note we used to treat read_lock() as pure recursive read locks in
lib/locking-seftest.c, and this is useful, especially for the lockdep
development selftest, so we keep this via a variable to force switching
lock annotation for read_lock().

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/lockdep.h | 35 ++++++++++++++++++++++++++++++++++-
 lib/locking-selftest.c  | 11 +++++++++++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..07793986c063 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -27,6 +27,7 @@ extern int lock_stat;
 #include <linux/list.h>
 #include <linux/debug_locks.h>
 #include <linux/stacktrace.h>
+#include <linux/preempt.h>
 
 /*
  * We'd rather not expose kernel/lockdep_states.h this wide, but we do need
@@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 }
 #endif
 
+/* Variable used to make lockdep treat read_lock() as recursive in selftests */
+#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
+extern unsigned int force_read_lock_recursive;
+#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+#define force_read_lock_recursive 0
+#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+
+#ifdef CONFIG_LOCKDEP
+/*
+ * read_lock() is recursive if:
+ * 1. We force lockdep think this way in selftests or
+ * 2. The implementation is not queued read/write lock or
+ * 3. The locker is at an in_interrupt() context.
+ */
+static inline bool read_lock_is_recursive(void)
+{
+	return force_read_lock_recursive ||
+	       !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
+	       in_interrupt();
+}
+#else /* CONFIG_LOCKDEP */
+/* If !LOCKDEP, the value is meaningless */
+#define read_lock_is_recursive() 0
+#endif
+
 /*
  * For trivial one-depth nesting of a lock-class, the following
  * global define can be used. (Subsystems with multiple levels
@@ -561,7 +587,14 @@ static inline void print_irqtrace_events(struct task_struct *curr)
 #define spin_release(l, n, i)			lock_release(l, n, i)
 
 #define rwlock_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwlock_acquire_read(l, s, t, i)		lock_acquire_shared_recursive(l, s, t, NULL, i)
+#define rwlock_acquire_read(l, s, t, i)					\
+do {									\
+	if (read_lock_is_recursive())					\
+		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
+	else								\
+		lock_acquire_shared(l, s, t, NULL, i);			\
+} while (0)
+
 #define rwlock_release(l, n, i)			lock_release(l, n, i)
 
 #define seqcount_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, NULL, i)
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index b5c1293ce147..dd92f8ad83d5 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -28,6 +28,7 @@
  * Change this to 1 if you want to see the failure printouts:
  */
 static unsigned int debug_locks_verbose;
+unsigned int force_read_lock_recursive;
 
 static DEFINE_WW_CLASS(ww_lockdep);
 
@@ -1978,6 +1979,11 @@ void locking_selftest(void)
 		return;
 	}
 
+	/*
+	 * treats read_lock() as recursive read locks for testing purpose
+	 */
+	force_read_lock_recursive = 1;
+
 	/*
 	 * Run the testsuite:
 	 */
@@ -2072,6 +2078,11 @@ void locking_selftest(void)
 
 	ww_tests();
 
+	force_read_lock_recursive = 0;
+	/*
+	 * queued_read_lock() specific test cases can be put here
+	 */
+
 	if (unexpected_testcase_failures) {
 		printk("-----------------------------------------------------------------\n");
 		debug_locks = 0;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-24 13:51           ` write_lock_irq(&tasklist_lock) Boqun Feng
@ 2018-05-24 17:37             ` Sodagudi Prasad
  2018-05-24 18:28               ` write_lock_irq(&tasklist_lock) Peter Zijlstra
  2018-05-24 21:14             ` write_lock_irq(&tasklist_lock) Andrea Parri
  1 sibling, 1 reply; 14+ messages in thread
From: Sodagudi Prasad @ 2018-05-24 17:37 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Linus Torvalds, Kees Cook, Andy Lutomirski,
	Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker,
	sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli,
	Sasha Levin, Linux Kernel Mailing List

On 2018-05-24 06:51, Boqun Feng wrote:
> On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote:
>> On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote:
>> > On Wed, May 23, 2018 at 8:35 AM Will Deacon <will.deacon@arm.com> wrote:
>> >
>> > > In other words, qrwlock requires consistent locking order wrt spinlocks.
>> >
>> > I *thought* lockdep already tracked and detected this. Or is that only with
>> > with the sleeping versions?
>> 
>> There are patches in-flight to detect this:
>> 
>> https://marc.info/?l=linux-kernel&m=152483640529740&w=2k
>> 
>> as part of Boqun's work into recursive read locking.
>> 
> 
Hi Linus and Will,

Thank you for quick responses and providing the suggestions.

  Kernel version is locked for couple of products and same issue observed 
on both 4.14.41
and 4.9.96 kernels. We can only accept the stable updates from upstream 
for these products.
If QUEUED_RWLOCKS works on above listed kernel versions without any 
issues, we can enabled QUEUED_RWLOCKS.

Can we go ahead with Linus suggestion for these kernel version?
So that IRQ wont be disabled for quite a long time.

  static void tasklist_write_lock(void)
    {
          unsigned long flags;
          local_irq_save(flags);
          while (!write_trylock(&tasklist_lock)) {
                  local_irq_restore(flags);
                  do { cpu_relax(); } while 
(write_islocked(&tasklist_lock));
                  local_irq_disable();
          }
    }


-Thanks, Prasad

> Yeah, lemme put some details here:
> 
> So we have three cases:
> 
> Case #1 (from Will)
> 
> 	P0:			P1:			P2:
> 
> 	spin_lock(&slock)	read_lock(&rwlock)
> 							write_lock(&rwlock)
> 	read_lock(&rwlock)	spin_lock(&slock)
> 
> , which is a deadlock, and couldn't not be detected by lockdep yet. And
> lockdep could detect this with the patch I attach at the end of the
> mail.
> 
> Case #2
> 
> 	P0:			P1:			P2:
> 
> 	<in irq handler>
> 	spin_lock(&slock)	read_lock(&rwlock)
> 							write_lock(&rwlock)
> 	read_lock(&rwlock)	spin_lock_irq(&slock)
> 
> , which is not a deadlock, as the read_lock() on P0 can use the unfair
> fastpass.
> 
> Case #3
> 
> 	P0:			P1:			P2:
> 
> 				<in irq handler>
> 	spin_lock_irq(&slock)	read_lock(&rwlock)
> 							write_lock_irq(&rwlock)
> 	read_lock(&rwlock)	spin_lock(&slock)
> 
> , which is a deadlock, as the read_lock() on P0 cannot use the 
> fastpass.
> To detect this and not to make case #2 as a false positive, the
> recursive deadlock detection patchset is needed, the WIP version is at:
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git 
> arr-rfc-wip
> 
> The code is done, I'm just working on the rework for documention stuff,
> so if anyone is interested, could try it out ;-)
> 
> Regards,
> Boqun
> 
> ------------------->8
> Subject: [PATCH] locking: More accurate annotations for read_lock()
> 
> On the archs using QUEUED_RWLOCKS, read_lock() is not always a 
> recursive
> read lock, actually it's only recursive if in_interrupt() is true. So
> change the annotation accordingly to catch more deadlocks.
> 
> Note we used to treat read_lock() as pure recursive read locks in
> lib/locking-seftest.c, and this is useful, especially for the lockdep
> development selftest, so we keep this via a variable to force switching
> lock annotation for read_lock().
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  include/linux/lockdep.h | 35 ++++++++++++++++++++++++++++++++++-
>  lib/locking-selftest.c  | 11 +++++++++++
>  2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 6fc77d4dbdcd..07793986c063 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -27,6 +27,7 @@ extern int lock_stat;
>  #include <linux/list.h>
>  #include <linux/debug_locks.h>
>  #include <linux/stacktrace.h>
> +#include <linux/preempt.h>
> 
>  /*
>   * We'd rather not expose kernel/lockdep_states.h this wide, but we do 
> need
> @@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct
> task_struct *curr)
>  }
>  #endif
> 
> +/* Variable used to make lockdep treat read_lock() as recursive in 
> selftests */
> +#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
> +extern unsigned int force_read_lock_recursive;
> +#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
> +#define force_read_lock_recursive 0
> +#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
> +
> +#ifdef CONFIG_LOCKDEP
> +/*
> + * read_lock() is recursive if:
> + * 1. We force lockdep think this way in selftests or
> + * 2. The implementation is not queued read/write lock or
> + * 3. The locker is at an in_interrupt() context.
> + */
> +static inline bool read_lock_is_recursive(void)
> +{
> +	return force_read_lock_recursive ||
> +	       !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
> +	       in_interrupt();
> +}
> +#else /* CONFIG_LOCKDEP */
> +/* If !LOCKDEP, the value is meaningless */
> +#define read_lock_is_recursive() 0
> +#endif
> +
>  /*
>   * For trivial one-depth nesting of a lock-class, the following
>   * global define can be used. (Subsystems with multiple levels
> @@ -561,7 +587,14 @@ static inline void print_irqtrace_events(struct
> task_struct *curr)
>  #define spin_release(l, n, i)			lock_release(l, n, i)
> 
>  #define rwlock_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, 
> NULL, i)
> -#define rwlock_acquire_read(l, s, t,
> i)		lock_acquire_shared_recursive(l, s, t, NULL, i)
> +#define rwlock_acquire_read(l, s, t, i)					\
> +do {									\
> +	if (read_lock_is_recursive())					\
> +		lock_acquire_shared_recursive(l, s, t, NULL, i);	\
> +	else								\
> +		lock_acquire_shared(l, s, t, NULL, i);			\
> +} while (0)
> +
>  #define rwlock_release(l, n, i)			lock_release(l, n, i)
> 
>  #define seqcount_acquire(l, s, t, i)		lock_acquire_exclusive(l, s, t, 
> NULL, i)
> diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
> index b5c1293ce147..dd92f8ad83d5 100644
> --- a/lib/locking-selftest.c
> +++ b/lib/locking-selftest.c
> @@ -28,6 +28,7 @@
>   * Change this to 1 if you want to see the failure printouts:
>   */
>  static unsigned int debug_locks_verbose;
> +unsigned int force_read_lock_recursive;
> 
>  static DEFINE_WW_CLASS(ww_lockdep);
> 
> @@ -1978,6 +1979,11 @@ void locking_selftest(void)
>  		return;
>  	}
> 
> +	/*
> +	 * treats read_lock() as recursive read locks for testing purpose
> +	 */
> +	force_read_lock_recursive = 1;
> +
>  	/*
>  	 * Run the testsuite:
>  	 */
> @@ -2072,6 +2078,11 @@ void locking_selftest(void)
> 
>  	ww_tests();
> 
> +	force_read_lock_recursive = 0;
> +	/*
> +	 * queued_read_lock() specific test cases can be put here
> +	 */
> +
>  	if (unexpected_testcase_failures) {
> 
> 		printk("-----------------------------------------------------------------\n");
>  		debug_locks = 0;

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-24 17:37             ` write_lock_irq(&tasklist_lock) Sodagudi Prasad
@ 2018-05-24 18:28               ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2018-05-24 18:28 UTC (permalink / raw)
  To: Sodagudi Prasad
  Cc: Boqun Feng, Will Deacon, Linus Torvalds, Kees Cook,
	Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel,
	Thomas Gleixner, Ingo Molnar, Eric Biggers, Frederic Weisbecker,
	sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli,
	Sasha Levin, Linux Kernel Mailing List

On Thu, May 24, 2018 at 10:37:25AM -0700, Sodagudi Prasad wrote:
>  Kernel version is locked for couple of products and same issue observed on
> both 4.14.41
> and 4.9.96 kernels. We can only accept the stable updates from upstream for
> these products.

> If QUEUED_RWLOCKS works on above listed kernel versions without any issues,
> we can enabled QUEUED_RWLOCKS.

You want:

e0d02285f16e ("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'")
4df714be4dcf ("locking/atomic: Add atomic_cond_read_acquire()")
b519b56e378e ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock")
087133ac9076 ("locking/qrwlock, arm64: Move rwlock implementation over to qrwlocks")
d13316614633 ("locking/qrwlock: Prevent slowpath writers getting held up by fastpath")

IIRC, enabling QUEUED_RWLOCKS will 'work' but esp. that
atomic_cond_read_acquire() one is goodness for ARM64.

> Can we go ahead with Linus suggestion for these kernel version?
> So that IRQ wont be disabled for quite a long time.
> 
>  static void tasklist_write_lock(void)
>    {
>          unsigned long flags;
>          local_irq_save(flags);
>          while (!write_trylock(&tasklist_lock)) {
>                  local_irq_restore(flags);
>                  do { cpu_relax(); } while (write_islocked(&tasklist_lock));
>                  local_irq_disable();
>          }
>    }

First you say you can only take stable updates, then you ask for a
gruesome hack that will seriously regress architectures that do use
qrwlock which will hence never make it into stable.

Just take the arm64 qrwlock patches and pretend they're from stable.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: write_lock_irq(&tasklist_lock)
  2018-05-24 13:51           ` write_lock_irq(&tasklist_lock) Boqun Feng
  2018-05-24 17:37             ` write_lock_irq(&tasklist_lock) Sodagudi Prasad
@ 2018-05-24 21:14             ` Andrea Parri
  1 sibling, 0 replies; 14+ messages in thread
From: Andrea Parri @ 2018-05-24 21:14 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Linus Torvalds, psodagud, Kees Cook,
	Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers,
	Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter,
	Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List,
	paulmck, stern

> Yeah, lemme put some details here:
> 
> So we have three cases:
> 
> Case #1 (from Will)
> 
> 	P0:			P1:			P2:
> 
> 	spin_lock(&slock)	read_lock(&rwlock)
> 							write_lock(&rwlock)
> 	read_lock(&rwlock)	spin_lock(&slock)
> 
> , which is a deadlock, and couldn't not be detected by lockdep yet. And
> lockdep could detect this with the patch I attach at the end of the
> mail.
> 
> Case #2
> 
> 	P0:			P1:			P2:
> 
> 	<in irq handler>
> 	spin_lock(&slock)	read_lock(&rwlock)
> 							write_lock(&rwlock)
> 	read_lock(&rwlock)	spin_lock_irq(&slock)
> 
> , which is not a deadlock, as the read_lock() on P0 can use the unfair
> fastpass.
> 
> Case #3
> 
> 	P0:			P1:			P2:
> 
> 				<in irq handler>
> 	spin_lock_irq(&slock)	read_lock(&rwlock)
> 							write_lock_irq(&rwlock)
> 	read_lock(&rwlock)	spin_lock(&slock)
> 
> , which is a deadlock, as the read_lock() on P0 cannot use the fastpass.

Mmh, I'm starting to think that, maybe, we need a model (a tool) to
distinguish these and other(?) cases (sorry, I could not resist ;-)

[...]


> ------------------->8
> Subject: [PATCH] locking: More accurate annotations for read_lock()
> 
> On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
> read lock, actually it's only recursive if in_interrupt() is true. So

Mmh, taking the "common denominator" over archs/Kconfig options and
CPU states, this would suggest that read_lock() is non-recursive;

it looks like I can say "good-bye" to my idea to define (formalize)
consistent executions/the memory ordering of RW-LOCKS "by following"
the following _emulation_:

void read_lock(rwlock_t *s)
{
	r0 = atomic_fetch_inc_acquire(&s->val);
}

void read_unlock(rwlock_t *s)
{
	r0 = atomic_fetch_sub_release(&s->val);
}

void write_lock(rwlock_t *s)
{
	r0 = atomic_cmpxchg_acquire(&s->val, 0, -1);
}

void write_unlock(rwlock_t *s)
{
	atomic_set_release(&s->val, 0);
}

filter (~read_lock:r0=-1 /\ write_lock:r0=0)

[...]


> The code is done, I'm just working on the rework for documention stuff,
> so if anyone is interested, could try it out ;-)

Any idea on how to "educate" the LKMM about this code/documentation?

  Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-05-24 21:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 19:40 write_lock_irq(&tasklist_lock) Sodagudi Prasad
2018-05-22 20:27 ` write_lock_irq(&tasklist_lock) Linus Torvalds
2018-05-22 21:17   ` write_lock_irq(&tasklist_lock) Peter Zijlstra
2018-05-22 21:31     ` write_lock_irq(&tasklist_lock) Linus Torvalds
2018-05-23  8:19       ` write_lock_irq(&tasklist_lock) Peter Zijlstra
2018-05-23 13:05 ` write_lock_irq(&tasklist_lock) Will Deacon
2018-05-23 15:25   ` write_lock_irq(&tasklist_lock) Linus Torvalds
2018-05-23 15:36     ` write_lock_irq(&tasklist_lock) Will Deacon
2018-05-23 16:26       ` write_lock_irq(&tasklist_lock) Linus Torvalds
2018-05-24 12:49         ` write_lock_irq(&tasklist_lock) Will Deacon
2018-05-24 13:51           ` write_lock_irq(&tasklist_lock) Boqun Feng
2018-05-24 17:37             ` write_lock_irq(&tasklist_lock) Sodagudi Prasad
2018-05-24 18:28               ` write_lock_irq(&tasklist_lock) Peter Zijlstra
2018-05-24 21:14             ` write_lock_irq(&tasklist_lock) Andrea Parri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).