* write_lock_irq(&tasklist_lock) @ 2018-05-22 19:40 Sodagudi Prasad 2018-05-22 20:27 ` write_lock_irq(&tasklist_lock) Linus Torvalds 2018-05-23 13:05 ` write_lock_irq(&tasklist_lock) Will Deacon 0 siblings, 2 replies; 14+ messages in thread From: Sodagudi Prasad @ 2018-05-22 19:40 UTC (permalink / raw) To: keescook, luto, wad, akpm, riel, tglx, mingo, peterz, ebiggers, fweisbec, sherryy, vegard.nossum, cl, aarcange, alexander.levin, vegard.nossum, sherryy, fweisbec, ebiggers, peterz Cc: linux-kernel, torvalds Hi All, When following test is executed on 4.14.41 stable kernel, observed that one of the core is waiting for tasklist_lock for long time with IRQs disabled. ./stress-ng-64 --get 8 -t 3h --times --metrics-brief Every time when device is crashed, I observed that one the task stuck at fork system call and waiting for tasklist_lock as writer with irq disabled. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843 Some other tasks are making getrlimit, prlimit system calls, so that these readers are continuously taking tasklist_list read lock. Writer has disabled local IRQs for long time and waiting to readers to finish but readers are keeping tasklist_lock busy for quite long time. I think, −−get N option creates N thread and they make following system calls. ======================================================================== start N workers that call system calls that fetch data from the kernel, currently these are: getpid, getppid, getcwd, getgid, getegid, getuid, getgroups, getpgrp, getpgid, getpriority, getresgid, getresuid, getrlimit, prlimit, getrusage, getsid, gettid, getcpu, gettimeofday, uname, adjtimex, sysfs. Some of these system calls are OS specific. ======================================================================== Have you observed this type of issues with tasklist_lock ? Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can I use write_unlock instead of write_lock_irq in portion of code? https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843 -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-22 19:40 write_lock_irq(&tasklist_lock) Sodagudi Prasad @ 2018-05-22 20:27 ` Linus Torvalds 2018-05-22 21:17 ` write_lock_irq(&tasklist_lock) Peter Zijlstra 2018-05-23 13:05 ` write_lock_irq(&tasklist_lock) Will Deacon 1 sibling, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2018-05-22 20:27 UTC (permalink / raw) To: psodagud Cc: Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List On Tue, May 22, 2018 at 12:40 PM Sodagudi Prasad <psodagud@codeaurora.org> wrote: > Have you observed this type of issues with tasklist_lock ? tasklist_lock remains pretty painful. It covers too much, but trying to split it up has never worked well. It's usually not a huge problem because there are so few writers, but what you're seeing is the writer starvation issue because readers can be plentiful. > Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can > I use write_unlock instead of write_lock_irq in portion of code? You absolutely need write_lock_irq(), because taking the tasklist_lock without disabling interrupts will deadlock very quickly due to an interrupt taking the tasklist_lock for reading. That said, the write_lock_irq(&tasklist_lock) could possibly be replaced with something like static void tasklist_write_lock(void) { unsigned long flags; local_irq_save(flags); while (!write_trylock(&tasklist_lock)) { local_irq_restore(flags); do { cpu_relax(); } while (write_islocked(&tasklist_lock)); local_irq_disable(); } } but we don't have that "write_islocked()" function. So the above would need more work, and is entirely untested anyway, obviously. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-22 20:27 ` write_lock_irq(&tasklist_lock) Linus Torvalds @ 2018-05-22 21:17 ` Peter Zijlstra 2018-05-22 21:31 ` write_lock_irq(&tasklist_lock) Linus Torvalds 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2018-05-22 21:17 UTC (permalink / raw) To: Linus Torvalds Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List On Tue, May 22, 2018 at 01:27:17PM -0700, Linus Torvalds wrote: > It's usually not a huge problem because there are so few writers, but what > you're seeing is the writer starvation issue because readers can be > plentiful. qrwlock is a fair lock and should not exhibit writer starvation. Write acquire time is of course proportional to the number of CPUs in the system because each can be holding a reader, but once there is a pending writer there should not be new readers. > > Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can > > I use write_unlock instead of write_lock_irq in portion of code? > > You absolutely need write_lock_irq(), because taking the tasklist_lock > without disabling interrupts will deadlock very quickly due to an interrupt > taking the tasklist_lock for reading. > > That said, the write_lock_irq(&tasklist_lock) could possibly be replaced > with something like > > static void tasklist_write_lock(void) > { > unsigned long flags; > local_irq_save(flags); > while (!write_trylock(&tasklist_lock)) { > local_irq_restore(flags); > do { cpu_relax(); } while (write_islocked(&tasklist_lock)); > local_irq_disable(); > } > } > > but we don't have that "write_islocked()" function. You basically want to spin-wait with interrupts enabled, right? I think there were problems with that, but I can't remember, my brain is entirely shot for the day. But given how qrwlock is constructed, you'd have to look at the arch spinlock implementation for that (qspinlock for x86). The obvious problem of course is: spin_lock_irq(&L) // contended local_irq_enable(); while(...) cpu_relax(); <IRQ> spin_lock(&L); Which because the spinlock is fair, is an instant deadlock. You can do the IRQ enabled spin-wait for unfair locks, but by now all our locks are fair (because we kept hitting starvation). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-22 21:17 ` write_lock_irq(&tasklist_lock) Peter Zijlstra @ 2018-05-22 21:31 ` Linus Torvalds 2018-05-23 8:19 ` write_lock_irq(&tasklist_lock) Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2018-05-22 21:31 UTC (permalink / raw) To: Peter Zijlstra Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List On Tue, May 22, 2018 at 2:17 PM Peter Zijlstra <peterz@infradead.org> wrote: > qrwlock is a fair lock and should not exhibit writer starvation. We actually have a special rule to make it *not* be fair, in that interrupts are allowed to take the read lock if there are readers - even if there are waiting writers. I'm not sure how much of an fairness effect this has, but it's required because of our rule that you can take it for reading without disabling interrupts. See void queued_read_lock_slowpath(struct qrwlock *lock) in kernel/locking/qrwlock.c. > You basically want to spin-wait with interrupts enabled, right? That was the intent of my (untested) pseudo-code. It should work fine. Note that I used write_trylock() only, so there is no queueing (which also implies no fairness). I'm not saying it's a _good_ idea. I'm saying it might work if all you worry about is the irq-disabled part. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-22 21:31 ` write_lock_irq(&tasklist_lock) Linus Torvalds @ 2018-05-23 8:19 ` Peter Zijlstra 0 siblings, 0 replies; 14+ messages in thread From: Peter Zijlstra @ 2018-05-23 8:19 UTC (permalink / raw) To: Linus Torvalds Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List On Tue, May 22, 2018 at 02:31:42PM -0700, Linus Torvalds wrote: > On Tue, May 22, 2018 at 2:17 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > qrwlock is a fair lock and should not exhibit writer starvation. > > We actually have a special rule to make it *not* be fair, in that > interrupts are allowed to take the read lock if there are readers - even if > there are waiting writers. Urgh, right.. would be interesting to know how much of that is happening in that workload. I assumed the readers were mostly due to the syscalls the reporter talked about, and those should not trigger that case. > > You basically want to spin-wait with interrupts enabled, right? > > That was the intent of my (untested) pseudo-code. It should work fine. Note > that I used write_trylock() only, so there is no queueing (which also > implies no fairness). > > I'm not saying it's a _good_ idea. I'm saying it might work if all you > worry about is the irq-disabled part. Right, if you make it unfair and utterly prone to starvation then yes, you can make it 'work'. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-22 19:40 write_lock_irq(&tasklist_lock) Sodagudi Prasad 2018-05-22 20:27 ` write_lock_irq(&tasklist_lock) Linus Torvalds @ 2018-05-23 13:05 ` Will Deacon 2018-05-23 15:25 ` write_lock_irq(&tasklist_lock) Linus Torvalds 1 sibling, 1 reply; 14+ messages in thread From: Will Deacon @ 2018-05-23 13:05 UTC (permalink / raw) To: Sodagudi Prasad Cc: keescook, luto, wad, akpm, riel, tglx, mingo, peterz, ebiggers, fweisbec, sherryy, vegard.nossum, cl, aarcange, alexander.levin, linux-kernel, torvalds Hi Prasad, On Tue, May 22, 2018 at 12:40:05PM -0700, Sodagudi Prasad wrote: > When following test is executed on 4.14.41 stable kernel, observed that one > of the core is waiting for tasklist_lock for long time with IRQs disabled. > ./stress-ng-64 --get 8 -t 3h --times --metrics-brief > > Every time when device is crashed, I observed that one the task stuck at > fork system call and waiting for tasklist_lock as writer with irq disabled. > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843 Please use a newer kernel. We've addressed this in mainline by moving arm64 over to the qrwlock implementation which (after some other changes) guarantees forward progress for well-behaved readers and writers. To give an example from locktorture with 2 writers and 8 readers, after a few seconds I see: rwlock: Writes: Total: 6725 Max/Min: 0/0 Fail: 0 Reads: Total: 5103727 Max/Min: 0/0 Fail: 0 qrwlock: Writes: Total: 155284 Max/Min: 0/0 Fail: 0 Reads: Total: 767703 Max/Min: 0/0 Fail: 0 so the ratio is closer to ~6:1 rather than ~191:1 for this case. The total locking throughput has dropped, but that's expected for this type of lock where maximum throughput would be achieved by favouring readers. Will ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-23 13:05 ` write_lock_irq(&tasklist_lock) Will Deacon @ 2018-05-23 15:25 ` Linus Torvalds 2018-05-23 15:36 ` write_lock_irq(&tasklist_lock) Will Deacon 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2018-05-23 15:25 UTC (permalink / raw) To: Will Deacon Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List On Wed, May 23, 2018 at 6:05 AM Will Deacon <will.deacon@arm.com> wrote: > Please use a newer kernel. We've addressed this in mainline by moving > arm64 over to the qrwlock implementation which (after some other changes) > guarantees forward progress for well-behaved readers and writers. Oh, I didn't even realize that this wasn't x86, and that there was still the very unfair rwlock issue on 4.14 on arm. Yeah, the queuing rwlocks shouldn't show the really pathological problems we used to have long ago. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-23 15:25 ` write_lock_irq(&tasklist_lock) Linus Torvalds @ 2018-05-23 15:36 ` Will Deacon 2018-05-23 16:26 ` write_lock_irq(&tasklist_lock) Linus Torvalds 0 siblings, 1 reply; 14+ messages in thread From: Will Deacon @ 2018-05-23 15:36 UTC (permalink / raw) To: Linus Torvalds Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List, boqun.feng [+Boqun] On Wed, May 23, 2018 at 08:25:06AM -0700, Linus Torvalds wrote: > On Wed, May 23, 2018 at 6:05 AM Will Deacon <will.deacon@arm.com> wrote: > > > Please use a newer kernel. We've addressed this in mainline by moving > > arm64 over to the qrwlock implementation which (after some other changes) > > guarantees forward progress for well-behaved readers and writers. > > Oh, I didn't even realize that this wasn't x86, and that there was still > the very unfair rwlock issue on 4.14 on arm. > > Yeah, the queuing rwlocks shouldn't show the really pathological problems > we used to have long ago. Yup, although they do reveal new issues that Boqun has been running into recently with his lockdep improvements. The general pattern is if you have: P0: P1: P2: spin_lock(&slock) read_lock(&rwlock) write_lock(&rwlock) read_lock(&rwlock) spin_lock(&slock) then the CPUs can be queued on the rwlock such that P1 has the lock, then P2 is queued and then P0. If P0 has taken the spinlock, we have a deadlock which wouldn't arise with the non-queued version. In other words, qrwlock requires consistent locking order wrt spinlocks. Will ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-23 15:36 ` write_lock_irq(&tasklist_lock) Will Deacon @ 2018-05-23 16:26 ` Linus Torvalds 2018-05-24 12:49 ` write_lock_irq(&tasklist_lock) Will Deacon 0 siblings, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2018-05-23 16:26 UTC (permalink / raw) To: Will Deacon Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List, Boqun Feng On Wed, May 23, 2018 at 8:35 AM Will Deacon <will.deacon@arm.com> wrote: > In other words, qrwlock requires consistent locking order wrt spinlocks. I *thought* lockdep already tracked and detected this. Or is that only with with the sleeping versions? But yes, that's equivalent to the irq-unfairness thing we have a special case for. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-23 16:26 ` write_lock_irq(&tasklist_lock) Linus Torvalds @ 2018-05-24 12:49 ` Will Deacon 2018-05-24 13:51 ` write_lock_irq(&tasklist_lock) Boqun Feng 0 siblings, 1 reply; 14+ messages in thread From: Will Deacon @ 2018-05-24 12:49 UTC (permalink / raw) To: Linus Torvalds Cc: psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List, Boqun Feng On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote: > On Wed, May 23, 2018 at 8:35 AM Will Deacon <will.deacon@arm.com> wrote: > > > In other words, qrwlock requires consistent locking order wrt spinlocks. > > I *thought* lockdep already tracked and detected this. Or is that only with > with the sleeping versions? There are patches in-flight to detect this: https://marc.info/?l=linux-kernel&m=152483640529740&w=2k as part of Boqun's work into recursive read locking. Will ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-24 12:49 ` write_lock_irq(&tasklist_lock) Will Deacon @ 2018-05-24 13:51 ` Boqun Feng 2018-05-24 17:37 ` write_lock_irq(&tasklist_lock) Sodagudi Prasad 2018-05-24 21:14 ` write_lock_irq(&tasklist_lock) Andrea Parri 0 siblings, 2 replies; 14+ messages in thread From: Boqun Feng @ 2018-05-24 13:51 UTC (permalink / raw) To: Will Deacon Cc: Linus Torvalds, psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote: > On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote: > > On Wed, May 23, 2018 at 8:35 AM Will Deacon <will.deacon@arm.com> wrote: > > > > > In other words, qrwlock requires consistent locking order wrt spinlocks. > > > > I *thought* lockdep already tracked and detected this. Or is that only with > > with the sleeping versions? > > There are patches in-flight to detect this: > > https://marc.info/?l=linux-kernel&m=152483640529740&w=2k > > as part of Boqun's work into recursive read locking. > Yeah, lemme put some details here: So we have three cases: Case #1 (from Will) P0: P1: P2: spin_lock(&slock) read_lock(&rwlock) write_lock(&rwlock) read_lock(&rwlock) spin_lock(&slock) , which is a deadlock, and couldn't not be detected by lockdep yet. And lockdep could detect this with the patch I attach at the end of the mail. Case #2 P0: P1: P2: <in irq handler> spin_lock(&slock) read_lock(&rwlock) write_lock(&rwlock) read_lock(&rwlock) spin_lock_irq(&slock) , which is not a deadlock, as the read_lock() on P0 can use the unfair fastpass. Case #3 P0: P1: P2: <in irq handler> spin_lock_irq(&slock) read_lock(&rwlock) write_lock_irq(&rwlock) read_lock(&rwlock) spin_lock(&slock) , which is a deadlock, as the read_lock() on P0 cannot use the fastpass. To detect this and not to make case #2 as a false positive, the recursive deadlock detection patchset is needed, the WIP version is at: git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git arr-rfc-wip The code is done, I'm just working on the rework for documention stuff, so if anyone is interested, could try it out ;-) Regards, Boqun ------------------->8 Subject: [PATCH] locking: More accurate annotations for read_lock() On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive read lock, actually it's only recursive if in_interrupt() is true. So change the annotation accordingly to catch more deadlocks. Note we used to treat read_lock() as pure recursive read locks in lib/locking-seftest.c, and this is useful, especially for the lockdep development selftest, so we keep this via a variable to force switching lock annotation for read_lock(). Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- include/linux/lockdep.h | 35 ++++++++++++++++++++++++++++++++++- lib/locking-selftest.c | 11 +++++++++++ 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 6fc77d4dbdcd..07793986c063 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -27,6 +27,7 @@ extern int lock_stat; #include <linux/list.h> #include <linux/debug_locks.h> #include <linux/stacktrace.h> +#include <linux/preempt.h> /* * We'd rather not expose kernel/lockdep_states.h this wide, but we do need @@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct task_struct *curr) } #endif +/* Variable used to make lockdep treat read_lock() as recursive in selftests */ +#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS +extern unsigned int force_read_lock_recursive; +#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */ +#define force_read_lock_recursive 0 +#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */ + +#ifdef CONFIG_LOCKDEP +/* + * read_lock() is recursive if: + * 1. We force lockdep think this way in selftests or + * 2. The implementation is not queued read/write lock or + * 3. The locker is at an in_interrupt() context. + */ +static inline bool read_lock_is_recursive(void) +{ + return force_read_lock_recursive || + !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) || + in_interrupt(); +} +#else /* CONFIG_LOCKDEP */ +/* If !LOCKDEP, the value is meaningless */ +#define read_lock_is_recursive() 0 +#endif + /* * For trivial one-depth nesting of a lock-class, the following * global define can be used. (Subsystems with multiple levels @@ -561,7 +587,14 @@ static inline void print_irqtrace_events(struct task_struct *curr) #define spin_release(l, n, i) lock_release(l, n, i) #define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i) -#define rwlock_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i) +#define rwlock_acquire_read(l, s, t, i) \ +do { \ + if (read_lock_is_recursive()) \ + lock_acquire_shared_recursive(l, s, t, NULL, i); \ + else \ + lock_acquire_shared(l, s, t, NULL, i); \ +} while (0) + #define rwlock_release(l, n, i) lock_release(l, n, i) #define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i) diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c index b5c1293ce147..dd92f8ad83d5 100644 --- a/lib/locking-selftest.c +++ b/lib/locking-selftest.c @@ -28,6 +28,7 @@ * Change this to 1 if you want to see the failure printouts: */ static unsigned int debug_locks_verbose; +unsigned int force_read_lock_recursive; static DEFINE_WW_CLASS(ww_lockdep); @@ -1978,6 +1979,11 @@ void locking_selftest(void) return; } + /* + * treats read_lock() as recursive read locks for testing purpose + */ + force_read_lock_recursive = 1; + /* * Run the testsuite: */ @@ -2072,6 +2078,11 @@ void locking_selftest(void) ww_tests(); + force_read_lock_recursive = 0; + /* + * queued_read_lock() specific test cases can be put here + */ + if (unexpected_testcase_failures) { printk("-----------------------------------------------------------------\n"); debug_locks = 0; -- 2.16.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-24 13:51 ` write_lock_irq(&tasklist_lock) Boqun Feng @ 2018-05-24 17:37 ` Sodagudi Prasad 2018-05-24 18:28 ` write_lock_irq(&tasklist_lock) Peter Zijlstra 2018-05-24 21:14 ` write_lock_irq(&tasklist_lock) Andrea Parri 1 sibling, 1 reply; 14+ messages in thread From: Sodagudi Prasad @ 2018-05-24 17:37 UTC (permalink / raw) To: Boqun Feng Cc: Will Deacon, Linus Torvalds, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List On 2018-05-24 06:51, Boqun Feng wrote: > On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote: >> On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote: >> > On Wed, May 23, 2018 at 8:35 AM Will Deacon <will.deacon@arm.com> wrote: >> > >> > > In other words, qrwlock requires consistent locking order wrt spinlocks. >> > >> > I *thought* lockdep already tracked and detected this. Or is that only with >> > with the sleeping versions? >> >> There are patches in-flight to detect this: >> >> https://marc.info/?l=linux-kernel&m=152483640529740&w=2k >> >> as part of Boqun's work into recursive read locking. >> > Hi Linus and Will, Thank you for quick responses and providing the suggestions. Kernel version is locked for couple of products and same issue observed on both 4.14.41 and 4.9.96 kernels. We can only accept the stable updates from upstream for these products. If QUEUED_RWLOCKS works on above listed kernel versions without any issues, we can enabled QUEUED_RWLOCKS. Can we go ahead with Linus suggestion for these kernel version? So that IRQ wont be disabled for quite a long time. static void tasklist_write_lock(void) { unsigned long flags; local_irq_save(flags); while (!write_trylock(&tasklist_lock)) { local_irq_restore(flags); do { cpu_relax(); } while (write_islocked(&tasklist_lock)); local_irq_disable(); } } -Thanks, Prasad > Yeah, lemme put some details here: > > So we have three cases: > > Case #1 (from Will) > > P0: P1: P2: > > spin_lock(&slock) read_lock(&rwlock) > write_lock(&rwlock) > read_lock(&rwlock) spin_lock(&slock) > > , which is a deadlock, and couldn't not be detected by lockdep yet. And > lockdep could detect this with the patch I attach at the end of the > mail. > > Case #2 > > P0: P1: P2: > > <in irq handler> > spin_lock(&slock) read_lock(&rwlock) > write_lock(&rwlock) > read_lock(&rwlock) spin_lock_irq(&slock) > > , which is not a deadlock, as the read_lock() on P0 can use the unfair > fastpass. > > Case #3 > > P0: P1: P2: > > <in irq handler> > spin_lock_irq(&slock) read_lock(&rwlock) > write_lock_irq(&rwlock) > read_lock(&rwlock) spin_lock(&slock) > > , which is a deadlock, as the read_lock() on P0 cannot use the > fastpass. > To detect this and not to make case #2 as a false positive, the > recursive deadlock detection patchset is needed, the WIP version is at: > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git > arr-rfc-wip > > The code is done, I'm just working on the rework for documention stuff, > so if anyone is interested, could try it out ;-) > > Regards, > Boqun > > ------------------->8 > Subject: [PATCH] locking: More accurate annotations for read_lock() > > On the archs using QUEUED_RWLOCKS, read_lock() is not always a > recursive > read lock, actually it's only recursive if in_interrupt() is true. So > change the annotation accordingly to catch more deadlocks. > > Note we used to treat read_lock() as pure recursive read locks in > lib/locking-seftest.c, and this is useful, especially for the lockdep > development selftest, so we keep this via a variable to force switching > lock annotation for read_lock(). > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > --- > include/linux/lockdep.h | 35 ++++++++++++++++++++++++++++++++++- > lib/locking-selftest.c | 11 +++++++++++ > 2 files changed, 45 insertions(+), 1 deletion(-) > > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h > index 6fc77d4dbdcd..07793986c063 100644 > --- a/include/linux/lockdep.h > +++ b/include/linux/lockdep.h > @@ -27,6 +27,7 @@ extern int lock_stat; > #include <linux/list.h> > #include <linux/debug_locks.h> > #include <linux/stacktrace.h> > +#include <linux/preempt.h> > > /* > * We'd rather not expose kernel/lockdep_states.h this wide, but we do > need > @@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct > task_struct *curr) > } > #endif > > +/* Variable used to make lockdep treat read_lock() as recursive in > selftests */ > +#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS > +extern unsigned int force_read_lock_recursive; > +#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */ > +#define force_read_lock_recursive 0 > +#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */ > + > +#ifdef CONFIG_LOCKDEP > +/* > + * read_lock() is recursive if: > + * 1. We force lockdep think this way in selftests or > + * 2. The implementation is not queued read/write lock or > + * 3. The locker is at an in_interrupt() context. > + */ > +static inline bool read_lock_is_recursive(void) > +{ > + return force_read_lock_recursive || > + !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) || > + in_interrupt(); > +} > +#else /* CONFIG_LOCKDEP */ > +/* If !LOCKDEP, the value is meaningless */ > +#define read_lock_is_recursive() 0 > +#endif > + > /* > * For trivial one-depth nesting of a lock-class, the following > * global define can be used. (Subsystems with multiple levels > @@ -561,7 +587,14 @@ static inline void print_irqtrace_events(struct > task_struct *curr) > #define spin_release(l, n, i) lock_release(l, n, i) > > #define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, > NULL, i) > -#define rwlock_acquire_read(l, s, t, > i) lock_acquire_shared_recursive(l, s, t, NULL, i) > +#define rwlock_acquire_read(l, s, t, i) \ > +do { \ > + if (read_lock_is_recursive()) \ > + lock_acquire_shared_recursive(l, s, t, NULL, i); \ > + else \ > + lock_acquire_shared(l, s, t, NULL, i); \ > +} while (0) > + > #define rwlock_release(l, n, i) lock_release(l, n, i) > > #define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, > NULL, i) > diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c > index b5c1293ce147..dd92f8ad83d5 100644 > --- a/lib/locking-selftest.c > +++ b/lib/locking-selftest.c > @@ -28,6 +28,7 @@ > * Change this to 1 if you want to see the failure printouts: > */ > static unsigned int debug_locks_verbose; > +unsigned int force_read_lock_recursive; > > static DEFINE_WW_CLASS(ww_lockdep); > > @@ -1978,6 +1979,11 @@ void locking_selftest(void) > return; > } > > + /* > + * treats read_lock() as recursive read locks for testing purpose > + */ > + force_read_lock_recursive = 1; > + > /* > * Run the testsuite: > */ > @@ -2072,6 +2078,11 @@ void locking_selftest(void) > > ww_tests(); > > + force_read_lock_recursive = 0; > + /* > + * queued_read_lock() specific test cases can be put here > + */ > + > if (unexpected_testcase_failures) { > > printk("-----------------------------------------------------------------\n"); > debug_locks = 0; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-24 17:37 ` write_lock_irq(&tasklist_lock) Sodagudi Prasad @ 2018-05-24 18:28 ` Peter Zijlstra 0 siblings, 0 replies; 14+ messages in thread From: Peter Zijlstra @ 2018-05-24 18:28 UTC (permalink / raw) To: Sodagudi Prasad Cc: Boqun Feng, Will Deacon, Linus Torvalds, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List On Thu, May 24, 2018 at 10:37:25AM -0700, Sodagudi Prasad wrote: > Kernel version is locked for couple of products and same issue observed on > both 4.14.41 > and 4.9.96 kernels. We can only accept the stable updates from upstream for > these products. > If QUEUED_RWLOCKS works on above listed kernel versions without any issues, > we can enabled QUEUED_RWLOCKS. You want: e0d02285f16e ("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'") 4df714be4dcf ("locking/atomic: Add atomic_cond_read_acquire()") b519b56e378e ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") 087133ac9076 ("locking/qrwlock, arm64: Move rwlock implementation over to qrwlocks") d13316614633 ("locking/qrwlock: Prevent slowpath writers getting held up by fastpath") IIRC, enabling QUEUED_RWLOCKS will 'work' but esp. that atomic_cond_read_acquire() one is goodness for ARM64. > Can we go ahead with Linus suggestion for these kernel version? > So that IRQ wont be disabled for quite a long time. > > static void tasklist_write_lock(void) > { > unsigned long flags; > local_irq_save(flags); > while (!write_trylock(&tasklist_lock)) { > local_irq_restore(flags); > do { cpu_relax(); } while (write_islocked(&tasklist_lock)); > local_irq_disable(); > } > } First you say you can only take stable updates, then you ask for a gruesome hack that will seriously regress architectures that do use qrwlock which will hence never make it into stable. Just take the arm64 qrwlock patches and pretend they're from stable. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: write_lock_irq(&tasklist_lock) 2018-05-24 13:51 ` write_lock_irq(&tasklist_lock) Boqun Feng 2018-05-24 17:37 ` write_lock_irq(&tasklist_lock) Sodagudi Prasad @ 2018-05-24 21:14 ` Andrea Parri 1 sibling, 0 replies; 14+ messages in thread From: Andrea Parri @ 2018-05-24 21:14 UTC (permalink / raw) To: Boqun Feng Cc: Will Deacon, Linus Torvalds, psodagud, Kees Cook, Andy Lutomirski, Will Drewry, Andrew Morton, Rik van Riel, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Eric Biggers, Frederic Weisbecker, sherryy, Vegard Nossum, Christoph Lameter, Andrea Arcangeli, Sasha Levin, Linux Kernel Mailing List, paulmck, stern > Yeah, lemme put some details here: > > So we have three cases: > > Case #1 (from Will) > > P0: P1: P2: > > spin_lock(&slock) read_lock(&rwlock) > write_lock(&rwlock) > read_lock(&rwlock) spin_lock(&slock) > > , which is a deadlock, and couldn't not be detected by lockdep yet. And > lockdep could detect this with the patch I attach at the end of the > mail. > > Case #2 > > P0: P1: P2: > > <in irq handler> > spin_lock(&slock) read_lock(&rwlock) > write_lock(&rwlock) > read_lock(&rwlock) spin_lock_irq(&slock) > > , which is not a deadlock, as the read_lock() on P0 can use the unfair > fastpass. > > Case #3 > > P0: P1: P2: > > <in irq handler> > spin_lock_irq(&slock) read_lock(&rwlock) > write_lock_irq(&rwlock) > read_lock(&rwlock) spin_lock(&slock) > > , which is a deadlock, as the read_lock() on P0 cannot use the fastpass. Mmh, I'm starting to think that, maybe, we need a model (a tool) to distinguish these and other(?) cases (sorry, I could not resist ;-) [...] > ------------------->8 > Subject: [PATCH] locking: More accurate annotations for read_lock() > > On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive > read lock, actually it's only recursive if in_interrupt() is true. So Mmh, taking the "common denominator" over archs/Kconfig options and CPU states, this would suggest that read_lock() is non-recursive; it looks like I can say "good-bye" to my idea to define (formalize) consistent executions/the memory ordering of RW-LOCKS "by following" the following _emulation_: void read_lock(rwlock_t *s) { r0 = atomic_fetch_inc_acquire(&s->val); } void read_unlock(rwlock_t *s) { r0 = atomic_fetch_sub_release(&s->val); } void write_lock(rwlock_t *s) { r0 = atomic_cmpxchg_acquire(&s->val, 0, -1); } void write_unlock(rwlock_t *s) { atomic_set_release(&s->val, 0); } filter (~read_lock:r0=-1 /\ write_lock:r0=0) [...] > The code is done, I'm just working on the rework for documention stuff, > so if anyone is interested, could try it out ;-) Any idea on how to "educate" the LKMM about this code/documentation? Andrea ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-05-24 21:14 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-22 19:40 write_lock_irq(&tasklist_lock) Sodagudi Prasad 2018-05-22 20:27 ` write_lock_irq(&tasklist_lock) Linus Torvalds 2018-05-22 21:17 ` write_lock_irq(&tasklist_lock) Peter Zijlstra 2018-05-22 21:31 ` write_lock_irq(&tasklist_lock) Linus Torvalds 2018-05-23 8:19 ` write_lock_irq(&tasklist_lock) Peter Zijlstra 2018-05-23 13:05 ` write_lock_irq(&tasklist_lock) Will Deacon 2018-05-23 15:25 ` write_lock_irq(&tasklist_lock) Linus Torvalds 2018-05-23 15:36 ` write_lock_irq(&tasklist_lock) Will Deacon 2018-05-23 16:26 ` write_lock_irq(&tasklist_lock) Linus Torvalds 2018-05-24 12:49 ` write_lock_irq(&tasklist_lock) Will Deacon 2018-05-24 13:51 ` write_lock_irq(&tasklist_lock) Boqun Feng 2018-05-24 17:37 ` write_lock_irq(&tasklist_lock) Sodagudi Prasad 2018-05-24 18:28 ` write_lock_irq(&tasklist_lock) Peter Zijlstra 2018-05-24 21:14 ` write_lock_irq(&tasklist_lock) Andrea Parri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).