On Tue, Apr 17, 2018 at 07:01:10AM -0700, Matthew Wilcox wrote: > On Thu, Apr 05, 2018 at 02:58:06PM +0300, Kirill Tkhai wrote: > > I observed the following deadlock between them: > > > > [task 1] [task 2] [task 3] > > kill_fasync() mm_update_next_owner() copy_process() > > spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) > > send_sigio() ... > > read_lock(&fown->lock) kill_fasync() ... > > read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ... > > > > Task 1 can't acquire read locked tasklist_lock, since there is > > already task 3 expressed its wish to take the lock exclusive. > > Task 2 holds the read locked lock, but it can't take the spin lock. > > I think the important question is to Peter ... why didn't lockdep catch this? > I think the following will help lockdep to catch this: @@ -570,7 +588,14 @@ static inline void print_irqtrace_events(struct task_struct *curr) #define spin_release(l, n, i) lock_release(l, n, i) #define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i) -#define rwlock_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i) +#define rwlock_acquire_read(l, s, t, i) \ +do { \ + if (!IS_ENABLED(CONFIG_QUEUED_RWLOCKS) || in_interrupt()) \ + lock_acquire_shared_recursive(l, s, t, NULL, i); \ + else \ + lock_acquire_shared(l, s, t, NULL, i); \ +} while (0) + However, this will break several self tests in lib/locking-selftest.c, because we used to treat read_lock() as recursive read locks for all callsites from lockdep's viewpoint. Besides, the above change will bring an interesting challenge for the recursive read lock deadlock detection work that I'm doing: https://marc.info/?l=linux-kernel&m=152345444100825 I will explain that in the thread of that patchset and add you and others Cced in case that you're interested. Regards, Boqun > > - spin_lock_irq(&fa->fa_lock); > > + write_lock_irq(&fa->fa_lock); > > fa->fa_file = NULL; > > - spin_unlock_irq(&fa->fa_lock); > > + write_unlock_irq(&fa->fa_lock); > ... > > - spin_lock_irq(&fa->fa_lock); > > + write_lock_irq(&fa->fa_lock); > > fa->fa_fd = fd; > > - spin_unlock_irq(&fa->fa_lock); > > + write_unlock_irq(&fa->fa_lock); > > Do we really need a lock here? If we convert each of these into WRITE_ONCE, > then > > ... > > - spin_lock_irqsave(&fa->fa_lock, flags); > > + read_lock(&fa->fa_lock); > > if (fa->fa_file) { > > file = READ_ONCE(fa->fa_file) > > then we're not opening any new races, are we? > > > fown = &fa->fa_file->f_owner; > > /* Don't send SIGURG to processes which have not set a > > @@ -997,7 +996,7 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band) > > if (!(sig == SIGURG && fown->signum == 0)) > > send_sigio(fown, fa->fa_fd, band); > > } > > - spin_unlock_irqrestore(&fa->fa_lock, flags); > > + read_unlock(&fa->fa_lock); > > fa = rcu_dereference(fa->fa_next); > > } > > } > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index c6baf767619e..297e2dcd9dd2 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1250,7 +1250,7 @@ static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl) > > } > > > > struct fasync_struct { > > - spinlock_t fa_lock; > > + rwlock_t fa_lock; > > int magic; > > int fa_fd; > > struct fasync_struct *fa_next; /* singly linked list */ > >