All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: jlayton@kernel.org, bfields@fieldses.org,
	viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, boqun.feng@gmail.com,
	longman@redhat.com, peterz@infradead.org, mingo@redhat.com
Subject: Re: [PATCH] fasync: Fix deadlock between task-context and interrupt-context kill_fasync()
Date: Tue, 17 Apr 2018 12:04:13 +0300	[thread overview]
Message-ID: <a8ead09e-2a4d-b664-44f9-12f12262f093@virtuozzo.com> (raw)
In-Reply-To: <152292939368.19745.13784475656016424647.stgit@localhost.localdomain>

Hi,

almost two weeks passed, while there is no reaction.

Jeff, Bruce, what is your point of view?

Just to underline, the problem is because of rw_lock fairness, which does not
allow a reader to take a read locked lock in case of there is already a writer
called write_lock(). See queued_read_lock_slowpath() for the details. This makes
rw_lock fair, and it does not allow readers to occupy the lock forever without
a possibility for writers to take it.

Kirill

On 05.04.2018 14:58, Kirill Tkhai wrote:
> I observed the following deadlock between them:
> 
> [task 1]                          [task 2]                         [task 3]
> kill_fasync()                     mm_update_next_owner()           copy_process()
>  spin_lock_irqsave(&fa->fa_lock)   read_lock(&tasklist_lock)        write_lock_irq(&tasklist_lock)
>   send_sigio()                    <IRQ>                             ...
>    read_lock(&fown->lock)         kill_fasync()                     ...
>     read_lock(&tasklist_lock)      spin_lock_irqsave(&fa->fa_lock)  ...
> 
> Task 1 can't acquire read locked tasklist_lock, since there is
> already task 3 expressed its wish to take the lock exclusive.
> Task 2 holds the read locked lock, but it can't take the spin lock.
> 
> Also, there is possible another deadlock (which I haven't observed):
> 
> [task 1]                            [task 2]
> f_getown()                          kill_fasync()
>  read_lock(&f_own->lock)             spin_lock_irqsave(&fa->fa_lock,)
>  <IRQ>                               send_sigio()                     write_lock_irq(&f_own->lock)
>   kill_fasync()                       read_lock(&fown->lock)
>    spin_lock_irqsave(&fa->fa_lock,)
> 
> Actually, we do not need exclusive fa->fa_lock in kill_fasync_rcu(),
> as it guarantees fa->fa_file->f_owner integrity only. It may seem,
> that it used to give a task a small possibility to receive two sequential
> signals, if there are two parallel kill_fasync() callers, and task
> handles the first signal fastly, but the behaviour won't become
> different, since there is exclusive sighand lock in do_send_sig_info().
> 
> The patch converts fa_lock into rwlock_t, and this fixes two above
> deadlocks, as rwlock is allowed to be taken from interrupt handler
> by qrwlock design.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> I used the following program for testing:
> 
> #include <unistd.h>
> #include <stdlib.h>
> #include <signal.h>
> #include <fcntl.h>
> #include <errno.h>
> #include <stdio.h>
> 
> #ifndef F_SETSIG
> #define F_SETSIG 10
> #endif
> 
> void handler(int sig)
> {
> }
> 
> main()
> {
> 	unsigned int flags;
> 	int fd;
> 
> 	system("echo 8 > /proc/sys/kernel/random/read_wakeup_threshold");
> 	system("while :; do ls -R / > /dev/random 2>&1 ; echo 3 > /proc/sys/vm/drop_caches; done &");
> 
> 	if (signal(SIGINT, handler) < 0) {
> 		perror("Signal");
> 		exit(1);
> 	}
> 
> 	fd = open("/dev/random", O_RDWR);
> 	if (fd < 0) {
> 		perror("Can't open");
> 		exit(1);
> 	}
> 
> 	flags = FASYNC | fcntl(fd, F_GETFL);
> 	if (fcntl(fd, F_SETFL, flags) < 0) {
> 		perror("Setfl");
> 		exit(1);
> 	}
> 	if (fcntl(fd, F_SETOWN, getpid())) {
> 		perror("Setown");
> 		exit(1);
> 	}
> 	if (fcntl(fd, F_SETSIG, SIGINT)) {
> 		perror("Setsig");
> 		exit(1);
> 	}
> 
> 	while (1)
> 		sleep(100);
> }
> ---
>  fs/fcntl.c         |   15 +++++++--------
>  include/linux/fs.h |    2 +-
>  2 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 1e97f1fda90c..780161a11f9d 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -865,9 +865,9 @@ int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp)
>  		if (fa->fa_file != filp)
>  			continue;
>  
> -		spin_lock_irq(&fa->fa_lock);
> +		write_lock_irq(&fa->fa_lock);
>  		fa->fa_file = NULL;
> -		spin_unlock_irq(&fa->fa_lock);
> +		write_unlock_irq(&fa->fa_lock);
>  
>  		*fp = fa->fa_next;
>  		call_rcu(&fa->fa_rcu, fasync_free_rcu);
> @@ -912,13 +912,13 @@ struct fasync_struct *fasync_insert_entry(int fd, struct file *filp, struct fasy
>  		if (fa->fa_file != filp)
>  			continue;
>  
> -		spin_lock_irq(&fa->fa_lock);
> +		write_lock_irq(&fa->fa_lock);
>  		fa->fa_fd = fd;
> -		spin_unlock_irq(&fa->fa_lock);
> +		write_unlock_irq(&fa->fa_lock);
>  		goto out;
>  	}
>  
> -	spin_lock_init(&new->fa_lock);
> +	rwlock_init(&new->fa_lock);
>  	new->magic = FASYNC_MAGIC;
>  	new->fa_file = filp;
>  	new->fa_fd = fd;
> @@ -981,14 +981,13 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band)
>  {
>  	while (fa) {
>  		struct fown_struct *fown;
> -		unsigned long flags;
>  
>  		if (fa->magic != FASYNC_MAGIC) {
>  			printk(KERN_ERR "kill_fasync: bad magic number in "
>  			       "fasync_struct!\n");
>  			return;
>  		}
> -		spin_lock_irqsave(&fa->fa_lock, flags);
> +		read_lock(&fa->fa_lock);
>  		if (fa->fa_file) {
>  			fown = &fa->fa_file->f_owner;
>  			/* Don't send SIGURG to processes which have not set a
> @@ -997,7 +996,7 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band)
>  			if (!(sig == SIGURG && fown->signum == 0))
>  				send_sigio(fown, fa->fa_fd, band);
>  		}
> -		spin_unlock_irqrestore(&fa->fa_lock, flags);
> +		read_unlock(&fa->fa_lock);
>  		fa = rcu_dereference(fa->fa_next);
>  	}
>  }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c6baf767619e..297e2dcd9dd2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1250,7 +1250,7 @@ static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
>  }
>  
>  struct fasync_struct {
> -	spinlock_t		fa_lock;
> +	rwlock_t		fa_lock;
>  	int			magic;
>  	int			fa_fd;
>  	struct fasync_struct	*fa_next; /* singly linked list */
> 

  parent reply	other threads:[~2018-04-17  9:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-05 11:58 [PATCH] fasync: Fix deadlock between task-context and interrupt-context kill_fasync() Kirill Tkhai
2018-04-09  3:37 ` Sasha Levin
2018-04-17  9:04 ` Kirill Tkhai [this message]
2018-04-17 11:42 ` Jeff Layton
2018-04-17 11:53   ` Kirill Tkhai
2018-04-17 13:31     ` Jeff Layton
2018-04-17 13:59       ` Kirill Tkhai
2018-04-17 14:01 ` Matthew Wilcox
2018-04-17 14:15   ` Kirill Tkhai
2018-04-18 20:00     ` Jeff Layton
2018-04-18 22:40       ` Kirill Tkhai
2018-04-27 13:44   ` Boqun Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8ead09e-2a4d-b664-44f9-12f12262f093@virtuozzo.com \
    --to=ktkhai@virtuozzo.com \
    --cc=bfields@fieldses.org \
    --cc=boqun.feng@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.