linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] fcntl: fix potential deadlocks
@ 2021-07-07  2:35 Desmond Cheong Zhi Xi
  2021-07-07  2:35 ` [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock Desmond Cheong Zhi Xi
  2021-07-07  2:35 ` [PATCH v2 2/2] fcntl: fix potential deadlock for &fasync_struct.fa_lock Desmond Cheong Zhi Xi
  0 siblings, 2 replies; 18+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-07  2:35 UTC (permalink / raw)
  To: jlayton, bfields, viro
  Cc: Desmond Cheong Zhi Xi, linux-fsdevel, linux-kernel, skhan,
	gregkh, linux-kernel-mentees

Hi,

Sorry for the delay between v1 and v2, there was an unrelated issue with Syzbot testing.

Syzbot reports a possible irq lock inversion dependency:
https://syzkaller.appspot.com/bug?id=923cfc6c6348963f99886a0176ef11dcc429547b

While investigating this error, I discovered that multiple similar lock inversion scenarios can occur. Hence, this series addresses potential deadlocks for two classes of locks, one in each patch:

1. Fix potential deadlocks for &fown_struct.lock

2. Fix potential deadlock for &fasync_struct.fa_lock

v1 -> v2:
- Added WARN_ON_ONCE(irqs_disabled()) before calls to read_lock_irq, with elaboration in the commit message, as suggested by Jeff Layton.

Best wishes,
Desmond

Desmond Cheong Zhi Xi (2):
  fcntl: fix potential deadlocks for &fown_struct.lock
  fcntl: fix potential deadlock for &fasync_struct.fa_lock

 fs/fcntl.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07  2:35 [PATCH v2 0/2] fcntl: fix potential deadlocks Desmond Cheong Zhi Xi
@ 2021-07-07  2:35 ` Desmond Cheong Zhi Xi
  2021-07-07  6:05   ` Greg KH
  2021-07-07  2:35 ` [PATCH v2 2/2] fcntl: fix potential deadlock for &fasync_struct.fa_lock Desmond Cheong Zhi Xi
  1 sibling, 1 reply; 18+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-07  2:35 UTC (permalink / raw)
  To: jlayton, bfields, viro
  Cc: Desmond Cheong Zhi Xi, linux-fsdevel, linux-kernel, skhan,
	gregkh, linux-kernel-mentees, syzbot+e6d5398a02c516ce5e70

Syzbot reports a potential deadlock in do_fcntl:

========================================================
WARNING: possible irq lock inversion dependency detected
5.12.0-syzkaller #0 Not tainted
--------------------------------------------------------
syz-executor132/8391 just changed the state of lock:
ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: f_getown_ex fs/fcntl.c:211 [inline]
ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: do_fcntl+0x8b4/0x1200 fs/fcntl.c:395
but this lock was taken by another, HARDIRQ-safe lock in the past:
 (&dev->event_lock){-...}-{2:2}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
Chain exists of:
  &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&f->f_owner.lock);
                               local_irq_disable();
                               lock(&dev->event_lock);
                               lock(&new->fa_lock);
  <Interrupt>
    lock(&dev->event_lock);

 *** DEADLOCK ***

This happens because there is a lock hierarchy of
&dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
from the following call chain:

  input_inject_event():
    spin_lock_irqsave(&dev->event_lock,...);
    input_handle_event():
      input_pass_values():
        input_to_handler():
          evdev_events():
            evdev_pass_values():
              spin_lock(&client->buffer_lock);
              __pass_event():
                kill_fasync():
                  kill_fasync_rcu():
                    read_lock(&fa->fa_lock);
                    send_sigio():
                      read_lock_irqsave(&fown->lock,...);

However, since &dev->event_lock is HARDIRQ-safe, interrupts have to be
disabled while grabbing &f->f_owner.lock, otherwise we invert the lock
hierarchy.

Hence, we replace calls to read_lock/read_unlock on &f->f_owner.lock,
with read_lock_irq/read_unlock_irq.

Here read_lock_irq/read_unlock_irq should be safe to use because the
functions f_getown_ex and f_getowner_uids are only called from
do_fcntl, and f_getown is only called from do_fnctl and
sock_ioctl. do_fnctl itself is only called from syscalls.

For sock_ioctl, the chain is
  compat_sock_ioctl():
    compat_sock_ioctl_trans():
      sock_ioctl()

And interrupts are not disabled on either path. We assert this
assumption with WARN_ON_ONCE(irqs_disabled()). This check is also
inserted into another use of write_lock_irq in f_modown.

Reported-and-tested-by: syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
---
 fs/fcntl.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index dfc72f15be7f..262235e02c4b 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -88,6 +88,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
                      int force)
 {
+	WARN_ON_ONCE(irqs_disabled());
 	write_lock_irq(&filp->f_owner.lock);
 	if (force || !filp->f_owner.pid) {
 		put_pid(filp->f_owner.pid);
@@ -150,7 +151,9 @@ void f_delown(struct file *filp)
 pid_t f_getown(struct file *filp)
 {
 	pid_t pid = 0;
-	read_lock(&filp->f_owner.lock);
+
+	WARN_ON_ONCE(irqs_disabled());
+	read_lock_irq(&filp->f_owner.lock);
 	rcu_read_lock();
 	if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) {
 		pid = pid_vnr(filp->f_owner.pid);
@@ -158,7 +161,7 @@ pid_t f_getown(struct file *filp)
 			pid = -pid;
 	}
 	rcu_read_unlock();
-	read_unlock(&filp->f_owner.lock);
+	read_unlock_irq(&filp->f_owner.lock);
 	return pid;
 }
 
@@ -208,7 +211,8 @@ static int f_getown_ex(struct file *filp, unsigned long arg)
 	struct f_owner_ex owner = {};
 	int ret = 0;
 
-	read_lock(&filp->f_owner.lock);
+	WARN_ON_ONCE(irqs_disabled());
+	read_lock_irq(&filp->f_owner.lock);
 	rcu_read_lock();
 	if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type))
 		owner.pid = pid_vnr(filp->f_owner.pid);
@@ -231,7 +235,7 @@ static int f_getown_ex(struct file *filp, unsigned long arg)
 		ret = -EINVAL;
 		break;
 	}
-	read_unlock(&filp->f_owner.lock);
+	read_unlock_irq(&filp->f_owner.lock);
 
 	if (!ret) {
 		ret = copy_to_user(owner_p, &owner, sizeof(owner));
@@ -249,10 +253,11 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 	uid_t src[2];
 	int err;
 
-	read_lock(&filp->f_owner.lock);
+	WARN_ON_ONCE(irqs_disabled());
+	read_lock_irq(&filp->f_owner.lock);
 	src[0] = from_kuid(user_ns, filp->f_owner.uid);
 	src[1] = from_kuid(user_ns, filp->f_owner.euid);
-	read_unlock(&filp->f_owner.lock);
+	read_unlock_irq(&filp->f_owner.lock);
 
 	err  = put_user(src[0], &dst[0]);
 	err |= put_user(src[1], &dst[1]);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/2] fcntl: fix potential deadlock for &fasync_struct.fa_lock
  2021-07-07  2:35 [PATCH v2 0/2] fcntl: fix potential deadlocks Desmond Cheong Zhi Xi
  2021-07-07  2:35 ` [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock Desmond Cheong Zhi Xi
@ 2021-07-07  2:35 ` Desmond Cheong Zhi Xi
  1 sibling, 0 replies; 18+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-07  2:35 UTC (permalink / raw)
  To: jlayton, bfields, viro
  Cc: Desmond Cheong Zhi Xi, linux-fsdevel, linux-kernel, skhan,
	gregkh, linux-kernel-mentees

There is an existing lock hierarchy of
&dev->event_lock --> &fasync_struct.fa_lock --> &f->f_owner.lock
from the following call chain:

  input_inject_event():
    spin_lock_irqsave(&dev->event_lock,...);
    input_handle_event():
      input_pass_values():
        input_to_handler():
          evdev_events():
            evdev_pass_values():
              spin_lock(&client->buffer_lock);
              __pass_event():
                kill_fasync():
                  kill_fasync_rcu():
                    read_lock(&fa->fa_lock);
                    send_sigio():
                      read_lock_irqsave(&fown->lock,...);

&dev->event_lock is HARDIRQ-safe, so interrupts have to be disabled
while grabbing &fasync_struct.fa_lock, otherwise we invert the lock
hierarchy. However, since kill_fasync which calls kill_fasync_rcu is
an exported symbol, it may not necessarily be called with interrupts
disabled.

As kill_fasync_rcu may be called with interrupts disabled (for
example, in the call chain above), we replace calls to
read_lock/read_unlock on &fasync_struct.fa_lock in kill_fasync_rcu
with read_lock_irqsave/read_unlock_irqrestore.

Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
---
 fs/fcntl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 262235e02c4b..fd9c895d704c 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -1008,13 +1008,14 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band)
 {
 	while (fa) {
 		struct fown_struct *fown;
+		unsigned long flags;
 
 		if (fa->magic != FASYNC_MAGIC) {
 			printk(KERN_ERR "kill_fasync: bad magic number in "
 			       "fasync_struct!\n");
 			return;
 		}
-		read_lock(&fa->fa_lock);
+		read_lock_irqsave(&fa->fa_lock, flags);
 		if (fa->fa_file) {
 			fown = &fa->fa_file->f_owner;
 			/* Don't send SIGURG to processes which have not set a
@@ -1023,7 +1024,7 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band)
 			if (!(sig == SIGURG && fown->signum == 0))
 				send_sigio(fown, fa->fa_fd, band);
 		}
-		read_unlock(&fa->fa_lock);
+		read_unlock_irqrestore(&fa->fa_lock, flags);
 		fa = rcu_dereference(fa->fa_next);
 	}
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07  2:35 ` [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock Desmond Cheong Zhi Xi
@ 2021-07-07  6:05   ` Greg KH
  2021-07-07  6:54     ` Desmond Cheong Zhi Xi
  2021-07-07 10:44     ` Jeff Layton
  0 siblings, 2 replies; 18+ messages in thread
From: Greg KH @ 2021-07-07  6:05 UTC (permalink / raw)
  To: Desmond Cheong Zhi Xi
  Cc: jlayton, bfields, viro, linux-fsdevel, linux-kernel, skhan,
	linux-kernel-mentees, syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> Syzbot reports a potential deadlock in do_fcntl:
> 
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.12.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> syz-executor132/8391 just changed the state of lock:
> ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: f_getown_ex fs/fcntl.c:211 [inline]
> ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: do_fcntl+0x8b4/0x1200 fs/fcntl.c:395
> but this lock was taken by another, HARDIRQ-safe lock in the past:
>  (&dev->event_lock){-...}-{2:2}
> 
> and interrupts could create inverse lock ordering between them.
> 
> other info that might help us debug this:
> Chain exists of:
>   &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
> 
>  Possible interrupt unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&f->f_owner.lock);
>                                local_irq_disable();
>                                lock(&dev->event_lock);
>                                lock(&new->fa_lock);
>   <Interrupt>
>     lock(&dev->event_lock);
> 
>  *** DEADLOCK ***
> 
> This happens because there is a lock hierarchy of
> &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
> from the following call chain:
> 
>   input_inject_event():
>     spin_lock_irqsave(&dev->event_lock,...);
>     input_handle_event():
>       input_pass_values():
>         input_to_handler():
>           evdev_events():
>             evdev_pass_values():
>               spin_lock(&client->buffer_lock);
>               __pass_event():
>                 kill_fasync():
>                   kill_fasync_rcu():
>                     read_lock(&fa->fa_lock);
>                     send_sigio():
>                       read_lock_irqsave(&fown->lock,...);
> 
> However, since &dev->event_lock is HARDIRQ-safe, interrupts have to be
> disabled while grabbing &f->f_owner.lock, otherwise we invert the lock
> hierarchy.
> 
> Hence, we replace calls to read_lock/read_unlock on &f->f_owner.lock,
> with read_lock_irq/read_unlock_irq.
> 
> Here read_lock_irq/read_unlock_irq should be safe to use because the
> functions f_getown_ex and f_getowner_uids are only called from
> do_fcntl, and f_getown is only called from do_fnctl and
> sock_ioctl. do_fnctl itself is only called from syscalls.
> 
> For sock_ioctl, the chain is
>   compat_sock_ioctl():
>     compat_sock_ioctl_trans():
>       sock_ioctl()
> 
> And interrupts are not disabled on either path. We assert this
> assumption with WARN_ON_ONCE(irqs_disabled()). This check is also
> inserted into another use of write_lock_irq in f_modown.
> 
> Reported-and-tested-by: syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com
> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
> ---
>  fs/fcntl.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index dfc72f15be7f..262235e02c4b 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -88,6 +88,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
>  static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
>                       int force)
>  {
> +	WARN_ON_ONCE(irqs_disabled());

If this triggers, you just rebooted the box :(

Please never do this, either properly handle the problem and return an
error, or do not check for this.  It is not any type of "fix" at all,
and at most, a debugging aid while you work on the root problem.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07  6:05   ` Greg KH
@ 2021-07-07  6:54     ` Desmond Cheong Zhi Xi
  2021-07-07 10:44     ` Jeff Layton
  1 sibling, 0 replies; 18+ messages in thread
From: Desmond Cheong Zhi Xi @ 2021-07-07  6:54 UTC (permalink / raw)
  To: Greg KH
  Cc: jlayton, bfields, viro, linux-fsdevel, linux-kernel, skhan,
	linux-kernel-mentees, syzbot+e6d5398a02c516ce5e70

On 7/7/21 2:05 pm, Greg KH wrote:
> On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
>> Syzbot reports a potential deadlock in do_fcntl:
>>
>> ========================================================
>> WARNING: possible irq lock inversion dependency detected
>> 5.12.0-syzkaller #0 Not tainted
>> --------------------------------------------------------
>> syz-executor132/8391 just changed the state of lock:
>> ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: f_getown_ex fs/fcntl.c:211 [inline]
>> ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: do_fcntl+0x8b4/0x1200 fs/fcntl.c:395
>> but this lock was taken by another, HARDIRQ-safe lock in the past:
>>   (&dev->event_lock){-...}-{2:2}
>>
>> and interrupts could create inverse lock ordering between them.
>>
>> other info that might help us debug this:
>> Chain exists of:
>>    &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
>>
>>   Possible interrupt unsafe locking scenario:
>>
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(&f->f_owner.lock);
>>                                 local_irq_disable();
>>                                 lock(&dev->event_lock);
>>                                 lock(&new->fa_lock);
>>    <Interrupt>
>>      lock(&dev->event_lock);
>>
>>   *** DEADLOCK ***
>>
>> This happens because there is a lock hierarchy of
>> &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
>> from the following call chain:
>>
>>    input_inject_event():
>>      spin_lock_irqsave(&dev->event_lock,...);
>>      input_handle_event():
>>        input_pass_values():
>>          input_to_handler():
>>            evdev_events():
>>              evdev_pass_values():
>>                spin_lock(&client->buffer_lock);
>>                __pass_event():
>>                  kill_fasync():
>>                    kill_fasync_rcu():
>>                      read_lock(&fa->fa_lock);
>>                      send_sigio():
>>                        read_lock_irqsave(&fown->lock,...);
>>
>> However, since &dev->event_lock is HARDIRQ-safe, interrupts have to be
>> disabled while grabbing &f->f_owner.lock, otherwise we invert the lock
>> hierarchy.
>>
>> Hence, we replace calls to read_lock/read_unlock on &f->f_owner.lock,
>> with read_lock_irq/read_unlock_irq.
>>
>> Here read_lock_irq/read_unlock_irq should be safe to use because the
>> functions f_getown_ex and f_getowner_uids are only called from
>> do_fcntl, and f_getown is only called from do_fnctl and
>> sock_ioctl. do_fnctl itself is only called from syscalls.
>>
>> For sock_ioctl, the chain is
>>    compat_sock_ioctl():
>>      compat_sock_ioctl_trans():
>>        sock_ioctl()
>>
>> And interrupts are not disabled on either path. We assert this
>> assumption with WARN_ON_ONCE(irqs_disabled()). This check is also
>> inserted into another use of write_lock_irq in f_modown.
>>
>> Reported-and-tested-by: syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com
>> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
>> ---
>>   fs/fcntl.c | 17 +++++++++++------
>>   1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/fcntl.c b/fs/fcntl.c
>> index dfc72f15be7f..262235e02c4b 100644
>> --- a/fs/fcntl.c
>> +++ b/fs/fcntl.c
>> @@ -88,6 +88,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
>>   static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
>>                        int force)
>>   {
>> +	WARN_ON_ONCE(irqs_disabled());
> 
> If this triggers, you just rebooted the box :(
> 
> Please never do this, either properly handle the problem and return an
> error, or do not check for this.  It is not any type of "fix" at all,
> and at most, a debugging aid while you work on the root problem.
> 
> thanks,
> 
> greg k-h
> 

Hi Greg,

Thanks for the feedback. My bad, I was under the impression that 
WARN_ON_ONCE could be used to document assumptions for other developers, 
but I'll stick to using it for debugging in the future.

I think then in this case it would be best to keep the reasoning for why 
the *_irq() locks are safe to use in the commit message. I'll update the 
patch accordingly.

Best wishes,
Desmond

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07  6:05   ` Greg KH
  2021-07-07  6:54     ` Desmond Cheong Zhi Xi
@ 2021-07-07 10:44     ` Jeff Layton
  2021-07-07 10:51       ` Greg KH
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-07-07 10:44 UTC (permalink / raw)
  To: Greg KH, Desmond Cheong Zhi Xi
  Cc: bfields, viro, linux-fsdevel, linux-kernel, skhan,
	linux-kernel-mentees, syzbot+e6d5398a02c516ce5e70

On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > Syzbot reports a potential deadlock in do_fcntl:
> > 
> > ========================================================
> > WARNING: possible irq lock inversion dependency detected
> > 5.12.0-syzkaller #0 Not tainted
> > --------------------------------------------------------
> > syz-executor132/8391 just changed the state of lock:
> > ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: f_getown_ex fs/fcntl.c:211 [inline]
> > ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: do_fcntl+0x8b4/0x1200 fs/fcntl.c:395
> > but this lock was taken by another, HARDIRQ-safe lock in the past:
> >  (&dev->event_lock){-...}-{2:2}
> > 
> > and interrupts could create inverse lock ordering between them.
> > 
> > other info that might help us debug this:
> > Chain exists of:
> >   &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
> > 
> >  Possible interrupt unsafe locking scenario:
> > 
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&f->f_owner.lock);
> >                                local_irq_disable();
> >                                lock(&dev->event_lock);
> >                                lock(&new->fa_lock);
> >   <Interrupt>
> >     lock(&dev->event_lock);
> > 
> >  *** DEADLOCK ***
> > 
> > This happens because there is a lock hierarchy of
> > &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
> > from the following call chain:
> > 
> >   input_inject_event():
> >     spin_lock_irqsave(&dev->event_lock,...);
> >     input_handle_event():
> >       input_pass_values():
> >         input_to_handler():
> >           evdev_events():
> >             evdev_pass_values():
> >               spin_lock(&client->buffer_lock);
> >               __pass_event():
> >                 kill_fasync():
> >                   kill_fasync_rcu():
> >                     read_lock(&fa->fa_lock);
> >                     send_sigio():
> >                       read_lock_irqsave(&fown->lock,...);
> > 
> > However, since &dev->event_lock is HARDIRQ-safe, interrupts have to be
> > disabled while grabbing &f->f_owner.lock, otherwise we invert the lock
> > hierarchy.
> > 
> > Hence, we replace calls to read_lock/read_unlock on &f->f_owner.lock,
> > with read_lock_irq/read_unlock_irq.
> > 
> > Here read_lock_irq/read_unlock_irq should be safe to use because the
> > functions f_getown_ex and f_getowner_uids are only called from
> > do_fcntl, and f_getown is only called from do_fnctl and
> > sock_ioctl. do_fnctl itself is only called from syscalls.
> > 
> > For sock_ioctl, the chain is
> >   compat_sock_ioctl():
> >     compat_sock_ioctl_trans():
> >       sock_ioctl()
> > 
> > And interrupts are not disabled on either path. We assert this
> > assumption with WARN_ON_ONCE(irqs_disabled()). This check is also
> > inserted into another use of write_lock_irq in f_modown.
> > 
> > Reported-and-tested-by: syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com
> > Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
> > ---
> >  fs/fcntl.c | 17 +++++++++++------
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/fcntl.c b/fs/fcntl.c
> > index dfc72f15be7f..262235e02c4b 100644
> > --- a/fs/fcntl.c
> > +++ b/fs/fcntl.c
> > @@ -88,6 +88,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
> >  static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
> >                       int force)
> >  {
> > +	WARN_ON_ONCE(irqs_disabled());
> 
> If this triggers, you just rebooted the box :(
> 
> Please never do this, either properly handle the problem and return an
> error, or do not check for this.  It is not any type of "fix" at all,
> and at most, a debugging aid while you work on the root problem.
> 
> thanks,
> 
> greg k-h

Wait, what? Why would testing for irqs being disabled and throwing a
WARN_ON in that case crash the box?
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 10:44     ` Jeff Layton
@ 2021-07-07 10:51       ` Greg KH
  2021-07-07 11:40         ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2021-07-07 10:51 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Desmond Cheong Zhi Xi, bfields, viro, linux-fsdevel,
	linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > Syzbot reports a potential deadlock in do_fcntl:
> > > 
> > > ========================================================
> > > WARNING: possible irq lock inversion dependency detected
> > > 5.12.0-syzkaller #0 Not tainted
> > > --------------------------------------------------------
> > > syz-executor132/8391 just changed the state of lock:
> > > ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: f_getown_ex fs/fcntl.c:211 [inline]
> > > ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: do_fcntl+0x8b4/0x1200 fs/fcntl.c:395
> > > but this lock was taken by another, HARDIRQ-safe lock in the past:
> > >  (&dev->event_lock){-...}-{2:2}
> > > 
> > > and interrupts could create inverse lock ordering between them.
> > > 
> > > other info that might help us debug this:
> > > Chain exists of:
> > >   &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
> > > 
> > >  Possible interrupt unsafe locking scenario:
> > > 
> > >        CPU0                    CPU1
> > >        ----                    ----
> > >   lock(&f->f_owner.lock);
> > >                                local_irq_disable();
> > >                                lock(&dev->event_lock);
> > >                                lock(&new->fa_lock);
> > >   <Interrupt>
> > >     lock(&dev->event_lock);
> > > 
> > >  *** DEADLOCK ***
> > > 
> > > This happens because there is a lock hierarchy of
> > > &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
> > > from the following call chain:
> > > 
> > >   input_inject_event():
> > >     spin_lock_irqsave(&dev->event_lock,...);
> > >     input_handle_event():
> > >       input_pass_values():
> > >         input_to_handler():
> > >           evdev_events():
> > >             evdev_pass_values():
> > >               spin_lock(&client->buffer_lock);
> > >               __pass_event():
> > >                 kill_fasync():
> > >                   kill_fasync_rcu():
> > >                     read_lock(&fa->fa_lock);
> > >                     send_sigio():
> > >                       read_lock_irqsave(&fown->lock,...);
> > > 
> > > However, since &dev->event_lock is HARDIRQ-safe, interrupts have to be
> > > disabled while grabbing &f->f_owner.lock, otherwise we invert the lock
> > > hierarchy.
> > > 
> > > Hence, we replace calls to read_lock/read_unlock on &f->f_owner.lock,
> > > with read_lock_irq/read_unlock_irq.
> > > 
> > > Here read_lock_irq/read_unlock_irq should be safe to use because the
> > > functions f_getown_ex and f_getowner_uids are only called from
> > > do_fcntl, and f_getown is only called from do_fnctl and
> > > sock_ioctl. do_fnctl itself is only called from syscalls.
> > > 
> > > For sock_ioctl, the chain is
> > >   compat_sock_ioctl():
> > >     compat_sock_ioctl_trans():
> > >       sock_ioctl()
> > > 
> > > And interrupts are not disabled on either path. We assert this
> > > assumption with WARN_ON_ONCE(irqs_disabled()). This check is also
> > > inserted into another use of write_lock_irq in f_modown.
> > > 
> > > Reported-and-tested-by: syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com
> > > Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
> > > ---
> > >  fs/fcntl.c | 17 +++++++++++------
> > >  1 file changed, 11 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/fs/fcntl.c b/fs/fcntl.c
> > > index dfc72f15be7f..262235e02c4b 100644
> > > --- a/fs/fcntl.c
> > > +++ b/fs/fcntl.c
> > > @@ -88,6 +88,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
> > >  static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
> > >                       int force)
> > >  {
> > > +	WARN_ON_ONCE(irqs_disabled());
> > 
> > If this triggers, you just rebooted the box :(
> > 
> > Please never do this, either properly handle the problem and return an
> > error, or do not check for this.  It is not any type of "fix" at all,
> > and at most, a debugging aid while you work on the root problem.
> > 
> > thanks,
> > 
> > greg k-h
> 
> Wait, what? Why would testing for irqs being disabled and throwing a
> WARN_ON in that case crash the box?

If panic-on-warn is enabled, which is a common setting for systems these
days.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 10:51       ` Greg KH
@ 2021-07-07 11:40         ` Jeff Layton
  2021-07-07 13:51           ` J. Bruce Fields
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-07-07 11:40 UTC (permalink / raw)
  To: Greg KH
  Cc: Desmond Cheong Zhi Xi, bfields, viro, linux-fsdevel,
	linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > Syzbot reports a potential deadlock in do_fcntl:
> > > > 
> > > > ========================================================
> > > > WARNING: possible irq lock inversion dependency detected
> > > > 5.12.0-syzkaller #0 Not tainted
> > > > --------------------------------------------------------
> > > > syz-executor132/8391 just changed the state of lock:
> > > > ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: f_getown_ex fs/fcntl.c:211 [inline]
> > > > ffff888015967bf8 (&f->f_owner.lock){.+..}-{2:2}, at: do_fcntl+0x8b4/0x1200 fs/fcntl.c:395
> > > > but this lock was taken by another, HARDIRQ-safe lock in the past:
> > > >  (&dev->event_lock){-...}-{2:2}
> > > > 
> > > > and interrupts could create inverse lock ordering between them.
> > > > 
> > > > other info that might help us debug this:
> > > > Chain exists of:
> > > >   &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
> > > > 
> > > >  Possible interrupt unsafe locking scenario:
> > > > 
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(&f->f_owner.lock);
> > > >                                local_irq_disable();
> > > >                                lock(&dev->event_lock);
> > > >                                lock(&new->fa_lock);
> > > >   <Interrupt>
> > > >     lock(&dev->event_lock);
> > > > 
> > > >  *** DEADLOCK ***
> > > > 
> > > > This happens because there is a lock hierarchy of
> > > > &dev->event_lock --> &new->fa_lock --> &f->f_owner.lock
> > > > from the following call chain:
> > > > 
> > > >   input_inject_event():
> > > >     spin_lock_irqsave(&dev->event_lock,...);
> > > >     input_handle_event():
> > > >       input_pass_values():
> > > >         input_to_handler():
> > > >           evdev_events():
> > > >             evdev_pass_values():
> > > >               spin_lock(&client->buffer_lock);
> > > >               __pass_event():
> > > >                 kill_fasync():
> > > >                   kill_fasync_rcu():
> > > >                     read_lock(&fa->fa_lock);
> > > >                     send_sigio():
> > > >                       read_lock_irqsave(&fown->lock,...);
> > > > 
> > > > However, since &dev->event_lock is HARDIRQ-safe, interrupts have to be
> > > > disabled while grabbing &f->f_owner.lock, otherwise we invert the lock
> > > > hierarchy.
> > > > 
> > > > Hence, we replace calls to read_lock/read_unlock on &f->f_owner.lock,
> > > > with read_lock_irq/read_unlock_irq.
> > > > 
> > > > Here read_lock_irq/read_unlock_irq should be safe to use because the
> > > > functions f_getown_ex and f_getowner_uids are only called from
> > > > do_fcntl, and f_getown is only called from do_fnctl and
> > > > sock_ioctl. do_fnctl itself is only called from syscalls.
> > > > 
> > > > For sock_ioctl, the chain is
> > > >   compat_sock_ioctl():
> > > >     compat_sock_ioctl_trans():
> > > >       sock_ioctl()
> > > > 
> > > > And interrupts are not disabled on either path. We assert this
> > > > assumption with WARN_ON_ONCE(irqs_disabled()). This check is also
> > > > inserted into another use of write_lock_irq in f_modown.
> > > > 
> > > > Reported-and-tested-by: syzbot+e6d5398a02c516ce5e70@syzkaller.appspotmail.com
> > > > Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
> > > > ---
> > > >  fs/fcntl.c | 17 +++++++++++------
> > > >  1 file changed, 11 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/fs/fcntl.c b/fs/fcntl.c
> > > > index dfc72f15be7f..262235e02c4b 100644
> > > > --- a/fs/fcntl.c
> > > > +++ b/fs/fcntl.c
> > > > @@ -88,6 +88,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
> > > >  static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
> > > >                       int force)
> > > >  {
> > > > +	WARN_ON_ONCE(irqs_disabled());
> > > 
> > > If this triggers, you just rebooted the box :(
> > > 
> > > Please never do this, either properly handle the problem and return an
> > > error, or do not check for this.  It is not any type of "fix" at all,
> > > and at most, a debugging aid while you work on the root problem.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > Wait, what? Why would testing for irqs being disabled and throwing a
> > WARN_ON in that case crash the box?
> 
> If panic-on-warn is enabled, which is a common setting for systems these
> days.

Ok, that makes some sense.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 11:40         ` Jeff Layton
@ 2021-07-07 13:51           ` J. Bruce Fields
  2021-07-07 15:06             ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2021-07-07 13:51 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Greg KH, Desmond Cheong Zhi Xi, viro, linux-fsdevel,
	linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > 
> > > > If this triggers, you just rebooted the box :(
> > > > 
> > > > Please never do this, either properly handle the problem and return an
> > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > and at most, a debugging aid while you work on the root problem.
> > > > 
> > > > thanks,
> > > > 
> > > > greg k-h
> > > 
> > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > WARN_ON in that case crash the box?
> > 
> > If panic-on-warn is enabled, which is a common setting for systems these
> > days.
> 
> Ok, that makes some sense.

Wait, I don't get it.

How are we supposed to decide when to use WARN, when to use BUG, and
when to panic?  Do we really want to treat them all as equivalent?  And
who exactly is turning on panic-on-warn?

--b.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 13:51           ` J. Bruce Fields
@ 2021-07-07 15:06             ` Greg KH
  2021-07-07 15:19               ` J. Bruce Fields
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2021-07-07 15:06 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Desmond Cheong Zhi Xi, viro, linux-fsdevel,
	linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > 
> > > > > If this triggers, you just rebooted the box :(
> > > > > 
> > > > > Please never do this, either properly handle the problem and return an
> > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > and at most, a debugging aid while you work on the root problem.
> > > > > 
> > > > > thanks,
> > > > > 
> > > > > greg k-h
> > > > 
> > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > WARN_ON in that case crash the box?
> > > 
> > > If panic-on-warn is enabled, which is a common setting for systems these
> > > days.
> > 
> > Ok, that makes some sense.
> 
> Wait, I don't get it.
> 
> How are we supposed to decide when to use WARN, when to use BUG, and
> when to panic?  Do we really want to treat them all as equivalent?  And
> who exactly is turning on panic-on-warn?

You never use WARN or BUG, unless the system is so messed up that you
can not possibly recover from the issue.  Don't be lazy, handle the
error properly and report it upwards.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 15:06             ` Greg KH
@ 2021-07-07 15:19               ` J. Bruce Fields
  2021-07-07 15:31                 ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2021-07-07 15:19 UTC (permalink / raw)
  To: Greg KH
  Cc: Jeff Layton, Desmond Cheong Zhi Xi, viro, linux-fsdevel,
	linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > 
> > > > > > If this triggers, you just rebooted the box :(
> > > > > > 
> > > > > > Please never do this, either properly handle the problem and return an
> > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > 
> > > > > > thanks,
> > > > > > 
> > > > > > greg k-h
> > > > > 
> > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > WARN_ON in that case crash the box?
> > > > 
> > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > days.
> > > 
> > > Ok, that makes some sense.
> > 
> > Wait, I don't get it.
> > 
> > How are we supposed to decide when to use WARN, when to use BUG, and
> > when to panic?  Do we really want to treat them all as equivalent?  And
> > who exactly is turning on panic-on-warn?
> 
> You never use WARN or BUG, unless the system is so messed up that you
> can not possibly recover from the issue.

I've heard similar advice for BUG before, but this is the first I've
heard it for WARN.  Do we have any guidelines for how to choose between
WARN and BUG?

--b.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 15:19               ` J. Bruce Fields
@ 2021-07-07 15:31                 ` Greg KH
  2021-07-07 15:34                   ` J. Bruce Fields
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2021-07-07 15:31 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Desmond Cheong Zhi Xi, viro, linux-fsdevel,
	linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > 
> > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > 
> > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > 
> > > > > > > thanks,
> > > > > > > 
> > > > > > > greg k-h
> > > > > > 
> > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > WARN_ON in that case crash the box?
> > > > > 
> > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > days.
> > > > 
> > > > Ok, that makes some sense.
> > > 
> > > Wait, I don't get it.
> > > 
> > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > who exactly is turning on panic-on-warn?
> > 
> > You never use WARN or BUG, unless the system is so messed up that you
> > can not possibly recover from the issue.
> 
> I've heard similar advice for BUG before, but this is the first I've
> heard it for WARN.  Do we have any guidelines for how to choose between
> WARN and BUG?

Never use either :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 15:31                 ` Greg KH
@ 2021-07-07 15:34                   ` J. Bruce Fields
  2021-07-07 15:46                     ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2021-07-07 15:34 UTC (permalink / raw)
  To: Greg KH
  Cc: Jeff Layton, Desmond Cheong Zhi Xi, viro, linux-fsdevel,
	linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 05:31:06PM +0200, Greg KH wrote:
> On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> > On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > > 
> > > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > > 
> > > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > > 
> > > > > > > > thanks,
> > > > > > > > 
> > > > > > > > greg k-h
> > > > > > > 
> > > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > > WARN_ON in that case crash the box?
> > > > > > 
> > > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > > days.
> > > > > 
> > > > > Ok, that makes some sense.
> > > > 
> > > > Wait, I don't get it.
> > > > 
> > > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > > who exactly is turning on panic-on-warn?
> > > 
> > > You never use WARN or BUG, unless the system is so messed up that you
> > > can not possibly recover from the issue.
> > 
> > I've heard similar advice for BUG before, but this is the first I've
> > heard it for WARN.  Do we have any guidelines for how to choose between
> > WARN and BUG?
> 
> Never use either :)

I can't tell if you're kidding.  Is there some plan to remove them?

There are definitely cases where I've been able to resolve a problem
more quickly because I got a backtrace from a WARN.

--b.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 15:34                   ` J. Bruce Fields
@ 2021-07-07 15:46                     ` Greg KH
  2021-07-07 16:18                       ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2021-07-07 15:46 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Desmond Cheong Zhi Xi, viro, linux-fsdevel,
	linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 11:34:17AM -0400, J. Bruce Fields wrote:
> On Wed, Jul 07, 2021 at 05:31:06PM +0200, Greg KH wrote:
> > On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > > > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > > > 
> > > > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > > > 
> > > > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > > > 
> > > > > > > > > thanks,
> > > > > > > > > 
> > > > > > > > > greg k-h
> > > > > > > > 
> > > > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > > > WARN_ON in that case crash the box?
> > > > > > > 
> > > > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > > > days.
> > > > > > 
> > > > > > Ok, that makes some sense.
> > > > > 
> > > > > Wait, I don't get it.
> > > > > 
> > > > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > > > who exactly is turning on panic-on-warn?
> > > > 
> > > > You never use WARN or BUG, unless the system is so messed up that you
> > > > can not possibly recover from the issue.
> > > 
> > > I've heard similar advice for BUG before, but this is the first I've
> > > heard it for WARN.  Do we have any guidelines for how to choose between
> > > WARN and BUG?
> > 
> > Never use either :)
> 
> I can't tell if you're kidding.

I am not.

> Is there some plan to remove them?

Over time, yes.  And any WARN that userspace can ever hit should be
removed today.

> There are definitely cases where I've been able to resolve a problem
> more quickly because I got a backtrace from a WARN.

If you want a backtrace, ask for that, recover from the error, and move
on.  Do not allow userspace to reboot a machine for no good reason as
again, panic-on-warn is a common setting that people use now.

This is what all of the syzbot work has been doing, it triggers things
that cause WARN() to be hit and so we have to fix them.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 15:46                     ` Greg KH
@ 2021-07-07 16:18                       ` Jeff Layton
  2021-07-07 16:25                         ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2021-07-07 16:18 UTC (permalink / raw)
  To: Greg KH, J. Bruce Fields
  Cc: Desmond Cheong Zhi Xi, viro, linux-fsdevel, linux-kernel, skhan,
	linux-kernel-mentees, syzbot+e6d5398a02c516ce5e70

On Wed, 2021-07-07 at 17:46 +0200, Greg KH wrote:
> On Wed, Jul 07, 2021 at 11:34:17AM -0400, J. Bruce Fields wrote:
> > On Wed, Jul 07, 2021 at 05:31:06PM +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > > > > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > > > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > > > > 
> > > > > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > > > > 
> > > > > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > > > > 
> > > > > > > > > > thanks,
> > > > > > > > > > 
> > > > > > > > > > greg k-h
> > > > > > > > > 
> > > > > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > > > > WARN_ON in that case crash the box?
> > > > > > > > 
> > > > > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > > > > days.
> > > > > > > 
> > > > > > > Ok, that makes some sense.
> > > > > > 
> > > > > > Wait, I don't get it.
> > > > > > 
> > > > > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > > > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > > > > who exactly is turning on panic-on-warn?
> > > > > 
> > > > > You never use WARN or BUG, unless the system is so messed up that you
> > > > > can not possibly recover from the issue.
> > > > 
> > > > I've heard similar advice for BUG before, but this is the first I've
> > > > heard it for WARN.  Do we have any guidelines for how to choose between
> > > > WARN and BUG?
> > > 
> > > Never use either :)
> > 
> > I can't tell if you're kidding.
> 
> I am not.
> 
> > Is there some plan to remove them?
> 
> Over time, yes.  And any WARN that userspace can ever hit should be
> removed today.
> 
> > There are definitely cases where I've been able to resolve a problem
> > more quickly because I got a backtrace from a WARN.
> 
> If you want a backtrace, ask for that, recover from the error, and move
> on.  Do not allow userspace to reboot a machine for no good reason as
> again, panic-on-warn is a common setting that people use now.
> 
> This is what all of the syzbot work has been doing, it triggers things
> that cause WARN() to be hit and so we have to fix them.
> 

This seems really draconian. Clearly we do want to fix things that show
a WARN (otherwise we wouldn't bother warning about it), but I don't
think that's a reason to completely avoid them. My understanding has
always been:

BUG: for when you reach some condition where the kernel (probably) can't
carry on

WARN: for when you reach some condition that is problematic but where
the machine can probably soldier on. 

Over the last several years, I've changed a lot of BUGs into WARNs to
avoid crashing the box unnecessarily. If someone is setting
panic_on_warn, then aren't they just getting what they asked for?

While I don't feel that strongly about this particular WARN in this
patch, it seems like a reasonable thing to do. If someone calls these
functions with IRQs disabled, then they might end up with some subtle
problems that could be hard to detect otherwise.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 16:18                       ` Jeff Layton
@ 2021-07-07 16:25                         ` Matthew Wilcox
  2021-07-07 17:52                           ` Jeff Layton
  2021-07-07 17:58                           ` Eric Biggers
  0 siblings, 2 replies; 18+ messages in thread
From: Matthew Wilcox @ 2021-07-07 16:25 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Greg KH, J. Bruce Fields, Desmond Cheong Zhi Xi, viro,
	linux-fsdevel, linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 12:18:45PM -0400, Jeff Layton wrote:
> On Wed, 2021-07-07 at 17:46 +0200, Greg KH wrote:
> > On Wed, Jul 07, 2021 at 11:34:17AM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 07, 2021 at 05:31:06PM +0200, Greg KH wrote:
> > > > On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> > > > > On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > > > > > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > > > > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > > > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > > > > > 
> > > > > > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > > > > > 
> > > > > > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > > > > > 
> > > > > > > > > > > thanks,
> > > > > > > > > > > 
> > > > > > > > > > > greg k-h
> > > > > > > > > > 
> > > > > > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > > > > > WARN_ON in that case crash the box?
> > > > > > > > > 
> > > > > > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > > > > > days.
> > > > > > > > 
> > > > > > > > Ok, that makes some sense.
> > > > > > > 
> > > > > > > Wait, I don't get it.
> > > > > > > 
> > > > > > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > > > > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > > > > > who exactly is turning on panic-on-warn?
> > > > > > 
> > > > > > You never use WARN or BUG, unless the system is so messed up that you
> > > > > > can not possibly recover from the issue.
> > > > > 
> > > > > I've heard similar advice for BUG before, but this is the first I've
> > > > > heard it for WARN.  Do we have any guidelines for how to choose between
> > > > > WARN and BUG?
> > > > 
> > > > Never use either :)
> > > 
> > > I can't tell if you're kidding.
> > 
> > I am not.
> > 
> > > Is there some plan to remove them?
> > 
> > Over time, yes.  And any WARN that userspace can ever hit should be
> > removed today.
> > 
> > > There are definitely cases where I've been able to resolve a problem
> > > more quickly because I got a backtrace from a WARN.
> > 
> > If you want a backtrace, ask for that, recover from the error, and move
> > on.  Do not allow userspace to reboot a machine for no good reason as
> > again, panic-on-warn is a common setting that people use now.
> > 
> > This is what all of the syzbot work has been doing, it triggers things
> > that cause WARN() to be hit and so we have to fix them.
> > 
> 
> This seems really draconian. Clearly we do want to fix things that show
> a WARN (otherwise we wouldn't bother warning about it), but I don't
> think that's a reason to completely avoid them. My understanding has
> always been:
> 
> BUG: for when you reach some condition where the kernel (probably) can't
> carry on
> 
> WARN: for when you reach some condition that is problematic but where
> the machine can probably soldier on. 
> 
> Over the last several years, I've changed a lot of BUGs into WARNs to
> avoid crashing the box unnecessarily. If someone is setting
> panic_on_warn, then aren't they just getting what they asked for?
> 
> While I don't feel that strongly about this particular WARN in this
> patch, it seems like a reasonable thing to do. If someone calls these
> functions with IRQs disabled, then they might end up with some subtle
> problems that could be hard to detect otherwise.

Don't we already have a debugging option that would catch this?

config DEBUG_IRQFLAGS
        bool "Debug IRQ flag manipulation"
        help
          Enables checks for potentially unsafe enabling or disabling of
          interrupts, such as calling raw_local_irq_restore() when interrupts
          are enabled.

so I think this particular warn is unnecessary.

But I also disagree with Greg.  Normal users aren't setting panic-on-warn.
Various build bots are setting panic-on-warn -- and they should -- because
we shouldn't be able to trigger these kinds of warnings from userspace.
Those are bugs that should be fixed.  But there's no reason to shy away
from using a WARN when it's the right thing to do.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 16:25                         ` Matthew Wilcox
@ 2021-07-07 17:52                           ` Jeff Layton
  2021-07-07 17:58                           ` Eric Biggers
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2021-07-07 17:52 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Greg KH, J. Bruce Fields, Desmond Cheong Zhi Xi, viro,
	linux-fsdevel, linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, 2021-07-07 at 17:25 +0100, Matthew Wilcox wrote:
> On Wed, Jul 07, 2021 at 12:18:45PM -0400, Jeff Layton wrote:
> > On Wed, 2021-07-07 at 17:46 +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 11:34:17AM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jul 07, 2021 at 05:31:06PM +0200, Greg KH wrote:
> > > > > On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> > > > > > On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > > > > > > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > > > > > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > > > > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > > > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > > > > > > 
> > > > > > > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > > > > > > 
> > > > > > > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > > > > > > 
> > > > > > > > > > > > thanks,
> > > > > > > > > > > > 
> > > > > > > > > > > > greg k-h
> > > > > > > > > > > 
> > > > > > > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > > > > > > WARN_ON in that case crash the box?
> > > > > > > > > > 
> > > > > > > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > > > > > > days.
> > > > > > > > > 
> > > > > > > > > Ok, that makes some sense.
> > > > > > > > 
> > > > > > > > Wait, I don't get it.
> > > > > > > > 
> > > > > > > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > > > > > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > > > > > > who exactly is turning on panic-on-warn?
> > > > > > > 
> > > > > > > You never use WARN or BUG, unless the system is so messed up that you
> > > > > > > can not possibly recover from the issue.
> > > > > > 
> > > > > > I've heard similar advice for BUG before, but this is the first I've
> > > > > > heard it for WARN.  Do we have any guidelines for how to choose between
> > > > > > WARN and BUG?
> > > > > 
> > > > > Never use either :)
> > > > 
> > > > I can't tell if you're kidding.
> > > 
> > > I am not.
> > > 
> > > > Is there some plan to remove them?
> > > 
> > > Over time, yes.  And any WARN that userspace can ever hit should be
> > > removed today.
> > > 
> > > > There are definitely cases where I've been able to resolve a problem
> > > > more quickly because I got a backtrace from a WARN.
> > > 
> > > If you want a backtrace, ask for that, recover from the error, and move
> > > on.  Do not allow userspace to reboot a machine for no good reason as
> > > again, panic-on-warn is a common setting that people use now.
> > > 
> > > This is what all of the syzbot work has been doing, it triggers things
> > > that cause WARN() to be hit and so we have to fix them.
> > > 
> > 
> > This seems really draconian. Clearly we do want to fix things that show
> > a WARN (otherwise we wouldn't bother warning about it), but I don't
> > think that's a reason to completely avoid them. My understanding has
> > always been:
> > 
> > BUG: for when you reach some condition where the kernel (probably) can't
> > carry on
> > 
> > WARN: for when you reach some condition that is problematic but where
> > the machine can probably soldier on. 
> > 
> > Over the last several years, I've changed a lot of BUGs into WARNs to
> > avoid crashing the box unnecessarily. If someone is setting
> > panic_on_warn, then aren't they just getting what they asked for?
> > 
> > While I don't feel that strongly about this particular WARN in this
> > patch, it seems like a reasonable thing to do. If someone calls these
> > functions with IRQs disabled, then they might end up with some subtle
> > problems that could be hard to detect otherwise.
> 
> Don't we already have a debugging option that would catch this?
> 
> config DEBUG_IRQFLAGS
>         bool "Debug IRQ flag manipulation"
>         help
>           Enables checks for potentially unsafe enabling or disabling of
>           interrupts, such as calling raw_local_irq_restore() when interrupts
>           are enabled.
> 
> so I think this particular warn is unnecessary.
> 

Good to know. I'm just going to leave Desmond's v1 patch (which didn't
have this WARN_ON) in linux-next for now.

> But I also disagree with Greg.  Normal users aren't setting panic-on-warn.
> Various build bots are setting panic-on-warn -- and they should -- because
> we shouldn't be able to trigger these kinds of warnings from userspace.
> Those are bugs that should be fixed.  But there's no reason to shy away
> from using a WARN when it's the right thing to do.

Agreed.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock
  2021-07-07 16:25                         ` Matthew Wilcox
  2021-07-07 17:52                           ` Jeff Layton
@ 2021-07-07 17:58                           ` Eric Biggers
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2021-07-07 17:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jeff Layton, Greg KH, J. Bruce Fields, Desmond Cheong Zhi Xi,
	viro, linux-fsdevel, linux-kernel, skhan, linux-kernel-mentees,
	syzbot+e6d5398a02c516ce5e70

On Wed, Jul 07, 2021 at 05:25:19PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 07, 2021 at 12:18:45PM -0400, Jeff Layton wrote:
> > On Wed, 2021-07-07 at 17:46 +0200, Greg KH wrote:
> > > On Wed, Jul 07, 2021 at 11:34:17AM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jul 07, 2021 at 05:31:06PM +0200, Greg KH wrote:
> > > > > On Wed, Jul 07, 2021 at 11:19:36AM -0400, J. Bruce Fields wrote:
> > > > > > On Wed, Jul 07, 2021 at 05:06:45PM +0200, Greg KH wrote:
> > > > > > > On Wed, Jul 07, 2021 at 09:51:29AM -0400, J. Bruce Fields wrote:
> > > > > > > > On Wed, Jul 07, 2021 at 07:40:47AM -0400, Jeff Layton wrote:
> > > > > > > > > On Wed, 2021-07-07 at 12:51 +0200, Greg KH wrote:
> > > > > > > > > > On Wed, Jul 07, 2021 at 06:44:42AM -0400, Jeff Layton wrote:
> > > > > > > > > > > On Wed, 2021-07-07 at 08:05 +0200, Greg KH wrote:
> > > > > > > > > > > > On Wed, Jul 07, 2021 at 10:35:47AM +0800, Desmond Cheong Zhi Xi wrote:
> > > > > > > > > > > > > +	WARN_ON_ONCE(irqs_disabled());
> > > > > > > > > > > > 
> > > > > > > > > > > > If this triggers, you just rebooted the box :(
> > > > > > > > > > > > 
> > > > > > > > > > > > Please never do this, either properly handle the problem and return an
> > > > > > > > > > > > error, or do not check for this.  It is not any type of "fix" at all,
> > > > > > > > > > > > and at most, a debugging aid while you work on the root problem.
> > > > > > > > > > > > 
> > > > > > > > > > > > thanks,
> > > > > > > > > > > > 
> > > > > > > > > > > > greg k-h
> > > > > > > > > > > 
> > > > > > > > > > > Wait, what? Why would testing for irqs being disabled and throwing a
> > > > > > > > > > > WARN_ON in that case crash the box?
> > > > > > > > > > 
> > > > > > > > > > If panic-on-warn is enabled, which is a common setting for systems these
> > > > > > > > > > days.
> > > > > > > > > 
> > > > > > > > > Ok, that makes some sense.
> > > > > > > > 
> > > > > > > > Wait, I don't get it.
> > > > > > > > 
> > > > > > > > How are we supposed to decide when to use WARN, when to use BUG, and
> > > > > > > > when to panic?  Do we really want to treat them all as equivalent?  And
> > > > > > > > who exactly is turning on panic-on-warn?
> > > > > > > 
> > > > > > > You never use WARN or BUG, unless the system is so messed up that you
> > > > > > > can not possibly recover from the issue.
> > > > > > 
> > > > > > I've heard similar advice for BUG before, but this is the first I've
> > > > > > heard it for WARN.  Do we have any guidelines for how to choose between
> > > > > > WARN and BUG?
> > > > > 
> > > > > Never use either :)
> > > > 
> > > > I can't tell if you're kidding.
> > > 
> > > I am not.
> > > 
> > > > Is there some plan to remove them?
> > > 
> > > Over time, yes.  And any WARN that userspace can ever hit should be
> > > removed today.
> > > 
> > > > There are definitely cases where I've been able to resolve a problem
> > > > more quickly because I got a backtrace from a WARN.
> > > 
> > > If you want a backtrace, ask for that, recover from the error, and move
> > > on.  Do not allow userspace to reboot a machine for no good reason as
> > > again, panic-on-warn is a common setting that people use now.
> > > 
> > > This is what all of the syzbot work has been doing, it triggers things
> > > that cause WARN() to be hit and so we have to fix them.
> > > 
> > 
> > This seems really draconian. Clearly we do want to fix things that show
> > a WARN (otherwise we wouldn't bother warning about it), but I don't
> > think that's a reason to completely avoid them. My understanding has
> > always been:
> > 
> > BUG: for when you reach some condition where the kernel (probably) can't
> > carry on
> > 
> > WARN: for when you reach some condition that is problematic but where
> > the machine can probably soldier on. 
> > 
> > Over the last several years, I've changed a lot of BUGs into WARNs to
> > avoid crashing the box unnecessarily. If someone is setting
> > panic_on_warn, then aren't they just getting what they asked for?
> > 
> > While I don't feel that strongly about this particular WARN in this
> > patch, it seems like a reasonable thing to do. If someone calls these
> > functions with IRQs disabled, then they might end up with some subtle
> > problems that could be hard to detect otherwise.
> 
> Don't we already have a debugging option that would catch this?
> 
> config DEBUG_IRQFLAGS
>         bool "Debug IRQ flag manipulation"
>         help
>           Enables checks for potentially unsafe enabling or disabling of
>           interrupts, such as calling raw_local_irq_restore() when interrupts
>           are enabled.
> 
> so I think this particular warn is unnecessary.
> 
> But I also disagree with Greg.  Normal users aren't setting panic-on-warn.
> Various build bots are setting panic-on-warn -- and they should -- because
> we shouldn't be able to trigger these kinds of warnings from userspace.
> Those are bugs that should be fixed.  But there's no reason to shy away
> from using a WARN when it's the right thing to do.

Yes, WARN is the right choice for signaling a kernel bug that is "recoverable",
e.g. by returning an error or simply ignoring it.  WARN is the wrong choice only
when the condition is user-triggerable, e.g. via invalid inputs passed to system
calls.  I don't understand why Greg is advocating against all use of WARN; that
would make it harder for kernel bugs to be found and reported.  Users of
panic_on_warn (which are usually test systems) *want* the kernel to panic if an
assertion fails -- that's the whole point of it.  I'm not sure why we are still
having this discussion, as the differences between and valid uses for WARN and
BUG were documented in include/asm-generic/bug.h a long time ago...

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-07-07 17:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07  2:35 [PATCH v2 0/2] fcntl: fix potential deadlocks Desmond Cheong Zhi Xi
2021-07-07  2:35 ` [PATCH v2 1/2] fcntl: fix potential deadlocks for &fown_struct.lock Desmond Cheong Zhi Xi
2021-07-07  6:05   ` Greg KH
2021-07-07  6:54     ` Desmond Cheong Zhi Xi
2021-07-07 10:44     ` Jeff Layton
2021-07-07 10:51       ` Greg KH
2021-07-07 11:40         ` Jeff Layton
2021-07-07 13:51           ` J. Bruce Fields
2021-07-07 15:06             ` Greg KH
2021-07-07 15:19               ` J. Bruce Fields
2021-07-07 15:31                 ` Greg KH
2021-07-07 15:34                   ` J. Bruce Fields
2021-07-07 15:46                     ` Greg KH
2021-07-07 16:18                       ` Jeff Layton
2021-07-07 16:25                         ` Matthew Wilcox
2021-07-07 17:52                           ` Jeff Layton
2021-07-07 17:58                           ` Eric Biggers
2021-07-07  2:35 ` [PATCH v2 2/2] fcntl: fix potential deadlock for &fasync_struct.fa_lock Desmond Cheong Zhi Xi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).