All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
@ 2023-10-03 17:00 Lee Jones
  2023-10-03 18:14 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 29+ messages in thread
From: Lee Jones @ 2023-10-03 17:00 UTC (permalink / raw)
  To: lee
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	Greg Kroah-Hartman, linux-serial, syzbot+5f47a8cea6a12b77a876

The fundamental issue here is that gsmld_write() uses spin_lock_irqsave()
to ensure mutual exclusion.  spin_lock_irqsave() disables IRQs,
essentially placing the thread of execution into atomic mode and
therefore must not sleep at any point.  Unfortunately, the call chain
eventually ends up in __might_resched() which complains loudly.

The BUG() splat looks like this:

    BUG: sleeping function called from invalid context at kernel/printk/printk.c:2627
    in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 13029, name: (agetty)
    preempt_count: 1, expected: 0
    RCU nest depth: 0, expected: 0
    3 locks held by (agetty)/13029:
     #0: ffff888112fc70a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x21/0x70
     #1: ffff888112fc7130 (&tty->atomic_write_lock){+.+.}-{3:3}, at: file_tty_write+0x1e4/0x9d0
     #2: ffff8881053c73e0 (&gsm->tx_lock){....}-{2:2}, at: gsmld_write+0x5b/0x120
    irq event stamp: 2506
    hardirqs last  enabled at (2505): [<ffffffff8b007b0e>] syscall_enter_from_user_mode+0x2e/0x1a0
    hardirqs last disabled at (2506): [<ffffffff8b0ab5cc>] _raw_spin_lock_irqsave+0xac/0x120
    softirqs last  enabled at (514): [<ffffffff81561c9c>] __irq_exit_rcu+0xec/0x160
    softirqs last disabled at (509): [<ffffffff81561c9c>] __irq_exit_rcu+0xec/0x160
    Preemption disabled at:
    [<0000000000000000>] 0x0
    CPU: 1 PID: 13029 Comm: (agetty) Not tainted 6.6.0-rc3-next-20230929-00002-gcdf323998d5b #17
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl+0x1e3/0x2d0
     ? nf_tcp_handle_invalid+0x650/0x650
     ? panic+0x8a0/0x8a0
     __might_resched+0x5ce/0x790
     ? __might_sleep+0xc0/0xc0
     ? reacquire_held_locks+0x680/0x680
     console_lock+0x1c/0x1b0
     do_con_write+0x118/0x7410
     ? stack_trace_snprint+0xf0/0xf0
     ? lockdep_unlock+0x163/0x300
     ? lockdep_unlock+0x163/0x300
     ? mark_lock+0x2a0/0x350
     ? __lock_acquire+0x1302/0x2050
     ? vt_resize+0xc0/0xc0
     ? do_raw_spin_lock+0x147/0x3a0
     ? __rwlock_init+0x140/0x140
     ? _raw_spin_lock_irqsave+0xdd/0x120
     ? _raw_spin_lock+0x40/0x40
     con_write+0x29/0x640
     ? check_heap_object+0x249/0x870
     gsmld_write+0xf9/0x120
     ? gsmld_read+0x10/0x10
     file_tty_write+0x57c/0x9d0
     vfs_write+0x77d/0xaf0
     ? file_end_write+0x250/0x250
     ? tty_ioctl+0x8fa/0xbc0
     ? __fdget_pos+0x1d2/0x330
     ksys_write+0x19b/0x2c0
      ? print_irqtrace_events+0x220/0x220
     ? __ia32_sys_read+0x80/0x80
     ? syscall_enter_from_user_mode+0x2e/0x1a0
     ? syscall_enter_from_user_mode+0x2e/0x1a0
     do_syscall_64+0x2b/0x70
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    RIP: 0033:0x7f76546101b0

The important part of the call stack being:

  gsmld_write()             # Takes a lock and disables IRQs
    con_write()
      console_lock()
        __might_sleep()
          __might_resched() # Complains that IRQs are disabled

To fix this, let's ensure mutual exclusion by using a protected shared
variable (busy) instead.  We'll use the current locking mechanism to
protect it, but ensure that the locks are released and IRQs re-enabled
by the time we transit further down the call chain which may sleep.

Cc: Daniel Starke <daniel.starke@siemens.com>
Cc: Fedor Pchelkin <pchelkin@ispras.ru>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-serial@vger.kernel.org
Reported-by: syzbot+5f47a8cea6a12b77a876@syzkaller.appspotmail.com
Signed-off-by: Lee Jones <lee@kernel.org>
---
 drivers/tty/n_gsm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 1f3aba607cd51..b83a97d58381f 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -270,6 +270,7 @@ struct gsm_mux {
 	struct tty_struct *tty;		/* The tty our ldisc is bound to */
 	spinlock_t lock;
 	struct mutex mutex;
+	bool busy;
 	unsigned int num;
 	struct kref ref;
 
@@ -3253,6 +3254,7 @@ static struct gsm_mux *gsm_alloc_mux(void)
 	gsm->dead = true;	/* Avoid early tty opens */
 	gsm->wait_config = false; /* Disabled */
 	gsm->keep_alive = 0;	/* Disabled */
+	gsm->busy = false;
 
 	/* Store the instance to the mux array or abort if no space is
 	 * available.
@@ -3718,11 +3720,21 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
 
 	ret = -ENOBUFS;
 	spin_lock_irqsave(&gsm->tx_lock, flags);
+	if (gsm->busy) {
+		spin_unlock_irqrestore(&gsm->tx_lock, flags);
+		return -EBUSY;
+	}
+	gsm->busy = true;
+	spin_unlock_irqrestore(&gsm->tx_lock, flags);
+
 	space = tty_write_room(tty);
 	if (space >= nr)
 		ret = tty->ops->write(tty, buf, nr);
 	else
 		set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
+
+	spin_lock_irqsave(&gsm->tx_lock, flags);
+	gsm->busy = false;
 	spin_unlock_irqrestore(&gsm->tx_lock, flags);
 
 	return ret;
-- 
2.42.0.582.g8ccd20d70d-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-03 17:00 [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic Lee Jones
@ 2023-10-03 18:14 ` Greg Kroah-Hartman
  2023-10-03 18:55   ` Lee Jones
  2023-10-04  5:59   ` Starke, Daniel
  0 siblings, 2 replies; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-03 18:14 UTC (permalink / raw)
  To: Lee Jones
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Tue, Oct 03, 2023 at 06:00:20PM +0100, Lee Jones wrote:
> The important part of the call stack being:
> 
>   gsmld_write()             # Takes a lock and disables IRQs
>     con_write()
>       console_lock()

Wait, why is the n_gsm line discipline being used for a console?

What hardware/protocol wants this to happen?

gsm I thought was for a very specific type of device, not a console.

As per:
	https://www.kernel.org/doc/html/v5.9/driver-api/serial/n_gsm.html
this is a specific modem protocol, why is con_write() being called?


>         __might_sleep()
>           __might_resched() # Complains that IRQs are disabled
> 
> To fix this, let's ensure mutual exclusion by using a protected shared
> variable (busy) instead.  We'll use the current locking mechanism to
> protect it, but ensure that the locks are released and IRQs re-enabled
> by the time we transit further down the call chain which may sleep.
> 
> Cc: Daniel Starke <daniel.starke@siemens.com>
> Cc: Fedor Pchelkin <pchelkin@ispras.ru>
> Cc: Jiri Slaby <jirislaby@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: linux-serial@vger.kernel.org
> Reported-by: syzbot+5f47a8cea6a12b77a876@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones <lee@kernel.org>
> ---
>  drivers/tty/n_gsm.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
> index 1f3aba607cd51..b83a97d58381f 100644
> --- a/drivers/tty/n_gsm.c
> +++ b/drivers/tty/n_gsm.c
> @@ -270,6 +270,7 @@ struct gsm_mux {
>  	struct tty_struct *tty;		/* The tty our ldisc is bound to */
>  	spinlock_t lock;
>  	struct mutex mutex;
> +	bool busy;
>  	unsigned int num;
>  	struct kref ref;
>  
> @@ -3253,6 +3254,7 @@ static struct gsm_mux *gsm_alloc_mux(void)
>  	gsm->dead = true;	/* Avoid early tty opens */
>  	gsm->wait_config = false; /* Disabled */
>  	gsm->keep_alive = 0;	/* Disabled */
> +	gsm->busy = false;
>  
>  	/* Store the instance to the mux array or abort if no space is
>  	 * available.
> @@ -3718,11 +3720,21 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
>  
>  	ret = -ENOBUFS;
>  	spin_lock_irqsave(&gsm->tx_lock, flags);
> +	if (gsm->busy) {
> +		spin_unlock_irqrestore(&gsm->tx_lock, flags);
> +		return -EBUSY;

So you just "busted" the re-entrant call chain here, are you sure this
is ok for this protocl?  Can it handle -EBUSY?

Daniel, any thoughts?

And Lee, you really don't have this hardware, right?  So why are you
dealing with bug reports for it?  :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-03 18:14 ` Greg Kroah-Hartman
@ 2023-10-03 18:55   ` Lee Jones
  2023-10-04  6:04     ` Greg Kroah-Hartman
  2023-10-04  5:59   ` Starke, Daniel
  1 sibling, 1 reply; 29+ messages in thread
From: Lee Jones @ 2023-10-03 18:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Tue, 03 Oct 2023, Greg Kroah-Hartman wrote:

> On Tue, Oct 03, 2023 at 06:00:20PM +0100, Lee Jones wrote:
> > The important part of the call stack being:
> > 
> >   gsmld_write()             # Takes a lock and disables IRQs
> >     con_write()
> >       console_lock()
> 
> Wait, why is the n_gsm line discipline being used for a console?
> 
> What hardware/protocol wants this to happen?
> 
> gsm I thought was for a very specific type of device, not a console.
> 
> As per:
> 	https://www.kernel.org/doc/html/v5.9/driver-api/serial/n_gsm.html
> this is a specific modem protocol, why is con_write() being called?

What it's meant for and what random users can make it do are likely to
be quite separate questions.  This scenario is user driven and can be
replicated simply by issuing a few syscalls (open, ioctl, write).

> >         __might_sleep()
> >           __might_resched() # Complains that IRQs are disabled
> > 
> > To fix this, let's ensure mutual exclusion by using a protected shared
> > variable (busy) instead.  We'll use the current locking mechanism to
> > protect it, but ensure that the locks are released and IRQs re-enabled
> > by the time we transit further down the call chain which may sleep.
> > 
> > Cc: Daniel Starke <daniel.starke@siemens.com>
> > Cc: Fedor Pchelkin <pchelkin@ispras.ru>
> > Cc: Jiri Slaby <jirislaby@kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: linux-serial@vger.kernel.org
> > Reported-by: syzbot+5f47a8cea6a12b77a876@syzkaller.appspotmail.com
> > Signed-off-by: Lee Jones <lee@kernel.org>
> > ---
> >  drivers/tty/n_gsm.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
> > index 1f3aba607cd51..b83a97d58381f 100644
> > --- a/drivers/tty/n_gsm.c
> > +++ b/drivers/tty/n_gsm.c
> > @@ -270,6 +270,7 @@ struct gsm_mux {
> >  	struct tty_struct *tty;		/* The tty our ldisc is bound to */
> >  	spinlock_t lock;
> >  	struct mutex mutex;
> > +	bool busy;
> >  	unsigned int num;
> >  	struct kref ref;
> >  
> > @@ -3253,6 +3254,7 @@ static struct gsm_mux *gsm_alloc_mux(void)
> >  	gsm->dead = true;	/* Avoid early tty opens */
> >  	gsm->wait_config = false; /* Disabled */
> >  	gsm->keep_alive = 0;	/* Disabled */
> > +	gsm->busy = false;
> >  
> >  	/* Store the instance to the mux array or abort if no space is
> >  	 * available.
> > @@ -3718,11 +3720,21 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
> >  
> >  	ret = -ENOBUFS;
> >  	spin_lock_irqsave(&gsm->tx_lock, flags);
> > +	if (gsm->busy) {
> > +		spin_unlock_irqrestore(&gsm->tx_lock, flags);
> > +		return -EBUSY;
> 
> So you just "busted" the re-entrant call chain here, are you sure this
> is ok for this protocl?  Can it handle -EBUSY?

I should have marked this submission as RFC.  Mea culpa.  Please
consider it as such going forward.  Feedback like this is highly
valuable.

> Daniel, any thoughts?
> 
> And Lee, you really don't have this hardware, right?  So why are you
> dealing with bug reports for it?  :)

'cos Syzkaller.

And no, as per the splat above, this was reproduced in qemu.

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-03 18:14 ` Greg Kroah-Hartman
  2023-10-03 18:55   ` Lee Jones
@ 2023-10-04  5:59   ` Starke, Daniel
  2023-10-04  6:05     ` Greg Kroah-Hartman
  1 sibling, 1 reply; 29+ messages in thread
From: Starke, Daniel @ 2023-10-04  5:59 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Lee Jones
  Cc: linux-kernel, Fedor Pchelkin, Jiri Slaby, linux-serial,
	syzbot+5f47a8cea6a12b77a876

> Daniel, any thoughts?

Our application of this protocol is only with specific modems to enable
circuit switched operation (handling calls, selecting/querying networks,
etc.) while doing packet switched communication (i.e. IP traffic over PPP).
The protocol was developed for such use cases.

Regarding the issue itself:
There was already an attempt to fix all this by switching from spinlocks to
mutexes resulting in ~20% performance loss. However, the patch was reverted
as it did not handle the T1 timer leading into sleep during atomic within
gsm_dlci_t1() on every mutex lock there.
There was also a suggestion to fix this in do_con_write() as
tty_operations::write() appears to be documented as "not allowed to sleep".
The patch for this was rejected. It did not fix the issue within n_gsm.

Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/

Best regards,
Daniel Starke

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-03 18:55   ` Lee Jones
@ 2023-10-04  6:04     ` Greg Kroah-Hartman
  2023-10-04  9:09       ` Lee Jones
  0 siblings, 1 reply; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-04  6:04 UTC (permalink / raw)
  To: Lee Jones
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Tue, Oct 03, 2023 at 07:55:00PM +0100, Lee Jones wrote:
> On Tue, 03 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Tue, Oct 03, 2023 at 06:00:20PM +0100, Lee Jones wrote:
> > > The important part of the call stack being:
> > > 
> > >   gsmld_write()             # Takes a lock and disables IRQs
> > >     con_write()
> > >       console_lock()
> > 
> > Wait, why is the n_gsm line discipline being used for a console?
> > 
> > What hardware/protocol wants this to happen?
> > 
> > gsm I thought was for a very specific type of device, not a console.
> > 
> > As per:
> > 	https://www.kernel.org/doc/html/v5.9/driver-api/serial/n_gsm.html
> > this is a specific modem protocol, why is con_write() being called?
> 
> What it's meant for and what random users can make it do are likely to
> be quite separate questions.  This scenario is user driven and can be
> replicated simply by issuing a few syscalls (open, ioctl, write).

I would recommend that any distro/system that does not want to support
this specific hardware protocol, just disable it for now (it's marked as
experimental too), if they don't want to deal with the potential
sleep-while-atomic issue.

> > And Lee, you really don't have this hardware, right?  So why are you
> > dealing with bug reports for it?  :)
> 
> 'cos Syzkaller.

Ah, yeah, fake report, no real issue here then :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04  5:59   ` Starke, Daniel
@ 2023-10-04  6:05     ` Greg Kroah-Hartman
  2023-10-04  8:40       ` Jiri Slaby
  2023-10-04  8:57       ` Lee Jones
  0 siblings, 2 replies; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-04  6:05 UTC (permalink / raw)
  To: Starke, Daniel
  Cc: Lee Jones, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > Daniel, any thoughts?
> 
> Our application of this protocol is only with specific modems to enable
> circuit switched operation (handling calls, selecting/querying networks,
> etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> The protocol was developed for such use cases.
> 
> Regarding the issue itself:
> There was already an attempt to fix all this by switching from spinlocks to
> mutexes resulting in ~20% performance loss. However, the patch was reverted
> as it did not handle the T1 timer leading into sleep during atomic within
> gsm_dlci_t1() on every mutex lock there.
> There was also a suggestion to fix this in do_con_write() as
> tty_operations::write() appears to be documented as "not allowed to sleep".
> The patch for this was rejected. It did not fix the issue within n_gsm.
> 
> Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/

Ok, I thought I remembered this, I'll just drop this patch from my
review queue and wait for a better solution if it ever comes up as this
isn't a real issue that people are seeing on actual systems, but just a
syzbot report.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04  6:05     ` Greg Kroah-Hartman
@ 2023-10-04  8:40       ` Jiri Slaby
  2023-10-04  8:58         ` Starke, Daniel
  2023-10-04  8:57       ` Lee Jones
  1 sibling, 1 reply; 29+ messages in thread
From: Jiri Slaby @ 2023-10-04  8:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Starke, Daniel
  Cc: Lee Jones, linux-kernel, Fedor Pchelkin, linux-serial,
	syzbot+5f47a8cea6a12b77a876

On 04. 10. 23, 8:05, Greg Kroah-Hartman wrote:
> On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
>>> Daniel, any thoughts?
>>
>> Our application of this protocol is only with specific modems to enable
>> circuit switched operation (handling calls, selecting/querying networks,
>> etc.) while doing packet switched communication (i.e. IP traffic over PPP).
>> The protocol was developed for such use cases.
>>
>> Regarding the issue itself:
>> There was already an attempt to fix all this by switching from spinlocks to
>> mutexes resulting in ~20% performance loss. However, the patch was reverted
>> as it did not handle the T1 timer leading into sleep during atomic within
>> gsm_dlci_t1() on every mutex lock there.
>> There was also a suggestion to fix this in do_con_write() as
>> tty_operations::write() appears to be documented as "not allowed to sleep".
>> The patch for this was rejected. It did not fix the issue within n_gsm.
>>
>> Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
>> Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
>> Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> 
> Ok, I thought I remembered this, I'll just drop this patch from my
> review queue and wait for a better solution if it ever comes up as this
> isn't a real issue that people are seeing on actual systems, but just a
> syzbot report.

I remember too and it is not easy.

So without actually looking into the code, can we just somehow disallow 
attaching this line discipline to a console (ban such a TIOCSETD) and be 
done with this issue? IOW disallow root to shoot their foot.

regards,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04  6:05     ` Greg Kroah-Hartman
  2023-10-04  8:40       ` Jiri Slaby
@ 2023-10-04  8:57       ` Lee Jones
  2023-10-04 12:21         ` Greg Kroah-Hartman
  1 sibling, 1 reply; 29+ messages in thread
From: Lee Jones @ 2023-10-04  8:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:

> On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > Daniel, any thoughts?
> > 
> > Our application of this protocol is only with specific modems to enable
> > circuit switched operation (handling calls, selecting/querying networks,
> > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > The protocol was developed for such use cases.
> > 
> > Regarding the issue itself:
> > There was already an attempt to fix all this by switching from spinlocks to
> > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > as it did not handle the T1 timer leading into sleep during atomic within
> > gsm_dlci_t1() on every mutex lock there.

That's correct.  When I initially saw this report, my initial thought
was to replace the spinlocks with mutexts, but having read the previous
accepted attempt and it's subsequent reversion I started to think of
other ways to solve this issue.  This solution, unlike the last, does
not involve adding sleep inducing locks into atomic contexts, nor
should it negatively affect performance.

> > There was also a suggestion to fix this in do_con_write() as
> > tty_operations::write() appears to be documented as "not allowed to sleep".
> > The patch for this was rejected. It did not fix the issue within n_gsm.
> > 
> > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> 
> Ok, I thought I remembered this, I'll just drop this patch from my
> review queue and wait for a better solution if it ever comes up as this
> isn't a real issue that people are seeing on actual systems, but just a
> syzbot report.

What does the "better solution" look like?

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04  8:40       ` Jiri Slaby
@ 2023-10-04  8:58         ` Starke, Daniel
  0 siblings, 0 replies; 29+ messages in thread
From: Starke, Daniel @ 2023-10-04  8:58 UTC (permalink / raw)
  To: Jiri Slaby, Greg Kroah-Hartman
  Cc: Lee Jones, linux-kernel, Fedor Pchelkin, linux-serial,
	syzbot+5f47a8cea6a12b77a876

> So without actually looking into the code, can we just somehow disallow
> attaching this line discipline to a console (ban such a TIOCSETD) and be
> done with this issue? IOW disallow root to shoot their foot.

Probably possible by patching gsmld_open() if there is a function which can
check this based on a struct tty_struct * handle. That may, however,
introduce some cross references that are hard to maintain.

Best regards,
Daniel Starke

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04  6:04     ` Greg Kroah-Hartman
@ 2023-10-04  9:09       ` Lee Jones
  2023-10-04  9:55         ` Greg Kroah-Hartman
  2023-10-04 12:19         ` Greg Kroah-Hartman
  0 siblings, 2 replies; 29+ messages in thread
From: Lee Jones @ 2023-10-04  9:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:

> On Tue, Oct 03, 2023 at 07:55:00PM +0100, Lee Jones wrote:
> > On Tue, 03 Oct 2023, Greg Kroah-Hartman wrote:
> > 
> > > On Tue, Oct 03, 2023 at 06:00:20PM +0100, Lee Jones wrote:
> > > > The important part of the call stack being:
> > > > 
> > > >   gsmld_write()             # Takes a lock and disables IRQs
> > > >     con_write()
> > > >       console_lock()
> > > 
> > > Wait, why is the n_gsm line discipline being used for a console?
> > > 
> > > What hardware/protocol wants this to happen?
> > > 
> > > gsm I thought was for a very specific type of device, not a console.
> > > 
> > > As per:
> > > 	https://www.kernel.org/doc/html/v5.9/driver-api/serial/n_gsm.html
> > > this is a specific modem protocol, why is con_write() being called?
> > 
> > What it's meant for and what random users can make it do are likely to
> > be quite separate questions.  This scenario is user driven and can be
> > replicated simply by issuing a few syscalls (open, ioctl, write).
> 
> I would recommend that any distro/system that does not want to support
> this specific hardware protocol, just disable it for now (it's marked as
> experimental too), if they don't want to deal with the potential
> sleep-while-atomic issue.

n_gsm is available on all the systems I have available.  The mention of
'EXPERIMENTAL' in the module description appears to have zero effect on
whether distros choose to make it available or not.  If you're saying
that we know this module is BROKEN however, then perhaps we should mark
it as such.

> > > And Lee, you really don't have this hardware, right?  So why are you
> > > dealing with bug reports for it?  :)
> > 
> > 'cos Syzkaller.
> 
> Ah, yeah, fake report, no real issue here then :)

Ouch!  The way I see it, the present issue with Syzkaller is that we do
not have the resources to remedy all of the issues it flags.  Passing
off all reports as 'not real issues' is going to make engineers who
decide to work on them feel undervalued and is likely have a detrimental
effect overall.  As an ambassador for young and new people trying to get
into Kernel Engineering in general, is that really the outcome you're
after?

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04  9:09       ` Lee Jones
@ 2023-10-04  9:55         ` Greg Kroah-Hartman
  2023-10-04 12:19         ` Greg Kroah-Hartman
  1 sibling, 0 replies; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-04  9:55 UTC (permalink / raw)
  To: Lee Jones
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, Oct 04, 2023 at 10:09:18AM +0100, Lee Jones wrote:
> On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Tue, Oct 03, 2023 at 07:55:00PM +0100, Lee Jones wrote:
> > > On Tue, 03 Oct 2023, Greg Kroah-Hartman wrote:
> > > 
> > > > On Tue, Oct 03, 2023 at 06:00:20PM +0100, Lee Jones wrote:
> > > > > The important part of the call stack being:
> > > > > 
> > > > >   gsmld_write()             # Takes a lock and disables IRQs
> > > > >     con_write()
> > > > >       console_lock()
> > > > 
> > > > Wait, why is the n_gsm line discipline being used for a console?
> > > > 
> > > > What hardware/protocol wants this to happen?
> > > > 
> > > > gsm I thought was for a very specific type of device, not a console.
> > > > 
> > > > As per:
> > > > 	https://www.kernel.org/doc/html/v5.9/driver-api/serial/n_gsm.html
> > > > this is a specific modem protocol, why is con_write() being called?
> > > 
> > > What it's meant for and what random users can make it do are likely to
> > > be quite separate questions.  This scenario is user driven and can be
> > > replicated simply by issuing a few syscalls (open, ioctl, write).
> > 
> > I would recommend that any distro/system that does not want to support
> > this specific hardware protocol, just disable it for now (it's marked as
> > experimental too), if they don't want to deal with the potential
> > sleep-while-atomic issue.
> 
> n_gsm is available on all the systems I have available.

Then file a bug with your distro to disable it?  No real general purpose
distro should enable it from what I can tell.

> The mention of
> 'EXPERIMENTAL' in the module description appears to have zero effect on
> whether distros choose to make it available or not.  If you're saying
> that we know this module is BROKEN however, then perhaps we should mark
> it as such.

Or we just prevent it from being bound to a console as that's not
something that should be happening.

And again, the "worst" that can happen is the calling process locks up,
due to a lock inversion, right?

> > > > And Lee, you really don't have this hardware, right?  So why are you
> > > > dealing with bug reports for it?  :)
> > > 
> > > 'cos Syzkaller.
> > 
> > Ah, yeah, fake report, no real issue here then :)
> 
> Ouch!  The way I see it, the present issue with Syzkaller is that we do
> not have the resources to remedy all of the issues it flags.  Passing
> off all reports as 'not real issues' is going to make engineers who
> decide to work on them feel undervalued and is likely have a detrimental
> effect overall.  As an ambassador for young and new people trying to get
> into Kernel Engineering in general, is that really the outcome you're
> after?

That's not what I'm saying here at all, what I'm saying is "pick issues
that are real".  syzbot does not always make it obvious what is, and is
not, a real issue.  There have been long threads and discussions about
this and some developers are now just ignoring all syzbot reports (see
the filesystem thread on the ksummit discuss list for more details.)

For this specific issue, it's been much-reported, and is not trivial,
and I would argue, not a "real" problem in the grand scheme of things
for normal users to worry about.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04  9:09       ` Lee Jones
  2023-10-04  9:55         ` Greg Kroah-Hartman
@ 2023-10-04 12:19         ` Greg Kroah-Hartman
  2023-10-04 12:57           ` Lee Jones
  1 sibling, 1 reply; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-04 12:19 UTC (permalink / raw)
  To: Lee Jones
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, Oct 04, 2023 at 10:09:18AM +0100, Lee Jones wrote:
> On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Tue, Oct 03, 2023 at 07:55:00PM +0100, Lee Jones wrote:
> > > On Tue, 03 Oct 2023, Greg Kroah-Hartman wrote:
> > > 
> > > > On Tue, Oct 03, 2023 at 06:00:20PM +0100, Lee Jones wrote:
> > > > > The important part of the call stack being:
> > > > > 
> > > > >   gsmld_write()             # Takes a lock and disables IRQs
> > > > >     con_write()
> > > > >       console_lock()
> > > > 
> > > > Wait, why is the n_gsm line discipline being used for a console?
> > > > 
> > > > What hardware/protocol wants this to happen?
> > > > 
> > > > gsm I thought was for a very specific type of device, not a console.
> > > > 
> > > > As per:
> > > > 	https://www.kernel.org/doc/html/v5.9/driver-api/serial/n_gsm.html
> > > > this is a specific modem protocol, why is con_write() being called?
> > > 
> > > What it's meant for and what random users can make it do are likely to
> > > be quite separate questions.  This scenario is user driven and can be
> > > replicated simply by issuing a few syscalls (open, ioctl, write).
> > 
> > I would recommend that any distro/system that does not want to support
> > this specific hardware protocol, just disable it for now (it's marked as
> > experimental too), if they don't want to deal with the potential
> > sleep-while-atomic issue.
> 
> n_gsm is available on all the systems I have available.  The mention of
> 'EXPERIMENTAL' in the module description appears to have zero effect on
> whether distros choose to make it available or not.  If you're saying
> that we know this module is BROKEN however, then perhaps we should mark
> it as such.

Also, I think this requires root to set this line discipline to the
console, right?  A normal user can't do that, or am I missing a code
path here?

Is there a reproducer somewhere for this issue that runs as a normal
user?  I couldn't find one in the syzbot listings but I might have been
not looking deep enough.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04  8:57       ` Lee Jones
@ 2023-10-04 12:21         ` Greg Kroah-Hartman
  2023-10-04 12:57           ` Lee Jones
  0 siblings, 1 reply; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-04 12:21 UTC (permalink / raw)
  To: Lee Jones
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote:
> On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > > Daniel, any thoughts?
> > > 
> > > Our application of this protocol is only with specific modems to enable
> > > circuit switched operation (handling calls, selecting/querying networks,
> > > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > > The protocol was developed for such use cases.
> > > 
> > > Regarding the issue itself:
> > > There was already an attempt to fix all this by switching from spinlocks to
> > > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > > as it did not handle the T1 timer leading into sleep during atomic within
> > > gsm_dlci_t1() on every mutex lock there.
> 
> That's correct.  When I initially saw this report, my initial thought
> was to replace the spinlocks with mutexts, but having read the previous
> accepted attempt and it's subsequent reversion I started to think of
> other ways to solve this issue.  This solution, unlike the last, does
> not involve adding sleep inducing locks into atomic contexts, nor
> should it negatively affect performance.
> 
> > > There was also a suggestion to fix this in do_con_write() as
> > > tty_operations::write() appears to be documented as "not allowed to sleep".
> > > The patch for this was rejected. It did not fix the issue within n_gsm.
> > > 
> > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> > 
> > Ok, I thought I remembered this, I'll just drop this patch from my
> > review queue and wait for a better solution if it ever comes up as this
> > isn't a real issue that people are seeing on actual systems, but just a
> > syzbot report.
> 
> What does the "better solution" look like?

One that actually fixes the root problem here (i.e. does not break the
recursion loop, or cause a performance decrease for normal users, or
prevent this from being bound to the console).

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04 12:21         ` Greg Kroah-Hartman
@ 2023-10-04 12:57           ` Lee Jones
  2023-10-04 13:14             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 29+ messages in thread
From: Lee Jones @ 2023-10-04 12:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:

> On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote:
> > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > 
> > > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > > > Daniel, any thoughts?
> > > > 
> > > > Our application of this protocol is only with specific modems to enable
> > > > circuit switched operation (handling calls, selecting/querying networks,
> > > > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > > > The protocol was developed for such use cases.
> > > > 
> > > > Regarding the issue itself:
> > > > There was already an attempt to fix all this by switching from spinlocks to
> > > > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > > > as it did not handle the T1 timer leading into sleep during atomic within
> > > > gsm_dlci_t1() on every mutex lock there.
> > 
> > That's correct.  When I initially saw this report, my initial thought
> > was to replace the spinlocks with mutexts, but having read the previous
> > accepted attempt and it's subsequent reversion I started to think of
> > other ways to solve this issue.  This solution, unlike the last, does
> > not involve adding sleep inducing locks into atomic contexts, nor
> > should it negatively affect performance.
> > 
> > > > There was also a suggestion to fix this in do_con_write() as
> > > > tty_operations::write() appears to be documented as "not allowed to sleep".
> > > > The patch for this was rejected. It did not fix the issue within n_gsm.
> > > > 
> > > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> > > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> > > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> > > 
> > > Ok, I thought I remembered this, I'll just drop this patch from my
> > > review queue and wait for a better solution if it ever comes up as this
> > > isn't a real issue that people are seeing on actual systems, but just a
> > > syzbot report.
> > 
> > What does the "better solution" look like?
> 
> One that actually fixes the root problem here (i.e. does not break the
> recursion loop, or cause a performance decrease for normal users, or
> prevent this from being bound to the console).

Does this solution break the recursion loop or affect performance?

The last suggestion was recently made (after mine was posted).

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04 12:19         ` Greg Kroah-Hartman
@ 2023-10-04 12:57           ` Lee Jones
  2023-10-04 13:13             ` Greg Kroah-Hartman
  2023-10-05  4:59             ` Jiri Slaby
  0 siblings, 2 replies; 29+ messages in thread
From: Lee Jones @ 2023-10-04 12:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:

> On Wed, Oct 04, 2023 at 10:09:18AM +0100, Lee Jones wrote:
> > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > 
> > > On Tue, Oct 03, 2023 at 07:55:00PM +0100, Lee Jones wrote:
> > > > On Tue, 03 Oct 2023, Greg Kroah-Hartman wrote:
> > > > 
> > > > > On Tue, Oct 03, 2023 at 06:00:20PM +0100, Lee Jones wrote:
> > > > > > The important part of the call stack being:
> > > > > > 
> > > > > >   gsmld_write()             # Takes a lock and disables IRQs
> > > > > >     con_write()
> > > > > >       console_lock()
> > > > > 
> > > > > Wait, why is the n_gsm line discipline being used for a console?
> > > > > 
> > > > > What hardware/protocol wants this to happen?
> > > > > 
> > > > > gsm I thought was for a very specific type of device, not a console.
> > > > > 
> > > > > As per:
> > > > > 	https://www.kernel.org/doc/html/v5.9/driver-api/serial/n_gsm.html
> > > > > this is a specific modem protocol, why is con_write() being called?
> > > > 
> > > > What it's meant for and what random users can make it do are likely to
> > > > be quite separate questions.  This scenario is user driven and can be
> > > > replicated simply by issuing a few syscalls (open, ioctl, write).
> > > 
> > > I would recommend that any distro/system that does not want to support
> > > this specific hardware protocol, just disable it for now (it's marked as
> > > experimental too), if they don't want to deal with the potential
> > > sleep-while-atomic issue.
> > 
> > n_gsm is available on all the systems I have available.  The mention of
> > 'EXPERIMENTAL' in the module description appears to have zero effect on
> > whether distros choose to make it available or not.  If you're saying
> > that we know this module is BROKEN however, then perhaps we should mark
> > it as such.
> 
> Also, I think this requires root to set this line discipline to the
> console, right?  A normal user can't do that, or am I missing a code
> path here?

I haven't been testing long, but yes, early indications show that root
is required.

> Is there a reproducer somewhere for this issue that runs as a normal
> user?  I couldn't find one in the syzbot listings but I might have been
> not looking deep enough.

https://syzkaller.appspot.com/text?tag=ReproC&x=15578d8fa80000

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04 12:57           ` Lee Jones
@ 2023-10-04 13:13             ` Greg Kroah-Hartman
  2023-10-05  4:59             ` Jiri Slaby
  1 sibling, 0 replies; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-04 13:13 UTC (permalink / raw)
  To: Lee Jones
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, Oct 04, 2023 at 01:57:58PM +0100, Lee Jones wrote:
> On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Wed, Oct 04, 2023 at 10:09:18AM +0100, Lee Jones wrote:
> > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > 
> > > > On Tue, Oct 03, 2023 at 07:55:00PM +0100, Lee Jones wrote:
> > > > > On Tue, 03 Oct 2023, Greg Kroah-Hartman wrote:
> > > > > 
> > > > > > On Tue, Oct 03, 2023 at 06:00:20PM +0100, Lee Jones wrote:
> > > > > > > The important part of the call stack being:
> > > > > > > 
> > > > > > >   gsmld_write()             # Takes a lock and disables IRQs
> > > > > > >     con_write()
> > > > > > >       console_lock()
> > > > > > 
> > > > > > Wait, why is the n_gsm line discipline being used for a console?
> > > > > > 
> > > > > > What hardware/protocol wants this to happen?
> > > > > > 
> > > > > > gsm I thought was for a very specific type of device, not a console.
> > > > > > 
> > > > > > As per:
> > > > > > 	https://www.kernel.org/doc/html/v5.9/driver-api/serial/n_gsm.html
> > > > > > this is a specific modem protocol, why is con_write() being called?
> > > > > 
> > > > > What it's meant for and what random users can make it do are likely to
> > > > > be quite separate questions.  This scenario is user driven and can be
> > > > > replicated simply by issuing a few syscalls (open, ioctl, write).
> > > > 
> > > > I would recommend that any distro/system that does not want to support
> > > > this specific hardware protocol, just disable it for now (it's marked as
> > > > experimental too), if they don't want to deal with the potential
> > > > sleep-while-atomic issue.
> > > 
> > > n_gsm is available on all the systems I have available.  The mention of
> > > 'EXPERIMENTAL' in the module description appears to have zero effect on
> > > whether distros choose to make it available or not.  If you're saying
> > > that we know this module is BROKEN however, then perhaps we should mark
> > > it as such.
> > 
> > Also, I think this requires root to set this line discipline to the
> > console, right?  A normal user can't do that, or am I missing a code
> > path here?
> 
> I haven't been testing long, but yes, early indications show that root
> is required.

Oh good, then this really isn't that high of a priority to get fixed as
root can do much worse things :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04 12:57           ` Lee Jones
@ 2023-10-04 13:14             ` Greg Kroah-Hartman
  2023-10-05  9:03               ` Lee Jones
  0 siblings, 1 reply; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-04 13:14 UTC (permalink / raw)
  To: Lee Jones
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, Oct 04, 2023 at 01:57:04PM +0100, Lee Jones wrote:
> On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote:
> > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > 
> > > > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > > > > Daniel, any thoughts?
> > > > > 
> > > > > Our application of this protocol is only with specific modems to enable
> > > > > circuit switched operation (handling calls, selecting/querying networks,
> > > > > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > > > > The protocol was developed for such use cases.
> > > > > 
> > > > > Regarding the issue itself:
> > > > > There was already an attempt to fix all this by switching from spinlocks to
> > > > > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > > > > as it did not handle the T1 timer leading into sleep during atomic within
> > > > > gsm_dlci_t1() on every mutex lock there.
> > > 
> > > That's correct.  When I initially saw this report, my initial thought
> > > was to replace the spinlocks with mutexts, but having read the previous
> > > accepted attempt and it's subsequent reversion I started to think of
> > > other ways to solve this issue.  This solution, unlike the last, does
> > > not involve adding sleep inducing locks into atomic contexts, nor
> > > should it negatively affect performance.
> > > 
> > > > > There was also a suggestion to fix this in do_con_write() as
> > > > > tty_operations::write() appears to be documented as "not allowed to sleep".
> > > > > The patch for this was rejected. It did not fix the issue within n_gsm.
> > > > > 
> > > > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> > > > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> > > > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> > > > 
> > > > Ok, I thought I remembered this, I'll just drop this patch from my
> > > > review queue and wait for a better solution if it ever comes up as this
> > > > isn't a real issue that people are seeing on actual systems, but just a
> > > > syzbot report.
> > > 
> > > What does the "better solution" look like?
> > 
> > One that actually fixes the root problem here (i.e. does not break the
> > recursion loop, or cause a performance decrease for normal users, or
> > prevent this from being bound to the console).
> 
> Does this solution break the recursion loop or affect performance?

This solution broke the recursion by returning an error, right?

The performance one was by using mutexes as in previous attempts.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04 12:57           ` Lee Jones
  2023-10-04 13:13             ` Greg Kroah-Hartman
@ 2023-10-05  4:59             ` Jiri Slaby
  2023-10-05  6:05               ` Greg Kroah-Hartman
  1 sibling, 1 reply; 29+ messages in thread
From: Jiri Slaby @ 2023-10-05  4:59 UTC (permalink / raw)
  To: Lee Jones, Greg Kroah-Hartman
  Cc: linux-kernel, Daniel Starke, Fedor Pchelkin, linux-serial,
	syzbot+5f47a8cea6a12b77a876

On 04. 10. 23, 14:57, Lee Jones wrote:
>>> n_gsm is available on all the systems I have available.  The mention of
>>> 'EXPERIMENTAL' in the module description appears to have zero effect on
>>> whether distros choose to make it available or not.  If you're saying
>>> that we know this module is BROKEN however, then perhaps we should mark
>>> it as such.
>>
>> Also, I think this requires root to set this line discipline to the
>> console, right?  A normal user can't do that, or am I missing a code
>> path here?
> 
> I haven't been testing long, but yes, early indications show that root
> is required.

FWIW I concluded to the same yesterday, so I disputed the connected 
CVE-2023-31082. Waiting for mitre's ack/nack.

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05  4:59             ` Jiri Slaby
@ 2023-10-05  6:05               ` Greg Kroah-Hartman
  2023-10-05  7:26                 ` Lee Jones
  0 siblings, 1 reply; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-05  6:05 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Lee Jones, linux-kernel, Daniel Starke, Fedor Pchelkin,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, Oct 05, 2023 at 06:59:50AM +0200, Jiri Slaby wrote:
> On 04. 10. 23, 14:57, Lee Jones wrote:
> > > > n_gsm is available on all the systems I have available.  The mention of
> > > > 'EXPERIMENTAL' in the module description appears to have zero effect on
> > > > whether distros choose to make it available or not.  If you're saying
> > > > that we know this module is BROKEN however, then perhaps we should mark
> > > > it as such.
> > > 
> > > Also, I think this requires root to set this line discipline to the
> > > console, right?  A normal user can't do that, or am I missing a code
> > > path here?
> > 
> > I haven't been testing long, but yes, early indications show that root
> > is required.
> 
> FWIW I concluded to the same yesterday, so I disputed the connected
> CVE-2023-31082. Waiting for mitre's ack/nack.

Trying to dispute obviously-wrong CVEs is a never-ending task.

Personally, it's fun to see who keeps popping up to attempt to resolve
this issue showing what companies have their security policies dictated
by a random government authority that can be modified by anyone :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05  6:05               ` Greg Kroah-Hartman
@ 2023-10-05  7:26                 ` Lee Jones
  0 siblings, 0 replies; 29+ messages in thread
From: Lee Jones @ 2023-10-05  7:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, linux-kernel, Daniel Starke, Fedor Pchelkin,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, 05 Oct 2023, Greg Kroah-Hartman wrote:

> On Thu, Oct 05, 2023 at 06:59:50AM +0200, Jiri Slaby wrote:
> > On 04. 10. 23, 14:57, Lee Jones wrote:
> > > > > n_gsm is available on all the systems I have available.  The mention of
> > > > > 'EXPERIMENTAL' in the module description appears to have zero effect on
> > > > > whether distros choose to make it available or not.  If you're saying
> > > > > that we know this module is BROKEN however, then perhaps we should mark
> > > > > it as such.
> > > > 
> > > > Also, I think this requires root to set this line discipline to the
> > > > console, right?  A normal user can't do that, or am I missing a code
> > > > path here?
> > > 
> > > I haven't been testing long, but yes, early indications show that root
> > > is required.
> > 
> > FWIW I concluded to the same yesterday, so I disputed the connected
> > CVE-2023-31082. Waiting for mitre's ack/nack.
> 
> Trying to dispute obviously-wrong CVEs is a never-ending task.
> 
> Personally, it's fun to see who keeps popping up to attempt to resolve
> this issue showing what companies have their security policies dictated
> by a random government authority that can be modified by anyone :)

It's a struggle! :)

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-04 13:14             ` Greg Kroah-Hartman
@ 2023-10-05  9:03               ` Lee Jones
  2023-10-05  9:18                 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 29+ messages in thread
From: Lee Jones @ 2023-10-05  9:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:

> On Wed, Oct 04, 2023 at 01:57:04PM +0100, Lee Jones wrote:
> > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > 
> > > On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote:
> > > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > > 
> > > > > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > > > > > Daniel, any thoughts?
> > > > > > 
> > > > > > Our application of this protocol is only with specific modems to enable
> > > > > > circuit switched operation (handling calls, selecting/querying networks,
> > > > > > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > > > > > The protocol was developed for such use cases.
> > > > > > 
> > > > > > Regarding the issue itself:
> > > > > > There was already an attempt to fix all this by switching from spinlocks to
> > > > > > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > > > > > as it did not handle the T1 timer leading into sleep during atomic within
> > > > > > gsm_dlci_t1() on every mutex lock there.
> > > > 
> > > > That's correct.  When I initially saw this report, my initial thought
> > > > was to replace the spinlocks with mutexts, but having read the previous
> > > > accepted attempt and it's subsequent reversion I started to think of
> > > > other ways to solve this issue.  This solution, unlike the last, does
> > > > not involve adding sleep inducing locks into atomic contexts, nor
> > > > should it negatively affect performance.
> > > > 
> > > > > > There was also a suggestion to fix this in do_con_write() as
> > > > > > tty_operations::write() appears to be documented as "not allowed to sleep".
> > > > > > The patch for this was rejected. It did not fix the issue within n_gsm.
> > > > > > 
> > > > > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> > > > > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> > > > > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> > > > > 
> > > > > Ok, I thought I remembered this, I'll just drop this patch from my
> > > > > review queue and wait for a better solution if it ever comes up as this
> > > > > isn't a real issue that people are seeing on actual systems, but just a
> > > > > syzbot report.
> > > > 
> > > > What does the "better solution" look like?
> > > 
> > > One that actually fixes the root problem here (i.e. does not break the
> > > recursion loop, or cause a performance decrease for normal users, or
> > > prevent this from being bound to the console).
> > 
> > Does this solution break the recursion loop or affect performance?
> 
> This solution broke the recursion by returning an error, right?

This is the part I was least sure about.

If this was considered valid and we were to go forward with a solution
like this, what would a quality improvement look like?  Should we have
stayed in this function and waited for the previous occupant to leave
before continuing through ->write()?

> The performance one was by using mutexes as in previous attempts.

Right.  This solution was designed to avoid that.

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05  9:03               ` Lee Jones
@ 2023-10-05  9:18                 ` Greg Kroah-Hartman
  2023-10-05 10:43                   ` Lee Jones
  0 siblings, 1 reply; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-05  9:18 UTC (permalink / raw)
  To: Lee Jones
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, Oct 05, 2023 at 10:03:11AM +0100, Lee Jones wrote:
> On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Wed, Oct 04, 2023 at 01:57:04PM +0100, Lee Jones wrote:
> > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > 
> > > > On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote:
> > > > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > > > 
> > > > > > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > > > > > > Daniel, any thoughts?
> > > > > > > 
> > > > > > > Our application of this protocol is only with specific modems to enable
> > > > > > > circuit switched operation (handling calls, selecting/querying networks,
> > > > > > > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > > > > > > The protocol was developed for such use cases.
> > > > > > > 
> > > > > > > Regarding the issue itself:
> > > > > > > There was already an attempt to fix all this by switching from spinlocks to
> > > > > > > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > > > > > > as it did not handle the T1 timer leading into sleep during atomic within
> > > > > > > gsm_dlci_t1() on every mutex lock there.
> > > > > 
> > > > > That's correct.  When I initially saw this report, my initial thought
> > > > > was to replace the spinlocks with mutexts, but having read the previous
> > > > > accepted attempt and it's subsequent reversion I started to think of
> > > > > other ways to solve this issue.  This solution, unlike the last, does
> > > > > not involve adding sleep inducing locks into atomic contexts, nor
> > > > > should it negatively affect performance.
> > > > > 
> > > > > > > There was also a suggestion to fix this in do_con_write() as
> > > > > > > tty_operations::write() appears to be documented as "not allowed to sleep".
> > > > > > > The patch for this was rejected. It did not fix the issue within n_gsm.
> > > > > > > 
> > > > > > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> > > > > > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> > > > > > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> > > > > > 
> > > > > > Ok, I thought I remembered this, I'll just drop this patch from my
> > > > > > review queue and wait for a better solution if it ever comes up as this
> > > > > > isn't a real issue that people are seeing on actual systems, but just a
> > > > > > syzbot report.
> > > > > 
> > > > > What does the "better solution" look like?
> > > > 
> > > > One that actually fixes the root problem here (i.e. does not break the
> > > > recursion loop, or cause a performance decrease for normal users, or
> > > > prevent this from being bound to the console).
> > > 
> > > Does this solution break the recursion loop or affect performance?
> > 
> > This solution broke the recursion by returning an error, right?
> 
> This is the part I was least sure about.
> 
> If this was considered valid and we were to go forward with a solution
> like this, what would a quality improvement look like?  Should we have
> stayed in this function and waited for the previous occupant to leave
> before continuing through ->write()?

This isn't valid, as it obviously never shows up in real use.

The real solution should be to prevent binding a console to this line
discipline as it can not handle the recursion that consoles require for
the write path.

Then, if consoles are really needed, the code can be fixed up to handle
such recursion.  That's not a trivial thing to do, as can be seen by the
crazy gyrations that the n_tty line discipline does in its write path...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05  9:18                 ` Greg Kroah-Hartman
@ 2023-10-05 10:43                   ` Lee Jones
  2023-10-05 11:33                     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 29+ messages in thread
From: Lee Jones @ 2023-10-05 10:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, 05 Oct 2023, Greg Kroah-Hartman wrote:

> On Thu, Oct 05, 2023 at 10:03:11AM +0100, Lee Jones wrote:
> > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > 
> > > On Wed, Oct 04, 2023 at 01:57:04PM +0100, Lee Jones wrote:
> > > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > > 
> > > > > On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote:
> > > > > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > > > > 
> > > > > > > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > > > > > > > Daniel, any thoughts?
> > > > > > > > 
> > > > > > > > Our application of this protocol is only with specific modems to enable
> > > > > > > > circuit switched operation (handling calls, selecting/querying networks,
> > > > > > > > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > > > > > > > The protocol was developed for such use cases.
> > > > > > > > 
> > > > > > > > Regarding the issue itself:
> > > > > > > > There was already an attempt to fix all this by switching from spinlocks to
> > > > > > > > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > > > > > > > as it did not handle the T1 timer leading into sleep during atomic within
> > > > > > > > gsm_dlci_t1() on every mutex lock there.
> > > > > > 
> > > > > > That's correct.  When I initially saw this report, my initial thought
> > > > > > was to replace the spinlocks with mutexts, but having read the previous
> > > > > > accepted attempt and it's subsequent reversion I started to think of
> > > > > > other ways to solve this issue.  This solution, unlike the last, does
> > > > > > not involve adding sleep inducing locks into atomic contexts, nor
> > > > > > should it negatively affect performance.
> > > > > > 
> > > > > > > > There was also a suggestion to fix this in do_con_write() as
> > > > > > > > tty_operations::write() appears to be documented as "not allowed to sleep".
> > > > > > > > The patch for this was rejected. It did not fix the issue within n_gsm.
> > > > > > > > 
> > > > > > > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> > > > > > > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> > > > > > > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> > > > > > > 
> > > > > > > Ok, I thought I remembered this, I'll just drop this patch from my
> > > > > > > review queue and wait for a better solution if it ever comes up as this
> > > > > > > isn't a real issue that people are seeing on actual systems, but just a
> > > > > > > syzbot report.
> > > > > > 
> > > > > > What does the "better solution" look like?
> > > > > 
> > > > > One that actually fixes the root problem here (i.e. does not break the
> > > > > recursion loop, or cause a performance decrease for normal users, or
> > > > > prevent this from being bound to the console).
> > > > 
> > > > Does this solution break the recursion loop or affect performance?
> > > 
> > > This solution broke the recursion by returning an error, right?
> > 
> > This is the part I was least sure about.
> > 
> > If this was considered valid and we were to go forward with a solution
> > like this, what would a quality improvement look like?  Should we have
> > stayed in this function and waited for the previous occupant to leave
> > before continuing through ->write()?
> 
> This isn't valid, as it obviously never shows up in real use.
> 
> The real solution should be to prevent binding a console to this line
> discipline as it can not handle the recursion that consoles require for
> the write path.

Would something like this tick that box?

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 1f3aba607cd51..5c1d2fcd5d9e2 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -3716,6 +3716,10 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
        if (!gsm)
                return -ENODEV;
 
+       /* The GSM line discipline does not support binding to console */
+       if (strncmp(tty->name, "tty", 3))
+               return -EINVAL;
+
        ret = -ENOBUFS;
        spin_lock_irqsave(&gsm->tx_lock, flags);
        space = tty_write_room(tty);

> Then, if consoles are really needed, the code can be fixed up to handle
> such recursion.  That's not a trivial thing to do, as can be seen by the
 > crazy gyrations that the n_tty line discipline does in its write path...

-- 
Lee Jones [李琼斯]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05 10:43                   ` Lee Jones
@ 2023-10-05 11:33                     ` Greg Kroah-Hartman
  2023-10-05 11:38                       ` Starke, Daniel
  0 siblings, 1 reply; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-05 11:33 UTC (permalink / raw)
  To: Lee Jones
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, Oct 05, 2023 at 11:43:11AM +0100, Lee Jones wrote:
> On Thu, 05 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Thu, Oct 05, 2023 at 10:03:11AM +0100, Lee Jones wrote:
> > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > 
> > > > On Wed, Oct 04, 2023 at 01:57:04PM +0100, Lee Jones wrote:
> > > > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > > > 
> > > > > > On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote:
> > > > > > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote:
> > > > > > > 
> > > > > > > > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote:
> > > > > > > > > > Daniel, any thoughts?
> > > > > > > > > 
> > > > > > > > > Our application of this protocol is only with specific modems to enable
> > > > > > > > > circuit switched operation (handling calls, selecting/querying networks,
> > > > > > > > > etc.) while doing packet switched communication (i.e. IP traffic over PPP).
> > > > > > > > > The protocol was developed for such use cases.
> > > > > > > > > 
> > > > > > > > > Regarding the issue itself:
> > > > > > > > > There was already an attempt to fix all this by switching from spinlocks to
> > > > > > > > > mutexes resulting in ~20% performance loss. However, the patch was reverted
> > > > > > > > > as it did not handle the T1 timer leading into sleep during atomic within
> > > > > > > > > gsm_dlci_t1() on every mutex lock there.
> > > > > > > 
> > > > > > > That's correct.  When I initially saw this report, my initial thought
> > > > > > > was to replace the spinlocks with mutexts, but having read the previous
> > > > > > > accepted attempt and it's subsequent reversion I started to think of
> > > > > > > other ways to solve this issue.  This solution, unlike the last, does
> > > > > > > not involve adding sleep inducing locks into atomic contexts, nor
> > > > > > > should it negatively affect performance.
> > > > > > > 
> > > > > > > > > There was also a suggestion to fix this in do_con_write() as
> > > > > > > > > tty_operations::write() appears to be documented as "not allowed to sleep".
> > > > > > > > > The patch for this was rejected. It did not fix the issue within n_gsm.
> > > > > > > > > 
> > > > > > > > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@ispras.ru/
> > > > > > > > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@huawei.com/
> > > > > > > > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@kernel.org/
> > > > > > > > 
> > > > > > > > Ok, I thought I remembered this, I'll just drop this patch from my
> > > > > > > > review queue and wait for a better solution if it ever comes up as this
> > > > > > > > isn't a real issue that people are seeing on actual systems, but just a
> > > > > > > > syzbot report.
> > > > > > > 
> > > > > > > What does the "better solution" look like?
> > > > > > 
> > > > > > One that actually fixes the root problem here (i.e. does not break the
> > > > > > recursion loop, or cause a performance decrease for normal users, or
> > > > > > prevent this from being bound to the console).
> > > > > 
> > > > > Does this solution break the recursion loop or affect performance?
> > > > 
> > > > This solution broke the recursion by returning an error, right?
> > > 
> > > This is the part I was least sure about.
> > > 
> > > If this was considered valid and we were to go forward with a solution
> > > like this, what would a quality improvement look like?  Should we have
> > > stayed in this function and waited for the previous occupant to leave
> > > before continuing through ->write()?
> > 
> > This isn't valid, as it obviously never shows up in real use.
> > 
> > The real solution should be to prevent binding a console to this line
> > discipline as it can not handle the recursion that consoles require for
> > the write path.
> 
> Would something like this tick that box?
> 
> diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
> index 1f3aba607cd51..5c1d2fcd5d9e2 100644
> --- a/drivers/tty/n_gsm.c
> +++ b/drivers/tty/n_gsm.c
> @@ -3716,6 +3716,10 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
>         if (!gsm)
>                 return -ENODEV;
>  
> +       /* The GSM line discipline does not support binding to console */
> +       if (strncmp(tty->name, "tty", 3))
> +               return -EINVAL;

No, that's not going to work, some consoles do not start with "tty" :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05 11:33                     ` Greg Kroah-Hartman
@ 2023-10-05 11:38                       ` Starke, Daniel
  2023-10-05 11:46                         ` Lee Jones
  0 siblings, 1 reply; 29+ messages in thread
From: Starke, Daniel @ 2023-10-05 11:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Lee Jones
  Cc: linux-kernel, Fedor Pchelkin, Jiri Slaby, linux-serial,
	syzbot+5f47a8cea6a12b77a876

> > Would something like this tick that box?
> > 
> > diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 
> > 1f3aba607cd51..5c1d2fcd5d9e2 100644
> > --- a/drivers/tty/n_gsm.c
> > +++ b/drivers/tty/n_gsm.c
> > @@ -3716,6 +3716,10 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
> >         if (!gsm)
> >                 return -ENODEV;
> >  
> > +       /* The GSM line discipline does not support binding to console */
> > +       if (strncmp(tty->name, "tty", 3))
> > +               return -EINVAL;
> 
> No, that's not going to work, some consoles do not start with "tty" :(

Also, I would recommend exiting as early as possible. E.g. in gsmld_open().
And please retain support for real serial devices (e.g. ttyS, ttyUSB,
ttyACM, ...).

Best regards,
Daniel Starke

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05 11:38                       ` Starke, Daniel
@ 2023-10-05 11:46                         ` Lee Jones
  2023-10-05 11:55                           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 29+ messages in thread
From: Lee Jones @ 2023-10-05 11:46 UTC (permalink / raw)
  To: Starke, Daniel
  Cc: Greg Kroah-Hartman, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, 05 Oct 2023, Starke, Daniel wrote:

> > > Would something like this tick that box?
> > > 
> > > diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 
> > > 1f3aba607cd51..5c1d2fcd5d9e2 100644
> > > --- a/drivers/tty/n_gsm.c
> > > +++ b/drivers/tty/n_gsm.c
> > > @@ -3716,6 +3716,10 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
> > >         if (!gsm)
> > >                 return -ENODEV;
> > >  
> > > +       /* The GSM line discipline does not support binding to console */
> > > +       if (strncmp(tty->name, "tty", 3))
> > > +               return -EINVAL;
> > 
> > No, that's not going to work, some consoles do not start with "tty" :(

Ah, you mean there are others that we need to consider?

I was just covering off con_write() from drivers/tty/vt/vt.c.

Does anyone have a counter proposal?

> Also, I would recommend exiting as early as possible. E.g. in gsmld_open().

Good suggestion.

> And please retain support for real serial devices (e.g. ttyS, ttyUSB,
> ttyACM, ...).

Okay, so it's "tty{0-9}*$" that should be excluded?

Plus others that Greg alluded to?

Is there a definitive accept / reject list?

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05 11:46                         ` Lee Jones
@ 2023-10-05 11:55                           ` Greg Kroah-Hartman
  2023-10-05 12:17                             ` Lee Jones
  0 siblings, 1 reply; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-05 11:55 UTC (permalink / raw)
  To: Lee Jones
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, Oct 05, 2023 at 12:46:32PM +0100, Lee Jones wrote:
> On Thu, 05 Oct 2023, Starke, Daniel wrote:
> 
> > > > Would something like this tick that box?
> > > > 
> > > > diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 
> > > > 1f3aba607cd51..5c1d2fcd5d9e2 100644
> > > > --- a/drivers/tty/n_gsm.c
> > > > +++ b/drivers/tty/n_gsm.c
> > > > @@ -3716,6 +3716,10 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
> > > >         if (!gsm)
> > > >                 return -ENODEV;
> > > >  
> > > > +       /* The GSM line discipline does not support binding to console */
> > > > +       if (strncmp(tty->name, "tty", 3))
> > > > +               return -EINVAL;
> > > 
> > > No, that's not going to work, some consoles do not start with "tty" :(
> 
> Ah, you mean there are others that we need to consider?
> 
> I was just covering off con_write() from drivers/tty/vt/vt.c.
> 
> Does anyone have a counter proposal?

consoles are dynamically assigned, the device knows if it is a console
or not, so there is a way to determine this at runtime.  It's not a
device naming thing at all.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05 11:55                           ` Greg Kroah-Hartman
@ 2023-10-05 12:17                             ` Lee Jones
  2023-10-05 12:37                               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 29+ messages in thread
From: Lee Jones @ 2023-10-05 12:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, 05 Oct 2023, Greg Kroah-Hartman wrote:

> On Thu, Oct 05, 2023 at 12:46:32PM +0100, Lee Jones wrote:
> > On Thu, 05 Oct 2023, Starke, Daniel wrote:
> > 
> > > > > Would something like this tick that box?
> > > > > 
> > > > > diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 
> > > > > 1f3aba607cd51..5c1d2fcd5d9e2 100644
> > > > > --- a/drivers/tty/n_gsm.c
> > > > > +++ b/drivers/tty/n_gsm.c
> > > > > @@ -3716,6 +3716,10 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
> > > > >         if (!gsm)
> > > > >                 return -ENODEV;
> > > > >  
> > > > > +       /* The GSM line discipline does not support binding to console */
> > > > > +       if (strncmp(tty->name, "tty", 3))
> > > > > +               return -EINVAL;
> > > > 
> > > > No, that's not going to work, some consoles do not start with "tty" :(
> > 
> > Ah, you mean there are others that we need to consider?
> > 
> > I was just covering off con_write() from drivers/tty/vt/vt.c.
> > 
> > Does anyone have a counter proposal?
> 
> consoles are dynamically assigned, the device knows if it is a console
> or not, so there is a way to determine this at runtime.  It's not a
> device naming thing at all.

It's not a device naming thing, but it is a ... :)

Okay, here's another uninformed stab in the dark:

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index 1f3aba607cd51..ddf00f6a4141d 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -3629,6 +3629,10 @@ static int gsmld_open(struct tty_struct *tty)
        if (tty->ops->write == NULL)
                return -EINVAL;

+       /* The GSM line discipline does not support binding to console */
+       if (tty->driver->type == TTY_DRIVER_TYPE_CONSOLE)
+               return -EINVAL;
+
        /* Attach our ldisc data */
        gsm = gsm_alloc_mux();
        if (gsm == NULL)

This seems to allow for the real serial devices (TTY_DRIVER_TYPE_SERIAL)
suggested by Daniel.

-- 
Lee Jones [李琼斯]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic
  2023-10-05 12:17                             ` Lee Jones
@ 2023-10-05 12:37                               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 29+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-05 12:37 UTC (permalink / raw)
  To: Lee Jones
  Cc: Starke, Daniel, linux-kernel, Fedor Pchelkin, Jiri Slaby,
	linux-serial, syzbot+5f47a8cea6a12b77a876

On Thu, Oct 05, 2023 at 01:17:56PM +0100, Lee Jones wrote:
> On Thu, 05 Oct 2023, Greg Kroah-Hartman wrote:
> 
> > On Thu, Oct 05, 2023 at 12:46:32PM +0100, Lee Jones wrote:
> > > On Thu, 05 Oct 2023, Starke, Daniel wrote:
> > > 
> > > > > > Would something like this tick that box?
> > > > > > 
> > > > > > diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index 
> > > > > > 1f3aba607cd51..5c1d2fcd5d9e2 100644
> > > > > > --- a/drivers/tty/n_gsm.c
> > > > > > +++ b/drivers/tty/n_gsm.c
> > > > > > @@ -3716,6 +3716,10 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file,
> > > > > >         if (!gsm)
> > > > > >                 return -ENODEV;
> > > > > >  
> > > > > > +       /* The GSM line discipline does not support binding to console */
> > > > > > +       if (strncmp(tty->name, "tty", 3))
> > > > > > +               return -EINVAL;
> > > > > 
> > > > > No, that's not going to work, some consoles do not start with "tty" :(
> > > 
> > > Ah, you mean there are others that we need to consider?
> > > 
> > > I was just covering off con_write() from drivers/tty/vt/vt.c.
> > > 
> > > Does anyone have a counter proposal?
> > 
> > consoles are dynamically assigned, the device knows if it is a console
> > or not, so there is a way to determine this at runtime.  It's not a
> > device naming thing at all.
> 
> It's not a device naming thing, but it is a ... :)
> 
> Okay, here's another uninformed stab in the dark:
> 
> diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
> index 1f3aba607cd51..ddf00f6a4141d 100644
> --- a/drivers/tty/n_gsm.c
> +++ b/drivers/tty/n_gsm.c
> @@ -3629,6 +3629,10 @@ static int gsmld_open(struct tty_struct *tty)
>         if (tty->ops->write == NULL)
>                 return -EINVAL;
> 
> +       /* The GSM line discipline does not support binding to console */
> +       if (tty->driver->type == TTY_DRIVER_TYPE_CONSOLE)
> +               return -EINVAL;
> +
>         /* Attach our ldisc data */
>         gsm = gsm_alloc_mux();
>         if (gsm == NULL)
> 
> This seems to allow for the real serial devices (TTY_DRIVER_TYPE_SERIAL)
> suggested by Daniel.

Close, but not quite.

Driver "types" can say if they are a console or not, but that doesn't
mean you can't run the console over a serial port as well, right?

Sorry, I don't have a real solution right now, and wouldn't wish the
phrase "just dig through the tty console code!" on anyone, but that's
what it is going to take to figure it out somehow, I can't remember the
exact way consoles are hooked up at the moment (I conviently skipped
that portion of the tty layer in my Embedded Recipies talk last week,
saying "it's magic...")

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-10-05 16:20 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-03 17:00 [PATCH 1/1] tty: n_gsm: Avoid sleeping during .write() whilst atomic Lee Jones
2023-10-03 18:14 ` Greg Kroah-Hartman
2023-10-03 18:55   ` Lee Jones
2023-10-04  6:04     ` Greg Kroah-Hartman
2023-10-04  9:09       ` Lee Jones
2023-10-04  9:55         ` Greg Kroah-Hartman
2023-10-04 12:19         ` Greg Kroah-Hartman
2023-10-04 12:57           ` Lee Jones
2023-10-04 13:13             ` Greg Kroah-Hartman
2023-10-05  4:59             ` Jiri Slaby
2023-10-05  6:05               ` Greg Kroah-Hartman
2023-10-05  7:26                 ` Lee Jones
2023-10-04  5:59   ` Starke, Daniel
2023-10-04  6:05     ` Greg Kroah-Hartman
2023-10-04  8:40       ` Jiri Slaby
2023-10-04  8:58         ` Starke, Daniel
2023-10-04  8:57       ` Lee Jones
2023-10-04 12:21         ` Greg Kroah-Hartman
2023-10-04 12:57           ` Lee Jones
2023-10-04 13:14             ` Greg Kroah-Hartman
2023-10-05  9:03               ` Lee Jones
2023-10-05  9:18                 ` Greg Kroah-Hartman
2023-10-05 10:43                   ` Lee Jones
2023-10-05 11:33                     ` Greg Kroah-Hartman
2023-10-05 11:38                       ` Starke, Daniel
2023-10-05 11:46                         ` Lee Jones
2023-10-05 11:55                           ` Greg Kroah-Hartman
2023-10-05 12:17                             ` Lee Jones
2023-10-05 12:37                               ` Greg Kroah-Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.