All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>,
	Max Filippov <jcmvbkbc@gmail.com>,
	David Sterba <dsterba@suse.com>,
	Bhaskar Chowdhury <unixbhaskar@gmail.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Igor Matheus Andrade Torrente <igormtorrente@gmail.com>,
	nick black <dankamongmen@gmail.com>,
	linux-kernel@vger.kernel.org,
	syzbot+5f47a8cea6a12b77a876@syzkaller.appspotmail.com,
	Marco Elver <elver@google.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Subject: Re: [PATCH] vt: Fix sleeping functions called from atomic context
Date: Wed, 17 Nov 2021 13:38:33 +0100	[thread overview]
Message-ID: <116969909.G7WiQ7h0bV@localhost.localdomain> (raw)
In-Reply-To: <YZTDY/h8HcEkq7mO@kroah.com>

On Wednesday, November 17, 2021 9:54:59 AM CET Greg Kroah-Hartman wrote:
> On Wed, Nov 17, 2021 at 09:23:45AM +0100, Fabio M. De Francesco wrote:
> > On Tuesday, November 16, 2021 5:59:24 PM CET Greg Kroah-Hartman wrote:
> > > On Tue, Nov 16, 2021 at 04:35:07PM +0100, Fabio M. De Francesco wrote:
> > > > On Tuesday, November 16, 2021 3:58:44 PM CET Greg Kroah-Hartman 
wrote:
> > > > > On Tue, Nov 16, 2021 at 03:49:37PM +0100, Fabio M. De Francesco 
wrote:
> > > > > > Fix two sleeping functions called from atomic context by doing 
> > immediate
> > > > > > return to the caller if !preemptible() evaluates 'true'. Remove 
two
> > > > > > in_interrupt() tests because they are not suited for being used 
here.
> > > > > > 
> > > > > > Since functions do_con_write() and con_flush_chars() might sleep 
in
> > > > > > console_lock(), it must be assured that they are never executed 
in
> > > > > > atomic contexts.
> > > > > > 
> > > > > > This issue is reported by Syzbot which notices that they are 
executed
> > > > > > while holding spinlocks and with interrupts disabled. Actually 
Syzbot
> > > > > > emits a first report and then, after fixing do_con_write(), a 
second
> > > > > > report for the same problem in con_flush_chars() because these 
> > functions
> > > > > > are called one after the other by con_write().
> > > > > > 
> > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > > > > Reported-by: 
syzbot+5f47a8cea6a12b77a876@syzkaller.appspotmail.com
> > > > > > Suggested-by: Marco Elver <elver@google.com>
> > > > > > Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
> > > > > > ---
> > > > > >  drivers/tty/vt/vt.c | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
> > > > > > index 7359c3e80d63..508f8a56d361 100644
> > > > > > --- a/drivers/tty/vt/vt.c
> > > > > > +++ b/drivers/tty/vt/vt.c
> > > > > > @@ -2902,7 +2902,7 @@ static int do_con_write(struct tty_struct 
*tty, 
> > > > const unsigned char *buf, int co
> > > > > >  	struct vt_notifier_param param;
> > > > > >  	bool rescan;
> > > > > >  
> > > > > > -	if (in_interrupt())
> > > > > > +	if (!preemptible())
> > > > > >  		return count;
> > > > > 
> > > > > Very odd, what code is calling these functions to trigger this 
check?
> > > > 
> > > > This is the call trace reported by Syzbot (https://
syzkaller.appspot.com/
> > bug?
> > > > id=fe5a4d5a2482bd73064db5de5d28e024f1e2a387):
> > > > 
> > > > Call Trace:
> > > >  <TASK>
> > > >  __dump_stack lib/dump_stack.c:88 [inline]
> > > >  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > > >  __might_resched.cold+0x222/0x26b kernel/sched/core.c:9539
> > > >  console_lock+0x17/0x80 kernel/printk/printk.c:2522
> > > >  do_con_write+0x10f/0x1e40 drivers/tty/vt/vt.c:2908
> > > >  con_write+0x21/0x40 drivers/tty/vt/vt.c:3295
> > > >  n_hdlc_send_frames+0x24b/0x490 drivers/tty/n_hdlc.c:290
> > > >  tty_wakeup+0xe1/0x120 drivers/tty/tty_io.c:534
> > > >  __start_tty drivers/tty/tty_io.c:806 [inline]
> > > >  __start_tty+0xfb/0x130 drivers/tty/tty_io.c:799
> > > >  n_tty_ioctl_helper+0x299/0x2d0 drivers/tty/tty_ioctl.c:880
> > > > 
> > > > 	^^^^^^^^^^
> > > > n_tty_ioctl_helper() disabled interrupts via spin_lock_irq(&tty-
> > >flow.lock).
> > > > 
> > > >  n_hdlc_tty_ioctl+0xd2/0x340 drivers/tty/n_hdlc.c:633
> > > >  tty_ioctl+0xc69/0x1670 drivers/tty/tty_io.c:2814
> > > >  vfs_ioctl fs/ioctl.c:51 [inline]
> > > >  __do_sys_ioctl fs/ioctl.c:874 [inline]
> > > >  __se_sys_ioctl fs/ioctl.c:860 [inline]
> > > >  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
> > > >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > >  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > 
> > > > > Shouldn't the caller be fixed instead?
> > > > 
> > > > Maybe that the caller has no need to disable IRQs, but I cannot yet 
> > answer to 
> > > > this particular question.
> > > > 
> > > > > What changed to suddenly cause this to show up?
> > > > 
> > > > Commit c545b66c6922 ("tty: Serialize tcflow() with other tty flow 
control 
> > > > changes") introduced a call to spin_lock_irq() for command "TCOON", 
just 
> > > > before calling __start_tty().
> > > 
> > > That commit happened in 2014.  Why is this suddenly an issue now that 
no
> > > one ever saw before?
> > 
> > Hi Greg,
> > 
> > I have just been informed by Pavel Skripkin (who at this moment cannot 
> > directly write to LKML) that this bug has been reported at least other 
two 
> > times. 
> > 
> > We cannot say why nobody has ever addressed this issue before.
> > 
> > Please read the following reports.
> > 
> > https://syzkaller.appspot.com/bug?
id=9ed44c22919e3cd2c71907b963fb048ac1115073
> > https://syzkaller.appspot.com/bug?
id=e806305a3a65a0bb2f1be7e672819a079d558016
> 
> Great, you have a reproducer, so you should be able to duplicate this
> locally to figure out what is really happening here.
> 
> Good luck!
> 
> greg k-h
> 
Hi Greg,

As you asked for, I've run the reproducer (as root) while "trace-cmd record -
g n_tty_ioctl_helper -p function_graph sleep 15" was in execution.

The call trace from n_tty_ioctl_helper() to console_lock() has been recorded 
and for sure console_lock() is called while holding a spinlock and with IRQs 
disabled via spin_lock_irq().

The following is the relevant part of "trace-cmd report"...

repro-27397 [004] 18814.535029: funcgraph_entry:                   |  
n_tty_ioctl_helper() {
           repro-27397 [004] 18814.535030: funcgraph_entry:        0.676 us   
|    tty_check_change();
           repro-27397 [004] 18814.535032: funcgraph_entry:        0.761 us   
|    _raw_spin_lock_irq();
           repro-27397 [004] 18814.535033: funcgraph_entry:                   
|    __start_tty() {
           repro-27397 [004] 18814.535034: funcgraph_entry:                   
|      con_start() {

---- cut not relevant code ----

repro-27397 [004] 18814.535088: funcgraph_entry:                   |            
con_write() {
           repro-27397 [004] 18814.535088: funcgraph_entry:                   
|              do_con_write() {
           repro-27397 [004] 18814.535088: funcgraph_entry:                   
|                console_lock() {
           repro-27397 [004] 18814.535089: funcgraph_entry:                   
|                  __cond_resched() {

---- cut not relevant code ----

repro-27397 [004] 18814.571839: funcgraph_entry:        0.641 us   |                  
__printk_safe_exit();
           repro-27397 [004] 18814.571841: funcgraph_exit:       + 18.902 us  
|                }
           repro-27397 [004] 18814.571842: funcgraph_exit:       + 31.651 us  
|              }
           repro-27397 [004] 18814.571843: funcgraph_exit:       # 36754.817 
us |            }

Please notice that "#36754.817 us" when finally console_lock() returned.

May very well be that I've overlooked something (as you know I'm a newbie and 
all this is quite new for me :)

As I wrote (please read it again above) for now I confirm that commit 
c545b66c6922 ("tty: Serialize tcflow() with other tty flow control changes") 
introduced a call to spin_lock_irq() for command "TCOON". 

However I have to double check it. 

If it is confirmed, I will reply soon to Tetsuo's message because he says 
that the relevant commit should be a different one. 

However, what is relevant here is that for sure console_lock() is called in 
atomic context and that the Syzbot report is not a false positive. Am I 
missing something?

About the possible fixes I only see two potential solutions:

1) If we need for real spin_lock_irq() in n_tty_ioctl_helper() we should test 
the context before calling console_lock() (it is annotated with 
might_sleep()).

2) If we don't need to disable IRQs we should fix n_tty_ioctl_helper().

I'm not able to go deeper and understand whether or not that spin_lock_irq() 
is an overkill and if it can be converted to a different kind of locking API.

I'd appreciate feedback from you and other experienced developers.

Thank you very much,

Fabio M. De  Francesco






  parent reply	other threads:[~2021-11-17 12:38 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16 14:49 [PATCH] vt: Fix sleeping functions called from atomic context Fabio M. De Francesco
2021-11-16 14:58 ` Greg Kroah-Hartman
2021-11-16 15:35   ` Fabio M. De Francesco
2021-11-16 16:59     ` Greg Kroah-Hartman
2021-11-16 17:28       ` Fabio M. De Francesco
2021-11-17  8:23       ` Fabio M. De Francesco
2021-11-17  8:54         ` Greg Kroah-Hartman
2021-11-17 10:51           ` Tetsuo Handa
2021-11-18  8:31             ` Fabio M. De Francesco
2021-11-18  9:38               ` Fabio M. De Francesco
2021-11-18 12:14                 ` Tetsuo Handa
2021-11-18 17:01                   ` Fabio M. De Francesco
2021-11-19 14:55                     ` [PATCH] tty: vt: make do_con_write() no-op if IRQ is disabled Tetsuo Handa
2021-12-01 13:40                       ` Tetsuo Handa
2021-12-01 14:20                         ` Greg Kroah-Hartman
2021-12-01 19:05                         ` Linus Torvalds
2021-12-02 15:40                           ` Tetsuo Handa
2021-12-02 18:35                             ` Linus Torvalds
2021-12-03  5:03                               ` Jiri Slaby
2021-12-03 11:00                               ` Fabio M. De Francesco
2021-12-03 12:32                                 ` Tetsuo Handa
2021-12-03 14:51                                   ` Fabio M. De Francesco
2021-11-17 12:38           ` Fabio M. De Francesco [this message]
2021-11-17  1:55 ` [PATCH] vt: Fix sleeping functions called from atomic context Tetsuo Handa
2021-11-17  7:02   ` Fabio M. De Francesco
2021-12-06 11:44 ` [PATCH] tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous Tetsuo Handa
2021-12-06 18:07   ` Linus Torvalds
2021-12-09 13:18     ` Tetsuo Handa
2021-12-15 11:52       ` [PATCH (resend)] " Tetsuo Handa
2021-12-06 19:06   ` [PATCH] " Fabio M. De Francesco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=116969909.G7WiQ7h0bV@localhost.localdomain \
    --to=fmdefrancesco@gmail.com \
    --cc=dankamongmen@gmail.com \
    --cc=dsterba@suse.com \
    --cc=elver@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=igormtorrente@gmail.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=jirislaby@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=syzbot+5f47a8cea6a12b77a876@syzkaller.appspotmail.com \
    --cc=unixbhaskar@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.