From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> To: sergey.senozhatsky@gmail.com Cc: pmladek@suse.com, jack@suse.com, tj@kernel.org, kyle@kernel.org, davej@codemonkey.org.uk, calvinowens@fb.com, gregkh@linuxfoundation.org, jslaby@suse.cz, linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org, sergey.senozhatsky.work@gmail.com Subject: printk: reset may_schedule if console_trylock() from console_unlock() Date: Thu, 5 Jan 2017 19:27:23 +0900 [thread overview] Message-ID: <201701051927.IJF65159.OFFtSLQOOFHJMV@I-love.SAKURA.ne.jp> (raw) (Excluding linux-mm and restarting as a new thread.) Sergey Senozhatsky wrote at http://lkml.kernel.org/r/20161226113407.GA515@tigerII.localdomain : > Cc Greg, Jiri, > > On (12/26/16 19:54), Tetsuo Handa wrote: > [..] > > > > (3) I got below warning. (Though not reproducible.) > > If fb_flashcursor() called console_trylock(), console_may_schedule is set to 1? > > hmmm... it takes an atomic/spin `printing_lock' lock in vt_console_print(), > then call console_conditional_schedule() from lf(), being under spin_lock. > `console_may_schedule' in console_conditional_schedule() still keeps the > value from console_trylock(), which was ok (console_may_schedule permits > rescheduling). but preemption got changed under console_trylock(), by > that spin_lock. > > console_trylock() used to always forbid rescheduling; but it got changed > like a year ago. Indeed, it seems to be an oversight in commit 6b97a20d3a7909da ("printk: set may_schedule for some of console_trylock() callers"). I understood what has happened. fb_flushcursor() was not the culprit. > > the other thing is... do we really need to console_conditional_schedule() > from fbcon_*()? console_unlock() does cond_resched() after every line it > prints. wouldn't that be enough? Every record, isn't it? How many bytes can a record write to consoles? CONSOLE_EXT_LOG_MAX bytes multiplied by MAX_CMDLINECONSOLES ? Might take longer than a second for each record? > > so may be we can drop some of console_conditional_schedule() > call sites in fbcon. or update console_conditional_schedule() > function to always return the current preemption value, not the > one we saw in console_trylock(). Replacing console_may_schedule with get_console_may_schedule() will avoid this bug. I noticed that we forgot to reset console_may_schedule to 0 because the "again:" label is located after the assignment line. > > ---------------------------------------- > > [ OK [ 255.862188] audit: type=1131 audit(1482733112.662:148): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > > ] Stopped Create Static Device Nodes in /dev. > > > > [ 255.871468] BUG: sleeping function called from invalid context at kernel/printk/printk.c:2325 > > [ 255.871469] in_atomic(): 1, irqs_disabled(): 1, pid: 10079, name: plymouthd > > [ 255.871469] 6 locks held by plymouthd/10079: > > [ 255.871470] #0: (&tty->ldisc_sem){++++.+}, at: [<ffffffff817413e2>] ldsem_down_read+0x32/0x40 > > [ 255.871472] #1: (&tty->atomic_write_lock){+.+.+.}, at: [<ffffffff81424309>] tty_write_lock+0x19/0x50 > > [ 255.871474] #2: (&tty->termios_rwsem){++++..}, at: [<ffffffff81429d59>] n_tty_write+0x99/0x470 > > [ 255.871475] #3: (&ldata->output_lock){+.+...}, at: [<ffffffff81429df0>] n_tty_write+0x130/0x470 > > [ 255.871477] #4: (console_lock){+.+.+.}, at: [<ffffffff8110616e>] console_unlock+0x33e/0x6b0 > > [ 255.871479] #5: (printing_lock){......}, at: [<ffffffff8143baf5>] vt_console_print+0x75/0x3d0 > > [ 255.871481] irq event stamp: 15244 > > [ 255.871481] hardirqs last enabled at (15243): [<ffffffff81105011>] __down_trylock_console_sem+0x91/0xa0 > > [ 255.871482] hardirqs last disabled at (15244): [<ffffffff81105ea4>] console_unlock+0x74/0x6b0 > > [ 255.871482] softirqs last enabled at (14968): [<ffffffff81096394>] __do_softirq+0x344/0x580 > > [ 255.871482] softirqs last disabled at (14963): [<ffffffff810968d3>] irq_exit+0xe3/0x120 > > [ 255.871483] CPU: 0 PID: 10079 Comm: plymouthd Not tainted 4.9.0-next-20161224+ #12 > > [ 255.871483] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 > > [ 255.871484] Call Trace: > > [ 255.871484] dump_stack+0x85/0xc9 > > [ 255.871485] ___might_sleep+0x14a/0x250 > > [ 255.871485] console_conditional_schedule+0x22/0x30 > > [ 255.871485] fbcon_redraw.isra.24+0xa3/0x1d0 > > [ 255.871486] ? fbcon_cursor+0x151/0x1c0 > > [ 255.871486] fbcon_scroll+0x11d/0xcb0 > > [ 255.871487] con_scroll+0x160/0x170 > > [ 255.871487] lf+0x9c/0xb0 > > [ 255.871487] vt_console_print+0x2b7/0x3d0 > > [ 255.871488] console_unlock+0x457/0x6b0 > > [ 255.871488] do_con_write.part.19+0x737/0x9e0 > > [ 255.871489] ? mark_held_locks+0x71/0x90 > > [ 255.871489] con_write+0x57/0x60 > > [ 255.871489] n_tty_write+0x1bf/0x470 > > [ 255.871490] ? prepare_to_wait_event+0x110/0x110 > > [ 255.871490] tty_write+0x157/0x2d0 > > [ 255.871491] ? n_tty_open+0xd0/0xd0 > > [ 255.871491] __vfs_write+0x32/0x140 > > [ 255.871491] ? trace_hardirqs_on+0xd/0x10 > > [ 255.871492] ? __audit_syscall_entry+0xaa/0xf0 > > [ 255.871492] vfs_write+0xc2/0x1f0 > > [ 255.871493] ? syscall_trace_enter+0x1cb/0x3e0 > > [ 255.871493] SyS_write+0x53/0xc0 > > [ 255.871493] do_syscall_64+0x67/0x1f0 > > [ 255.871494] entry_SYSCALL64_slow_path+0x25/0x25 > > [ 255.871494] RIP: 0033:0x7fb74cf8fc60 > > [ 255.871495] RSP: 002b:00007ffcaab3fe88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > [ 255.871495] RAX: ffffffffffffffda RBX: 000055d3acaf7160 RCX: 00007fb74cf8fc60 > > [ 255.871496] RDX: 000000000000003f RSI: 000055d3acafd090 RDI: 0000000000000009 > > [ 255.871496] RBP: 000055d3acafc440 R08: 0000000000000070 R09: 0000000000000000 > > [ 255.871497] R10: 000000000000003f R11: 0000000000000246 R12: 000055d3acafc330 > > [ 255.871497] R13: 000000000000003f R14: 00007ffcaab3ffb0 R15: 0000000000000000 > > Stopping Create Static Device Nodes in /dev... What happened here was: (1) /usr/sbin/plymouthd requested write() syscall against tty. (2) write() syscall invoked console_unlock() via tty_write(). (3) Since write() syscall is sleepable context, console_may_schedule was 1 as of entering into console_unlock(). (4) console_unlock() set do_cond_resched to 1 and set console_may_schedule to 0 so that console drivers won't call cond_resched() from console_conditional_schedule() because printk_safe_enter(flags) in console_unlock() disables hardirqs. (5) /usr/sbin/plymouthd left the "for (;;)" loop in console_unlock(). (6) Since an audit log from /usr/lib/systemd/systemd was added to printk buffer via kauditd_printk_skb() because /usr/sbin/auditd terminated due to shutdown sequence, /usr/sbin/plymouthd found that console_seq != log_next_seq is true. (7) /usr/sbin/plymouthd calls console_trylock(), and console_trylock() returned 1 because nobody was holding console_sem. console_may_schedule was 1 as of returning from console_trylock() because write() syscall is sleepable context. (8) However, "goto again;" jumped to a line which is after setting console_may_schedule to 0. This is the cause of this problem. As a result, console_may_schedule remained 1. (9) console_conditional_schedule() was called from vt_console_print() via call_console_drivers(). And cond_resched() was by error called due to (8). The correct behavior I guess is ---------- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 8b26964..ce53488 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2186,6 +2186,7 @@ void console_unlock(void) bool wake_klogd = false; bool do_cond_resched, retry; +again: if (console_suspended) { up_console_sem(); return; @@ -2204,7 +2205,6 @@ void console_unlock(void) do_cond_resched = console_may_schedule; console_may_schedule = 0; -again: /* * We released the console_sem lock, so we need to recheck if * cpu is online and (if not) is there at least one CON_ANYTIME ---------- optionally with ---------- @@ -2297,11 +2297,13 @@ void console_unlock(void) retry = console_seq != log_next_seq; raw_spin_unlock_irqrestore(&logbuf_lock, flags); - if (retry && console_trylock()) - goto again; - if (wake_klogd) wake_up_klogd(); + + if (retry && console_trylock()) { + wake_klogd = false; + goto again; + } } EXPORT_SYMBOL(console_unlock); ---------- so that current thread will act as if current thread again called console_unlock() immediately after returning from console_trylock().
WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> To: sergey.senozhatsky@gmail.com Cc: pmladek@suse.com, jack@suse.com, tj@kernel.org, kyle@kernel.org, davej@codemonkey.org.uk, calvinowens@fb.com, gregkh@linuxfoundation.org, jslaby@suse.cz, linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org, sergey.senozhatsky.work@gmail.com Subject: printk: reset may_schedule if console_trylock() from console_unlock() Date: Thu, 05 Jan 2017 10:27:23 +0000 [thread overview] Message-ID: <201701051927.IJF65159.OFFtSLQOOFHJMV@I-love.SAKURA.ne.jp> (raw) (Excluding linux-mm and restarting as a new thread.) Sergey Senozhatsky wrote at http://lkml.kernel.org/r/20161226113407.GA515@tigerII.localdomain : > Cc Greg, Jiri, > > On (12/26/16 19:54), Tetsuo Handa wrote: > [..] > > > > (3) I got below warning. (Though not reproducible.) > > If fb_flashcursor() called console_trylock(), console_may_schedule is set to 1? > > hmmm... it takes an atomic/spin `printing_lock' lock in vt_console_print(), > then call console_conditional_schedule() from lf(), being under spin_lock. > `console_may_schedule' in console_conditional_schedule() still keeps the > value from console_trylock(), which was ok (console_may_schedule permits > rescheduling). but preemption got changed under console_trylock(), by > that spin_lock. > > console_trylock() used to always forbid rescheduling; but it got changed > like a year ago. Indeed, it seems to be an oversight in commit 6b97a20d3a7909da ("printk: set may_schedule for some of console_trylock() callers"). I understood what has happened. fb_flushcursor() was not the culprit. > > the other thing is... do we really need to console_conditional_schedule() > from fbcon_*()? console_unlock() does cond_resched() after every line it > prints. wouldn't that be enough? Every record, isn't it? How many bytes can a record write to consoles? CONSOLE_EXT_LOG_MAX bytes multiplied by MAX_CMDLINECONSOLES ? Might take longer than a second for each record? > > so may be we can drop some of console_conditional_schedule() > call sites in fbcon. or update console_conditional_schedule() > function to always return the current preemption value, not the > one we saw in console_trylock(). Replacing console_may_schedule with get_console_may_schedule() will avoid this bug. I noticed that we forgot to reset console_may_schedule to 0 because the "again:" label is located after the assignment line. > > ---------------------------------------- > > [ OK [ 255.862188] audit: type\x1131 audit(1482733112.662:148): pid=1 uid=0 auidB94967295 sesB94967295 msg='unit=systemd-tmpfiles-setup-dev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > > ] Stopped Create Static Device Nodes in /dev. > > > > [ 255.871468] BUG: sleeping function called from invalid context at kernel/printk/printk.c:2325 > > [ 255.871469] in_atomic(): 1, irqs_disabled(): 1, pid: 10079, name: plymouthd > > [ 255.871469] 6 locks held by plymouthd/10079: > > [ 255.871470] #0: (&tty->ldisc_sem){++++.+}, at: [<ffffffff817413e2>] ldsem_down_read+0x32/0x40 > > [ 255.871472] #1: (&tty->atomic_write_lock){+.+.+.}, at: [<ffffffff81424309>] tty_write_lock+0x19/0x50 > > [ 255.871474] #2: (&tty->termios_rwsem){++++..}, at: [<ffffffff81429d59>] n_tty_write+0x99/0x470 > > [ 255.871475] #3: (&ldata->output_lock){+.+...}, at: [<ffffffff81429df0>] n_tty_write+0x130/0x470 > > [ 255.871477] #4: (console_lock){+.+.+.}, at: [<ffffffff8110616e>] console_unlock+0x33e/0x6b0 > > [ 255.871479] #5: (printing_lock){......}, at: [<ffffffff8143baf5>] vt_console_print+0x75/0x3d0 > > [ 255.871481] irq event stamp: 15244 > > [ 255.871481] hardirqs last enabled at (15243): [<ffffffff81105011>] __down_trylock_console_sem+0x91/0xa0 > > [ 255.871482] hardirqs last disabled at (15244): [<ffffffff81105ea4>] console_unlock+0x74/0x6b0 > > [ 255.871482] softirqs last enabled at (14968): [<ffffffff81096394>] __do_softirq+0x344/0x580 > > [ 255.871482] softirqs last disabled at (14963): [<ffffffff810968d3>] irq_exit+0xe3/0x120 > > [ 255.871483] CPU: 0 PID: 10079 Comm: plymouthd Not tainted 4.9.0-next-20161224+ #12 > > [ 255.871483] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 > > [ 255.871484] Call Trace: > > [ 255.871484] dump_stack+0x85/0xc9 > > [ 255.871485] ___might_sleep+0x14a/0x250 > > [ 255.871485] console_conditional_schedule+0x22/0x30 > > [ 255.871485] fbcon_redraw.isra.24+0xa3/0x1d0 > > [ 255.871486] ? fbcon_cursor+0x151/0x1c0 > > [ 255.871486] fbcon_scroll+0x11d/0xcb0 > > [ 255.871487] con_scroll+0x160/0x170 > > [ 255.871487] lf+0x9c/0xb0 > > [ 255.871487] vt_console_print+0x2b7/0x3d0 > > [ 255.871488] console_unlock+0x457/0x6b0 > > [ 255.871488] do_con_write.part.19+0x737/0x9e0 > > [ 255.871489] ? mark_held_locks+0x71/0x90 > > [ 255.871489] con_write+0x57/0x60 > > [ 255.871489] n_tty_write+0x1bf/0x470 > > [ 255.871490] ? prepare_to_wait_event+0x110/0x110 > > [ 255.871490] tty_write+0x157/0x2d0 > > [ 255.871491] ? n_tty_open+0xd0/0xd0 > > [ 255.871491] __vfs_write+0x32/0x140 > > [ 255.871491] ? trace_hardirqs_on+0xd/0x10 > > [ 255.871492] ? __audit_syscall_entry+0xaa/0xf0 > > [ 255.871492] vfs_write+0xc2/0x1f0 > > [ 255.871493] ? syscall_trace_enter+0x1cb/0x3e0 > > [ 255.871493] SyS_write+0x53/0xc0 > > [ 255.871493] do_syscall_64+0x67/0x1f0 > > [ 255.871494] entry_SYSCALL64_slow_path+0x25/0x25 > > [ 255.871494] RIP: 0033:0x7fb74cf8fc60 > > [ 255.871495] RSP: 002b:00007ffcaab3fe88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > [ 255.871495] RAX: ffffffffffffffda RBX: 000055d3acaf7160 RCX: 00007fb74cf8fc60 > > [ 255.871496] RDX: 000000000000003f RSI: 000055d3acafd090 RDI: 0000000000000009 > > [ 255.871496] RBP: 000055d3acafc440 R08: 0000000000000070 R09: 0000000000000000 > > [ 255.871497] R10: 000000000000003f R11: 0000000000000246 R12: 000055d3acafc330 > > [ 255.871497] R13: 000000000000003f R14: 00007ffcaab3ffb0 R15: 0000000000000000 > > Stopping Create Static Device Nodes in /dev... What happened here was: (1) /usr/sbin/plymouthd requested write() syscall against tty. (2) write() syscall invoked console_unlock() via tty_write(). (3) Since write() syscall is sleepable context, console_may_schedule was 1 as of entering into console_unlock(). (4) console_unlock() set do_cond_resched to 1 and set console_may_schedule to 0 so that console drivers won't call cond_resched() from console_conditional_schedule() because printk_safe_enter(flags) in console_unlock() disables hardirqs. (5) /usr/sbin/plymouthd left the "for (;;)" loop in console_unlock(). (6) Since an audit log from /usr/lib/systemd/systemd was added to printk buffer via kauditd_printk_skb() because /usr/sbin/auditd terminated due to shutdown sequence, /usr/sbin/plymouthd found that console_seq != log_next_seq is true. (7) /usr/sbin/plymouthd calls console_trylock(), and console_trylock() returned 1 because nobody was holding console_sem. console_may_schedule was 1 as of returning from console_trylock() because write() syscall is sleepable context. (8) However, "goto again;" jumped to a line which is after setting console_may_schedule to 0. This is the cause of this problem. As a result, console_may_schedule remained 1. (9) console_conditional_schedule() was called from vt_console_print() via call_console_drivers(). And cond_resched() was by error called due to (8). The correct behavior I guess is ---------- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 8b26964..ce53488 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2186,6 +2186,7 @@ void console_unlock(void) bool wake_klogd = false; bool do_cond_resched, retry; +again: if (console_suspended) { up_console_sem(); return; @@ -2204,7 +2205,6 @@ void console_unlock(void) do_cond_resched = console_may_schedule; console_may_schedule = 0; -again: /* * We released the console_sem lock, so we need to recheck if * cpu is online and (if not) is there at least one CON_ANYTIME ---------- optionally with ---------- @@ -2297,11 +2297,13 @@ void console_unlock(void) retry = console_seq != log_next_seq; raw_spin_unlock_irqrestore(&logbuf_lock, flags); - if (retry && console_trylock()) - goto again; - if (wake_klogd) wake_up_klogd(); + + if (retry && console_trylock()) { + wake_klogd = false; + goto again; + } } EXPORT_SYMBOL(console_unlock); ---------- so that current thread will act as if current thread again called console_unlock() immediately after returning from console_trylock().
next reply other threads:[~2017-01-05 10:30 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-01-05 10:27 Tetsuo Handa [this message] 2017-01-05 10:27 ` printk: reset may_schedule if console_trylock() from console_unlock() Tetsuo Handa 2017-01-05 10:56 ` Sergey Senozhatsky 2017-01-05 10:56 ` Sergey Senozhatsky
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=201701051927.IJF65159.OFFtSLQOOFHJMV@I-love.SAKURA.ne.jp \ --to=penguin-kernel@i-love.sakura.ne.jp \ --cc=calvinowens@fb.com \ --cc=davej@codemonkey.org.uk \ --cc=gregkh@linuxfoundation.org \ --cc=jack@suse.com \ --cc=jslaby@suse.cz \ --cc=kyle@kernel.org \ --cc=linux-fbdev@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=pmladek@suse.com \ --cc=sergey.senozhatsky.work@gmail.com \ --cc=sergey.senozhatsky@gmail.com \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.