All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
@ 2021-06-29 14:33 Petr Mladek
  2021-06-29 14:48 ` John Ogness
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Petr Mladek @ 2021-06-29 14:33 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Petr Mladek, stable

The standard printk() tries to flush the message to the console
immediately. It tries to take the console lock. If the lock is
already taken then the current owner is responsible for flushing
even the new message.

There is a small race window between checking whether a new message is
available and releasing the console lock. It is solved by re-checking
the state after releasing the console lock. If the check is positive
then console_unlock() tries to take the lock again and process the new
message as well.

The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that
console_seq is not longer read atomically. As a result, the re-check might
be done with an inconsistent 64-bit index.

Solve it by using the last sequence number that has been checked under
the console lock. In the worst case, it will take the lock again only
to realized that the new message has already been proceed. But it
was possible even before.

Fixes: commit 996e966640ddea7b535c ("printk: remove logbuf_lock")
Cc: stable@vger.kernel.org # 5.13
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 142a58d124d9..87411084075e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2545,6 +2545,7 @@ void console_unlock(void)
 	bool do_cond_resched, retry;
 	struct printk_info info;
 	struct printk_record r;
+	u64 next_seq;
 
 	if (console_suspended) {
 		up_console_sem();
@@ -2654,8 +2655,10 @@ void console_unlock(void)
 			cond_resched();
 	}
 
-	console_locked = 0;
+	/* Get consistent value of the next-to-be-used sequence number. */
+	next_seq = console_seq;
 
+	console_locked = 0;
 	up_console_sem();
 
 	/*
@@ -2664,7 +2667,7 @@ void console_unlock(void)
 	 * there's a new owner and the console_unlock() from them will do the
 	 * flush, no worries.
 	 */
-	retry = prb_read_valid(prb, console_seq, NULL);
+	retry = prb_read_valid(prb, next_seq, NULL);
 	printk_safe_exit_irqrestore(flags);
 
 	if (retry && console_trylock())
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
  2021-06-29 14:33 [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock() Petr Mladek
@ 2021-06-29 14:48 ` John Ogness
  2021-06-29 15:42 ` Sergey Senozhatsky
  2021-06-29 19:44   ` kernel test robot
  2 siblings, 0 replies; 11+ messages in thread
From: John Ogness @ 2021-06-29 14:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Petr Mladek, stable

On 2021-06-29, Petr Mladek <pmladek@suse.com> wrote:
> The standard printk() tries to flush the message to the console
> immediately. It tries to take the console lock. If the lock is
> already taken then the current owner is responsible for flushing
> even the new message.
>
> There is a small race window between checking whether a new message is
> available and releasing the console lock. It is solved by re-checking
> the state after releasing the console lock. If the check is positive
> then console_unlock() tries to take the lock again and process the new
> message as well.
>
> The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that
> console_seq is not longer read atomically. As a result, the re-check might
> be done with an inconsistent 64-bit index.
>
> Solve it by using the last sequence number that has been checked under
> the console lock. In the worst case, it will take the lock again only
> to realized that the new message has already been proceed. But it
> was possible even before.
>
> Fixes: commit 996e966640ddea7b535c ("printk: remove logbuf_lock")
> Cc: stable@vger.kernel.org # 5.13
> Signed-off-by: Petr Mladek <pmladek@suse.com>

Reviewed-by: John Ogness <john.ogness@linutronix.de>

Good catch.

John

> ---
>  kernel/printk/printk.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 142a58d124d9..87411084075e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2545,6 +2545,7 @@ void console_unlock(void)
>  	bool do_cond_resched, retry;
>  	struct printk_info info;
>  	struct printk_record r;
> +	u64 next_seq;
>  
>  	if (console_suspended) {
>  		up_console_sem();
> @@ -2654,8 +2655,10 @@ void console_unlock(void)
>  			cond_resched();
>  	}
>  
> -	console_locked = 0;
> +	/* Get consistent value of the next-to-be-used sequence number. */
> +	next_seq = console_seq;
>  
> +	console_locked = 0;
>  	up_console_sem();
>  
>  	/*
> @@ -2664,7 +2667,7 @@ void console_unlock(void)
>  	 * there's a new owner and the console_unlock() from them will do the
>  	 * flush, no worries.
>  	 */
> -	retry = prb_read_valid(prb, console_seq, NULL);
> +	retry = prb_read_valid(prb, next_seq, NULL);
>  	printk_safe_exit_irqrestore(flags);
>  
>  	if (retry && console_trylock())
> -- 
> 2.26.2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
  2021-06-29 14:33 [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock() Petr Mladek
  2021-06-29 14:48 ` John Ogness
@ 2021-06-29 15:42 ` Sergey Senozhatsky
  2021-06-30  7:58   ` Petr Mladek
  2021-06-29 19:44   ` kernel test robot
  2 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2021-06-29 15:42 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, stable

On (21/06/29 16:33), Petr Mladek wrote:
> The standard printk() tries to flush the message to the console
> immediately. It tries to take the console lock. If the lock is
> already taken then the current owner is responsible for flushing
> even the new message.
> 
> There is a small race window between checking whether a new message is
> available and releasing the console lock. It is solved by re-checking
> the state after releasing the console lock. If the check is positive
> then console_unlock() tries to take the lock again and process the new
> message as well.
[..]
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 142a58d124d9..87411084075e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2545,6 +2545,7 @@ void console_unlock(void)
>  	bool do_cond_resched, retry;
>  	struct printk_info info;
>  	struct printk_record r;
> +	u64 next_seq;
>  
>  	if (console_suspended) {
>  		up_console_sem();
> @@ -2654,8 +2655,10 @@ void console_unlock(void)
>  			cond_resched();
>  	}
>  
> -	console_locked = 0;
> +	/* Get consistent value of the next-to-be-used sequence number. */
> +	next_seq = console_seq;
>  
> +	console_locked = 0;
>  	up_console_sem();
>  
>  	/*
> @@ -2664,7 +2667,7 @@ void console_unlock(void)
>  	 * there's a new owner and the console_unlock() from them will do the
>  	 * flush, no worries.
>  	 */
> -	retry = prb_read_valid(prb, console_seq, NULL);
> +	retry = prb_read_valid(prb, next_seq, NULL);
>  	printk_safe_exit_irqrestore(flags);
>  
>  	if (retry && console_trylock())

Maybe it's too late here in my time zone, but what are the consequences
of this race?

`retry` can be falsely set, console_trylock() does not spin on owner,
so the context that just released the lock can grab it again only if
it's unlocked. For the context that just has released the console_sem
and then acquired it again, because of the race, - console_seq will be
valid after it acquires the lock, then it'll jump to `retry` and
re-validated the console_seq - prb_read_valid(). If it's valid, it'll
print the message; and should another CPU printk that CPU will spin on
owner and then the current console_sem owner will yield to it via
console_lock_spinning branch.

One way or the other, good catch and nice to have it fixed.

Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
  2021-06-29 14:33 [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock() Petr Mladek
@ 2021-06-29 19:44   ` kernel test robot
  2021-06-29 15:42 ` Sergey Senozhatsky
  2021-06-29 19:44   ` kernel test robot
  2 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2021-06-29 19:44 UTC (permalink / raw)
  To: Petr Mladek, John Ogness
  Cc: kbuild-all, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Petr Mladek, stable

[-- Attachment #1: Type: text/plain, Size: 7244 bytes --]

Hi Petr,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.13 next-20210629]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Petr-Mladek/printk-console-Check-consistent-sequence-number-when-handling-race-in-console_unlock/20210629-223408
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c54b245d011855ea91c5beff07f1db74143ce614
config: x86_64-buildonly-randconfig-r006-20210629 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/24c9c4e757be2efe5bb225e443b3502338abf64c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Petr-Mladek/printk-console-Check-consistent-sequence-number-when-handling-race-in-console_unlock/20210629-223408
        git checkout 24c9c4e757be2efe5bb225e443b3502338abf64c
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/printk/printk.c:175:5: warning: no previous prototype for 'devkmsg_sysctl_set_loglvl' [-Wmissing-prototypes]
     175 | int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c: In function 'console_unlock':
>> kernel/printk/printk.c:2548:6: warning: variable 'next_seq' set but not used [-Wunused-but-set-variable]
    2548 |  u64 next_seq;
         |      ^~~~~~~~


vim +/next_seq +2548 kernel/printk/printk.c

  2525	
  2526	/**
  2527	 * console_unlock - unlock the console system
  2528	 *
  2529	 * Releases the console_lock which the caller holds on the console system
  2530	 * and the console driver list.
  2531	 *
  2532	 * While the console_lock was held, console output may have been buffered
  2533	 * by printk().  If this is the case, console_unlock(); emits
  2534	 * the output prior to releasing the lock.
  2535	 *
  2536	 * If there is output waiting, we wake /dev/kmsg and syslog() users.
  2537	 *
  2538	 * console_unlock(); may be called from any context.
  2539	 */
  2540	void console_unlock(void)
  2541	{
  2542		static char ext_text[CONSOLE_EXT_LOG_MAX];
  2543		static char text[CONSOLE_LOG_MAX];
  2544		unsigned long flags;
  2545		bool do_cond_resched, retry;
  2546		struct printk_info info;
  2547		struct printk_record r;
> 2548		u64 next_seq;
  2549	
  2550		if (console_suspended) {
  2551			up_console_sem();
  2552			return;
  2553		}
  2554	
  2555		prb_rec_init_rd(&r, &info, text, sizeof(text));
  2556	
  2557		/*
  2558		 * Console drivers are called with interrupts disabled, so
  2559		 * @console_may_schedule should be cleared before; however, we may
  2560		 * end up dumping a lot of lines, for example, if called from
  2561		 * console registration path, and should invoke cond_resched()
  2562		 * between lines if allowable.  Not doing so can cause a very long
  2563		 * scheduling stall on a slow console leading to RCU stall and
  2564		 * softlockup warnings which exacerbate the issue with more
  2565		 * messages practically incapacitating the system.
  2566		 *
  2567		 * console_trylock() is not able to detect the preemptive
  2568		 * context reliably. Therefore the value must be stored before
  2569		 * and cleared after the "again" goto label.
  2570		 */
  2571		do_cond_resched = console_may_schedule;
  2572	again:
  2573		console_may_schedule = 0;
  2574	
  2575		/*
  2576		 * We released the console_sem lock, so we need to recheck if
  2577		 * cpu is online and (if not) is there at least one CON_ANYTIME
  2578		 * console.
  2579		 */
  2580		if (!can_use_console()) {
  2581			console_locked = 0;
  2582			up_console_sem();
  2583			return;
  2584		}
  2585	
  2586		for (;;) {
  2587			size_t ext_len = 0;
  2588			size_t len;
  2589	
  2590			printk_safe_enter_irqsave(flags);
  2591	skip:
  2592			if (!prb_read_valid(prb, console_seq, &r))
  2593				break;
  2594	
  2595			if (console_seq != r.info->seq) {
  2596				console_dropped += r.info->seq - console_seq;
  2597				console_seq = r.info->seq;
  2598			}
  2599	
  2600			if (suppress_message_printing(r.info->level)) {
  2601				/*
  2602				 * Skip record we have buffered and already printed
  2603				 * directly to the console when we received it, and
  2604				 * record that has level above the console loglevel.
  2605				 */
  2606				console_seq++;
  2607				goto skip;
  2608			}
  2609	
  2610			/* Output to all consoles once old messages replayed. */
  2611			if (unlikely(exclusive_console &&
  2612				     console_seq >= exclusive_console_stop_seq)) {
  2613				exclusive_console = NULL;
  2614			}
  2615	
  2616			/*
  2617			 * Handle extended console text first because later
  2618			 * record_print_text() will modify the record buffer in-place.
  2619			 */
  2620			if (nr_ext_console_drivers) {
  2621				ext_len = info_print_ext_header(ext_text,
  2622							sizeof(ext_text),
  2623							r.info);
  2624				ext_len += msg_print_ext_body(ext_text + ext_len,
  2625							sizeof(ext_text) - ext_len,
  2626							&r.text_buf[0],
  2627							r.info->text_len,
  2628							&r.info->dev_info);
  2629			}
  2630			len = record_print_text(&r,
  2631					console_msg_format & MSG_FORMAT_SYSLOG,
  2632					printk_time);
  2633			console_seq++;
  2634	
  2635			/*
  2636			 * While actively printing out messages, if another printk()
  2637			 * were to occur on another CPU, it may wait for this one to
  2638			 * finish. This task can not be preempted if there is a
  2639			 * waiter waiting to take over.
  2640			 */
  2641			console_lock_spinning_enable();
  2642	
  2643			stop_critical_timings();	/* don't trace print latency */
  2644			call_console_drivers(ext_text, ext_len, text, len);
  2645			start_critical_timings();
  2646	
  2647			if (console_lock_spinning_disable_and_check()) {
  2648				printk_safe_exit_irqrestore(flags);
  2649				return;
  2650			}
  2651	
  2652			printk_safe_exit_irqrestore(flags);
  2653	
  2654			if (do_cond_resched)
  2655				cond_resched();
  2656		}
  2657	
  2658		/* Get consistent value of the next-to-be-used sequence number. */
  2659		next_seq = console_seq;
  2660	
  2661		console_locked = 0;
  2662		up_console_sem();
  2663	
  2664		/*
  2665		 * Someone could have filled up the buffer again, so re-check if there's
  2666		 * something to flush. In case we cannot trylock the console_sem again,
  2667		 * there's a new owner and the console_unlock() from them will do the
  2668		 * flush, no worries.
  2669		 */
  2670		retry = prb_read_valid(prb, next_seq, NULL);
  2671		printk_safe_exit_irqrestore(flags);
  2672	
  2673		if (retry && console_trylock())
  2674			goto again;
  2675	}
  2676	EXPORT_SYMBOL(console_unlock);
  2677	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35718 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
@ 2021-06-29 19:44   ` kernel test robot
  0 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2021-06-29 19:44 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 7441 bytes --]

Hi Petr,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.13 next-20210629]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Petr-Mladek/printk-console-Check-consistent-sequence-number-when-handling-race-in-console_unlock/20210629-223408
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c54b245d011855ea91c5beff07f1db74143ce614
config: x86_64-buildonly-randconfig-r006-20210629 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/24c9c4e757be2efe5bb225e443b3502338abf64c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Petr-Mladek/printk-console-Check-consistent-sequence-number-when-handling-race-in-console_unlock/20210629-223408
        git checkout 24c9c4e757be2efe5bb225e443b3502338abf64c
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/printk/printk.c:175:5: warning: no previous prototype for 'devkmsg_sysctl_set_loglvl' [-Wmissing-prototypes]
     175 | int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c: In function 'console_unlock':
>> kernel/printk/printk.c:2548:6: warning: variable 'next_seq' set but not used [-Wunused-but-set-variable]
    2548 |  u64 next_seq;
         |      ^~~~~~~~


vim +/next_seq +2548 kernel/printk/printk.c

  2525	
  2526	/**
  2527	 * console_unlock - unlock the console system
  2528	 *
  2529	 * Releases the console_lock which the caller holds on the console system
  2530	 * and the console driver list.
  2531	 *
  2532	 * While the console_lock was held, console output may have been buffered
  2533	 * by printk().  If this is the case, console_unlock(); emits
  2534	 * the output prior to releasing the lock.
  2535	 *
  2536	 * If there is output waiting, we wake /dev/kmsg and syslog() users.
  2537	 *
  2538	 * console_unlock(); may be called from any context.
  2539	 */
  2540	void console_unlock(void)
  2541	{
  2542		static char ext_text[CONSOLE_EXT_LOG_MAX];
  2543		static char text[CONSOLE_LOG_MAX];
  2544		unsigned long flags;
  2545		bool do_cond_resched, retry;
  2546		struct printk_info info;
  2547		struct printk_record r;
> 2548		u64 next_seq;
  2549	
  2550		if (console_suspended) {
  2551			up_console_sem();
  2552			return;
  2553		}
  2554	
  2555		prb_rec_init_rd(&r, &info, text, sizeof(text));
  2556	
  2557		/*
  2558		 * Console drivers are called with interrupts disabled, so
  2559		 * @console_may_schedule should be cleared before; however, we may
  2560		 * end up dumping a lot of lines, for example, if called from
  2561		 * console registration path, and should invoke cond_resched()
  2562		 * between lines if allowable.  Not doing so can cause a very long
  2563		 * scheduling stall on a slow console leading to RCU stall and
  2564		 * softlockup warnings which exacerbate the issue with more
  2565		 * messages practically incapacitating the system.
  2566		 *
  2567		 * console_trylock() is not able to detect the preemptive
  2568		 * context reliably. Therefore the value must be stored before
  2569		 * and cleared after the "again" goto label.
  2570		 */
  2571		do_cond_resched = console_may_schedule;
  2572	again:
  2573		console_may_schedule = 0;
  2574	
  2575		/*
  2576		 * We released the console_sem lock, so we need to recheck if
  2577		 * cpu is online and (if not) is there at least one CON_ANYTIME
  2578		 * console.
  2579		 */
  2580		if (!can_use_console()) {
  2581			console_locked = 0;
  2582			up_console_sem();
  2583			return;
  2584		}
  2585	
  2586		for (;;) {
  2587			size_t ext_len = 0;
  2588			size_t len;
  2589	
  2590			printk_safe_enter_irqsave(flags);
  2591	skip:
  2592			if (!prb_read_valid(prb, console_seq, &r))
  2593				break;
  2594	
  2595			if (console_seq != r.info->seq) {
  2596				console_dropped += r.info->seq - console_seq;
  2597				console_seq = r.info->seq;
  2598			}
  2599	
  2600			if (suppress_message_printing(r.info->level)) {
  2601				/*
  2602				 * Skip record we have buffered and already printed
  2603				 * directly to the console when we received it, and
  2604				 * record that has level above the console loglevel.
  2605				 */
  2606				console_seq++;
  2607				goto skip;
  2608			}
  2609	
  2610			/* Output to all consoles once old messages replayed. */
  2611			if (unlikely(exclusive_console &&
  2612				     console_seq >= exclusive_console_stop_seq)) {
  2613				exclusive_console = NULL;
  2614			}
  2615	
  2616			/*
  2617			 * Handle extended console text first because later
  2618			 * record_print_text() will modify the record buffer in-place.
  2619			 */
  2620			if (nr_ext_console_drivers) {
  2621				ext_len = info_print_ext_header(ext_text,
  2622							sizeof(ext_text),
  2623							r.info);
  2624				ext_len += msg_print_ext_body(ext_text + ext_len,
  2625							sizeof(ext_text) - ext_len,
  2626							&r.text_buf[0],
  2627							r.info->text_len,
  2628							&r.info->dev_info);
  2629			}
  2630			len = record_print_text(&r,
  2631					console_msg_format & MSG_FORMAT_SYSLOG,
  2632					printk_time);
  2633			console_seq++;
  2634	
  2635			/*
  2636			 * While actively printing out messages, if another printk()
  2637			 * were to occur on another CPU, it may wait for this one to
  2638			 * finish. This task can not be preempted if there is a
  2639			 * waiter waiting to take over.
  2640			 */
  2641			console_lock_spinning_enable();
  2642	
  2643			stop_critical_timings();	/* don't trace print latency */
  2644			call_console_drivers(ext_text, ext_len, text, len);
  2645			start_critical_timings();
  2646	
  2647			if (console_lock_spinning_disable_and_check()) {
  2648				printk_safe_exit_irqrestore(flags);
  2649				return;
  2650			}
  2651	
  2652			printk_safe_exit_irqrestore(flags);
  2653	
  2654			if (do_cond_resched)
  2655				cond_resched();
  2656		}
  2657	
  2658		/* Get consistent value of the next-to-be-used sequence number. */
  2659		next_seq = console_seq;
  2660	
  2661		console_locked = 0;
  2662		up_console_sem();
  2663	
  2664		/*
  2665		 * Someone could have filled up the buffer again, so re-check if there's
  2666		 * something to flush. In case we cannot trylock the console_sem again,
  2667		 * there's a new owner and the console_unlock() from them will do the
  2668		 * flush, no worries.
  2669		 */
  2670		retry = prb_read_valid(prb, next_seq, NULL);
  2671		printk_safe_exit_irqrestore(flags);
  2672	
  2673		if (retry && console_trylock())
  2674			goto again;
  2675	}
  2676	EXPORT_SYMBOL(console_unlock);
  2677	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35718 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
  2021-06-29 19:44   ` kernel test robot
@ 2021-06-29 20:53     ` John Ogness
  -1 siblings, 0 replies; 11+ messages in thread
From: John Ogness @ 2021-06-29 20:53 UTC (permalink / raw)
  To: kernel test robot, Petr Mladek
  Cc: kbuild-all, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel, Petr Mladek, stable

On 2021-06-30, kernel test robot <lkp@intel.com> wrote:
>>> kernel/printk/printk.c:2548:6: warning: variable 'next_seq' set but not used [-Wunused-but-set-variable]

I suppose the correct fix for this warning would be to change the NOP
macros. Currently they are:

#define prb_read_valid(rb, seq, r)      false
#define prb_first_valid_seq(rb)         0

They should probably be something like (untested):

#define prb_read_valid(rb, seq, r)     \
({                                     \
        (void)(rb);                    \
        (void)(seq);                   \
        (void)(r);                     \
        false;                         \
})

#define prb_first_valid_seq(rb)        \
({                                     \
        (void)(rb);                    \
        0;                             \
})

John Ogness

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
@ 2021-06-29 20:53     ` John Ogness
  0 siblings, 0 replies; 11+ messages in thread
From: John Ogness @ 2021-06-29 20:53 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 860 bytes --]

On 2021-06-30, kernel test robot <lkp@intel.com> wrote:
>>> kernel/printk/printk.c:2548:6: warning: variable 'next_seq' set but not used [-Wunused-but-set-variable]

I suppose the correct fix for this warning would be to change the NOP
macros. Currently they are:

#define prb_read_valid(rb, seq, r)      false
#define prb_first_valid_seq(rb)         0

They should probably be something like (untested):

#define prb_read_valid(rb, seq, r)     \
({                                     \
        (void)(rb);                    \
        (void)(seq);                   \
        (void)(r);                     \
        false;                         \
})

#define prb_first_valid_seq(rb)        \
({                                     \
        (void)(rb);                    \
        0;                             \
})

John Ogness

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
  2021-06-29 15:42 ` Sergey Senozhatsky
@ 2021-06-30  7:58   ` Petr Mladek
  2021-07-05  3:53     ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Mladek @ 2021-06-30  7:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: John Ogness, Steven Rostedt, Thomas Gleixner, linux-kernel, stable

On Wed 2021-06-30 00:42:13, Sergey Senozhatsky wrote:
> On (21/06/29 16:33), Petr Mladek wrote:
> > The standard printk() tries to flush the message to the console
> > immediately. It tries to take the console lock. If the lock is
> > already taken then the current owner is responsible for flushing
> > even the new message.
> > 
> > There is a small race window between checking whether a new message is
> > available and releasing the console lock. It is solved by re-checking
> > the state after releasing the console lock. If the check is positive
> > then console_unlock() tries to take the lock again and process the new
> > message as well.
> [..]
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 142a58d124d9..87411084075e 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2545,6 +2545,7 @@ void console_unlock(void)
> >  	bool do_cond_resched, retry;
> >  	struct printk_info info;
> >  	struct printk_record r;
> > +	u64 next_seq;
> >  
> >  	if (console_suspended) {
> >  		up_console_sem();
> > @@ -2654,8 +2655,10 @@ void console_unlock(void)
> >  			cond_resched();
> >  	}
> >  
> > -	console_locked = 0;
> > +	/* Get consistent value of the next-to-be-used sequence number. */
> > +	next_seq = console_seq;
> >  
> > +	console_locked = 0;
> >  	up_console_sem();
> >  
> >  	/*
> > @@ -2664,7 +2667,7 @@ void console_unlock(void)
> >  	 * there's a new owner and the console_unlock() from them will do the
> >  	 * flush, no worries.
> >  	 */
> > -	retry = prb_read_valid(prb, console_seq, NULL);
> > +	retry = prb_read_valid(prb, next_seq, NULL);
> >  	printk_safe_exit_irqrestore(flags);
> >  
> >  	if (retry && console_trylock())
> 
> Maybe it's too late here in my time zone, but what are the consequences
> of this race?
> 
> `retry` can be falsely set, console_trylock() does not spin on owner,
> so the context that just released the lock can grab it again only if
> it's unlocked. For the context that just has released the console_sem
> and then acquired it again, because of the race, - console_seq will be
> valid after it acquires the lock, then it'll jump to `retry` and
> re-validated the console_seq - prb_read_valid(). If it's valid, it'll
> print the message; and should another CPU printk that CPU will spin on
> owner and then the current console_sem owner will yield to it via
> console_lock_spinning branch.

I am not sure that I follow it correctly. IMHO, there are two possible
races. I believe that you are talking about the 2nd scenario:

1st scenario: console_unlock() retries but the message has been proceed
   in the meantime:

CPU0				CPU1

console_unlock()

  // process all pending messages
  next_seq = 100;
  up_console_sem();

				printk()
				  vprintk_store()
				    // storing message with seq = 100;

				  console_trylock_spinning()
				     //succees
				  console_unlock()
				    // show message with seq == 100
				    console_seq++; (101)

				  next_seq = 101;
				  up_consle_sem()

   retry = prb_read_valid(prb, next_seq, NULL);
     // true because next_seq == 100
   goto retry;

   if (!prb_read_valid(prb, console_seq, &r))
     break;
     // breaks because console_seq == 101

Result: CPU0 retired just to realize that the message
	has already been procceed.


2nd scenario: printk() caller spins when other process is already
	processing it's message


CPU0				CPU1

printk()
  vprintk_store()
     // storing message with seq == 100

  console_trylock_spinning()
     // succees
  console_unlock()
    // show message with seq == 100
      console_seq++; (=> 101)

      next_seq = 101;
      up_console_sem();

			       printk()
				  vprintk_store()
				  // storing message with seq == 101

     retry = prb_read_valid(prb, next_seq, NULL);
     // true because next_seq == 101
     goto retry

     if (!prb_read_valid(prb, console_seq, &r))
       // read messages with seq == 101

     console_seq++;  (=> 102)

     console_lock_spinning_enable();
     call_console_drivers();

				  console_trylock_spinning()
				    // spinning

     if (console_lock_spinning_disable_and_check())
	return;

     // returns because there is a waiter

				  // got the console lock
				  console_unlock()
				    if (!prb_read_valid(prb, console_seq, &r))
				      break;
				     // breaks because console_seq == 102


Result: CPU1 was spinning just to realize that the message has already
	been proceed.


It is not ideal. But the result is always correct.

The races have been there already before. Only the race window in 1st
scenario was a bit smaller.

Anyway, thanks for the review.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
  2021-06-29 20:53     ` John Ogness
@ 2021-06-30  8:57       ` Petr Mladek
  -1 siblings, 0 replies; 11+ messages in thread
From: Petr Mladek @ 2021-06-30  8:57 UTC (permalink / raw)
  To: John Ogness
  Cc: kernel test robot, kbuild-all, Sergey Senozhatsky,
	Steven Rostedt, Thomas Gleixner, linux-kernel, stable

On Tue 2021-06-29 22:59:57, John Ogness wrote:
> On 2021-06-30, kernel test robot <lkp@intel.com> wrote:
> >>> kernel/printk/printk.c:2548:6: warning: variable 'next_seq' set but not used [-Wunused-but-set-variable]
> 
> I suppose the correct fix for this warning would be to change the NOP
> macros. Currently they are:
> 
> #define prb_read_valid(rb, seq, r)      false
> #define prb_first_valid_seq(rb)         0
> 
> They should probably be something like (untested):
> 
> #define prb_read_valid(rb, seq, r)     \
> ({                                     \
>         (void)(rb);                    \
>         (void)(seq);                   \
>         (void)(r);                     \
>         false;                         \
> })

This did not work:

kernel/printk/printk.c: In function ‘console_unlock’:
kernel/printk/printk.c:2600:23: error: ‘prb’ undeclared (first use in this function)
   if (!prb_read_valid(prb, console_seq, &r))
                       ^
kernel/printk/printk.c:2230:16: note: in definition of macro ‘prb_read_valid’
         (void)(rb);                    \
                ^~
kernel/printk/printk.c:2600:23: note: each undeclared identifier is reported only once for each function it appears in
   if (!prb_read_valid(prb, console_seq, &r))
                       ^
kernel/printk/printk.c:2230:16: note: in definition of macro ‘prb_read_valid’
         (void)(rb);                    \
                ^~


Instead, it might be solved by declaring next_seq as:

	u64 __maybe_unused next_seq;

Any better idea is welcome. Well, it is not worth any big complexity.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
@ 2021-06-30  8:57       ` Petr Mladek
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Mladek @ 2021-06-30  8:57 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

On Tue 2021-06-29 22:59:57, John Ogness wrote:
> On 2021-06-30, kernel test robot <lkp@intel.com> wrote:
> >>> kernel/printk/printk.c:2548:6: warning: variable 'next_seq' set but not used [-Wunused-but-set-variable]
> 
> I suppose the correct fix for this warning would be to change the NOP
> macros. Currently they are:
> 
> #define prb_read_valid(rb, seq, r)      false
> #define prb_first_valid_seq(rb)         0
> 
> They should probably be something like (untested):
> 
> #define prb_read_valid(rb, seq, r)     \
> ({                                     \
>         (void)(rb);                    \
>         (void)(seq);                   \
>         (void)(r);                     \
>         false;                         \
> })

This did not work:

kernel/printk/printk.c: In function ‘console_unlock’:
kernel/printk/printk.c:2600:23: error: ‘prb’ undeclared (first use in this function)
   if (!prb_read_valid(prb, console_seq, &r))
                       ^
kernel/printk/printk.c:2230:16: note: in definition of macro ‘prb_read_valid’
         (void)(rb);                    \
                ^~
kernel/printk/printk.c:2600:23: note: each undeclared identifier is reported only once for each function it appears in
   if (!prb_read_valid(prb, console_seq, &r))
                       ^
kernel/printk/printk.c:2230:16: note: in definition of macro ‘prb_read_valid’
         (void)(rb);                    \
                ^~


Instead, it might be solved by declaring next_seq as:

	u64 __maybe_unused next_seq;

Any better idea is welcome. Well, it is not worth any big complexity.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock()
  2021-06-30  7:58   ` Petr Mladek
@ 2021-07-05  3:53     ` Sergey Senozhatsky
  0 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2021-07-05  3:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, John Ogness, Steven Rostedt, Thomas Gleixner,
	linux-kernel, stable

On (21/06/30 09:58), Petr Mladek wrote:
[..]
> > `retry` can be falsely set, console_trylock() does not spin on owner,
> > so the context that just released the lock can grab it again only if
> > it's unlocked. For the context that just has released the console_sem
> > and then acquired it again, because of the race, - console_seq will be
> > valid after it acquires the lock, then it'll jump to `retry` and
> > re-validated the console_seq - prb_read_valid(). If it's valid, it'll
> > print the message; and should another CPU printk that CPU will spin on
> > owner and then the current console_sem owner will yield to it via
> > console_lock_spinning branch.
> 
> I am not sure that I follow it correctly. IMHO, there are two possible
> races. I believe that you are talking about the 2nd scenario:

I guess I was thinking about two scenarios simultaneously, but you
certainly did a much better job describing them.

Thanks a lot for spending time on this!

> 1st scenario: console_unlock() retries but the message has been proceed
>    in the meantime:
[..]
> Result: CPU0 retired just to realize that the message
> 	has already been procceed.

Ack.

> 2nd scenario: printk() caller spins when other process is already
> 	processing it's message
[..]
> Result: CPU1 was spinning just to realize that the message has already
> 	been proceed.

Ack.

> It is not ideal. But the result is always correct.
>
> The races have been there already before. Only the race window in 1st
> scenario was a bit smaller.

Yeah, this was my assertion as well, but I wanted to double check.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-07-05  3:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-29 14:33 [PATCH] printk/console: Check consistent sequence number when handling race in console_unlock() Petr Mladek
2021-06-29 14:48 ` John Ogness
2021-06-29 15:42 ` Sergey Senozhatsky
2021-06-30  7:58   ` Petr Mladek
2021-07-05  3:53     ` Sergey Senozhatsky
2021-06-29 19:44 ` kernel test robot
2021-06-29 19:44   ` kernel test robot
2021-06-29 20:53   ` John Ogness
2021-06-29 20:53     ` John Ogness
2021-06-30  8:57     ` Petr Mladek
2021-06-30  8:57       ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.