LKML Archive on lore.kernel.org
 help / Atom feed
* [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
@ 2018-02-08 13:04 Petr Mladek
  2018-02-08 14:53 ` Sergey Senozhatsky
  2018-02-19 15:58 ` Petr Mladek
  0 siblings, 2 replies; 13+ messages in thread
From: Petr Mladek @ 2018-02-08 13:04 UTC (permalink / raw)
  To: Sergey Senozhatsky, Steven Rostedt
  Cc: Sergey Senozhatsky, linux-kernel, Tejun Heo, Petr Mladek

From: Tejun Heo <tj@kernel.org>

We mark for waking up klogd whenever we see a new message sequence in
the main loop.  However, the actual wakeup is always at the end of the
function and we can easily test for the wakeup condition when we do
the final should-we-repeat check.

Move the wake_klogd condition check out of the main loop.  This avoids
doing the same thing repeatedly and groups similar checks into a
common place.

This fixes a race introduced by the commit dbdda842fe96f8932 ("printk: Add
console owner and waiter logic to load balance console writes").
The current console owner might process the newly added message before
the related printk() start waiting for the console lock. Then the lock
is passed without waking klogd. The new owner sees the already updated
seen_seq and does not know that the wakeup is needed.

Fixes: dbdda842fe96f8932 ("printk: Add console owner and waiter logic to load balance console writes")
Signed-off-by: Tejun Heo <tj@kernel.org>
[pmladek@suse.com: Describe the new race with console waiter]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
The patch [1] was lost in the discussions about the solution
of printk softlockups. I make sense on its own.

Important: It actually fixes a race caused by the new console waiter
logic. This is why I resend it with an updated commit message.

[1] https://lkml.kernel.org/r/20171102135146.GN3252168@devbig577.frc2.facebook.com


 kernel/printk/printk.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index db4b9b8929eb..2682209b1c90 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2335,10 +2335,6 @@ void console_unlock(void)
 
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
-		if (seen_seq != log_next_seq) {
-			wake_klogd = true;
-			seen_seq = log_next_seq;
-		}
 
 		if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped **\n",
@@ -2417,12 +2413,17 @@ void console_unlock(void)
 	up_console_sem();
 
 	/*
-	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
+	 * Check whether userland needs notification.  Also, someone could
+	 * have filled up the buffer again, so re-check if there's
+	 * something to flush. In case we cannot trylock the console_sem
+	 * again, there's a new owner and the console_unlock() from them
+	 * will do the flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
+	if (seen_seq != log_next_seq) {
+		wake_klogd = true;
+		seen_seq = log_next_seq;
+	}
 	retry = console_seq != log_next_seq;
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
-- 
2.13.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-08 13:04 [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock() Petr Mladek
@ 2018-02-08 14:53 ` Sergey Senozhatsky
  2018-02-08 16:48   ` Petr Mladek
  2018-02-19 15:58 ` Petr Mladek
  1 sibling, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2018-02-08 14:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel, Tejun Heo

On (02/08/18 14:04), Petr Mladek wrote:
> We mark for waking up klogd whenever we see a new message sequence in
> the main loop.  However, the actual wakeup is always at the end of the
> function and we can easily test for the wakeup condition when we do
> the final should-we-repeat check.
> 
> Move the wake_klogd condition check out of the main loop.  This avoids
> doing the same thing repeatedly and groups similar checks into a
> common place.
> 
> This fixes a race introduced by the commit dbdda842fe96f8932 ("printk: Add
> console owner and waiter logic to load balance console writes").
> The current console owner might process the newly added message before
> the related printk() start waiting for the console lock. Then the lock
> is passed without waking klogd. The new owner sees the already updated
> seen_seq and does not know that the wakeup is needed.

I need to do more "research" on this. I though about it some time ago,
and I think that waking up klogd _only_ when we don't have any pending
logbuf messages still can be pretty late. Can't it? We can spin in
console_unlock() printing loop for a long time, probably passing
console_sem ownership between CPUs, without waking up the log_wait waiter.
May be we can wake it up from the printing loop, outside of logbuf_lock,
and let klogd to compete for logbuf_lock with the printing CPU. Why do
we wake it up only when we are done pushing messages to a potentially
slow serial console?

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-08 14:53 ` Sergey Senozhatsky
@ 2018-02-08 16:48   ` Petr Mladek
  2018-02-09  3:28     ` Sergey Senozhatsky
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2018-02-08 16:48 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, linux-kernel, Tejun Heo

On Thu 2018-02-08 23:53:07, Sergey Senozhatsky wrote:
> On (02/08/18 14:04), Petr Mladek wrote:
> > We mark for waking up klogd whenever we see a new message sequence in
> > the main loop.  However, the actual wakeup is always at the end of the
> > function and we can easily test for the wakeup condition when we do
> > the final should-we-repeat check.
> > 
> > Move the wake_klogd condition check out of the main loop.  This avoids
> > doing the same thing repeatedly and groups similar checks into a
> > common place.
> > 
> > This fixes a race introduced by the commit dbdda842fe96f8932 ("printk: Add
> > console owner and waiter logic to load balance console writes").
> > The current console owner might process the newly added message before
> > the related printk() start waiting for the console lock. Then the lock
> > is passed without waking klogd. The new owner sees the already updated
> > seen_seq and does not know that the wakeup is needed.
> 
> I need to do more "research" on this. I though about it some time ago,
> and I think that waking up klogd _only_ when we don't have any pending
> logbuf messages still can be pretty late. Can't it? We can spin in
> console_unlock() printing loop for a long time, probably passing
> console_sem ownership between CPUs, without waking up the log_wait waiter.
> May be we can wake it up from the printing loop, outside of logbuf_lock,
> and let klogd to compete for logbuf_lock with the printing CPU. Why do
> we wake it up only when we are done pushing messages to a potentially
> slow serial console?

I thought about this as well but I was lazy. You made me to do some
archaeology. It seems that it worked this way basically from the beginning.
I have a git tree with pre-git commits. The oldest printk changes are
there from 2.1.113.

In 2.1.113, logd was weaken directly from printk():


asmlinkage int printk(const char *fmt, ...)
{
	spin_lock_irqsave(&console_lock, flags);
[...]
		for (; p < buf_end; p++) {
			log_buf[(log_start+log_size) & (LOG_BUF_LEN-1)] = *p;
			if (log_size < LOG_BUF_LEN)
--->				log_size++;
			else {
--->				log_start++;
				log_start &= LOG_BUF_LEN-1;
			}
		if (msg_level < console_loglevel && console_drivers) {
			struct console *c = console_drivers;
			while(c) {
				if ((c->flags & CON_ENABLED) && c->write)
--->					c->write(c, msg, p - msg + line_feed);
				c = c->next;
			}
		}
	}
	spin_unlock_irqrestore(&console_lock, flags);
--->	wake_up_interruptible(&log_wait);


log_wait seems to be used only in sys_syslog():

asmlinkage int sys_syslog(int type, char * buf, int len)
{

	lock_kernel();

	switch (type) {

	case 2:		/* Read from log */
---->		while (!log_size) {
			if (signal_pending(current)) {
				sti();
				goto out;
			}
			interruptible_sleep_on(&log_wait);
		}
		i = 0;
		while (log_size && i < len) {
			c = *((char *) log_buf+log_start);
---->			log_start++;
---->			log_size--;
			log_start &= LOG_BUF_LEN-1;
			sti();
			__put_user(c,buf);
			buf++;
			i++;
			cli();
		}
		sti();
		error = i;
		break;
		spin_unlock_irq(&logbuf_lock);


There are few interesting things:

   + synchronization is done using console_lock and the big kernel
     lock
   + consoles are written directly from printk()
   + the big kernel lock is taken all the time in sys_syslog()
   + sys_syslog() basically removes the messages from the buffer


I am not sure how the console_lock and the big kernel lock worked
together. But it seems that it was not possible to call consoles
and call __put_user() in sys_syslog() in parallel.


My opinion:

IMHO, it would make perfect sense to wake klogd earlier and it should
be safe these days.

I am just slightly afraid of a potential contention on printk_lock.
Consoles and klogd might delay each other. Another question is
how to do so when console_unlock() is called with interrupts
disabled (irq_work is queued on the same CPU). This is why
I would suggest to do this change separately and not for 4.16.

Note that we need Tejun's patch for-4.16 because it fixes a potential
race introduced by the console waiter logic.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-08 16:48   ` Petr Mladek
@ 2018-02-09  3:28     ` Sergey Senozhatsky
  2018-02-09 10:39       ` Petr Mladek
  2018-02-09 10:47       ` Petr Mladek
  0 siblings, 2 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2018-02-09  3:28 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel, Tejun Heo

On (02/08/18 17:48), Petr Mladek wrote:
[..]
> > 
> > I need to do more "research" on this. I though about it some time ago,
> > and I think that waking up klogd _only_ when we don't have any pending
> > logbuf messages still can be pretty late. Can't it? We can spin in
> > console_unlock() printing loop for a long time, probably passing
> > console_sem ownership between CPUs, without waking up the log_wait waiter.
> > May be we can wake it up from the printing loop, outside of logbuf_lock,
> > and let klogd to compete for logbuf_lock with the printing CPU. Why do
> > we wake it up only when we are done pushing messages to a potentially
> > slow serial console?
> 
> I thought about this as well but I was lazy. You made me to do some
> archaeology. It seems that it worked this way basically from the beginning.
> I have a git tree with pre-git commits. The oldest printk changes are
> there from 2.1.113.
> 
> In 2.1.113, logd was weaken directly from printk():

Thanks!

Was going to do the same today. Will take a look.

[..]
> My opinion:
> 
> IMHO, it would make perfect sense to wake klogd earlier and it should
> be safe these days.
> 
> I am just slightly afraid of a potential contention on printk_lock.
> Consoles and klogd might delay each other. Another question is
> how to do so when console_unlock() is called with interrupts
> disabled (irq_work is queued on the same CPU). This is why
> I would suggest to do this change separately and not for 4.16.

By postponing klogd wakeup we don't really address logbuf_lock
contention. We have no guarantees that no new printk will come
while klogd is active. Besides, consoles don't really delay
klogd - I tend to ignore the impact of msg_print_text(), it should
be fast. We call console drivers outside of logbuf_lock scope, so
everything should fine (tm).


Another question - do we need to wake it up from console_unlock()?

Basically,
- if consoles are suspended, we also "suspend" user space klogd.
  Does it really make sense?

- if console_lock is acquired by a preemptible task and that task
  is getting scheduled out for a long time (OOM, etc) then we postpone
  user space logging for unknown period of time. First the console_lock
  will have to flush pending messages and only afterwards it will wakeup
  klogd. Does it really make sense?

- If current console_lock owner jumps to retry (new pending messages
  were appended to the logbuf) label, user space klogd wakeup is getting
  postponed even further.

So, the final question is - since there in only one legitimate way
(modulo user space writes to kmsg) to append new messages to the
logbuf, shall we put klogd wakeup there? IOW, to vprintk_emit().

Something like this:

---

 kernel/printk/printk.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index db4b9b8929eb..2c8992d54a59 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -423,6 +423,9 @@ static u32 log_next_idx;
 static u64 console_seq;
 static u32 console_idx;
 
+/* The last message seq klogd have seen */
+static u64 klogd_seen_seq;
+
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
 static u32 clear_idx;
@@ -1888,6 +1891,10 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	printed_len = log_output(facility, level, lflags, dict, dictlen, text, text_len);
 
+	if (klogd_seen_seq != log_next_seq) {
+		klogd_seen_seq = log_next_seq;
+		wake_up_klogd();
+	}
 	logbuf_unlock_irqrestore(flags);
 
 	/* If called from the scheduler, we can not call up(). */
@@ -2289,9 +2296,7 @@ void console_unlock(void)
 {
 	static char ext_text[CONSOLE_EXT_LOG_MAX];
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
-	static u64 seen_seq;
 	unsigned long flags;
-	bool wake_klogd = false;
 	bool do_cond_resched, retry;
 
 	if (console_suspended) {
@@ -2335,11 +2340,6 @@ void console_unlock(void)
 
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
-		if (seen_seq != log_next_seq) {
-			wake_klogd = true;
-			seen_seq = log_next_seq;
-		}
-
 		if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped **\n",
 				      (unsigned)(log_first_seq - console_seq));
@@ -2429,9 +2429,6 @@ void console_unlock(void)
 
 	if (retry && console_trylock())
 		goto again;
-
-	if (wake_klogd)
-		wake_up_klogd();
 }
 EXPORT_SYMBOL(console_unlock);
 
---

So, essentially, instead of:

- OK, there is a new kernel message. Let's first print it to all of the
  consoles (if !suspended and if can use console) and only afterwards let
  user space to read it and to, probably, persist it in syslog journal file.

and now we have

- OK, there is a new kernel message. Let's try to wakeup klogd so it
  probably can persist it in syslog journal file while, in parallel, we
  will print it to the consoles (if !suspended and if can use console).

Opinions?

> Note that we need Tejun's patch for-4.16 because it fixes a potential
> race introduced by the console waiter logic.

I understand it. I think I put my Reviewed-by on v1, maybe not...
Here it is

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

But can we may be come up with a more general solution? Like the one
that we discussed above. I've seen some user space klogd lost messages
reports recently. I assume it was partly caused by that very late
wakeup that we have.

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-09  3:28     ` Sergey Senozhatsky
@ 2018-02-09 10:39       ` Petr Mladek
  2018-02-10  7:33         ` Sergey Senozhatsky
  2018-02-09 10:47       ` Petr Mladek
  1 sibling, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2018-02-09 10:39 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, linux-kernel, Tejun Heo

On Fri 2018-02-09 12:28:30, Sergey Senozhatsky wrote:
> On (02/08/18 17:48), Petr Mladek wrote:
> By postponing klogd wakeup we don't really address logbuf_lock
> contention. We have no guarantees that no new printk will come
> while klogd is active. Besides, consoles don't really delay
> klogd - I tend to ignore the impact of msg_print_text(), it should
> be fast. We call console drivers outside of logbuf_lock scope, so
> everything should fine (tm).

First, I am all for waking klogd earlier.

Second, sigh, I do not have much experience with kernel performace issues.
It is very likely that we really do not need to mind about it.
But the following questions come to my mind:

1. Yes, new printk() might come when klogd is active. But how often
   this happens? But if we wakeup klogd before handling consoles,
   we basically schedule two parallel readers. There might be quite
   big chance of a contention even if not serious. But maybe not.
   Wakeup != running. In practice, it might take some time until
   klogd processes are scheduled and running.


2. If we wake klogd processes then they might get scheduled instead
   of the process calling printk(). This might happen if klogd
   processes have higher priority than the process calling
   printk(). As a result, this might delay getting messages
   to the console.

   Of course, this might happen already now. But we probably
   should not make it much more likely ?


> Another question - do we need to wake it up from console_unlock()?
> 
> Basically,
> - if consoles are suspended, we also "suspend" user space klogd.
>   Does it really make sense?
> 
> - if console_lock is acquired by a preemptible task and that task
>   is getting scheduled out for a long time (OOM, etc) then we postpone
>   user space logging for unknown period of time. First the console_lock
>   will have to flush pending messages and only afterwards it will wakeup
>   klogd. Does it really make sense?
> 
> - If current console_lock owner jumps to retry (new pending messages
>   were appended to the logbuf) label, user space klogd wakeup is getting
>   postponed even further.
> 
> So, the final question is - since there in only one legitimate way
> (modulo user space writes to kmsg) to append new messages to the
> logbuf, shall we put klogd wakeup there? IOW, to vprintk_emit().

Good points! In fact, if we wakeup klogd before calling consoles,
we would need to do it for every new message. Otherwise, klogd
processes might miss new messages that are added in parallel
when console_lock is taken.

I mean that klogd might process the new messages faster
than console_unlock(). Then klogd might start sleeping while
new messages are still comming and are being handled by
the same console_unlock() call.


Now, your proposed solution looked fine. Just note that we do not need
seen_seq. vprintk_emit() knows when log_next_seq is increased.
It would always wake klogd in this case.

Anyway, I think about slightly different way that would prevent
scheduling klogd intead of the vprintk_emit() caller. The trick
is to do the wakeup with preemtion disabled. I mean something like:


diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index db4b9b8929eb..1bdfb6ed93f0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1899,9 +1899,13 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 */
 		preempt_disable();
 		/*
+		 * Wake klogd with disabled preemtion so that they can run
+		 * in parallel but they could not delay console_handling.
+		 */
+		wake_up_klogd();
+		/*
 		 * Try to acquire and then immediately release the console
-		 * semaphore.  The release will print out buffers and wake up
-		 * /dev/kmsg and syslog() users.
+		 * semaphore.  The release will print out buffers.
 		 */
 		if (console_trylock_spinning())
 			console_unlock();
@@ -2289,9 +2293,7 @@ void console_unlock(void)
 {
 	static char ext_text[CONSOLE_EXT_LOG_MAX];
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
-	static u64 seen_seq;
 	unsigned long flags;
-	bool wake_klogd = false;
 	bool do_cond_resched, retry;
 
 	if (console_suspended) {
@@ -2335,10 +2337,6 @@ void console_unlock(void)
 
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
-		if (seen_seq != log_next_seq) {
-			wake_klogd = true;
-			seen_seq = log_next_seq;
-		}
 
 		if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped **\n",
@@ -2429,9 +2427,6 @@ void console_unlock(void)
 
 	if (retry && console_trylock())
 		goto again;
-
-	if (wake_klogd)
-		wake_up_klogd();
 }
 EXPORT_SYMBOL(console_unlock);
 
@@ -2858,14 +2853,14 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
 	int pending = __this_cpu_xchg(printk_pending, 0);
 
+	/* Wake up loggers even for pending output. */
+	if (pending)
+		wake_up_interruptible(&log_wait);
+
 	if (pending & PRINTK_PENDING_OUTPUT) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
 }
 
 static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-09  3:28     ` Sergey Senozhatsky
  2018-02-09 10:39       ` Petr Mladek
@ 2018-02-09 10:47       ` Petr Mladek
  1 sibling, 0 replies; 13+ messages in thread
From: Petr Mladek @ 2018-02-09 10:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, linux-kernel, Tejun Heo

On Fri 2018-02-09 12:28:30, Sergey Senozhatsky wrote:
> On (02/08/18 17:48), Petr Mladek wrote:
> > Note that we need Tejun's patch for-4.16 because it fixes a potential
> > race introduced by the console waiter logic.
> 
> I understand it. I think I put my Reviewed-by on v1, maybe not...
> Here it is
> 
> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

You actually provided Reviewed-by for v1. But I was not sure if
you would agree with the updated commit log and pushing this for-4.16.

OK, I am going to push v2 into for-4.16-rc3 branch. I will push it
to Linus once I am back from vacation. If nobody is against it.


> But can we may be come up with a more general solution? Like the one
> that we discussed above. I've seen some user space klogd lost messages
> reports recently. I assume it was partly caused by that very late
> wakeup that we have.

Sure. A better solution makes sense. I am just not 100% sure about
the exact variant. But it is more or less about details that we
need to settle on.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-09 10:39       ` Petr Mladek
@ 2018-02-10  7:33         ` Sergey Senozhatsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2018-02-10  7:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	linux-kernel, Tejun Heo

Hello,

On (02/09/18 11:39), Petr Mladek wrote:
> On Fri 2018-02-09 12:28:30, Sergey Senozhatsky wrote:
> > On (02/08/18 17:48), Petr Mladek wrote:
> > By postponing klogd wakeup we don't really address logbuf_lock
> > contention. We have no guarantees that no new printk will come
> > while klogd is active. Besides, consoles don't really delay
> > klogd - I tend to ignore the impact of msg_print_text(), it should
> > be fast. We call console drivers outside of logbuf_lock scope, so
> > everything should fine (tm).
> 
> First, I am all for waking klogd earlier.
> 
> Second, sigh, I do not have much experience with kernel performace issues.
> It is very likely that we really do not need to mind about it.

I really don't think we need to bother.
klogd loggers are as important as consoles, we need to run them anyway.

> > Basically,
> > - if consoles are suspended, we also "suspend" user space klogd.
> >   Does it really make sense?
> > 
> > - if console_lock is acquired by a preemptible task and that task
> >   is getting scheduled out for a long time (OOM, etc) then we postpone
> >   user space logging for unknown period of time. First the console_lock
> >   will have to flush pending messages and only afterwards it will wakeup
> >   klogd. Does it really make sense?
> > 
> > - If current console_lock owner jumps to retry (new pending messages
> >   were appended to the logbuf) label, user space klogd wakeup is getting
> >   postponed even further.
> > 
> > So, the final question is - since there in only one legitimate way
> > (modulo user space writes to kmsg) to append new messages to the
> > logbuf, shall we put klogd wakeup there? IOW, to vprintk_emit().
> 
> Good points! In fact, if we wakeup klogd before calling consoles,
> we would need to do it for every new message. Otherwise, klogd
> processes might miss new messages that are added in parallel
> when console_lock is taken.
[..]
> Then klogd might start sleeping while new messages are still comming

Loggers sleep when `user->seq == log_next_seq' or when
`syslog_seq == log_next_seq'. We need to just wakeup them.
Once woken up they will check the condition again, if there
are no new messages, they will schedule.

> Now, your proposed solution looked fine. Just note that we do not need
> seen_seq. vprintk_emit() knows when log_next_seq is increased.
> It would always wake klogd in this case.

Probably can wake_up_klogd() unconditionally.

> Anyway, I think about slightly different way that would prevent
> scheduling klogd intead of the vprintk_emit() caller. The trick
> is to do the wakeup with preemtion disabled. I mean something like:
[..]
> @@ -1899,9 +1899,13 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		 */
>  		preempt_disable();
>  		/*
> +		 * Wake klogd with disabled preemtion so that they can run
> +		 * in parallel but they could not delay console_handling.
> +		 */
> +		wake_up_klogd();
> +		/*
>  		 * Try to acquire and then immediately release the console
> -		 * semaphore.  The release will print out buffers and wake up
> -		 * /dev/kmsg and syslog() users.
> +		 * semaphore.  The release will print out buffers.
>  		 */
>  		if (console_trylock_spinning())
>  			console_unlock();
[..]

It doesn't wakeup loggers for printk_deferred() output this way.
I want to wakeup them for every new logbuf message.

If logger preempts current printk() that it's for 1 message only.
printk() will return back and finish printing. If there will be another
printk()-s from other CPUs, then one of those CPUs will print messages
to the consoles - CPU that woke up logger had not acquired console_sem
before it was preempted by logger.

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-08 13:04 [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock() Petr Mladek
  2018-02-08 14:53 ` Sergey Senozhatsky
@ 2018-02-19 15:58 ` Petr Mladek
  2018-02-19 16:01   ` [PATCH v3] " Petr Mladek
  2018-02-26  6:27   ` [PATCH v2] " Sergey Senozhatsky
  1 sibling, 2 replies; 13+ messages in thread
From: Petr Mladek @ 2018-02-19 15:58 UTC (permalink / raw)
  To: Sergey Senozhatsky, Steven Rostedt
  Cc: Sergey Senozhatsky, linux-kernel, Tejun Heo

On Thu 2018-02-08 14:04:02, Petr Mladek wrote:
> From: Tejun Heo <tj@kernel.org>
> 
> We mark for waking up klogd whenever we see a new message sequence in
> the main loop.  However, the actual wakeup is always at the end of the
> function and we can easily test for the wakeup condition when we do
> the final should-we-repeat check.
> 
> Move the wake_klogd condition check out of the main loop.  This avoids
> doing the same thing repeatedly and groups similar checks into a
> common place.
> 
> This fixes a race introduced by the commit dbdda842fe96f8932 ("printk: Add
> console owner and waiter logic to load balance console writes").
> The current console owner might process the newly added message before
> the related printk() start waiting for the console lock. Then the lock
> is passed without waking klogd. The new owner sees the already updated
> seen_seq and does not know that the wakeup is needed.
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index db4b9b8929eb..2682209b1c90 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2417,12 +2413,17 @@ void console_unlock(void)
>  	up_console_sem();
>  
>  	/*
> -	 * Someone could have filled up the buffer again, so re-check if there's
> -	 * something to flush. In case we cannot trylock the console_sem again,
> -	 * there's a new owner and the console_unlock() from them will do the
> -	 * flush, no worries.
> +	 * Check whether userland needs notification.  Also, someone could
> +	 * have filled up the buffer again, so re-check if there's
> +	 * something to flush. In case we cannot trylock the console_sem
> +	 * again, there's a new owner and the console_unlock() from them
> +	 * will do the flush, no worries.
>  	 */
>  	raw_spin_lock(&logbuf_lock);
> +	if (seen_seq != log_next_seq) {
> +		wake_klogd = true;
> +		seen_seq = log_next_seq;

Sigh, there is actually still a race with console_trylock_spinning().
We might see the updated log_next_seq here while the related
printk() might steal the lock in retry path.

The simplest solution seems to be to do this only when !retry.
I am going to send v3.

> +	}
>  	retry = console_seq != log_next_seq;
>  	raw_spin_unlock(&logbuf_lock);
>  	printk_safe_exit_irqrestore(flags);

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-19 15:58 ` Petr Mladek
@ 2018-02-19 16:01   ` " Petr Mladek
  2018-02-26  6:37     ` Sergey Senozhatsky
  2018-02-26  6:27   ` [PATCH v2] " Sergey Senozhatsky
  1 sibling, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2018-02-19 16:01 UTC (permalink / raw)
  To: Sergey Senozhatsky, Steven Rostedt
  Cc: Sergey Senozhatsky, linux-kernel, Tejun Heo

From: Tejun Heo <tj@kernel.org>

We mark for waking up klogd whenever we see a new message sequence in
the main loop.  However, the actual wakeup is always at the end of the
function and we can easily test for the wakeup condition when we do
the final should-we-repeat check.

Move the wake_klogd condition check out of the main loop.  This avoids
doing the same thing repeatedly and groups similar checks into a
common place.

This fixes a race introduced by the commit dbdda842fe96f8932 ("printk: Add
console owner and waiter logic to load balance console writes").
The current console owner might process the newly added message before
the related printk() start waiting for the console lock. Then the lock
is passed without waking klogd. The new owner sees the already updated
seen_seq and does not know that the wakeup is needed.

Fixes: dbdda842fe96f8932 ("printk: Add console owner and waiter logic to load balance console writes")
Signed-off-by: Tejun Heo <tj@kernel.org>
[pmladek@suse.com: Fix and describe a new race with console waiter]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
Changes against v2:

+ check wake_klogd only when really leaving

 kernel/printk/printk.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index db4b9b8929eb..aaa28c1409ab 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2335,10 +2335,6 @@ void console_unlock(void)
 
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
-		if (seen_seq != log_next_seq) {
-			wake_klogd = true;
-			seen_seq = log_next_seq;
-		}
 
 		if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped **\n",
@@ -2416,14 +2412,22 @@ void console_unlock(void)
 
 	up_console_sem();
 
+	raw_spin_lock(&logbuf_lock);
 	/*
 	 * Someone could have filled up the buffer again, so re-check if there's
 	 * something to flush. In case we cannot trylock the console_sem again,
 	 * there's a new owner and the console_unlock() from them will do the
 	 * flush, no worries.
 	 */
-	raw_spin_lock(&logbuf_lock);
 	retry = console_seq != log_next_seq;
+	/*
+	 * Check whether userland needs notification. Do this only when really
+	 * leaving to avoid race with console_trylock_spinning().
+	 */
+	if (seen_seq != log_next_seq && !retry) {
+		wake_klogd = true;
+		seen_seq = log_next_seq;
+	}
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
-- 
2.13.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-19 15:58 ` Petr Mladek
  2018-02-19 16:01   ` [PATCH v3] " Petr Mladek
@ 2018-02-26  6:27   ` " Sergey Senozhatsky
  1 sibling, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2018-02-26  6:27 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel, Tejun Heo

Hello,

Sorry for the delay. Could not reply sooner.


On (02/19/18 16:58), Petr Mladek wrote:
[..]
> > @@ -2417,12 +2413,17 @@ void console_unlock(void)
> >  	up_console_sem();
> >  
> >  	/*
> > -	 * Someone could have filled up the buffer again, so re-check if there's
> > -	 * something to flush. In case we cannot trylock the console_sem again,
> > -	 * there's a new owner and the console_unlock() from them will do the
> > -	 * flush, no worries.
> > +	 * Check whether userland needs notification.  Also, someone could
> > +	 * have filled up the buffer again, so re-check if there's
> > +	 * something to flush. In case we cannot trylock the console_sem
> > +	 * again, there's a new owner and the console_unlock() from them
> > +	 * will do the flush, no worries.
> >  	 */
> >  	raw_spin_lock(&logbuf_lock);
> > +	if (seen_seq != log_next_seq) {
> > +		wake_klogd = true;
> > +		seen_seq = log_next_seq;
> 
> Sigh, there is actually still a race with console_trylock_spinning().

I see what you are talking about. Good catch.

> The simplest solution seems to be to do this only when !retry.

Yeah, offloading makes any "internal state" flags almost impossible to
use. We either need to upgrade those "internal state" flags to "global
state" flags, or to add more complex checks/race conditions workarounds
and so on. So I think that !retry should work. But, honestly, the whole
wakeup_klogd thing is getting too complex to like it.

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-19 16:01   ` [PATCH v3] " Petr Mladek
@ 2018-02-26  6:37     ` Sergey Senozhatsky
  2018-02-26 15:57       ` Petr Mladek
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2018-02-26  6:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	linux-kernel, Tejun Heo

On (02/19/18 17:01), Petr Mladek wrote:
[..]
> -	raw_spin_lock(&logbuf_lock);
>  	retry = console_seq != log_next_seq;
> +	/*
> +	 * Check whether userland needs notification. Do this only when really
> +	 * leaving to avoid race with console_trylock_spinning().
> +	 */
> +	if (seen_seq != log_next_seq && !retry) {
> +		wake_klogd = true;
> +		seen_seq = log_next_seq;
> +	}

Let's add the "why" part. This "!retry" might be hard to understand. We
are looking at

- CPUa is about to leave console_unlock()
- printk on CPUb appends a new message
- CPUa detects that `console_seq != log_next_seq', updates `seen_seq'
- printk on CPUb is getting preempted
- CPUa re-takes the console_sem via retry path
- printk CPUb is becoming TASK_RUNNING again - it now spins for console_sem,
  since we have an active console_sem owner
- CPUa detects that there is a console_sem waiter, so it offloads the
  printing task, without ever waking up klogd


Either we can have that complex "seen_seq != log_next_seq && !retry"
check - or we simply can add

	if (console_lock_spinning_disable_and_check()) {
		printk_safe_exit_irqrestore(flags);
		if (wake_klogd)
			wake_up_klogd();
	}

to the offloading return path.

The later is *may be* simpler to follow. The rule is: every
!console_suspend and !cant-use-consoles return path from console_unlock()
must wake_up_klogd() [if needed].

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-26  6:37     ` Sergey Senozhatsky
@ 2018-02-26 15:57       ` Petr Mladek
  2018-02-26 16:01         ` Sergey Senozhatsky
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2018-02-26 15:57 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, linux-kernel, Tejun Heo

On Mon 2018-02-26 15:37:35, Sergey Senozhatsky wrote:
> On (02/19/18 17:01), Petr Mladek wrote:
> [..]
> > -	raw_spin_lock(&logbuf_lock);
> >  	retry = console_seq != log_next_seq;
> > +	/*
> > +	 * Check whether userland needs notification. Do this only when really
> > +	 * leaving to avoid race with console_trylock_spinning().
> > +	 */
> > +	if (seen_seq != log_next_seq && !retry) {
> > +		wake_klogd = true;
> > +		seen_seq = log_next_seq;
> > +	}
> 
> Let's add the "why" part. This "!retry" might be hard to understand. We
> are looking at
> 
> - CPUa is about to leave console_unlock()
> - printk on CPUb appends a new message
> - CPUa detects that `console_seq != log_next_seq', updates `seen_seq'
> - printk on CPUb is getting preempted
> - CPUa re-takes the console_sem via retry path
> - printk CPUb is becoming TASK_RUNNING again - it now spins for console_sem,
>   since we have an active console_sem owner
> - CPUa detects that there is a console_sem waiter, so it offloads the
>   printing task, without ever waking up klogd

Yes, this is the scenario. I agree that it is very tricky.

> Either we can have that complex "seen_seq != log_next_seq && !retry"
> check - or we simply can add
> 
> 	if (console_lock_spinning_disable_and_check()) {
> 		printk_safe_exit_irqrestore(flags);
> 		if (wake_klogd)
> 			wake_up_klogd();
> 	}
> 
> to the offloading return path.
> 
> The later is *may be* simpler to follow. The rule is: every
> !console_suspend and !cant-use-consoles return path from console_unlock()
> must wake_up_klogd() [if needed].

Yes, this looks more straighforward. It might cause earlier and more
often wakeup but we want to go this way anyway. I do not see any
real problem with it.

Okay, what about the following patch for 4.16-rc4?

>From 7450da0e7e0d7ee5926f09c8bbbaa44c67ae0b48 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Mon, 26 Feb 2018 15:44:20 +0100
Subject: [PATCH] printk: Wake klogd when passing console_lock owner

wake_klogd is a local variable in console_unlock(). The information
is lost when the console_lock owner using the busy wait added by
the commit dbdda842fe96f8932 ("printk: Add console owner and waiter
logic to load balance console writes"). The following race is
possible:

CPU0				CPU1
console_unlock()

  for (;;)
     /* calling console for last message */

				printk()
				  log_store()
				    log_next_seq++;

     /* see new message */
     if (seen_seq != log_next_seq) {
	wake_klogd = true;
	seen_seq = log_next_seq;
     }

     console_lock_spinning_enable();

				  if (console_trylock_spinning())
				     /* spinning */

     if (console_lock_spinning_disable_and_check()) {
	printk_safe_exit_irqrestore(flags);
	return;

				  console_unlock()
				    if (seen_seq != log_next_seq) {
				    /* already seen */
				    /* nothing to do */

Result: Nobody would wakeup klogd.

One solution would be to make a global variable from wake_klogd.
But then we would need to manipulate it under a lock or so.

This patch wakes klogd also when console_lock is passed to the
spinning waiter. It looks like the right way to go. Also userspace
should have a chance to see and store any "flood" of messages.

Note that the very late klogd wake up was a historic solution.
It made sense on single CPU systems or when sys_syslog() operations
were synchronized using the big kernel lock like in v2.1.113.
But it is questionable these days.

Fixes: dbdda842fe96f8932 ("printk: Add console owner and waiter logic to load balance console writes")
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fc1123583fa6..f274fbef821d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2397,7 +2397,7 @@ void console_unlock(void)
 
 		if (console_lock_spinning_disable_and_check()) {
 			printk_safe_exit_irqrestore(flags);
-			return;
+			goto out;
 		}
 
 		printk_safe_exit_irqrestore(flags);
@@ -2430,6 +2430,7 @@ void console_unlock(void)
 	if (retry && console_trylock())
 		goto again;
 
+out:
 	if (wake_klogd)
 		wake_up_klogd();
 }
-- 
2.13.6

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] printk: Relocate wake_klogd check close to the end of console_unlock()
  2018-02-26 15:57       ` Petr Mladek
@ 2018-02-26 16:01         ` Sergey Senozhatsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2018-02-26 16:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	linux-kernel, Tejun Heo

On (02/26/18 16:57), Petr Mladek wrote:
[..]
> 
> Okay, what about the following patch for 4.16-rc4?

Looks good!

[..]
> Fixes: dbdda842fe96f8932 ("printk: Add console owner and waiter logic to load balance console writes")
> Suggested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Signed-off-by: Petr Mladek <pmladek@suse.com>

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-08 13:04 [PATCH v2] printk: Relocate wake_klogd check close to the end of console_unlock() Petr Mladek
2018-02-08 14:53 ` Sergey Senozhatsky
2018-02-08 16:48   ` Petr Mladek
2018-02-09  3:28     ` Sergey Senozhatsky
2018-02-09 10:39       ` Petr Mladek
2018-02-10  7:33         ` Sergey Senozhatsky
2018-02-09 10:47       ` Petr Mladek
2018-02-19 15:58 ` Petr Mladek
2018-02-19 16:01   ` [PATCH v3] " Petr Mladek
2018-02-26  6:37     ` Sergey Senozhatsky
2018-02-26 15:57       ` Petr Mladek
2018-02-26 16:01         ` Sergey Senozhatsky
2018-02-26  6:27   ` [PATCH v2] " Sergey Senozhatsky

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox