linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
@ 2018-01-10 13:24 Petr Mladek
  2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek
                   ` (2 more replies)
  0 siblings, 3 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-10 13:24 UTC (permalink / raw)
  To: Steven Rostedt, Sergey Senozhatsky
  Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo,
	Pavel Machek, linux-kernel, Petr Mladek

This is the last version of Steven's console owner/waiter logic.
Plus my proposal to hide it into 3 helper functions. It is supposed
to keep the code maintenable.

The handshake really works. It happens about 10-times even during
boot of a simple system in qemu with a fast console here. It is
definitely able to avoid some softlockups. Let's see if it is
enough in practice.

>From my point of view, it is ready to go into linux-next so that
it can get some more test coverage.

Steven's patch is the v4, see
https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home

Petr Mladek (1):
  printk: Hide console waiter logic into helpers

Steven Rostedt (1):
  printk: Add console owner and waiter logic to load balance console
    writes

 kernel/printk/printk.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 155 insertions(+), 1 deletion(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 140+ messages in thread

* [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-10 13:24 [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Petr Mladek
@ 2018-01-10 13:24 ` Petr Mladek
  2018-01-10 16:50   ` Steven Rostedt
                     ` (2 more replies)
  2018-01-10 13:24 ` [PATCH v5 2/2] printk: Hide console waiter logic into helpers Petr Mladek
  2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo
  2 siblings, 3 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-10 13:24 UTC (permalink / raw)
  To: Steven Rostedt, Sergey Senozhatsky
  Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo,
	Pavel Machek, linux-kernel

From: Steven Rostedt <rostedt@goodmis.org>

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.

Here's the design again:

I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.

There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.

In printk() when it tries to write to the consoles, we have:

	if (console_trylock())
		console_unlock();

Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.

When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.

If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.

Then the waiter calls console_unlock() and continues to write to the
consoles.

If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!

By Petr Mladek about possible new deadlocks:

The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."

We could look at it from this side. The possible deadlock would
look like:

CPU0                            CPU1

console_unlock()

  console_owner = current;

				spin_lockA()
				  printk()
				    spin = true;
				    while (...)

    call_console_drivers()
      spin_lockA()

This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.

But if the above is true than the following scenario was
already possible before:

CPU0

spin_lockA()
  printk()
    console_unlock()
      call_console_drivers()
	spin_lockA()

By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
---
 kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 107 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b9006617710f..7e6459abba43 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers);
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
 };
+static struct lockdep_map console_owner_dep_map = {
+	.name = "console_owner"
+};
 #endif
 
+static DEFINE_RAW_SPINLOCK(console_owner_lock);
+static struct task_struct *console_owner;
+static bool console_waiter;
+
 enum devkmsg_log_bits {
 	__DEVKMSG_LOG_BIT_ON = 0,
 	__DEVKMSG_LOG_BIT_OFF,
@@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * semaphore.  The release will print out buffers and wake up
 		 * /dev/kmsg and syslog() users.
 		 */
-		if (console_trylock())
+		if (console_trylock()) {
 			console_unlock();
+		} else {
+			struct task_struct *owner = NULL;
+			bool waiter;
+			bool spin = false;
+
+			printk_safe_enter_irqsave(flags);
+
+			raw_spin_lock(&console_owner_lock);
+			owner = READ_ONCE(console_owner);
+			waiter = READ_ONCE(console_waiter);
+			if (!waiter && owner && owner != current) {
+				WRITE_ONCE(console_waiter, true);
+				spin = true;
+			}
+			raw_spin_unlock(&console_owner_lock);
+
+			/*
+			 * If there is an active printk() writing to the
+			 * consoles, instead of having it write our data too,
+			 * see if we can offload that load from the active
+			 * printer, and do some printing ourselves.
+			 * Go into a spin only if there isn't already a waiter
+			 * spinning, and there is an active printer, and
+			 * that active printer isn't us (recursive printk?).
+			 */
+			if (spin) {
+				/* We spin waiting for the owner to release us */
+				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+				/* Owner will clear console_waiter on hand off */
+				while (READ_ONCE(console_waiter))
+					cpu_relax();
+
+				spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+				printk_safe_exit_irqrestore(flags);
+
+				/*
+				 * The owner passed the console lock to us.
+				 * Since we did not spin on console lock, annotate
+				 * this as a trylock. Otherwise lockdep will
+				 * complain.
+				 */
+				mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
+				console_unlock();
+				printk_safe_enter_irqsave(flags);
+			}
+			printk_safe_exit_irqrestore(flags);
+
+		}
 	}
 
 	return printed_len;
@@ -2141,6 +2196,7 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
+	bool waiter = false;
 	bool do_cond_resched, retry;
 
 	if (console_suspended) {
@@ -2229,14 +2285,64 @@ void console_unlock(void)
 		console_seq++;
 		raw_spin_unlock(&logbuf_lock);
 
+		/*
+		 * While actively printing out messages, if another printk()
+		 * were to occur on another CPU, it may wait for this one to
+		 * finish. This task can not be preempted if there is a
+		 * waiter waiting to take over.
+		 */
+		raw_spin_lock(&console_owner_lock);
+		console_owner = current;
+		raw_spin_unlock(&console_owner_lock);
+
+		/* The waiter may spin on us after setting console_owner */
+		spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
+
+		raw_spin_lock(&console_owner_lock);
+		waiter = READ_ONCE(console_waiter);
+		console_owner = NULL;
+		raw_spin_unlock(&console_owner_lock);
+
+		/*
+		 * If there is a waiter waiting for us, then pass the
+		 * rest of the work load over to that waiter.
+		 */
+		if (waiter)
+			break;
+
+		/* There was no waiter, and nothing will spin on us here */
+		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+
 		printk_safe_exit_irqrestore(flags);
 
 		if (do_cond_resched)
 			cond_resched();
 	}
+
+	/*
+	 * If there is an active waiter waiting on the console_lock.
+	 * Pass off the printing to the waiter, and the waiter
+	 * will continue printing on its CPU, and when all writing
+	 * has finished, the last printer will wake up klogd.
+	 */
+	if (waiter) {
+		WRITE_ONCE(console_waiter, false);
+		/* The waiter is now free to continue */
+		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+		/*
+		 * Hand off console_lock to waiter. The waiter will perform
+		 * the up(). After this, the waiter is the console_lock owner.
+		 */
+		mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
+		printk_safe_exit_irqrestore(flags);
+		/* Note, if waiter is set, logbuf_lock is not held */
+		return;
+	}
+
 	console_locked = 0;
 
 	/* Release the exclusive_console once it is used */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [PATCH v5 2/2] printk: Hide console waiter logic into helpers
  2018-01-10 13:24 [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Petr Mladek
  2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek
@ 2018-01-10 13:24 ` Petr Mladek
  2018-01-10 17:52   ` Steven Rostedt
  2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo
  2 siblings, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-10 13:24 UTC (permalink / raw)
  To: Steven Rostedt, Sergey Senozhatsky
  Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo,
	Pavel Machek, linux-kernel, Petr Mladek

The commit ("printk: Add console owner and waiter logic to load balance
console writes") made vprintk_emit() and console_unlock() even more
complicated.

This patch extracts the new code into 3 helper functions. They should
help to keep it rather self-contained. It will be easier to use and
maintain.

This patch just shuffles the existing code. It does not change
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 242 +++++++++++++++++++++++++++++--------------------
 1 file changed, 145 insertions(+), 97 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7e6459abba43..6217c280e6c1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers);
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
 };
-static struct lockdep_map console_owner_dep_map = {
-	.name = "console_owner"
-};
 #endif
 
-static DEFINE_RAW_SPINLOCK(console_owner_lock);
-static struct task_struct *console_owner;
-static bool console_waiter;
-
 enum devkmsg_log_bits {
 	__DEVKMSG_LOG_BIT_ON = 0,
 	__DEVKMSG_LOG_BIT_OFF,
@@ -1551,6 +1544,143 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
 }
 
 /*
+ * Special console_lock variants that help to reduce the risk of soft-lockups.
+ * They allow to pass console_lock to another printk() call using a busy wait.
+ */
+
+#ifdef CONFIG_LOCKDEP
+static struct lockdep_map console_owner_dep_map = {
+	.name = "console_owner"
+};
+#endif
+
+static DEFINE_RAW_SPINLOCK(console_owner_lock);
+static struct task_struct *console_owner;
+static bool console_waiter;
+
+/**
+ * console_lock_spinning_enable - mark beginning of code where another
+ *	thread might safely busy wait
+ *
+ * This might be called in sections where the current console_lock owner
+ * cannot sleep. It is a signal that another thread might start busy
+ * waiting for console_lock.
+ */
+static void console_lock_spinning_enable(void)
+{
+	raw_spin_lock(&console_owner_lock);
+	console_owner = current;
+	raw_spin_unlock(&console_owner_lock);
+
+	/* The waiter may spin on us after setting console_owner */
+	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+}
+
+/**
+ * console_lock_spinning_disable_and_check - mark end of code where another
+ *	thread was able to busy wait and check if there is a waiter
+ *
+ * This is called at the end of section when spinning was enabled by
+ * console_lock_spinning_enable(). It has two functions. First, it
+ * is a signal that it is not longer safe to start busy waiting
+ * for the lock. Second, it checks if there is a busy waiter and
+ * passes the lock rights to her.
+ *
+ * Important: Callers lose the lock if there was the busy waiter.
+ *	They must not longer touch items synchornized by console_lock
+ *	in this case.
+ *
+ * Return: 1 if the lock rights were passed, 0 othrewise.
+ */
+static int console_lock_spinning_disable_and_check(void)
+{
+	int waiter;
+
+	raw_spin_lock(&console_owner_lock);
+	waiter = READ_ONCE(console_waiter);
+	console_owner = NULL;
+	raw_spin_unlock(&console_owner_lock);
+
+	if (!waiter) {
+		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+		return 0;
+	}
+
+	/* The waiter is now free to continue */
+	WRITE_ONCE(console_waiter, false);
+
+	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+
+	/*
+	 * Hand off console_lock to waiter. The waiter will perform
+	 * the up(). After this, the waiter is the console_lock owner.
+	 */
+	mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
+	return 1;
+}
+
+/**
+ * console_trylock_spinning - try to get console_lock by busy waiting
+ *
+ * This allows to busy wait for the console_lock when the current
+ * owner is running in a special marked sections. It means that
+ * the current owner is running and cannot reschedule until it
+ * is ready to loose the lock.
+ *
+ * Return: 1 if we got the lock, 0 othrewise
+ */
+static int console_trylock_spinning(void)
+{
+	struct task_struct *owner = NULL;
+	bool waiter;
+	bool spin = false;
+	unsigned long flags;
+
+	printk_safe_enter_irqsave(flags);
+
+	raw_spin_lock(&console_owner_lock);
+	owner = READ_ONCE(console_owner);
+	waiter = READ_ONCE(console_waiter);
+	if (!waiter && owner && owner != current) {
+		WRITE_ONCE(console_waiter, true);
+		spin = true;
+	}
+	raw_spin_unlock(&console_owner_lock);
+
+	/*
+	 * If there is an active printk() writing to the
+	 * consoles, instead of having it write our data too,
+	 * see if we can offload that load from the active
+	 * printer, and do some printing ourselves.
+	 * Go into a spin only if there isn't already a waiter
+	 * spinning, and there is an active printer, and
+	 * that active printer isn't us (recursive printk?).
+	 */
+	if (!spin) {
+		printk_safe_exit_irqrestore(flags);
+		return 0;
+	}
+
+	/* We spin waiting for the owner to release us */
+	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+	/* Owner will clear console_waiter on hand off */
+	while (READ_ONCE(console_waiter))
+		cpu_relax();
+	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+
+	printk_safe_exit_irqrestore(flags);
+	/*
+	 * The owner passed the console lock to us.
+	 * Since we did not spin on console lock, annotate
+	 * this as a trylock. Otherwise lockdep will
+	 * complain.
+	 */
+	mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
+
+	return 1;
+}
+
+/*
  * Call the console drivers, asking them to write out
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
@@ -1760,56 +1890,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * semaphore.  The release will print out buffers and wake up
 		 * /dev/kmsg and syslog() users.
 		 */
-		if (console_trylock()) {
+		if (console_trylock() || console_trylock_spinning())
 			console_unlock();
-		} else {
-			struct task_struct *owner = NULL;
-			bool waiter;
-			bool spin = false;
-
-			printk_safe_enter_irqsave(flags);
-
-			raw_spin_lock(&console_owner_lock);
-			owner = READ_ONCE(console_owner);
-			waiter = READ_ONCE(console_waiter);
-			if (!waiter && owner && owner != current) {
-				WRITE_ONCE(console_waiter, true);
-				spin = true;
-			}
-			raw_spin_unlock(&console_owner_lock);
-
-			/*
-			 * If there is an active printk() writing to the
-			 * consoles, instead of having it write our data too,
-			 * see if we can offload that load from the active
-			 * printer, and do some printing ourselves.
-			 * Go into a spin only if there isn't already a waiter
-			 * spinning, and there is an active printer, and
-			 * that active printer isn't us (recursive printk?).
-			 */
-			if (spin) {
-				/* We spin waiting for the owner to release us */
-				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
-				/* Owner will clear console_waiter on hand off */
-				while (READ_ONCE(console_waiter))
-					cpu_relax();
-
-				spin_release(&console_owner_dep_map, 1, _THIS_IP_);
-				printk_safe_exit_irqrestore(flags);
-
-				/*
-				 * The owner passed the console lock to us.
-				 * Since we did not spin on console lock, annotate
-				 * this as a trylock. Otherwise lockdep will
-				 * complain.
-				 */
-				mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
-				console_unlock();
-				printk_safe_enter_irqsave(flags);
-			}
-			printk_safe_exit_irqrestore(flags);
-
-		}
 	}
 
 	return printed_len;
@@ -1910,6 +1992,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
 static ssize_t msg_print_ext_body(char *buf, size_t size,
 				  char *dict, size_t dict_len,
 				  char *text, size_t text_len) { return 0; }
+static void console_lock_spinning_enable(void) { }
+static int console_lock_spinning_disable_and_check(void) { return 0; }
 static void call_console_drivers(const char *ext_text, size_t ext_len,
 				 const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg,
@@ -2196,7 +2280,6 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
-	bool waiter = false;
 	bool do_cond_resched, retry;
 
 	if (console_suspended) {
@@ -2291,31 +2374,16 @@ void console_unlock(void)
 		 * finish. This task can not be preempted if there is a
 		 * waiter waiting to take over.
 		 */
-		raw_spin_lock(&console_owner_lock);
-		console_owner = current;
-		raw_spin_unlock(&console_owner_lock);
-
-		/* The waiter may spin on us after setting console_owner */
-		spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+		console_lock_spinning_enable();
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
-		raw_spin_lock(&console_owner_lock);
-		waiter = READ_ONCE(console_waiter);
-		console_owner = NULL;
-		raw_spin_unlock(&console_owner_lock);
-
-		/*
-		 * If there is a waiter waiting for us, then pass the
-		 * rest of the work load over to that waiter.
-		 */
-		if (waiter)
-			break;
-
-		/* There was no waiter, and nothing will spin on us here */
-		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+		if (console_lock_spinning_disable_and_check()) {
+			printk_safe_exit_irqrestore(flags);
+			return;
+		}
 
 		printk_safe_exit_irqrestore(flags);
 
@@ -2323,26 +2391,6 @@ void console_unlock(void)
 			cond_resched();
 	}
 
-	/*
-	 * If there is an active waiter waiting on the console_lock.
-	 * Pass off the printing to the waiter, and the waiter
-	 * will continue printing on its CPU, and when all writing
-	 * has finished, the last printer will wake up klogd.
-	 */
-	if (waiter) {
-		WRITE_ONCE(console_waiter, false);
-		/* The waiter is now free to continue */
-		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
-		/*
-		 * Hand off console_lock to waiter. The waiter will perform
-		 * the up(). After this, the waiter is the console_lock owner.
-		 */
-		mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
-		printk_safe_exit_irqrestore(flags);
-		/* Note, if waiter is set, logbuf_lock is not held */
-		return;
-	}
-
 	console_locked = 0;
 
 	/* Release the exclusive_console once it is used */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 13:24 [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Petr Mladek
  2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek
  2018-01-10 13:24 ` [PATCH v5 2/2] printk: Hide console waiter logic into helpers Petr Mladek
@ 2018-01-10 14:05 ` Tejun Heo
  2018-01-10 16:29   ` Petr Mladek
  2018-01-10 18:05   ` Steven Rostedt
  2 siblings, 2 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 14:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote:
> This is the last version of Steven's console owner/waiter logic.
> Plus my proposal to hide it into 3 helper functions. It is supposed
> to keep the code maintenable.
> 
> The handshake really works. It happens about 10-times even during
> boot of a simple system in qemu with a fast console here. It is
> definitely able to avoid some softlockups. Let's see if it is
> enough in practice.
> 
> From my point of view, it is ready to go into linux-next so that
> it can get some more test coverage.
> 
> Steven's patch is the v4, see
> https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home

At least for now,

 Nacked-by: Tejun Heo <tj@kernel.org>

Maybe this can be a part of solution but it's really worrying how the
whole discussion around this subject is proceeding.  You guys are
trying to railroad actual problems.  Please address actual technical
problems.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo
@ 2018-01-10 16:29   ` Petr Mladek
  2018-01-10 17:02     ` Tejun Heo
                       ` (2 more replies)
  2018-01-10 18:05   ` Steven Rostedt
  1 sibling, 3 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-10 16:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed 2018-01-10 06:05:47, Tejun Heo wrote:
> On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote:
> > This is the last version of Steven's console owner/waiter logic.
> > Plus my proposal to hide it into 3 helper functions. It is supposed
> > to keep the code maintenable.
> > 
> > The handshake really works. It happens about 10-times even during
> > boot of a simple system in qemu with a fast console here. It is
> > definitely able to avoid some softlockups. Let's see if it is
> > enough in practice.
> > 
> > From my point of view, it is ready to go into linux-next so that
> > it can get some more test coverage.
> > 
> > Steven's patch is the v4, see
> > https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home
> 
> At least for now,
> 
>  Nacked-by: Tejun Heo <tj@kernel.org>
> 
> Maybe this can be a part of solution but it's really worrying how the
> whole discussion around this subject is proceeding.  You guys are
> trying to railroad actual problems.  Please address actual technical
> problems.

I wonder how long you follow the discussions about solving this
problem. I was able to find one old solution from Jan Kara that
was sent on January 15, 2013. You might google it by
"[PATCH] printk: Avoid softlockups in console_unlock()". For example,
it is archived at
http://linux-kernel.2935.n7.nabble.com/PATCH-printk-Avoid-softlockups-in-console-unlock-td581957.html

The historic Jan Kara's solution is actually very similar to your proposal at
https://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com

Why Jan Kara's Solution was not accepted?
Was it because he was not trying enough?

No, Jan provided several variants (based on workqueues, irqwork,
kthread), for example
https://lkml.kernel.org/r/1395770101-24534-1-git-send-email-jack@suse.cz
Also he discussed this on conferences, etc.

Later Jan handed over the fight to Sergey Senozhatsky, see
https://lkml.kernel.org/r/1457175338-1665-1-git-send-email-sergey.senozhatsky@gmail.com

Also Sergey was very active. He was addressing many issues, discussed
this on Kernel Summit twice.

Why is it not upstream?

All attempts up to v12 were blocked by someone (Andrew, Linus,
Pavel Machek, few others) because they did not guarantee enough
that the kthread would wake up and they would be able to see
the messages!

Sergey tried to address this by forcing synchronous mode in
some situations (panic, suspend, kexec, ...). But people
still complained.

One important milestone was v12, see
https://lkml.kernel.org/r/20160513131848.2087-1-sergey.senozhatsky@gmail.com
It was the last version where we did the offload immediately from
vprintk_emit().

The next versions used lazy offload from console_unlock() when
the thread spent there too much time. IMHO, this is one
very promising solution. It guarantees that softlockup
would never happen. But it tries hard to get the messages
out immediately.

Unfortunately, it is very complicated. We have troubles to understand
the concerns, for example see the long discussion about v3 at
https://lkml.kernel.org/r/20170509082859.854-1-sergey.senozhatsky@gmail.com
I admit that I did not have enough time to review this.


Anyway, in October, 2017, Steven came up with a completely
different approach (console owner/waiter transfer). It does
not guarantee that the softlockup will not happen. But it
does not suffer from the problem that blocked the obvious
solution for years. It moves the owner at runtime, so
it is guaranteed that the new owner would continue
printing.



Finally, no solution is perfect! There are contradicting requirements
on printk:

	get the messages out ASAP
               vs.
	do not block the system

The harder you try to get the messages out the more you could block
the entire system.

Where is the acceptable compromise? I am not sure. So far, the most
forceful people (Linus) did not see softlockups as a big problem.
They rather wanted to see the messages.


What could we do?

   + offload  -> not acceptable so far
   + lazy offload -> might be acceptable if done more easily or gets
		review

   + try to transfer console owner (Steven) -> helps in several
         situations, so far only hand made stress code failed

   + reduce amount of messages
         + does it make sense to print the same warning 1000-times?
         + could one long warning cause softlockup with the console
	   owner transfer?

   + throttle thread producing too many messages
         + IMHO, very good solution but nobody investigated it


This patchset really helps in many situations. I believe that it
does not make things worse. You might block it and spend another
long time discussing other solutions.

Will we need a better solution? Maybe, probably.

Is it possible to provide an acceptable solution using offload?
Probably using lazy offload. In a reasonable time frame with
a comparably low risk? Me not.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek
@ 2018-01-10 16:50   ` Steven Rostedt
  2018-01-12 16:54   ` Steven Rostedt
  2018-01-17  2:19   ` Byungchul Park
  2 siblings, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 16:50 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 14:24:17 +0100
Petr Mladek <pmladek@suse.com> wrote:

> From: Steven Rostedt <rostedt@goodmis.org>

Please remove the above From:, it will overwrite the one below which I
would prefer to have.

Thanks!

-- Steve

> 
> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> This patch implements what I discussed in Kernel Summit. I added
> lockdep annotation (hopefully correctly), and it hasn't had any splats
> (since I fixed some bugs in the first iterations). It did catch
> problems when I had the owner covering too much. But now that the owner
> is only set when actively calling the consoles, lockdep has stayed
> quiet.
>

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 16:29   ` Petr Mladek
@ 2018-01-10 17:02     ` Tejun Heo
  2018-01-10 18:21       ` Peter Zijlstra
                         ` (4 more replies)
  2018-01-10 18:54     ` Steven Rostedt
  2018-01-11  5:10     ` Sergey Senozhatsky
  2 siblings, 5 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 17:02 UTC (permalink / raw)
  To: Petr Mladek, Linus Torvalds, akpm
  Cc: Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Pavel Machek, linux-kernel

Hello, Linus, Andrew.

On Wed, Jan 10, 2018 at 05:29:00PM +0100, Petr Mladek wrote:
> Where is the acceptable compromise? I am not sure. So far, the most
> forceful people (Linus) did not see softlockups as a big problem.
> They rather wanted to see the messages.

Can you please chime in?  Would you be opposed to offloading to an
independent context even if it were only for cases where we were
already punting?  The thing with the current offloading is that we
don't know who we're offloading to.  It might end up in faster or
slower context, or more importantly a dangerous one.

The particular case that we've been seeing regularly in the fleet was
the following scenario.

1. Console is IPMI emulated serial console.  Super slow.  Also
   netconsole is in use.
2. System runs out of memory, OOM triggers.
3. OOM handler is printing out OOM debug info.
4. While trying to emit the messages for netconsole, the network stack
   / driver tries to allocate memory and then fail, which in turn
   triggers allocation failure or other warning messages.  printk was
   already flushing, so the messages are queued on the ring.
5. OOM handler keeps flushing but 4 repeats and the queue is never
   shrinking.  Because OOM handler is trapped in printk flushing, it
   never manages to free memory and no one else can enter OOM path
   either, so the system is trapped in this state.

The system usually never recovers in time once this sort of condition
hits and the following was the patch that I suggested which only punts
when messages are already being punted and we can easily make it less
punty by delaying the punting by N messages.

 http://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com

We definitely can fix the above described case by e.g. preventing
printk flushing task from queueing more messages or whatever, but it
just seems really dumb for the system to die from things like this in
general and it doesn't really take all that much to trigger the
condition.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers
  2018-01-10 13:24 ` [PATCH v5 2/2] printk: Hide console waiter logic into helpers Petr Mladek
@ 2018-01-10 17:52   ` Steven Rostedt
  2018-01-11 12:03     ` Petr Mladek
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 17:52 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 14:24:18 +0100
Petr Mladek <pmladek@suse.com> wrote:

> The commit ("printk: Add console owner and waiter logic to load balance
> console writes") made vprintk_emit() and console_unlock() even more
> complicated.
> 
> This patch extracts the new code into 3 helper functions. They should
> help to keep it rather self-contained. It will be easier to use and
> maintain.
> 
> This patch just shuffles the existing code. It does not change
> the functionality.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/printk/printk.c | 242 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 145 insertions(+), 97 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 7e6459abba43..6217c280e6c1 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers);
>  static struct lockdep_map console_lock_dep_map = {
>  	.name = "console_lock"
>  };
> -static struct lockdep_map console_owner_dep_map = {
> -	.name = "console_owner"
> -};
>  #endif
>  
> -static DEFINE_RAW_SPINLOCK(console_owner_lock);
> -static struct task_struct *console_owner;
> -static bool console_waiter;
> -
>  enum devkmsg_log_bits {
>  	__DEVKMSG_LOG_BIT_ON = 0,
>  	__DEVKMSG_LOG_BIT_OFF,
> @@ -1551,6 +1544,143 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
>  }
>  
>  /*
> + * Special console_lock variants that help to reduce the risk of soft-lockups.
> + * They allow to pass console_lock to another printk() call using a busy wait.
> + */
> +
> +#ifdef CONFIG_LOCKDEP
> +static struct lockdep_map console_owner_dep_map = {
> +	.name = "console_owner"
> +};
> +#endif
> +
> +static DEFINE_RAW_SPINLOCK(console_owner_lock);
> +static struct task_struct *console_owner;
> +static bool console_waiter;
> +
> +/**
> + * console_lock_spinning_enable - mark beginning of code where another
> + *	thread might safely busy wait
> + *
> + * This might be called in sections where the current console_lock owner


"might be"? It has to be called in sections where the current
console_lock owner can not sleep. It's basically saying "console lock is
now acting like a spinlock".

> + * cannot sleep. It is a signal that another thread might start busy
> + * waiting for console_lock.
> + */
> +static void console_lock_spinning_enable(void)
> +{
> +	raw_spin_lock(&console_owner_lock);
> +	console_owner = current;
> +	raw_spin_unlock(&console_owner_lock);
> +
> +	/* The waiter may spin on us after setting console_owner */
> +	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> +}
> +
> +/**
> + * console_lock_spinning_disable_and_check - mark end of code where another
> + *	thread was able to busy wait and check if there is a waiter
> + *
> + * This is called at the end of section when spinning was enabled by
> + * console_lock_spinning_enable(). It has two functions. First, it

"This is called at the end of the section where spinning is allowed."

> + * is a signal that it is not longer safe to start busy waiting

	"it is no longer safe"

> + * for the lock. Second, it checks if there is a busy waiter and
> + * passes the lock rights to her.
> + *
> + * Important: Callers lose the lock if there was the busy waiter.
> + *	They must not longer touch items synchornized by console_lock

	"They must not touch items ..."

> + *	in this case.
> + *
> + * Return: 1 if the lock rights were passed, 0 othrewise.

						"otherwise"

> + */
> +static int console_lock_spinning_disable_and_check(void)
> +{
> +	int waiter;
> +
> +	raw_spin_lock(&console_owner_lock);
> +	waiter = READ_ONCE(console_waiter);
> +	console_owner = NULL;
> +	raw_spin_unlock(&console_owner_lock);
> +
> +	if (!waiter) {
> +		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> +		return 0;
> +	}
> +
> +	/* The waiter is now free to continue */
> +	WRITE_ONCE(console_waiter, false);
> +
> +	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> +
> +	/*
> +	 * Hand off console_lock to waiter. The waiter will perform
> +	 * the up(). After this, the waiter is the console_lock owner.
> +	 */
> +	mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
> +	return 1;
> +}
> +
> +/**
> + * console_trylock_spinning - try to get console_lock by busy waiting
> + *
> + * This allows to busy wait for the console_lock when the current
> + * owner is running in a special marked sections. It means that
> + * the current owner is running and cannot reschedule until it
> + * is ready to loose the lock.
> + *
> + * Return: 1 if we got the lock, 0 othrewise
> + */
> +static int console_trylock_spinning(void)
> +{
> +	struct task_struct *owner = NULL;
> +	bool waiter;
> +	bool spin = false;
> +	unsigned long flags;

Can we add here:

	if (console_trylock())
		return 1;

And then we can simplify the below from:

	if (console_trylock() || console_trylock_spinning())

to just

	if (console_trylock_spinning())

-- Steve

> +
> +	printk_safe_enter_irqsave(flags);
> +
> +	raw_spin_lock(&console_owner_lock);
> +	owner = READ_ONCE(console_owner);
> +	waiter = READ_ONCE(console_waiter);
> +	if (!waiter && owner && owner != current) {
> +		WRITE_ONCE(console_waiter, true);
> +		spin = true;
> +	}
> +	raw_spin_unlock(&console_owner_lock);
> +
> +	/*
> +	 * If there is an active printk() writing to the
> +	 * consoles, instead of having it write our data too,
> +	 * see if we can offload that load from the active
> +	 * printer, and do some printing ourselves.
> +	 * Go into a spin only if there isn't already a waiter
> +	 * spinning, and there is an active printer, and
> +	 * that active printer isn't us (recursive printk?).
> +	 */
> +	if (!spin) {
> +		printk_safe_exit_irqrestore(flags);
> +		return 0;
> +	}
> +
> +	/* We spin waiting for the owner to release us */
> +	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> +	/* Owner will clear console_waiter on hand off */
> +	while (READ_ONCE(console_waiter))
> +		cpu_relax();
> +	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> +
> +	printk_safe_exit_irqrestore(flags);
> +	/*
> +	 * The owner passed the console lock to us.
> +	 * Since we did not spin on console lock, annotate
> +	 * this as a trylock. Otherwise lockdep will
> +	 * complain.
> +	 */
> +	mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
> +
> +	return 1;
> +}
> +
> +/*
>   * Call the console drivers, asking them to write out
>   * log_buf[start] to log_buf[end - 1].
>   * The console_lock must be held.
> @@ -1760,56 +1890,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		 * semaphore.  The release will print out buffers and wake up
>  		 * /dev/kmsg and syslog() users.
>  		 */
> -		if (console_trylock()) {
> +		if (console_trylock() || console_trylock_spinning())
>  			console_unlock();
>

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo
  2018-01-10 16:29   ` Petr Mladek
@ 2018-01-10 18:05   ` Steven Rostedt
  2018-01-10 18:12     ` Tejun Heo
  2018-01-11  4:58     ` Sergey Senozhatsky
  1 sibling, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 18:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 06:05:47 -0800
Tejun Heo <tj@kernel.org> wrote:

> On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote:
> > This is the last version of Steven's console owner/waiter logic.
> > Plus my proposal to hide it into 3 helper functions. It is supposed
> > to keep the code maintenable.
> > 
> > The handshake really works. It happens about 10-times even during
> > boot of a simple system in qemu with a fast console here. It is
> > definitely able to avoid some softlockups. Let's see if it is
> > enough in practice.
> > 
> > From my point of view, it is ready to go into linux-next so that
> > it can get some more test coverage.
> > 
> > Steven's patch is the v4, see
> > https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home  
> 
> At least for now,
> 
>  Nacked-by: Tejun Heo <tj@kernel.org>

And I NACK your NACK!

> 
> Maybe this can be a part of solution but it's really worrying how the
> whole discussion around this subject is proceeding.  You guys are
> trying to railroad actual problems.  Please address actual technical
> problems.

WE ARE!

I presented the issue at Kernel Summit and everyone agreed with me that
the issue my patch solves is a real issue. You have yet to demonstrate
how this does not solve issues.

I presented the history of printk, where it use to serialize all
printks. This was a problem when you had n CPUs doing printks at the
same time, because the n'th CPU had to wait for the n-1 CPUs to print
before it could. This was obviously an issue.

The "solution" to that was to have the first printk do the printing,
and all other printks that come in while it is printing just load their
data into the log buffer and continue. The first printk would get stuck
printing for everyone else. This was fine when we had 4 CPUs, but now
that we have boxes with 100s of CPUs, this is definitely an issue. I
demonstrated that this caused printk() to be unbounded, and there were
real word scenarios that could easily cause a printk to never stop
printing.

My solution is to make printk() have a max bounded time to print. This
is how we solve things in the Real Time world, and it makes perfect
sense in this context. The point being, the max a printk() could
print, and that is if it was really unlucky, which would be really
unlikely because it would mean we had a burst of printks followed by no
printks, the bounded time is what it takes to print the entire buffer.

My solution takes printk from its current unbounded state, and makes it
fixed bounded. Which means printk() is now a O(1) algorithm.

The solution is simple, everyone at KS agreed with it, there should be
no controversy here.

You on the other hand are showing unrealistic scenarios, and crying
that it's what you see in production, with no proof of it.

My printk solution is solid, with no risk of regressions of current
printk usages.

If anything, I'll pull theses patches myself, and push them to Linus
directly. I'll Cc you and you can make your argument to NACK them, and
I'll make mine to take them.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:05   ` Steven Rostedt
@ 2018-01-10 18:12     ` Tejun Heo
  2018-01-10 18:14       ` Tejun Heo
  2018-01-10 18:41       ` Steven Rostedt
  2018-01-11  4:58     ` Sergey Senozhatsky
  1 sibling, 2 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 18:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

Hello, Steven.

So, everything else on your message, sure.  You do what you have to
do, but I really don't understand the following part, and this has
been the main source of frustration in the whole discussion.

On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote:
> You on the other hand are showing unrealistic scenarios, and crying
> that it's what you see in production, with no proof of it.

I've explained the same scenario multiple times.  Unless you're
assuming that I'm lying, it should be amply clear that the scenario is
unrealistic - we've been seeing them taking place repeatedly for quite
a while.

What I don't understand is why we can't address this seemingly obvious
problem.  If there are technical reasons and the consensus is to not
solve this within flushing logic, sure, we can deal with it otherwise,
but we at least have to be able to agree that there are actual issues
here, no?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:12     ` Tejun Heo
@ 2018-01-10 18:14       ` Tejun Heo
  2018-01-10 18:45         ` Steven Rostedt
  2018-01-10 18:41       ` Steven Rostedt
  1 sibling, 1 reply; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 18:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, Jan 10, 2018 at 10:12:52AM -0800, Tejun Heo wrote:
> Hello, Steven.
> 
> So, everything else on your message, sure.  You do what you have to
> do, but I really don't understand the following part, and this has
> been the main source of frustration in the whole discussion.
> 
> On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote:
> > You on the other hand are showing unrealistic scenarios, and crying
> > that it's what you see in production, with no proof of it.
> 
> I've explained the same scenario multiple times.  Unless you're
> assuming that I'm lying, it should be amply clear that the scenario is
> unrealistic - we've been seeing them taking place repeatedly for quite
> a while.

Oops, I meant to write "not unrealistic".  Anyways, if you think I'm
lying, please let me know.  I can ask others who have been seeing the
issue to join the thread.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 17:02     ` Tejun Heo
@ 2018-01-10 18:21       ` Peter Zijlstra
  2018-01-10 18:30         ` Tejun Heo
  2018-01-11  5:15         ` Sergey Senozhatsky
  2018-01-10 18:22       ` Steven Rostedt
                         ` (3 subsequent siblings)
  4 siblings, 2 replies; 140+ messages in thread
From: Peter Zijlstra @ 2018-01-10 18:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>    / driver tries to allocate memory and then fail, which in turn
>    triggers allocation failure or other warning messages.  printk was
>    already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>    shrinking.  Because OOM handler is trapped in printk flushing, it
>    never manages to free memory and no one else can enter OOM path
>    either, so the system is trapped in this state.

Why not kill recursive OOM (msgs) ?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 17:02     ` Tejun Heo
  2018-01-10 18:21       ` Peter Zijlstra
@ 2018-01-10 18:22       ` Steven Rostedt
  2018-01-10 18:36         ` Tejun Heo
  2018-01-10 18:40       ` Mathieu Desnoyers
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 18:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Linus Torvalds, akpm, Sergey Senozhatsky, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 09:02:23 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hello, Linus, Andrew.
> 
> On Wed, Jan 10, 2018 at 05:29:00PM +0100, Petr Mladek wrote:
> > Where is the acceptable compromise? I am not sure. So far, the most
> > forceful people (Linus) did not see softlockups as a big problem.
> > They rather wanted to see the messages.  
> 
> Can you please chime in?  Would you be opposed to offloading to an
> independent context even if it were only for cases where we were
> already punting?  The thing with the current offloading is that we
> don't know who we're offloading to.  It might end up in faster or
> slower context, or more importantly a dangerous one.

And how is that different to what we have today? It could be the
"dangerous one" that did the first printk, and 100 other CPUs in "non
dangerous" locations are constantly calling printk and making that
"dangerous" one NEVER STOP.

My solution is, if there are a ton of printks going off, each one will
do a single print, and pass it to the next one. The printk will only be
stuck doing more than one message if no more printks happen. Which is a
good thing!

Again, my algorithm bounds printk to printing AT MOST the printk buffer
size. And that can only happen if there was a burst of printks on all
CPUs, and then no printks. The one to get handed off the printk would
just finish the buffer and continue. Which should not be an issue.

> 
> The particular case that we've been seeing regularly in the fleet was
> the following scenario.
> 
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>    netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>    / driver tries to allocate memory and then fail, which in turn
>    triggers allocation failure or other warning messages.  printk was
>    already flushing, so the messages are queued on the ring.

This looks like a bug in the netconsole, as the net console shouldn't
print warnings if the warning is caused by it doing a print.

Totally unrelated problem to my and Petr's patch set. Basically your
argument is "I see this bug, and your patch doesn't fix it". Well maybe
we are not solving your bug. Not to mention, it looks like printk isn't
the bug, but net console is.


> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>    shrinking.  Because OOM handler is trapped in printk flushing, it
>    never manages to free memory and no one else can enter OOM path
>    either, so the system is trapped in this state.
> 
> The system usually never recovers in time once this sort of condition
> hits and the following was the patch that I suggested which only punts
> when messages are already being punted and we can easily make it less
> punty by delaying the punting by N messages.
> 
>  http://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com
> 
> We definitely can fix the above described case by e.g. preventing
> printk flushing task from queueing more messages or whatever, but it
> just seems really dumb for the system to die from things like this in
> general and it doesn't really take all that much to trigger the
> condition.

It seems really dumb to not fix that recursive net console bug, and
try to solve it with a printk work around. 

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:21       ` Peter Zijlstra
@ 2018-01-10 18:30         ` Tejun Heo
  2018-01-10 18:41           ` Peter Zijlstra
  2018-01-11  5:15         ` Sergey Senozhatsky
  1 sibling, 1 reply; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 18:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

Hello, Peter.

On Wed, Jan 10, 2018 at 07:21:53PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> >    / driver tries to allocate memory and then fail, which in turn
> >    triggers allocation failure or other warning messages.  printk was
> >    already flushing, so the messages are queued on the ring.
> > 5. OOM handler keeps flushing but 4 repeats and the queue is never
> >    shrinking.  Because OOM handler is trapped in printk flushing, it
> >    never manages to free memory and no one else can enter OOM path
> >    either, so the system is trapped in this state.
> 
> Why not kill recursive OOM (msgs) ?

Sure, we can do that too, e.g. marking flushing thread and ignoring
new messages from it, although that does come with its own downsides.
The choices are

* If we can make printk safe without much downside, that'd be the best
  option.

* If we decide that we can't do that in a reasonable way, we sure can
  try to plug the identified cases.  We might have to play a bit of
  whack-a-mole (e.g. the feedback loop might not necessarily be from
  the same context) but there likely are very few repeatable cases.

It could be me not knowing the history of the discussion but up until
now the discussion hasn't really gotten to that point since I brought
up the case that we've been seeing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:22       ` Steven Rostedt
@ 2018-01-10 18:36         ` Tejun Heo
  0 siblings, 0 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 18:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Linus Torvalds, akpm, Sergey Senozhatsky, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

Hello,

On Wed, Jan 10, 2018 at 01:22:55PM -0500, Steven Rostedt wrote:
> > Can you please chime in?  Would you be opposed to offloading to an
> > independent context even if it were only for cases where we were
> > already punting?  The thing with the current offloading is that we
> > don't know who we're offloading to.  It might end up in faster or
> > slower context, or more importantly a dangerous one.
> 
> And how is that different to what we have today? It could be the
> "dangerous one" that did the first printk, and 100 other CPUs in "non
> dangerous" locations are constantly calling printk and making that
> "dangerous" one NEVER STOP.

So, the dangerous one would punt to the dedicated safe one beyond
certain point.  The posted version just flushes to the last message
that it saw on entry to flush.

> > The particular case that we've been seeing regularly in the fleet was
> > the following scenario.
> > 
> > 1. Console is IPMI emulated serial console.  Super slow.  Also
> >    netconsole is in use.
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> >    / driver tries to allocate memory and then fail, which in turn
> >    triggers allocation failure or other warning messages.  printk was
> >    already flushing, so the messages are queued on the ring.
> 
> This looks like a bug in the netconsole, as the net console shouldn't
> print warnings if the warning is caused by it doing a print.
> 
> Totally unrelated problem to my and Petr's patch set. Basically your
> argument is "I see this bug, and your patch doesn't fix it". Well maybe
> we are not solving your bug. Not to mention, it looks like printk isn't
> the bug, but net console is.

Sure, that could be the case, especially if punting to a safe context
can't be done reasonably (and there are downsides to silencing the
recursive messages too), but it'd also be really great to have printk
generaly safe from brining down a machine this way, right?  I just
don't yet see why punting to a safe context is so difficult /
undesirable that we can't solve the issue in a general manner.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 17:02     ` Tejun Heo
  2018-01-10 18:21       ` Peter Zijlstra
  2018-01-10 18:22       ` Steven Rostedt
@ 2018-01-10 18:40       ` Mathieu Desnoyers
  2018-01-11  7:36         ` Sergey Senozhatsky
  2018-01-24  9:36       ` Peter Zijlstra
  2018-05-09  8:58       ` Sergey Senozhatsky
  4 siblings, 1 reply; 140+ messages in thread
From: Mathieu Desnoyers @ 2018-01-10 18:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Linus Torvalds, Andrew Morton, rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Jan Kara, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

----- On Jan 10, 2018, at 12:02 PM, Tejun Heo tj@kernel.org wrote:

> Hello, Linus, Andrew.
> 
> On Wed, Jan 10, 2018 at 05:29:00PM +0100, Petr Mladek wrote:
>> Where is the acceptable compromise? I am not sure. So far, the most
>> forceful people (Linus) did not see softlockups as a big problem.
>> They rather wanted to see the messages.
> 
> Can you please chime in?  Would you be opposed to offloading to an
> independent context even if it were only for cases where we were
> already punting?  The thing with the current offloading is that we
> don't know who we're offloading to.  It might end up in faster or
> slower context, or more importantly a dangerous one.
> 
> The particular case that we've been seeing regularly in the fleet was
> the following scenario.
> 
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>   netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>   / driver tries to allocate memory and then fail, which in turn
>   triggers allocation failure or other warning messages.  printk was
>   already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>   shrinking.  Because OOM handler is trapped in printk flushing, it
>   never manages to free memory and no one else can enter OOM path
>   either, so the system is trapped in this state.

Hi Tejun,

There appears to be two problems at hand. One is making sure a console
buffer owner only flushes a bounded amount of data. Steven&Co patches
seem to address this.

The second problem you describe here appears to be related to the
side-effects of console drivers, namely netconsole in this scenario.
Its use of the network stack can allocate memory, which can fail, and
therefore trigger more printk. Having a way to detect that code is
directly called from a printk driver, and making sure error handling
is _not_ done by pushing more printk messages to that printk driver in
those situations comes to mind as a possible solution.

The problem you describe seems to be _another_ issue of the current
printk implementation which Steven's approach does not address, but
I don't think that Steven's changes prevent doing further improvements
on the netconsole driver front.

I also don't see what's wrong in the incremental approach proposed by
Steven. Even though it does not fix your console driver problem, his
patchset appears to address some real-world latency issues.

Thanks,

Mathieu

> 
> The system usually never recovers in time once this sort of condition
> hits and the following was the patch that I suggested which only punts
> when messages are already being punted and we can easily make it less
> punty by delaying the punting by N messages.
> 
> http://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com
> 
> We definitely can fix the above described case by e.g. preventing
> printk flushing task from queueing more messages or whatever, but it
> just seems really dumb for the system to die from things like this in
> general and it doesn't really take all that much to trigger the
> condition.
> 
> Thanks.
> 
> --
> tejun

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:30         ` Tejun Heo
@ 2018-01-10 18:41           ` Peter Zijlstra
  2018-01-10 19:05             ` Tejun Heo
  0 siblings, 1 reply; 140+ messages in thread
From: Peter Zijlstra @ 2018-01-10 18:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, Jan 10, 2018 at 10:30:55AM -0800, Tejun Heo wrote:
> > Why not kill recursive OOM (msgs) ?
> 
> Sure, we can do that too, e.g. marking flushing thread and ignoring
> new messages from it, although that does come with its own downsides.

Typically we (scheduler) have removed printk()s (on boot) when BIGSMP
folks say it creates boot pain. Much of it is now behind the sched_debug
parameter, others are compressed.

I've also seen other people reduce printk()s.

In general reducing printk() is a good thing, its a low bandwidth
channel for critical stuff like OOPSen and the like.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:12     ` Tejun Heo
  2018-01-10 18:14       ` Tejun Heo
@ 2018-01-10 18:41       ` Steven Rostedt
  2018-01-10 18:57         ` Tejun Heo
  1 sibling, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 18:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 10:12:52 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hello, Steven.
> 
> So, everything else on your message, sure.  You do what you have to
> do, but I really don't understand the following part, and this has
> been the main source of frustration in the whole discussion.
> 
> On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote:
> > You on the other hand are showing unrealistic scenarios, and crying
> > that it's what you see in production, with no proof of it.  
> 
> I've explained the same scenario multiple times.  Unless you're
> assuming that I'm lying, it should be amply clear that the scenario is
> unrealistic - we've been seeing them taking place repeatedly for quite
> a while.

The one scenario you did show was the recursive OOM messages, and as
Peter Zijlstra pointed out that's more of a bug in the net console than
a printk bug.

> 
> What I don't understand is why we can't address this seemingly obvious
> problem.  If there are technical reasons and the consensus is to not
> solve this within flushing logic, sure, we can deal with it otherwise,
> but we at least have to be able to agree that there are actual issues
> here, no?

The issue with the solution you want to do with printk is that it can
break existing printk usages. As Petr said, people want printk to do two
things. 1 - print out data ASAP, 2 - not lock up the system. The two
are fighting each other. You care more about 2 where I (and others,
like Peter Zijlstra and Linus) care more about 1.

My solution can help with 2 without doing anything to hurt 1.

You are NACKing my solution because it doesn't solve this bug with net
console. I believe net console should be fixed. You believe that printk
should have a work around to not let net console type bugs occur. Which
to me is papering over the real bugs.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:14       ` Tejun Heo
@ 2018-01-10 18:45         ` Steven Rostedt
  0 siblings, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 18:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 10:14:59 -0800
Tejun Heo <tj@kernel.org> wrote:

> On Wed, Jan 10, 2018 at 10:12:52AM -0800, Tejun Heo wrote:
> > Hello, Steven.
> > 
> > So, everything else on your message, sure.  You do what you have to
> > do, but I really don't understand the following part, and this has
> > been the main source of frustration in the whole discussion.
> > 
> > On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote:  
> > > You on the other hand are showing unrealistic scenarios, and crying
> > > that it's what you see in production, with no proof of it.  
> > 
> > I've explained the same scenario multiple times.  Unless you're
> > assuming that I'm lying, it should be amply clear that the scenario is
> > unrealistic - we've been seeing them taking place repeatedly for quite
> > a while.  
> 
> Oops, I meant to write "not unrealistic".  Anyways, if you think I'm
> lying, please let me know.  I can ask others who have been seeing the
> issue to join the thread.

I don't believe you are lying. I believe you are interpreting one
problem as another. I don't see this is a printk bug, I see it as a
recursive OOM + net console bug. My patch is not trying to solve that,
and I don't believe it should be solved via printk.

I'm trying to solve the problem of printk spamming all CPUs causing a
single CPU to lock up. That is a real bug that has been hit in various
different scenarios, where there is no other underlying bug. This issue
is a printk problem, and my solution solves it for printk.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 16:29   ` Petr Mladek
  2018-01-10 17:02     ` Tejun Heo
@ 2018-01-10 18:54     ` Steven Rostedt
  2018-01-11  5:10     ` Sergey Senozhatsky
  2 siblings, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 18:54 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 17:29:00 +0100
Petr Mladek <pmladek@suse.com> wrote:

> he next versions used lazy offload from console_unlock() when
> the thread spent there too much time. IMHO, this is one
> very promising solution. It guarantees that softlockup
> would never happen. But it tries hard to get the messages
> out immediately.
> 
> Unfortunately, it is very complicated. We have troubles to understand
> the concerns, for example see the long discussion about v3 at
> https://lkml.kernel.org/r/20170509082859.854-1-sergey.senozhatsky@gmail.com
> I admit that I did not have enough time to review this.
> 
> 
> Anyway, in October, 2017, Steven came up with a completely
> different approach (console owner/waiter transfer). It does
> not guarantee that the softlockup will not happen. But it
> does not suffer from the problem that blocked the obvious
> solution for years. It moves the owner at runtime, so
> it is guaranteed that the new owner would continue
> printing.

Yes, I believe my solution and the offloading solution are two agnostic
solutions, and they are not mutually exclusive. They both can be
applied. But mine shouldn't be controversial as it has no down sides
from the current printk solution.

After adding this one, if issues come up, we should have a better idea
of how to handle them, because I'm betting the issues will only come up
in some pretty unique scenarios. And they may even be solved without
having to touch printk (and hurt the get out ASAP requirement). I don't
want to paper over some real issues of those that use printk, with
printk work arounds.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:41       ` Steven Rostedt
@ 2018-01-10 18:57         ` Tejun Heo
  2018-01-10 19:17           ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 18:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

Hello, Steven.

On Wed, Jan 10, 2018 at 01:41:57PM -0500, Steven Rostedt wrote:
> The issue with the solution you want to do with printk is that it can
> break existing printk usages. As Petr said, people want printk to do two
> things. 1 - print out data ASAP, 2 - not lock up the system. The two
> are fighting each other. You care more about 2 where I (and others,
> like Peter Zijlstra and Linus) care more about 1.
> 
> My solution can help with 2 without doing anything to hurt 1.

I'm not really sure why punting to a safe context is necessarily
unacceptable in terms of #1 because there seems to be a pretty wide
gap between printing useful messages synchronously and a system being
caught in printk flush to the point where the system is not
operational at all.

> You are NACKing my solution because it doesn't solve this bug with net
> console. I believe net console should be fixed. You believe that printk
> should have a work around to not let net console type bugs occur. Which
> to me is papering over the real bugs.

As I wrote along with nack, I was more concerned with how this was
pushed forward by saying that actual problems are not real.

As for the netconsole part, sure, that can be one way, but please
consider that the messages could be coming from network drivers, of
which we have many and a lot of them aren't too high quality.  Plus,
netconsole is a separate path and network drivers can easily
malfunction on memory allocation failures.

Again, not a critical problem.  We can decide either way but it'd be
better to be generally safe (if we can do that reasonably), right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:41           ` Peter Zijlstra
@ 2018-01-10 19:05             ` Tejun Heo
  0 siblings, 0 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 19:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

Hello,

On Wed, Jan 10, 2018 at 07:41:44PM +0100, Peter Zijlstra wrote:
> Typically we (scheduler) have removed printk()s (on boot) when BIGSMP
> folks say it creates boot pain. Much of it is now behind the sched_debug
> parameter, others are compressed.
> 
> I've also seen other people reduce printk()s.
> 
> In general reducing printk() is a good thing, its a low bandwidth
> channel for critical stuff like OOPSen and the like.

Yeah, sure, no disagreement there.  It's just that this is a provision
for when that breaks down.  In the described scenario, it's also not
caused by any particular one printing too many messages.  OOM is just
printing OOM info and packet tx is just printing standard alloc failed
message (and some other following errors).  It's the feedback loop
which kills the machine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:57         ` Tejun Heo
@ 2018-01-10 19:17           ` Steven Rostedt
  2018-01-10 19:34             ` Tejun Heo
  2018-01-11  5:35             ` Sergey Senozhatsky
  0 siblings, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 19:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 10:57:47 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hello, Steven.
> 
> On Wed, Jan 10, 2018 at 01:41:57PM -0500, Steven Rostedt wrote:
> > The issue with the solution you want to do with printk is that it can
> > break existing printk usages. As Petr said, people want printk to do two
> > things. 1 - print out data ASAP, 2 - not lock up the system. The two
> > are fighting each other. You care more about 2 where I (and others,
> > like Peter Zijlstra and Linus) care more about 1.
> > 
> > My solution can help with 2 without doing anything to hurt 1.  
> 
> I'm not really sure why punting to a safe context is necessarily
> unacceptable in terms of #1 because there seems to be a pretty wide
> gap between printing useful messages synchronously and a system being
> caught in printk flush to the point where the system is not
> operational at all.

And what do you define as a "safe" context. And what happens when the
system is hosed and that "safe" context no longer exists? How do you
know that the safe context is gone?

> 
> > You are NACKing my solution because it doesn't solve this bug with net
> > console. I believe net console should be fixed. You believe that printk
> > should have a work around to not let net console type bugs occur. Which
> > to me is papering over the real bugs.  
> 
> As I wrote along with nack, I was more concerned with how this was
> pushed forward by saying that actual problems are not real.

You mean you saying that? I never created this patch set for the
problems you reported. You came in nacking this saying that it doesn't
solve your problems and showed some totally unrealistic module that
triggers issues that my patch doesn't solve.

I admit now that the OOM net console bug is a real issue. But my
saying that you were being unrealistic was more about that module you
posted to try to demonstrate the issue.

This is not the issue I'm trying to solve, and I don't understand why
you are against my solution when it is agnostic to any solution that
you want to do as well.

One way to have an offload solution added on top of mine, is to have a
limit in how many messages the printk will do. Honestly, I believe it
should always printk its own message if there are no others trying to
do a print. Yes, that may still not solve the net console bug, but it
helps guarantee that printks get out.

But if a printk starts printing more than one message, perhaps that is
where we can look at offloading. Similar to how softirq works. If a
softirq repeats too many times, it is offloaded to the ksoftirqd
thread. We can have a similar approach to printk.

> 
> As for the netconsole part, sure, that can be one way, but please
> consider that the messages could be coming from network drivers, of
> which we have many and a lot of them aren't too high quality.  Plus,
> netconsole is a separate path and network drivers can easily
> malfunction on memory allocation failures.
> 
> Again, not a critical problem.  We can decide either way but it'd be
> better to be generally safe (if we can do that reasonably), right?

OK, lets start over.

Right now my focus is an incremental approach. I'm not trying to solve
all issues that printk has. I've focused on a single issue, and that is
that printk is unbounded. Coming from a Real Time background, I find
that is a big problem. I hate unbounded algorithms. I looked at this
and found a way to make printk have a max bounded time it can print.
Sure, it can be more than what you want, but it is a constant time,
that can be measured. Hence, it is an O(1) solution.

Now, if there is still issues with printk, there may be cases where
offloading makes sense. I don't see why we should stop my solution
because we are not addressing these other issues where offloading may
make sense. My solution is simple, and does not impact other solutions.
It may even show that other solutions are not needed. But that's a good
thing.

I'm not against an offloading solution if it can solve issues without
impacting the other printk use cases. I'm currently only focusing on
this solution which you are fighting me against.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 19:17           ` Steven Rostedt
@ 2018-01-10 19:34             ` Tejun Heo
  2018-01-10 19:44               ` Steven Rostedt
  2018-01-11  5:35             ` Sergey Senozhatsky
  1 sibling, 1 reply; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 19:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

Hello, Steven.

On Wed, Jan 10, 2018 at 02:17:58PM -0500, Steven Rostedt wrote:
> > I'm not really sure why punting to a safe context is necessarily
> > unacceptable in terms of #1 because there seems to be a pretty wide
> > gap between printing useful messages synchronously and a system being
> > caught in printk flush to the point where the system is not
> > operational at all.
> 
> And what do you define as a "safe" context. And what happens when the
> system is hosed and that "safe" context no longer exists? How do you
> know that the safe context is gone?

Hmm.. yeah, we have that problem now too.  Panic bypassing
synchronizations solves some of that I guess.

> I admit now that the OOM net console bug is a real issue. But my
> saying that you were being unrealistic was more about that module you
> posted to try to demonstrate the issue.

Heh, our recollections would differ widely there, but let's leave it
at that.

> Right now my focus is an incremental approach. I'm not trying to solve
> all issues that printk has. I've focused on a single issue, and that is
> that printk is unbounded. Coming from a Real Time background, I find
> that is a big problem. I hate unbounded algorithms. I looked at this
> and found a way to make printk have a max bounded time it can print.
> Sure, it can be more than what you want, but it is a constant time,
> that can be measured. Hence, it is an O(1) solution.

It is bound iff there are contexts which can bounce the flushing role
among them, right?

> Now, if there is still issues with printk, there may be cases where
> offloading makes sense. I don't see why we should stop my solution
> because we are not addressing these other issues where offloading may
> make sense. My solution is simple, and does not impact other solutions.
> It may even show that other solutions are not needed. But that's a good
> thing.
> 
> I'm not against an offloading solution if it can solve issues without
> impacting the other printk use cases. I'm currently only focusing on
> this solution which you are fighting me against.

Oh yeah, sure.  It might actually be pretty simple to combine into
your solution.  For example, can't we just always make sure that
there's at least one sleepable context which participates in your
pingpongs, which only kicks in when a particular context is trapped
too long?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 19:34             ` Tejun Heo
@ 2018-01-10 19:44               ` Steven Rostedt
  2018-01-10 22:44                 ` Tejun Heo
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-10 19:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 11:34:51 -0800
Tejun Heo <tj@kernel.org> wrote:

> > Right now my focus is an incremental approach. I'm not trying to solve
> > all issues that printk has. I've focused on a single issue, and that is
> > that printk is unbounded. Coming from a Real Time background, I find
> > that is a big problem. I hate unbounded algorithms. I looked at this
> > and found a way to make printk have a max bounded time it can print.
> > Sure, it can be more than what you want, but it is a constant time,
> > that can be measured. Hence, it is an O(1) solution.  
> 
> It is bound iff there are contexts which can bounce the flushing role
> among them, right?

No, not at all. The printk can only print what's in the buffer. The
buffer can only get more to print if another printk occurs. If that
happens, that other printk takes over. Thus, any single printk can
print at most one buffer full. Which is bounded to the size of the
buffer.

Yes, there can be the case that printks are added via an interrupt, but
then again, it's an issue that a single CPU. And printks from interrupt
context should be considered critical, part of the ASAP category. If
they are not critical, then they shouldn't be doing printks. That may
be a place were we can add a "printk_delay", for things like non
critical printks in interrupt context, that can trigger offloading?

> 
> > Now, if there is still issues with printk, there may be cases where
> > offloading makes sense. I don't see why we should stop my solution
> > because we are not addressing these other issues where offloading may
> > make sense. My solution is simple, and does not impact other solutions.
> > It may even show that other solutions are not needed. But that's a good
> > thing.
> > 
> > I'm not against an offloading solution if it can solve issues without
> > impacting the other printk use cases. I'm currently only focusing on
> > this solution which you are fighting me against.  
> 
> Oh yeah, sure.  It might actually be pretty simple to combine into
> your solution.  For example, can't we just always make sure that
> there's at least one sleepable context which participates in your
> pingpongs, which only kicks in when a particular context is trapped
> too long?

The solution can be extended to that if the need exists, yes.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 19:44               ` Steven Rostedt
@ 2018-01-10 22:44                 ` Tejun Heo
  0 siblings, 0 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-10 22:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

Hello, Steven.

On Wed, Jan 10, 2018 at 02:44:55PM -0500, Steven Rostedt wrote:
> Yes, there can be the case that printks are added via an interrupt, but
> then again, it's an issue that a single CPU. And printks from interrupt
> context should be considered critical, part of the ASAP category. If
> they are not critical, then they shouldn't be doing printks. That may
> be a place were we can add a "printk_delay", for things like non
> critical printks in interrupt context, that can trigger offloading?

Ideally, if we can annoate all those, that would be great.  I don't
feel too confident about that tho.  Here is one network driver that we
deal with often.

  $ wc -l $(git ls-files drivers/net/ethernet/mellanox/mlx5) | tail -1
    48029 total

It's close to 50k lines of code and AFAICT this seems to be the trend.
Most things which are happening in the driver are complicated and
sometimes lead to surprising behaviors.  With memory allocation
failures thrown in, idk.

I think our exposure to this sort of problem is pretty wide and we
can't reasonably keep close eyes on them, especially for problems
which only happen under high stress conditions which aren't tested
that easily.

> > Oh yeah, sure.  It might actually be pretty simple to combine into
> > your solution.  For example, can't we just always make sure that
> > there's at least one sleepable context which participates in your
> > pingpongs, which only kicks in when a particular context is trapped
> > too long?
> 
> The solution can be extended to that if the need exists, yes.

I think it'd be really great if the core code can protect itself
against these things going haywire.  We can ignore messages generated
while being recursive from netconsole, but that would mean, for
example, if that giant driver messes up in that path (netconsole under
memory pressure), it'd be painful to debug.  So, if we can, it'd be
really great to have a generic protection which can handle these
situations.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:05   ` Steven Rostedt
  2018-01-10 18:12     ` Tejun Heo
@ 2018-01-11  4:58     ` Sergey Senozhatsky
  2018-01-11  9:34       ` Petr Mladek
  1 sibling, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-11  4:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

On (01/10/18 13:05), Steven Rostedt wrote:
[..]
> My solution takes printk from its current unbounded state, and makes it
> fixed bounded. Which means printk() is now a O(1) algorithm.
						^^^
						O(logbuf)

and   O(logbuf) > watchdog_thresh   is totally possible. and there
is nothing super unlucky in having O(logbuf). limiting printk is the
right way to go, sure. but you limit it to the wrong thing. limiting
it to logbuf is not enough, especially given that logbuf size is
configurable via kernel param - it's a moving target. if one wants
printk to stop disappointing the watchdog then printk must learn to
respect watchdog's threshold.


https://marc.info/?l=linux-kernel&m=151444381104068


hence a small fix up

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8882a4bf2a9e..4efa7542d84d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2341,6 +2341,14 @@ void console_unlock(void)
 
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
+
+		if (log_next_seq - console_seq > 666) {
+			console_seq = log_next_seq;
+			raw_spin_unlock(&logbuf_lock);
+			printk_safe_exit_irqrestore(flags);
+			panic("you mad bro? this can softlockup your system! let me fix that for you");
+		}
+
 		if (seen_seq != log_next_seq) {
 			wake_klogd = true;
 			seen_seq = log_next_seq;

---

> The solution is simple, everyone at KS agreed with it, there should be
> no controversy here.

frankly speaking, that's not what I recall ;)


[..]
> My printk solution is solid, with no risk of regressions of current
> printk usages.

except that handing off a console_sem to atomic task when there
is   O(logbuf) > watchdog_thresh   is a regression, basically...
it is what it is.


> If anything, I'll pull theses patches myself, and push them to Linus
> directly

lovely.

	-ss

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 16:29   ` Petr Mladek
  2018-01-10 17:02     ` Tejun Heo
  2018-01-10 18:54     ` Steven Rostedt
@ 2018-01-11  5:10     ` Sergey Senozhatsky
  2 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-11  5:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Tejun Heo, Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

On (01/10/18 17:29), Petr Mladek wrote:
[..]
> The next versions used lazy offload from console_unlock() when
> the thread spent there too much time. IMHO, this is one
> very promising solution. It guarantees that softlockup
> would never happen. But it tries hard to get the messages
> out immediately.

a small addition. my motivation was not exactly the "lazy offload",
but to keep the existing printk behavior as long as possible. and
that "as long as possible" is determined by watchdog threshold, which
is the only limit we must care about. as long as printing task spends
more than 1/2 of watchdog threshold - we offload. otherwise we don't
mess up with the existing logic/guarantees/etc.

there is also a bunch of other things in the patch now. but nothing
fantastically complex.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:21       ` Peter Zijlstra
  2018-01-10 18:30         ` Tejun Heo
@ 2018-01-11  5:15         ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-11  5:15 UTC (permalink / raw)
  To: Tejun Heo, Peter Zijlstra
  Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

On (01/10/18 19:21), Peter Zijlstra wrote:
> 
> On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> >    / driver tries to allocate memory and then fail, which in turn
> >    triggers allocation failure or other warning messages.  printk was
> >    already flushing, so the messages are queued on the ring.
> > 5. OOM handler keeps flushing but 4 repeats and the queue is never
> >    shrinking.  Because OOM handler is trapped in printk flushing, it
> >    never manages to free memory and no one else can enter OOM path
> >    either, so the system is trapped in this state.
> 
> Why not kill recursive OOM (msgs) ?

hm... do I understand it correctly that there is a

console_unlock()->call_console_drivers()->FOO_write()->kmalloc()->printk() recursion?

we call console drivers from printk-safe context now. so those printks
from kmalloc are redirected to per-CPU printk-safe buffer, which is
limited in size (we probably might start losing some of those OOM
messages) and which is flushed (log_store()) from another context.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 19:17           ` Steven Rostedt
  2018-01-10 19:34             ` Tejun Heo
@ 2018-01-11  5:35             ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-11  5:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

On (01/10/18 14:17), Steven Rostedt wrote:
[..]
> OK, lets start over.

good.

> Right now my focus is an incremental approach. I'm not trying to solve
> all issues that printk has. I've focused on a single issue, and that is
> that printk is unbounded. Coming from a Real Time background, I find
> that is a big problem. I hate unbounded algorithms.

agreed! so why not bound it to watchdog threshold then? why bound
it to a random O(logbuf) thing? which is not even constant. when you
un-register or disable one or several consoles then call_console_drivers()
becomes faster; when you register/enable consoles then the entire
call_console_drivers() becomes slower. how do we build a reliable
algorithm on that O(logbuf)?

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 18:40       ` Mathieu Desnoyers
@ 2018-01-11  7:36         ` Sergey Senozhatsky
  2018-01-11 11:24           ` Petr Mladek
  0 siblings, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-11  7:36 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Tejun Heo, Petr Mladek, Linus Torvalds, Andrew Morton, rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Jan Kara, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

Hi Mathieu,

On (01/10/18 18:40), Mathieu Desnoyers wrote:
[..]
> 
> There appears to be two problems at hand. One is making sure a console
> buffer owner only flushes a bounded amount of data.

which, realistically, has quite little to do with the "and thus it
fixes the lockups". logbuf size is mutable, the number of consoles we
need to sequentially push the data to is mutable, the watchdog threshold
is mutable... if combination of first two mutable things produces the
result which makes the check based on the third mutable thing happy,
then it's just an accident. my 5 cents.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-11  4:58     ` Sergey Senozhatsky
@ 2018-01-11  9:34       ` Petr Mladek
  2018-01-11 10:38         ` Sergey Senozhatsky
  0 siblings, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-11  9:34 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Thu 2018-01-11 13:58:17, Sergey Senozhatsky wrote:
> On (01/10/18 13:05), Steven Rostedt wrote:
> > The solution is simple, everyone at KS agreed with it, there should be
> > no controversy here.
> 
> frankly speaking, that's not what I recall ;)

To be honest, I do not longer remember the details. I think that
nobody was really against that solution. Of course, there were
doubts and other proposals.

I think that I was actually the most sceptical guy there. I would
split my old doubts into three areas:

      + new possible deadlocks
            -> I was wrong

      + did not fully prevent softlockups
            -> no real life example in hands

      + looked tricky and complex
	    -> like many other new things

You see that I have changed my mind and decided to give this solution
a chance.

 
> [..]
> > My printk solution is solid, with no risk of regressions of current
> > printk usages.
> 
> except that handing off a console_sem to atomic task when there
> is   O(logbuf) > watchdog_thresh   is a regression, basically...
> it is what it is.

How this could be a regression? Is not the victim that handles
other printk's random? What protected the atomic task to
handle the other printks before this patch?

Or do you have a system that started to suffer from softlockups
with this patchset and did not do this before?
 
> 
> > If anything, I'll pull theses patches myself, and push them to Linus
> > directly
> 
> lovely.

Do you know about any system where this patch made the softlockup
deterministically or statistically more likely, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-11  9:34       ` Petr Mladek
@ 2018-01-11 10:38         ` Sergey Senozhatsky
  2018-01-11 11:50           ` Petr Mladek
  2018-01-11 16:29           ` Steven Rostedt
  0 siblings, 2 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-11 10:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo,
	Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/11/18 10:34), Petr Mladek wrote:
[..]
> > except that handing off a console_sem to atomic task when there
> > is   O(logbuf) > watchdog_thresh   is a regression, basically...
> > it is what it is.
> 
> How this could be a regression? Is not the victim that handles
> other printk's random? What protected the atomic task to
> handle the other printks before this patch?

the non-atomic -> atomic context console_sem transfer. we previously
would have kept the console_sem owner to its non-atomic owner. we now
will make sure that if printk from atomic context happens then it will
make it to console_unlock() loop.
emphasis on O(logbuf) > watchdog_thresh.


- if the patch's goal is to bound (not necessarily to watchdog's threshold)
the amount of time we spend in console_unlock(), then the patch is kinda
overcomplicated. but no further questions in this case.

- but if the patch's goal is to bound (to lockup threshold) the amount of
time spent in console_unlock() in order to avoid lockups [uh, a reason],
then the patch is rather oversimplified.


claiming that for any given A, B, C the following is always true

				A * B < C

where
	A is the amount of data to print in the worst case
	B the time call_console_drivers() needs to print a single
	  char to all registered and enabled consoles
	C the watchdog's threshold

is not really a step forward.

and the "last console_sem owner prints all pending messages" rule
is still there.


> Or do you have a system that started to suffer from softlockups
> with this patchset and did not do this before?
[..]
> Do you know about any system where this patch made the softlockup
> deterministically or statistically more likely, please?

I have explained many, many times why my boards die just like before.
why would I bother collecting any numbers...

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-11  7:36         ` Sergey Senozhatsky
@ 2018-01-11 11:24           ` Petr Mladek
  2018-01-11 13:19             ` Sergey Senozhatsky
  0 siblings, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-11 11:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Mathieu Desnoyers, Tejun Heo, Linus Torvalds, Andrew Morton,
	rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Jan Kara, Tetsuo Handa, rostedt, Byungchul Park,
	Pavel Machek, linux-kernel

On Thu 2018-01-11 16:36:18, Sergey Senozhatsky wrote:
> Hi Mathieu,
> 
> On (01/10/18 18:40), Mathieu Desnoyers wrote:
> [..]
> > 
> > There appears to be two problems at hand. One is making sure a console
> > buffer owner only flushes a bounded amount of data.
> 
> which, realistically, has quite little to do with the "and thus it
> fixes the lockups". logbuf size is mutable, the number of consoles we
> need to sequentially push the data to is mutable, the watchdog threshold
> is mutable... if combination of first two mutable things produces the
> result which makes the check based on the third mutable thing happy,
> then it's just an accident. my 5 cents.

Yes, there might be situations when Steven's patch is not able to
prevent the softlockup. But there is clear evidence that it will
help in many other situations.

The offload-based solution prevents the softlockup completely.
But there might be situations where the offload does not happen
and people might miss important messages.

And this is my point. Steven's patch is not perfect. But it helps
and it seems that it does not cause regressions. The offload based
solution solves one problem a better way but it might cause
regressions that are being discussed for years.


IMHO, nobody know how much Steven's solution is effective until we
push it into the wild. IMHO, it is safe to be pushed.

You might argue that we already know that Steven's solution will
not be enough. IMHO, the problem here is the term "real life example".

My understanding is that real-life example is a softlockup report
from a system running in production or used for debugging any bug.
So far, Steven's opponents provided only hand made code or
scenarios. The provided code usually produced printk() messages
in a tight loop. In each case, there is not a consensus that they
simulated a real life problem good enough. We might continue
discussing it but basically any discussion is theoretical unless
there are hard data behind it.

I vote to push Steven's patch into the wild and see. I really would
like to give it a chance.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-11 10:38         ` Sergey Senozhatsky
@ 2018-01-11 11:50           ` Petr Mladek
  2018-01-11 16:29           ` Steven Rostedt
  1 sibling, 0 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-11 11:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Thu 2018-01-11 19:38:45, Sergey Senozhatsky wrote:
> On (01/11/18 10:34), Petr Mladek wrote:
> [..]
> > > except that handing off a console_sem to atomic task when there
> > > is   O(logbuf) > watchdog_thresh   is a regression, basically...
> > > it is what it is.
> > 
> > How this could be a regression? Is not the victim that handles
> > other printk's random? What protected the atomic task to
> > handle the other printks before this patch?
> 
> the non-atomic -> atomic context console_sem transfer. we previously
> would have kept the console_sem owner to its non-atomic owner. we now
> will make sure that if printk from atomic context happens then it will
> make it to console_unlock() loop.
> emphasis on O(logbuf) > watchdog_thresh.

Sergey, please, why do you completely and repeatedly ignore that
argument about statistical effects?

Yes, the above scenario is possible. But Steven's patch might also move the
owner from atomic context to a non-atomic one. The chances should be
more or less equal. The main advantage is that the owner is moved.
This should statistically lower the chance of a soft-lockup.

> 
> > Or do you have a system that started to suffer from softlockups
> > with this patchset and did not do this before?
> [..]
> > Do you know about any system where this patch made the softlockup
> > deterministically or statistically more likely, please?
> 
> I have explained many, many times why my boards die just like before.
> why would I bother collecting any numbers...

Is it with your own printk stress tests or during "normal" work?

If it is during a normal work, is there any chance that we
could have a look at the logs?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers
  2018-01-10 17:52   ` Steven Rostedt
@ 2018-01-11 12:03     ` Petr Mladek
  2018-01-12 15:37       ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-11 12:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Wed 2018-01-10 12:52:20, Steven Rostedt wrote:
> On Wed, 10 Jan 2018 14:24:18 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > The commit ("printk: Add console owner and waiter logic to load balance
> > console writes") made vprintk_emit() and console_unlock() even more
> > complicated.
> > 
> > This patch extracts the new code into 3 helper functions. They should
> > help to keep it rather self-contained. It will be easier to use and
> > maintain.
> > 
> > This patch just shuffles the existing code. It does not change
> > the functionality.
> > 
> > Signed-off-by: Petr Mladek <pmladek@suse.com>
> > ---
> >  kernel/printk/printk.c | 242 +++++++++++++++++++++++++++++--------------------
> >  1 file changed, 145 insertions(+), 97 deletions(-)
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 7e6459abba43..6217c280e6c1 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers);
> >  static struct lockdep_map console_lock_dep_map = {
> >  	.name = "console_lock"
> >  };
> > -static struct lockdep_map console_owner_dep_map = {
> > -	.name = "console_owner"
> > -};
> >  #endif
> >  
> > -static DEFINE_RAW_SPINLOCK(console_owner_lock);
> > -static struct task_struct *console_owner;
> > -static bool console_waiter;
> > -
> >  enum devkmsg_log_bits {
> >  	__DEVKMSG_LOG_BIT_ON = 0,
> >  	__DEVKMSG_LOG_BIT_OFF,
> > @@ -1551,6 +1544,143 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
> >  }
> >  
> >  /*
> > + * Special console_lock variants that help to reduce the risk of soft-lockups.
> > + * They allow to pass console_lock to another printk() call using a busy wait.
> > + */
> > +
> > +#ifdef CONFIG_LOCKDEP
> > +static struct lockdep_map console_owner_dep_map = {
> > +	.name = "console_owner"
> > +};
> > +#endif
> > +
> > +static DEFINE_RAW_SPINLOCK(console_owner_lock);
> > +static struct task_struct *console_owner;
> > +static bool console_waiter;
> > +
> > +/**
> > + * console_lock_spinning_enable - mark beginning of code where another
> > + *	thread might safely busy wait
> > + *
> > + * This might be called in sections where the current console_lock owner
> 
> 
> "might be"? It has to be called in sections where the current
> console_lock owner can not sleep. It's basically saying "console lock is
> now acting like a spinlock".

I am afraid that both explanations are confusing. Your one sounds like
it must be called every time we enter non-preemptive context in
console_unlock. What about the following?

 * This is basically saying that "console lock is now acting like
 * a spinlock". It can be called _only_ in sections where the current
 * console_lock owner could not sleep. Also it must be ready to hand
 * over the lock at the end of the section.

> > + * cannot sleep. It is a signal that another thread might start busy
> > + * waiting for console_lock.
> > + */

All the other changes look good to me. I will use them in the next version.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-11 11:24           ` Petr Mladek
@ 2018-01-11 13:19             ` Sergey Senozhatsky
  0 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-11 13:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Mathieu Desnoyers, Tejun Heo, Linus Torvalds,
	Andrew Morton, rostedt, Sergey Senozhatsky, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Jan Kara, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On (01/11/18 12:24), Petr Mladek wrote:
[..]
> You might argue that we already know that Steven's solution will
> not be enough. IMHO, the problem here is the term "real life example".

this is really boring, how real life examples happen only on Steven's PC
or Petr's qemu image. whatever.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-11 10:38         ` Sergey Senozhatsky
  2018-01-11 11:50           ` Petr Mladek
@ 2018-01-11 16:29           ` Steven Rostedt
  2018-01-12  1:30             ` Steven Rostedt
  2018-01-12  2:56             ` Sergey Senozhatsky
  1 sibling, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-11 16:29 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Thu, 11 Jan 2018 19:38:45 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> 
> the non-atomic -> atomic context console_sem transfer. we previously
> would have kept the console_sem owner to its non-atomic owner. we now
> will make sure that if printk from atomic context happens then it will
> make it to console_unlock() loop.
> emphasis on O(logbuf) > watchdog_thresh.
> 
> 
> - if the patch's goal is to bound (not necessarily to watchdog's threshold)
> the amount of time we spend in console_unlock(), then the patch is kinda
> overcomplicated. but no further questions in this case.

It's goal is to keep printk from running amok on a single CPU like it
currently does. This prevents one printk from never ending. And it is
far from complex. It doesn't deal with "offloading". The "handover" is
only done to those that are doing printks. What do you do if all CPUs
are in "critical sections", how would a "handoff to safe" work? Will
the printks never get out? If the machine were to triple fault and
reboot, we lost all of it.

> 
> - but if the patch's goal is to bound (to lockup threshold) the amount of
> time spent in console_unlock() in order to avoid lockups [uh, a reason],
> then the patch is rather oversimplified.

It's bound to print all the information that has been added to the
printk buffer. You want to bound it to some "time" and what about the
printks that haven't gotten out yet? Delay them to something else, and
if the machine were to crash in the transfer, we lost all that data.

My method, there's really no delay between a hand off. There's always
an active CPU doing printing. It matches the current method which works
well for getting information out. A delayed approach will break that
and that's what people like myself, Peter, Linus and others are worried
about.


> 
> 
> claiming that for any given A, B, C the following is always true
> 
> 				A * B < C
> 
> where
> 	A is the amount of data to print in the worst case
> 	B the time call_console_drivers() needs to print a single
> 	  char to all registered and enabled consoles
> 	C the watchdog's threshold
> 
> is not really a step forward.

It's no different than what we have, except that we currently have A
being infinite. My patch makes A no longer infinite, but a constant.
Yes that constant is mutable, but it's still a constant, and
controlled by the user. That to me is definitely a BIG step forward.

> 
> and the "last console_sem owner prints all pending messages" rule
> is still there.
> 
> 
> > Or do you have a system that started to suffer from softlockups
> > with this patchset and did not do this before?  
> [..]
> > Do you know about any system where this patch made the softlockup
> > deterministically or statistically more likely, please?  
> 
> I have explained many, many times why my boards die just like before.
> why would I bother collecting any numbers...

Great, and there's cases that die that my patch solves. Lets add my
patch now since it is orthogonal to an offloading approach and see how
it works, because it would solve issues that I have hit. If you can
show that this isn't good enough we can add another approach. We are
solving two different problems. My patch simply makes one printk() no
longer unbounded. It's a fixed time.

Honestly, I don't see why you are against this patch. It doesn't stop
your work. If this patch isn't enough (but it does fix some issues),
then we can look at adding other approaches. Really, it sounds like you
are afraid of this patch, that it might be good enough for most cases
which would make adding another approach even more difficult.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-11 16:29           ` Steven Rostedt
@ 2018-01-12  1:30             ` Steven Rostedt
  2018-01-12  2:55               ` Steven Rostedt
  2018-01-12  3:12               ` Sergey Senozhatsky
  2018-01-12  2:56             ` Sergey Senozhatsky
  1 sibling, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12  1:30 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Thu, 11 Jan 2018 11:29:08 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > claiming that for any given A, B, C the following is always true
> > 
> > 				A * B < C
> > 
> > where
> > 	A is the amount of data to print in the worst case
> > 	B the time call_console_drivers() needs to print a single
> > 	  char to all registered and enabled consoles
> > 	C the watchdog's threshold
> > 
> > is not really a step forward.  
> 
> It's no different than what we have, except that we currently have A
> being infinite. My patch makes A no longer infinite, but a constant.
> Yes that constant is mutable, but it's still a constant, and
> controlled by the user. That to me is definitely a BIG step forward.

I have to say that your analysis here really does point out the benefit
of my patch.

Today, printk() can print for a time of A * B, where, as you state
above:

   A is the amount of data to print in the worst case
   B the time call_console_drivers() needs to print a single
	  char to all registered and enabled consoles

In the worse case, the current approach is A is infinite. That is,
printk() never stops, as long as there is a printk happening on another
CPU before B can finish. A will keep growing. The call to printk() will
never return. The more CPUs you have, the more likely this will occur.
All it takes is a few CPUs doing periodic printks. If there is a slow
console, where the periodic printk on other CPUs occur quicker than the
first can finish, the first one will be stuck forever. Doesn't take
much to have this happen.

With my patch, A is fixed to the size of the buffer. A single printk()
can never print more than that. If another CPU comes in and does a
printk, then it will take over the task of printing, and release the
first printk.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12  1:30             ` Steven Rostedt
@ 2018-01-12  2:55               ` Steven Rostedt
  2018-01-12  4:20                 ` Steven Rostedt
  2018-01-16 19:44                 ` Tejun Heo
  2018-01-12  3:12               ` Sergey Senozhatsky
  1 sibling, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12  2:55 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Thu, 11 Jan 2018 20:30:57 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I have to say that your analysis here really does point out the benefit
> of my patch.
> 
> Today, printk() can print for a time of A * B, where, as you state
> above:
> 
>    A is the amount of data to print in the worst case
>    B the time call_console_drivers() needs to print a single
> 	  char to all registered and enabled consoles
> 
> In the worse case, the current approach is A is infinite. That is,
> printk() never stops, as long as there is a printk happening on another
> CPU before B can finish. A will keep growing. The call to printk() will
> never return. The more CPUs you have, the more likely this will occur.
> All it takes is a few CPUs doing periodic printks. If there is a slow
> console, where the periodic printk on other CPUs occur quicker than the
> first can finish, the first one will be stuck forever. Doesn't take
> much to have this happen.
> 
> With my patch, A is fixed to the size of the buffer. A single printk()
> can never print more than that. If another CPU comes in and does a
> printk, then it will take over the task of printing, and release the
> first printk.

In fact, below is a module I made (starting with Tejun's crazy stress
test, then removing all the craziness). This simple module locks up the
system without my patch. After applying my patch, the system runs fine.

All I did was start off a work queue on each CPU, and each CPU does one
printk() followed by a millisecond sleep. No 10,000 printks, nothing
in an interrupt handler. Preemption is disabled while the printk
happens, but that's normal.

This is much closer to an OOM happening all over the system, where OOMs
stack dumps are occurring on different CPUS.

I ran this on a box with 4 CPUs and a serial console (so it has a slow
console). Again, all I have is each CPU doing exactly ONE printk()!
then sleeping for a full millisecond! It will cause a lot of output,
and perhaps slow the system down. But it should not lock up the system.
But without my patch, it does!

Try it!

Test it on a box, and it will lock up. Then add my patch and see what
the results are. I think this speaks very loudly in favor of applying
my patch.

Again, the below module locks up my system immediately without my
patch. With my patch, no problem. In fact, it's still running, while I
wrote this email, and it hardly shows a slow down in the system.


-- Steve

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>

static bool stop_testing;

static void preempt_printk_workfn(struct work_struct *work)
{
	while (!READ_ONCE(stop_testing)) {
		preempt_disable();
		printk("%5d%-75s\n", smp_processor_id(), " XXX PREEMPT");
		preempt_enable();
		msleep(1);
	}
}

static struct work_struct __percpu *works;

static void finish(void)
{
	int cpu;

	WRITE_ONCE(stop_testing, true);
	for_each_online_cpu(cpu)
		flush_work(per_cpu_ptr(works, cpu));
	free_percpu(works);
}

static int __init test_init(void)
{
	int cpu;

	works = alloc_percpu(struct work_struct);
	if (!works)
		return -ENOMEM;

	/*
	 * This is just a test module. This will break if you
	 * do any CPU hot plugging between loading and
	 * unloading the module.
	 */

	for_each_online_cpu(cpu) {
		struct work_struct *work = per_cpu_ptr(works, cpu);

		INIT_WORK(work, &preempt_printk_workfn);
		schedule_work_on(cpu, work);
	}

	return 0;
}

static void __exit test_exit(void)
{
	finish();
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-11 16:29           ` Steven Rostedt
  2018-01-12  1:30             ` Steven Rostedt
@ 2018-01-12  2:56             ` Sergey Senozhatsky
  2018-01-12  3:21               ` Steven Rostedt
  1 sibling, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-12  2:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

Hi,

On (01/11/18 11:29), Steven Rostedt wrote:
[..]
> > - if the patch's goal is to bound (not necessarily to watchdog's threshold)
> > the amount of time we spend in console_unlock(), then the patch is kinda
> > overcomplicated. but no further questions in this case.
> 
> It's goal is to keep printk from running amok on a single CPU like it
> currently does. This prevents one printk from never ending. And it is
> far from complex. It doesn't deal with "offloading". The "handover" is
> only done to those that are doing printks. What do you do if all CPUs
> are in "critical sections", how would a "handoff to safe" work? Will
> the printks never get out? If the machine were to triple fault and
> reboot, we lost all of it.

make printk_kthread to be just one of the things that compete for
handed off console_sem, along with other CPUs.

> > - but if the patch's goal is to bound (to lockup threshold) the amount of
> > time spent in console_unlock() in order to avoid lockups [uh, a reason],
> > then the patch is rather oversimplified.
> 
> It's bound to print all the information that has been added to the
> printk buffer. You want to bound it to some "time"

not some... it's aligned with watchdog expectations.
which is deterministic, isn't it?

> My method, there's really no delay between a hand off. There's always
> an active CPU doing printing. It matches the current method which works
> well for getting information out. A delayed approach will break

no, not necessarily. and my previous patch set had some bits of that
"combined offloading and hand off" behaviour. I was thinking about
extending it further, but decided not to. - printk_kthread would spin
on console_owner until current console_sem hand off.

> > claiming that for any given A, B, C the following is always true
> > 
> > 				A * B < C
> > 
> > where
> > 	A is the amount of data to print in the worst case
> > 	B the time call_console_drivers() needs to print a single
> > 	  char to all registered and enabled consoles
> > 	C the watchdog's threshold
> > 
> > is not really a step forward.
> 
> It's no different than what we have, except that we currently have A
> being infinite. My patch makes A no longer infinite, but a constant.

my point is - the constant can be unrealistically high. and can
easily overlap watchdog_threshold, returning printk back to unbound
land. IOW, if your bound is above the watchdog threshold then you
don't have any bounds.

by example, with console=ttyS1,57600n8
- keep increasing the watchdog_threshold until watchdog stops
  complaining?
or
- keep reducing the logbuf size until it can be flushed under
  watchdog_threshold seconds?


and I demonstrated how exactly we end up having a full logbuf of pending
messages even on systems with faster consoles.


[..]
> Great, and there's cases that die that my patch solves. Lets add my
> patch now since it is orthogonal to an offloading approach and see how
> it works, because it would solve issues that I have hit. If you can
> show that this isn't good enough we can add another approach.

it bounds printk. yes, good! that's what I want. but it bounds it to a
wrong value. I want more deterministic and close to reality bound.
and I also want to get rid of "the last console_sem owner prints it all"
thing. I demonstrated with the traces how that thing can bite.


> Honestly, I don't see why you are against this patch.

prove it! show me exactly when and where I said that I NACK or
block the patch? seriously.


> It doesn't stop your work.

and I never said it would. your patch changes nothing on my side, that's
my message. as of now I have out-of-tree patches, well I'll keep using
them. nothing new.


> If this patch isn't enough

BINGO! this is all I'm trying to say.
and the only reply (if there is any at all!) I'm getting is
"GTFO!!! your problems are unrealistic! we gonna release the
patch and wait for someone to come along and say us something
new about printk issues. but not you!".


> (but it does fix some issues)

obviously there are cases which your patch addresses. have I ever
denied that? but, once again, obviously, there are cases which it
doesn't. and those cases tend to bite my setups. I have repeated
it many times, and have explained in great details which parts I'm
talking about.

and I have never run unrealistic test_printk.ko against your patch
or anything alike; why the heck would I do that.


> Really, it sounds like you are afraid of this patch, that it might
> be good enough for most cases which would make adding another approach
> even more difficult.

LOL! wish I knew how to capture screenshots on Linux!

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12  1:30             ` Steven Rostedt
  2018-01-12  2:55               ` Steven Rostedt
@ 2018-01-12  3:12               ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-12  3:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/11/18 20:30), Steven Rostedt wrote:
[..]
> Today, printk() can print for a time of A * B, where, as you state
> above:
> 
>    A is the amount of data to print in the worst case
>    B the time call_console_drivers() needs to print a single
> 	  char to all registered and enabled consoles
> 
> In the worse case, the current approach is A is infinite. That is,
> printk() never stops, as long as there is a printk happening on another
> CPU before B can finish. A will keep growing. The call to printk() will
> never return. The more CPUs you have, the more likely this will occur.
> All it takes is a few CPUs doing periodic printks. If there is a slow
> console, where the periodic printk on other CPUs occur quicker than the
> first can finish, the first one will be stuck forever. Doesn't take
> much to have this happen.

console_sem owner can stuck in console_unlock() not because of printk-s
happening right now on other CPUs, but because those printk-s could have
happened while console_sem owner was preempted. when it comes back it has
a ton of pending messages.

I said it before - "we stuck in console_unlock() because others CPUs
printk right now a lot" is not always true. we have preemption. and
the "last console_sem owner prints it all" is not good in this case.

> With my patch, A is fixed to the size of the buffer. A single printk()
> can never print more than that. If another CPU comes in and does a
> printk, then it will take over the task of printing, and release the
> first printk.

yes. and "another CPU" that comes to take over has to print all the
pending messages. from whatever context it's currently in. and bringing
A * B below C can be quite tricky, if possible at all (!). most likely
people will just add more touch_nmi_watchdog().

again, I don't disagree on "let's bound printk". yes, we totally
should! but the bound must be realistic if we want to fix the damn
thing (either with printk_kthread, or hand off, or anything else).

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12  2:56             ` Sergey Senozhatsky
@ 2018-01-12  3:21               ` Steven Rostedt
  2018-01-12 10:05                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12  3:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Fri, 12 Jan 2018 11:56:12 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> Hi,
> 
> On (01/11/18 11:29), Steven Rostedt wrote:
> [..]
> > > - if the patch's goal is to bound (not necessarily to watchdog's threshold)
> > > the amount of time we spend in console_unlock(), then the patch is kinda
> > > overcomplicated. but no further questions in this case.  
> > 
> > It's goal is to keep printk from running amok on a single CPU like it
> > currently does. This prevents one printk from never ending. And it is
> > far from complex. It doesn't deal with "offloading". The "handover" is
> > only done to those that are doing printks. What do you do if all CPUs
> > are in "critical sections", how would a "handoff to safe" work? Will
> > the printks never get out? If the machine were to triple fault and
> > reboot, we lost all of it.  
> 
> make printk_kthread to be just one of the things that compete for
> handed off console_sem, along with other CPUs.

Are you going to make printk thread a high priority task?

> 
> > > - but if the patch's goal is to bound (to lockup threshold) the amount of
> > > time spent in console_unlock() in order to avoid lockups [uh, a reason],
> > > then the patch is rather oversimplified.  
> > 
> > It's bound to print all the information that has been added to the
> > printk buffer. You want to bound it to some "time"  
> 
> not some... it's aligned with watchdog expectations.
> which is deterministic, isn't it?

When do you start the timer? What you are trying to solve isn't a
single printk that gets stuck. Just look at Tejun's module. To trigger
what he wanted, he had to do 10,000 printks from an interrupt context.

> 
> > My method, there's really no delay between a hand off. There's always
> > an active CPU doing printing. It matches the current method which works
> > well for getting information out. A delayed approach will break  
> 
> no, not necessarily. and my previous patch set had some bits of that
> "combined offloading and hand off" behaviour. I was thinking about
> extending it further, but decided not to. - printk_kthread would spin
> on console_owner until current console_sem hand off.

Is printk_thread always running, taking up CPU cycles?

> 
> > > claiming that for any given A, B, C the following is always true
> > > 
> > > 				A * B < C
> > > 
> > > where
> > > 	A is the amount of data to print in the worst case
> > > 	B the time call_console_drivers() needs to print a single
> > > 	  char to all registered and enabled consoles
> > > 	C the watchdog's threshold
> > > 
> > > is not really a step forward.  
> > 
> > It's no different than what we have, except that we currently have A
> > being infinite. My patch makes A no longer infinite, but a constant.  
> 
> my point is - the constant can be unrealistically high. and can
> easily overlap watchdog_threshold, returning printk back to unbound
> land. IOW, if your bound is above the watchdog threshold then you
> don't have any bounds.

That makes no sense.

> 
> by example, with console=ttyS1,57600n8
> - keep increasing the watchdog_threshold until watchdog stops
>   complaining?
> or
> - keep reducing the logbuf size until it can be flushed under
>   watchdog_threshold seconds?

After playing with the module in my last email, I think your trying to
solve multiple printks, not one that is stuck. I'm solving the one that
is stuck problem, which was easily triggered by a simple (non stess
test) module.

> 
> 
> and I demonstrated how exactly we end up having a full logbuf of pending
> messages even on systems with faster consoles.

Where did you demonstrate that. There's so many emails I can't keep up.

But still, take a look at my simple module. I locked up the system
immediately with something that shouldn't have locked up the system.
And my patch fixed it. I think that speaks louder than any of our
opinions.

> 
> 
> [..]
> > Great, and there's cases that die that my patch solves. Lets add my
> > patch now since it is orthogonal to an offloading approach and see how
> > it works, because it would solve issues that I have hit. If you can
> > show that this isn't good enough we can add another approach.  
> 
> it bounds printk. yes, good! that's what I want. but it bounds it to a
> wrong value. I want more deterministic and close to reality bound.
> and I also want to get rid of "the last console_sem owner prints it all"
> thing. I demonstrated with the traces how that thing can bite.

I have not seen any realistic traces, but perhaps I missed something. It
all requires lots of printks, in weird scenarios. I demonstrated that
the system can be locked up with few printks (one per cpu per
millisecond), and my patch solves it.

> 
> 
> > Honestly, I don't see why you are against this patch.  
> 
> prove it! show me exactly when and where I said that I NACK or
> block the patch? seriously.

Why are we having this discussion then? Just give your Ack to my patch,
and we can look to see if we need to improve on it.

> 
> 
> > It doesn't stop your work.  
> 
> and I never said it would. your patch changes nothing on my side, that's
> my message. as of now I have out-of-tree patches, well I'll keep using
> them. nothing new.
> 
> 
> > If this patch isn't enough  
> 
> BINGO! this is all I'm trying to say.
> and the only reply (if there is any at all!) I'm getting is
> "GTFO!!! your problems are unrealistic! we gonna release the
> patch and wait for someone to come along and say us something
> new about printk issues. but not you!".

I think we are misunderstanding each other. It didn't seem that you
were on board with this patch. Why didn't you just say, "here's my ack
for this patch, but we are going to need more"?

This could just be that we are misunderstanding each other. I've been
saying from the beginning, that my patch is an incremental approach.
But I never got the "OK" from you about it. You just pointed out what
you thought was its short comings. Yes, you never actually NACK'd it
(like Tejun did), but you never gave it your blessing either.

> 
> 
> > (but it does fix some issues)  
> 
> obviously there are cases which your patch addresses. have I ever
> denied that? but, once again, obviously, there are cases which it
> doesn't. and those cases tend to bite my setups. I have repeated
> it many times, and have explained in great details which parts I'm
> talking about.

Well, I could argue that the cases you are trying to solve were
intensified by the bug my patch fixes.

> 
> and I have never run unrealistic test_printk.ko against your patch
> or anything alike; why the heck would I do that.
> 
> 
> > Really, it sounds like you are afraid of this patch, that it might
> > be good enough for most cases which would make adding another approach
> > even more difficult.  
> 
> LOL! wish I knew how to capture screenshots on Linux!


OK, if you are fine with my patch, just give it an Ack, and we push it
into the wild and see what happens. If things go as you say, not good
enough, then we can add your approach. I never veered from this. It
just appeared that you didn't want this patch to go in without your
additions.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12  2:55               ` Steven Rostedt
@ 2018-01-12  4:20                 ` Steven Rostedt
  2018-01-16 19:44                 ` Tejun Heo
  1 sibling, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12  4:20 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Thu, 11 Jan 2018 21:55:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I ran this on a box with 4 CPUs and a serial console (so it has a slow
> console). Again, all I have is each CPU doing exactly ONE printk()!
> then sleeping for a full millisecond! It will cause a lot of output,
> and perhaps slow the system down. But it should not lock up the system.
> But without my patch, it does!

I decided to see how this works without a slow serial console. So I
rebooted the box and enabled hyper-threading (doubling the number of
CPUs to 8), and then ran this module, with serial disabled.

As expected, it did not lock up. That's because there was only a single
console (VGA) and it is fast enough to keep up. Especially, since I
have a 1 millisecond sleep between printks.

But I ran the function_graph tracer to see what was happening. Here's
the unpatched case. It didn't take long to see a single CPU suffering
(and this is with a fast console!)

     kworker/1:2-309   [001]    78.677770: funcgraph_entry:                   |  printk() {
     kworker/7:1-176   [007]    78.677772: funcgraph_entry:                   |  printk() {
     kworker/3:1-72    [003]    78.677772: funcgraph_entry:                   |  printk() {
     kworker/7:1-176   [007]    78.677778: funcgraph_exit:         4.528 us   |  }
     kworker/3:1-72    [003]    78.677779: funcgraph_exit:         5.875 us   |  }
     kworker/0:0-3     [000]    78.678745: funcgraph_entry:                   |  printk() {
     kworker/5:1-78    [005]    78.678749: funcgraph_entry:                   |  printk() {
     kworker/4:1-73    [004]    78.678751: funcgraph_entry:                   |  printk() {
     kworker/0:0-3     [000]    78.678752: funcgraph_exit:         4.893 us   |  }
     kworker/5:1-78    [005]    78.678754: funcgraph_exit:         4.287 us   |  }
     kworker/4:1-73    [004]    78.678756: funcgraph_exit:         3.964 us   |  }
     kworker/6:1-147   [006]    78.679751: funcgraph_entry:                   |  printk() {
     kworker/2:3-1295  [002]    78.679753: funcgraph_entry:                   |  printk() {
     kworker/6:1-147   [006]    78.679767: funcgraph_exit:       + 13.735 us  |  }
     kworker/2:3-1295  [002]    78.679768: funcgraph_exit:       + 14.318 us  |  }
     kworker/7:1-176   [007]    78.680751: funcgraph_entry:                   |  printk() {
     kworker/3:1-72    [003]    78.680753: funcgraph_entry:                   |  printk() {
     kworker/7:1-176   [007]    78.680756: funcgraph_exit:         3.981 us   |  }
     kworker/3:1-72    [003]    78.680757: funcgraph_exit:         3.499 us   |  }
     kworker/5:1-78    [005]    78.681734: funcgraph_entry:        3.388 us   |  printk();
     kworker/4:1-73    [004]    78.681752: funcgraph_entry:                   |  printk() {
     kworker/0:0-3     [000]    78.681753: funcgraph_entry:                   |  printk() {
     kworker/4:1-73    [004]    78.681756: funcgraph_exit:         3.009 us   |  }
     kworker/0:0-3     [000]    78.681757: funcgraph_exit:         3.708 us   |  }
     kworker/2:3-1295  [002]    78.682742: funcgraph_entry:                   |  printk() {
     kworker/6:1-147   [006]    78.682746: funcgraph_entry:                   |  printk() {
     kworker/2:3-1295  [002]    78.682749: funcgraph_exit:         4.548 us   |  }
     kworker/6:1-147   [006]    78.682750: funcgraph_exit:         3.001 us   |  }
     kworker/3:1-72    [003]    78.683751: funcgraph_entry:                   |  printk() {
     kworker/7:1-176   [007]    78.683753: funcgraph_entry:                   |  printk() {
     kworker/3:1-72    [003]    78.683756: funcgraph_exit:         3.869 us   |  }
     kworker/7:1-176   [007]    78.683757: funcgraph_exit:         4.300 us   |  }
     kworker/5:1-78    [005]    78.684736: funcgraph_entry:        2.074 us   |  printk();
     kworker/4:1-73    [004]    78.684755: funcgraph_entry:                   |  printk() {
     kworker/0:0-3     [000]    78.684755: funcgraph_entry:        3.065 us   |  printk();
     kworker/4:1-73    [004]    78.684760: funcgraph_exit:         4.091 us   |  }
     kworker/6:1-147   [006]    78.685744: funcgraph_entry:                   |  printk() {
     kworker/2:3-1295  [002]    78.685744: funcgraph_entry:        4.616 us   |  printk();
     kworker/6:1-147   [006]    78.685752: funcgraph_exit:         5.943 us   |  }
     kworker/7:1-176   [007]    78.686763: funcgraph_entry:                   |  printk() {
     kworker/3:1-72    [003]    78.686767: funcgraph_entry:                   |  printk() {
     kworker/7:1-176   [007]    78.686770: funcgraph_exit:         4.570 us   |  }
     kworker/3:1-72    [003]    78.686771: funcgraph_exit:         3.262 us   |  }
     kworker/1:2-309   [001]    78.687626: funcgraph_exit:       # 9854.982 us |  }


CPU 1 was stuck for 9 milliseconds doing nothing but handling printk.
And this is without a serial or slow console.

With a patched kernel:

     kworker/7:1-176   [007]    85.937411: funcgraph_entry:                   |  printk() {
     kworker/3:1-72    [003]    85.937416: funcgraph_exit:         3.357 us   |  }
     kworker/7:1-176   [007]    85.937416: funcgraph_exit:         4.388 us   |  }
     kworker/2:2-315   [002]    85.937793: funcgraph_exit:       # 1391.842 us |  }
     kworker/1:2-592   [001]    85.938391: funcgraph_entry:                   |  printk() {
     kworker/4:2-529   [004]    85.938396: funcgraph_entry:        3.267 us   |  printk();
     kworker/6:1-150   [006]    85.938555: funcgraph_exit:       # 1159.354 us |  }
     kworker/0:2-127   [000]    85.939393: funcgraph_entry:                   |  printk() {
     kworker/5:2-352   [005]    85.939394: funcgraph_entry:      + 13.403 us  |  printk();
     kworker/1:2-592   [001]    85.939718: funcgraph_exit:       # 1325.211 us |  }
     kworker/0:2-127   [000]    85.940345: funcgraph_exit:       ! 951.361 us |  }
     kworker/7:1-176   [007]    85.940390: funcgraph_entry:                   |  printk() {
     kworker/3:1-72    [003]    85.940390: funcgraph_entry:                   |  printk() {
     kworker/2:2-315   [002]    85.940391: funcgraph_entry:                   |  printk() {
     kworker/7:1-176   [007]    85.940396: funcgraph_exit:         4.144 us   |  }
     kworker/2:2-315   [002]    85.940397: funcgraph_exit:         5.687 us   |  }
     kworker/4:2-529   [004]    85.941403: funcgraph_entry:                   |  printk() {
     kworker/6:1-150   [006]    85.941407: funcgraph_entry:        3.167 us   |  printk();
     kworker/3:1-72    [003]    85.941545: funcgraph_exit:       # 1153.899 us |  }
     kworker/4:2-529   [004]    85.942371: funcgraph_exit:       ! 966.322 us |  }
     kworker/1:2-592   [001]    85.942411: funcgraph_entry:                   |  printk() {
     kworker/5:2-352   [005]    85.942411: funcgraph_entry:                   |  printk() {
     kworker/1:2-592   [001]    85.942416: funcgraph_exit:         4.099 us   |  }
     kworker/0:2-127   [000]    85.942553: funcgraph_entry:                   |  printk() {
     kworker/5:2-352   [005]    85.942739: funcgraph_exit:       ! 326.853 us |  }
     kworker/0:2-127   [000]    85.943358: funcgraph_exit:       ! 804.095 us |  }
     kworker/2:2-315   [002]    85.943388: funcgraph_entry:                   |  printk() {
     kworker/7:1-176   [007]    85.943391: funcgraph_entry:                   |  printk() {
     kworker/2:2-315   [002]    85.943754: funcgraph_exit:       ! 364.921 us |  }
     kworker/7:1-176   [007]    85.944127: funcgraph_exit:       ! 734.864 us |  }
     kworker/6:1-150   [006]    85.944408: funcgraph_entry:                   |  printk() {
     kworker/3:1-72    [003]    85.944408: funcgraph_entry:        4.911 us   |  printk();
     kworker/6:1-150   [006]    85.945235: funcgraph_exit:       ! 826.596 us |  }
     kworker/0:2-127   [000]    85.945398: funcgraph_entry:                   |  printk() {
     kworker/5:2-352   [005]    85.945399: funcgraph_entry:                   |  printk() {
     kworker/4:2-529   [004]    85.945400: funcgraph_entry:                   |  printk() {
     kworker/1:2-592   [001]    85.945412: funcgraph_entry:                   |  printk() {
     kworker/5:2-352   [005]    85.945415: funcgraph_exit:       + 14.537 us  |  }
     kworker/4:2-529   [004]    85.945416: funcgraph_exit:         5.494 us   |  }
     kworker/0:2-127   [000]    85.945736: funcgraph_exit:       ! 337.000 us |  }
     kworker/7:1-176   [007]    85.946403: funcgraph_entry:                   |  printk() {
     kworker/2:2-315   [002]    85.946409: funcgraph_entry:        3.275 us   |  printk();
     kworker/1:2-592   [001]    85.946546: funcgraph_exit:       # 1133.155 us |  }

The load is spread out much better. No one CPU is stuck too badly.

As the function_graph tracer annotates functions that take over a
millisecond with a '#', I can grep and see how many take that long, and
for how long.

 $ trace-cmd report trace-printk-nopatch-8cpus.dat |grep '#'
     kworker/4:1-73    [004]    78.658973: funcgraph_exit:       # 1247.220 us |  }
     kworker/2:3-1295  [002]    78.662340: funcgraph_exit:       # 2616.456 us |  }
     kworker/7:1-176   [007]    78.671727: funcgraph_exit:       # 1996.234 us |  }
     kworker/4:1-73    [004]    78.676696: funcgraph_exit:       # 2954.230 us |  }
     kworker/1:2-309   [001]    78.687626: funcgraph_exit:       # 9854.982 us |  }
     kworker/5:1-78    [005]    78.692652: funcgraph_exit:       # 4920.607 us |  }
     kworker/5:1-78    [005]    78.696737: funcgraph_exit:       # 1983.090 us |  }
     kworker/5:1-78    [005]    78.701426: funcgraph_exit:       # 1686.832 us |  }
     kworker/2:3-1295  [002]    78.710736: funcgraph_exit:       # 6975.033 us |  }
     kworker/1:2-309   [001]    78.712455: funcgraph_exit:       # 1711.895 us |  }
     kworker/7:1-176   [007]    78.721588: funcgraph_exit:       # 7835.767 us |  }
     kworker/1:2-309   [001]    78.729626: funcgraph_exit:       # 5879.358 us |  }
     kworker/3:1-72    [003]    78.744426: funcgraph_exit:       # 12678.256 us |  }
     kworker/1:2-309   [001]    78.754549: funcgraph_exit:       # 7816.182 us |  }
     kworker/7:1-176   [007]    78.758612: funcgraph_exit:       # 1874.185 us |  }
     kworker/5:1-78    [005]    78.762615: funcgraph_exit:       # 1878.463 us |  }
     kworker/2:3-1295  [002]    78.771593: funcgraph_exit:       # 6849.619 us |  }
     kworker/3:1-72    [003]    78.776616: funcgraph_exit:       # 2868.446 us |  }
     kworker/1:2-309   [001]    78.780585: funcgraph_exit:       # 2843.085 us |  }
     kworker/7:1-176   [007]    78.785701: funcgraph_exit:       # 3949.963 us |  }
     kworker/1:2-309   [001]    78.787192: funcgraph_exit:       # 1452.146 us |  }
     kworker/2:3-1295  [002]    78.791554: funcgraph_exit:       # 2821.999 us |  }
     kworker/5:1-78    [005]    78.793686: funcgraph_exit:       # 1934.499 us |  }
     kworker/2:3-1295  [002]    78.795377: funcgraph_exit:       # 1641.652 us |  }
     kworker/6:1-147   [006]    78.815413: funcgraph_exit:       # 2669.295 us |  }
     kworker/5:1-78    [005]    78.821529: funcgraph_exit:       # 1782.758 us |  }
     kworker/5:1-78    [005]    78.826732: funcgraph_exit:       # 2993.772 us |  }
     kworker/6:1-147   [006]    78.829676: funcgraph_exit:       # 1920.164 us |  }
     kworker/5:1-78    [005]    78.831464: funcgraph_exit:       # 1728.834 us |  }
     kworker/1:2-309   [001]    78.833674: funcgraph_exit:       # 1939.356 us |  }
     kworker/1:2-309   [001]    78.839663: funcgraph_exit:       # 3908.825 us |  }
     kworker/5:1-78    [005]    78.841376: funcgraph_exit:       # 1624.089 us |  }
     kworker/1:2-309   [001]    78.843474: funcgraph_exit:       # 1725.975 us |  }
     kworker/5:1-78    [005]    78.845490: funcgraph_exit:       # 1753.258 us |  }
     kworker/5:1-78    [005]    78.850592: funcgraph_exit:       # 2839.801 us |  }
     kworker/2:3-1295  [002]    78.855668: funcgraph_exit:       # 3925.402 us |  }
     kworker/6:1-147   [006]    78.866346: funcgraph_exit:       # 10603.155 us |  }


CPUs can be stuck for over 10 milliseconds doing just printk!

With my patch:

     kworker/0:2-127   [000]    85.902486: funcgraph_exit:       # 1092.105 us |  }
     kworker/2:2-315   [002]    85.904458: funcgraph_exit:       # 1070.174 us |  }
     kworker/4:2-529   [004]    85.907523: funcgraph_exit:       # 1131.189 us |  }
     kworker/6:1-150   [006]    85.909187: funcgraph_exit:       # 1802.074 us |  }
     kworker/7:1-176   [007]    85.910534: funcgraph_exit:       # 1138.249 us |  }
     kworker/1:2-592   [001]    85.911586: funcgraph_exit:       # 1207.807 us |  }
     kworker/2:2-315   [002]    85.914585: funcgraph_exit:       # 1183.669 us |  }
     kworker/6:1-150   [006]    85.915426: funcgraph_exit:       # 1019.587 us |  }
     kworker/5:2-352   [005]    85.916516: funcgraph_exit:       # 1120.144 us |  }
     kworker/3:1-72    [003]    85.922472: funcgraph_exit:       # 1071.437 us |  }
     kworker/4:2-529   [004]    85.923685: funcgraph_exit:       # 1296.953 us |  }
     kworker/1:2-592   [001]    85.924481: funcgraph_exit:       # 1051.758 us |  }
     kworker/5:2-352   [005]    85.926536: funcgraph_exit:       # 1126.423 us |  }
     kworker/2:2-315   [002]    85.927403: funcgraph_exit:       # 1020.366 us |  }
     kworker/1:2-592   [001]    85.928493: funcgraph_exit:       # 1094.864 us |  }
     kworker/6:1-150   [006]    85.931457: funcgraph_exit:       # 1052.531 us |  }
     kworker/1:2-592   [001]    85.932779: funcgraph_exit:       # 1371.806 us |  }
     kworker/5:2-352   [005]    85.933536: funcgraph_exit:       # 1128.199 us |  }
     kworker/2:2-315   [002]    85.937793: funcgraph_exit:       # 1391.842 us |  }
     kworker/6:1-150   [006]    85.938555: funcgraph_exit:       # 1159.354 us |  }
     kworker/1:2-592   [001]    85.939718: funcgraph_exit:       # 1325.211 us |  }
     kworker/3:1-72    [003]    85.941545: funcgraph_exit:       # 1153.899 us |  }
     kworker/1:2-592   [001]    85.946546: funcgraph_exit:       # 1133.155 us |  }
     kworker/7:1-176   [007]    85.947730: funcgraph_exit:       # 1325.744 us |  }
     kworker/3:1-72    [003]    85.948588: funcgraph_exit:       # 1192.876 us |  }
     kworker/4:2-529   [004]    85.950647: funcgraph_exit:       # 2248.783 us |  }
     kworker/6:1-150   [006]    85.951463: funcgraph_exit:       # 1045.498 us |  }
     kworker/0:2-127   [000]    85.952576: funcgraph_exit:       # 1171.645 us |  }
     kworker/1:2-592   [001]    85.953393: funcgraph_exit:       # 1001.659 us |  }
     kworker/5:2-352   [005]    85.955542: funcgraph_exit:       # 1130.396 us |  }

It spreads the load out much nicer, and seldom goes over 2 milliseconds.

My trace was only for a few seconds (no events lost), and I can see the
max with:

 $ trace-cmd report trace-printk-nopatch-8cpus.dat | grep '#' | cut -d'#' -f1 | sort -n | tail -20
 13510.063 us |  }
 13531.914 us |  }
 13533.591 us |  }
 13574.488 us |  }
 13584.322 us |  }
 13611.234 us |  }
 13668.255 us |  }
 13710.294 us |  }
 13722.017 us |  }
 13725.000 us |  }
 13728.883 us |  }
 13740.601 us |  }
 13744.194 us |  }
 13770.512 us |  }
 13776.246 us |  }
 13809.729 us |  }
 13812.279 us |  }
 13830.563 us |  }
 13907.382 us |  }
 14498.937 us |  }

We had a printk take up to 14 millisecond with a VGA console on 8 CPUs,
where each CPU was doing a single printk once per millisecond.

With my patch:

 $ trace-cmd report trace-printk-patch-8cpus.dat |grep '#' | cut -d'#' -f 2 |sort -n | tail -20
 2477.627 us |  }
 2482.012 us |  }
 2482.077 us |  }
 2488.672 us |  }
 2490.253 us |  }
 2502.381 us |  }
 2503.990 us |  }
 2505.448 us |  }
 2509.389 us |  }
 2510.868 us |  }
 2511.597 us |  }
 2512.108 us |  }
 2538.886 us |  }
 3095.917 us |  }
 3137.604 us |  }
 3223.213 us |  }
 3324.967 us |  }
 3331.018 us |  }
 3331.518 us |  }
 3348.263 us |  }

We got up to just over 3 milliseconds for a single printk.

I think that's a damn good improvement.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12  3:21               ` Steven Rostedt
@ 2018-01-12 10:05                 ` Sergey Senozhatsky
  2018-01-12 12:21                   ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-12 10:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

Steven, we are having too many things in one email, I've dropped most
of them to concentrate on one topic only.

On (01/11/18 22:21), Steven Rostedt wrote:
[..]
>
> After playing with the module in my last email, I think your trying to
> solve multiple printks, not one that is stuck

I wouldn't say so. I'm trying to fix the same thing. but when system has
additional limitations - there are NO concurrent printk-s to hand off to
and A * B > C, so we can't have "last console_sem prints it all" bounded
to O(A * B).

- no concurrent printk-s to hand off is explainable - preemption under
  console_sem and the fact that console_sem is a sleeping lock.

- on a system with slow consoles A * B > C is also pretty clear.

- slow consoles make preemption under console_sem more likely.


to summarize:

1) I have a slow serial console. call_console_drivers() is significantly
   slower than log_store().

   the disproportion can be 1:1000. that is while CPUA prints a single
   logbuf message, other CPUs can add 1000 new entries.

2) not every CPU that stuck in console_unlock() came there through printk().
   CPUs that directly call console_lock() can sleep under console_sem. a bunch
   of printk-s can happen in the meantime -- OOM can happen in the meantime;
   no hand off will happen.

3) console_unlock(void)
   {
	for (;;) {
		printk_safe_enter_irqsave(flags);
		// lock-unlock logbuf
		call_console_drivers(ext_text, ext_len, text, len);
		printk_safe_exit_irqrestore(flags);
	}
   }

with slow serial console, call_console_drivers() takes enough time to
to make preemption of a current console_sem owner right after it irqrestore()
highly possible; unless there is a spinning console_waiter. which easily may
not be there; but can come in while current console_sem is preempted, why not.
so when preempted console_sem owner comes back - it suddenly has a whole bunch
of new messages to print and on one to hand off printing to. in a super
imperfect and ugly world, BTW, this is how console_unlock() still can be
O(infinite): schedule between the printed lines [even !PREEMPT kernel tries
to cond_resched() after every line it prints] from current console_sem
owner and printk() while console_sem owner is scheduled out.

4) the interesting thing here is that call_console_drivers() can
   cause console_sem owner to schedule even if it has handed off the
   ownership. because waiting CPU has to spin with local IRQs disabled
   as long as call_console_drivers() prints its message. so if consoles
   are slow, then the first thing the waiter will face after it receives
   the console_sem ownership and enables the IRQs is - preemption.

   so hand off is not immediate. there is a possibility of re-scheduling
   between hand off and actual printing. so that "there is always an active
   printing CPU" is not quite true.

vprintk_emit()
{

	console_trylock_spinning(void)
	{
	   printk_safe_enter_irqsave(flags);
	   while (READ_ONCE(console_waiter))       // spins as long as call_console_drivers() on other CPU
	        cpu_relax();
	   printk_safe_exit_irqrestore(flags);
--->	}
|						   // preemptible up until printk_safe_enter_irqsave() in console_unlock()
|	console_unlock()
|	{
|		
|		....
|		for (;;) {
|-------------->	printk_safe_enter_irqsave(flags);
			....
		}

	}
}

reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
thing after all.

preemption latencies can be high. especially during OOM. I went
through reports that Tetsuo provided over the years. On some of
his tests preempted console_sem owner can sleep long enough to
let other CPUs to start overflowing the logbuf with the pending
messages.

more on preemption. see this email, for instance. a bunch of links in
the middle, scroll down:
https://marc.info/?l=linux-kernel&m=151375384500555


BTW, note the disclaimer [in capitals] -

	LIKE I SAID, IF STEVEN OR PETR WANT TO PUSH THE PATCH, I'M NOT
	GOING TO BLOCK IT.


> > and I demonstrated how exactly we end up having a full logbuf of pending
> > messages even on systems with faster consoles.
> 
> Where did you demonstrate that. There's so many emails I can't keep up.
> 
> But still, take a look at my simple module. I locked up the system
> immediately with something that shouldn't have locked up the system.
> And my patch fixed it. I think that speaks louder than any of our
> opinions.

sure it will!
you don't have scheduler latencies mixed in under console_sem (neither in
vprintk_emit(), nor in console_unlock(), nor anywhere in between), you have
printks only from non-preemptible contexts, so your hand off logic always
works and is never preempted, you have concurrent printks from many CPUs,
so once again your hand off logic always works, and you have fast console,
and, due to hand off, console_sem is never up() so no schedulable context
can ever acquire it - you pass it between non-preemptible printk CPUs only.
I cannot see why your patch would not help. your patch works fine in these
conditions, I said it many times. and I have no issues with that. my setups
(real HW, by the way) are far from those conditions. but there is an active
denial of that.

anyway. like I said weeks ago and repeated it in several emails: I have
no intention to NACK or block the patch.
but the patch is not doing enough. that's all I'm saying.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12 10:05                 ` Sergey Senozhatsky
@ 2018-01-12 12:21                   ` Steven Rostedt
  2018-01-12 12:55                     ` Petr Mladek
  2018-01-13  7:28                     ` Sergey Senozhatsky
  0 siblings, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12 12:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Fri, 12 Jan 2018 19:05:44 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> Steven, we are having too many things in one email, I've dropped most
> of them to concentrate on one topic only.

I totally agree, and I believe this is the reason behind the tensions
between us. We are not discussing the topic of the patch.


> 
> On (01/11/18 22:21), Steven Rostedt wrote:
> [..]
> >
> > After playing with the module in my last email, I think your trying to
> > solve multiple printks, not one that is stuck  
> 
> I wouldn't say so. I'm trying to fix the same thing. but when system has
> additional limitations - there are NO concurrent printk-s to hand off to
> and A * B > C, so we can't have "last console_sem prints it all" bounded
> to O(A * B).
> 
> - no concurrent printk-s to hand off is explainable - preemption under
>   console_sem and the fact that console_sem is a sleeping lock.
> 
> - on a system with slow consoles A * B > C is also pretty clear.
> 
> - slow consoles make preemption under console_sem more likely.
> 
> 
> to summarize:
> 
> 1) I have a slow serial console. call_console_drivers() is significantly
>    slower than log_store().
> 
>    the disproportion can be 1:1000. that is while CPUA prints a single
>    logbuf message, other CPUs can add 1000 new entries.
> 
> 2) not every CPU that stuck in console_unlock() came there through printk().
>    CPUs that directly call console_lock() can sleep under console_sem. a bunch
>    of printk-s can happen in the meantime -- OOM can happen in the meantime;
>    no hand off will happen.

Yep, but I'm still not convinced you are seeing an issue with a single
printk. An OOM does not do everything in one printk, it calls hundreds.
Having hundreds of printks is an issue, especially in critical sections.

The thing is, all of your analysis has been done on a system with the
bug my patch fixes. The bug being, that any printk has no limit to how
much it can print, regardless of logbuf size.

When debugging an issue, if I find a bug that can affect that issue,
although it may not be the cause, I fix that first, and start over
looking at the original issue, because that bug fix can have an effect,
and in lots of cases, fixing the bug makes the fix for the original
bug easier.

There's two issues here:

 #1) The bug I'm fixing. printk() can get stuck printing forever. I
 demonstrated this by a simple module, that locked up the system by
 doing something that was not stressful.

 #2) The bug you are seeing, where printk can trigger the watchdog
 timer. This is much harder to hit. I have not seen any simple module
 that can trigger it.

This patch series is focused on fixing #1, #2 is out of scope, and
continuing discussing it will just cause us to argue more.


> 
> 3) console_unlock(void)
>    {
> 	for (;;) {
> 		printk_safe_enter_irqsave(flags);
> 		// lock-unlock logbuf
> 		call_console_drivers(ext_text, ext_len, text, len);
> 		printk_safe_exit_irqrestore(flags);
> 	}
>    }
> 
> with slow serial console, call_console_drivers() takes enough time to
> to make preemption of a current console_sem owner right after it irqrestore()
> highly possible; unless there is a spinning console_waiter. which easily may
> not be there; but can come in while current console_sem is preempted, why not.
> so when preempted console_sem owner comes back - it suddenly has a whole bunch
> of new messages to print and on one to hand off printing to. in a super
> imperfect and ugly world, BTW, this is how console_unlock() still can be
> O(infinite): schedule between the printed lines [even !PREEMPT kernel tries

I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
PREEMPT kernels than !PREEMPT ones.

> to cond_resched() after every line it prints] from current console_sem
> owner and printk() while console_sem owner is scheduled out.
> 
> 4) the interesting thing here is that call_console_drivers() can
>    cause console_sem owner to schedule even if it has handed off the
>    ownership. because waiting CPU has to spin with local IRQs disabled
>    as long as call_console_drivers() prints its message. so if consoles
>    are slow, then the first thing the waiter will face after it receives
>    the console_sem ownership and enables the IRQs is - preemption.

If the waiter is preempted, that means its not in a critical section.
Isn't that what you want?

> 
>    so hand off is not immediate. there is a possibility of re-scheduling
>    between hand off and actual printing. so that "there is always an active
>    printing CPU" is not quite true.
> 
> vprintk_emit()
> {
> 
> 	console_trylock_spinning(void)
> 	{
> 	   printk_safe_enter_irqsave(flags);
> 	   while (READ_ONCE(console_waiter))       // spins as long as call_console_drivers() on other CPU
> 	        cpu_relax();
> 	   printk_safe_exit_irqrestore(flags);
> --->	}  
> |						   // preemptible up until printk_safe_enter_irqsave() in console_unlock()

Again, this means the waiter is not in a critical section. Why do we
care?

You bring up a good point, that shows that my patch helps you
statistically. We want printks that are not in critical sections
(interrupts or preemption disabled) to do the most work. With my patch,
those that call printk in an atomic section, are the ones most likely
not have to print more than what they are printing. Because they will
have the console lock without having "console ownership" for the
shortest time. Remember, there is no hand off if you own console lock
without console ownership.

Those that can be preempted, are most likely to have console lock
without console ownership, and have to do the most printing.


> |	console_unlock()
> |	{
> |		
> |		....
> |		for (;;) {
> |-------------->	printk_safe_enter_irqsave(flags);
> 			....
> 		}
> 
> 	}
> }
> 
> reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
> thing after all.

I would analyze that more before doing so. Because with my patch, I
think we make those that can do long prints (without triggering a
watchdog), the ones most likely doing the long prints.

> 
> preemption latencies can be high. especially during OOM. I went
> through reports that Tetsuo provided over the years. On some of
> his tests preempted console_sem owner can sleep long enough to
> let other CPUs to start overflowing the logbuf with the pending
> messages.

Sure, that's fine. Because if the one that has console_lock can be
preempted, it should be fine to take time to do printks.

> 
> more on preemption. see this email, for instance. a bunch of links in
> the middle, scroll down:
> https://marc.info/?l=linux-kernel&m=151375384500555
> 
> 
> BTW, note the disclaimer [in capitals] -
> 
> 	LIKE I SAID, IF STEVEN OR PETR WANT TO PUSH THE PATCH, I'M NOT
> 	GOING TO BLOCK IT.

GREAT! Then we can continue this conversation after the patch goes in.
Because I'm focused on fixing #1 above.

> 
> 
> > > and I demonstrated how exactly we end up having a full logbuf of pending
> > > messages even on systems with faster consoles.  
> > 
> > Where did you demonstrate that. There's so many emails I can't keep up.
> > 
> > But still, take a look at my simple module. I locked up the system
> > immediately with something that shouldn't have locked up the system.
> > And my patch fixed it. I think that speaks louder than any of our
> > opinions.  
> 
> sure it will!
> you don't have scheduler latencies mixed in under console_sem (neither in
> vprintk_emit(), nor in console_unlock(), nor anywhere in between), you have
> printks only from non-preemptible contexts, so your hand off logic always
> works and is never preempted, you have concurrent printks from many CPUs,
> so once again your hand off logic always works, and you have fast console,
> and, due to hand off, console_sem is never up() so no schedulable context
> can ever acquire it - you pass it between non-preemptible printk CPUs only.
> I cannot see why your patch would not help. your patch works fine in these
> conditions, I said it many times. and I have no issues with that. my setups
> (real HW, by the way) are far from those conditions. but there is an active
> denial of that.

OK, I modified my module to include a loop variable. You can add in a
loop variable and the printer now does this:

	while (!READ_ONCE(stop_testing)) {
		for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
			if (i & 1)
				preempt_disable();
			pr_emerg("%5d%-75s\n", smp_processor_id(), " XXX PREEMPT");
			if (i & 1)
				preempt_enable();
		}
		msleep(1);
	}

So I do the printk "loops" times (defined by what variable you put in
as the module parameter). With my patch, I ran it with 10, then 100 and
then 100000! (It's still running). Every other printk is done with
preemption enabled. Is this what you mean?

I ran this with my patch with and without serial enabled (with
hyper-threading on 8 CPUs). Runs fine. 100,000 loops! Yes, and with
CONFIG_PREEMPT=y

Note, doing the preemption makes it harder to lock up the current
kernel. I was not able to lock it up even with serial console. This
goes to show that having printk called with preemption enabled, makes
the preempted printk much more likely to be the one stuck doing the
preemption. That means, statistically, the "safe" printks will be the
more likely one to print.

In fact, I had to add another option to my module to make it go back to
only calling printk without preemption enabled. That locks up the
kernel again with a slow console.

Then I ran this without serial enabled (just VGA) on the kernel without
my patch. With the printk always being called with preemption
disabled, it only took loops=100 before to make it lock up!

Yes, I'm able to lock up the kernel with no slow console, with a simple
loop of 100 printks. Where my patch allows me to do 100,000 printks in
that loop and I hardly notice it. But this only locks up if all printks
are called without preemption (call my module with preempt=1).

If I can lock up the kernel with a single fast console, with only a 100
printks per millisecond, I think that's a pretty serious bug. And my
patch fixes it.


I was not able lock up the system when calling printk with preemption
enabled with or without serial on the current kernel. I think this
shows that my point that statistically, a preemptable printk is more
likely to get stuck doing the slow prints. And since it can be
preempted, it doesn't affect the system at all. And the more it gets
preempted, the more likely it will continue doing the prints. Which is
a good thing.

> 
> anyway. like I said weeks ago and repeated it in several emails: I have
> no intention to NACK or block the patch.
> but the patch is not doing enough. that's all I'm saying.
> 

Great, then Petr can start pushing this through.

Below is my latest module I used for testing:

-- Steve

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>

static bool stop_testing;
static unsigned int loops = 1;
static int preempt;

static void preempt_printk_workfn(struct work_struct *work)
{
	int i;

	while (!READ_ONCE(stop_testing)) {
		for (i = 0; i < loops && !READ_ONCE(stop_testing); i++)
	{ bool no_preempt = preempt || (i & 1);

			if (no_preempt)
				preempt_disable();
			pr_emerg("%5d%-75s\n", smp_processor_id(),
				 no_preempt ? " XXX NOPREEMPT" : " XXX
			PREEMPT"); if (no_preempt)
				preempt_enable();
		}
		msleep(1);
	}
}

static struct work_struct __percpu *works;

static void finish(void)
{
	int cpu;

	WRITE_ONCE(stop_testing, true);
	for_each_online_cpu(cpu)
		flush_work(per_cpu_ptr(works, cpu));
	free_percpu(works);
}

static int __init test_init(void)
{
	int cpu;

	works = alloc_percpu(struct work_struct);
	if (!works)
		return -ENOMEM;

	/*
	 * This is just a test module. This will break if you
	 * do any CPU hot plugging between loading and
	 * unloading the module.
	 */

	for_each_online_cpu(cpu) {
		struct work_struct *work = per_cpu_ptr(works, cpu);

		INIT_WORK(work, &preempt_printk_workfn);
		schedule_work_on(cpu, work);
	}

	return 0;
}

static void __exit test_exit(void)
{
	finish();
}

module_param(loops, uint, 0);
module_param(preempt, int, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12 12:21                   ` Steven Rostedt
@ 2018-01-12 12:55                     ` Petr Mladek
  2018-01-13  7:31                       ` Sergey Senozhatsky
  2018-01-15 12:08                       ` Steven Rostedt
  2018-01-13  7:28                     ` Sergey Senozhatsky
  1 sibling, 2 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-12 12:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Fri 2018-01-12 07:21:23, Steven Rostedt wrote:
> On Fri, 12 Jan 2018 19:05:44 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
> > 3) console_unlock(void)
> >    {
> > 	for (;;) {
> > 		printk_safe_enter_irqsave(flags);
> > 		// lock-unlock logbuf
> > 		call_console_drivers(ext_text, ext_len, text, len);
> > 		printk_safe_exit_irqrestore(flags);
> > 	}
> >    }
> > 
> > with slow serial console, call_console_drivers() takes enough time to
> > to make preemption of a current console_sem owner right after it irqrestore()
> > highly possible; unless there is a spinning console_waiter. which easily may
> > not be there; but can come in while current console_sem is preempted, why not.
> > so when preempted console_sem owner comes back - it suddenly has a whole bunch
> > of new messages to print and on one to hand off printing to. in a super
> > imperfect and ugly world, BTW, this is how console_unlock() still can be
> > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries
> 
> I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
> kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
> PREEMPT kernels than !PREEMPT ones.

I would say that the patch improves also console_unlock() but only in
non-preemttive context.

By other words, it makes console_unlock() finite in preemptible context
(limited by buffer size). It might still be unlimited in
non-preemtible context.


> > to cond_resched() after every line it prints] from current console_sem
> > owner and printk() while console_sem owner is scheduled out.
> > 
> > 4) the interesting thing here is that call_console_drivers() can
> >    cause console_sem owner to schedule even if it has handed off the
> >    ownership. because waiting CPU has to spin with local IRQs disabled
> >    as long as call_console_drivers() prints its message. so if consoles
> >    are slow, then the first thing the waiter will face after it receives
> >    the console_sem ownership and enables the IRQs is - preemption.
> >    so hand off is not immediate. there is a possibility of re-scheduling
> >    between hand off and actual printing. so that "there is always an active
> >    printing CPU" is not quite true.
> > 
> > vprintk_emit()
> > {
> > 
> > 	console_trylock_spinning(void)
> > 	{
> > 	   printk_safe_enter_irqsave(flags);
> > 	   while (READ_ONCE(console_waiter))       // spins as long as call_console_drivers() on other CPU
> > 	        cpu_relax();
> > 	   printk_safe_exit_irqrestore(flags);
> > --->	}  
> > |						   // preemptible up until printk_safe_enter_irqsave() in console_unlock()
> > |	console_unlock()
> > |	{
> > |		
> > |		....
> > |		for (;;) {
> > |-------------->	printk_safe_enter_irqsave(flags);
> > 			....
> > 		}
> > 
> > 	}
> > }
> > 
> > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
> > thing after all.
> 
> I would analyze that more before doing so. Because with my patch, I
> think we make those that can do long prints (without triggering a
> watchdog), the ones most likely doing the long prints.

IMHO, it might make sense because it would help to see the messages
faster. But I would prefer to handle this separately because it
might also increase the risk of softlockups. Therefore it might
cause regressions.

We should also take into account the commit 8d91f8b15361dfb438ab6
("printk: do cond_resched() between lines while outputting to
consoles"). It has the same effect for console_lock() callers.

> > BTW, note the disclaimer [in capitals] -
> > 
> > 	LIKE I SAID, IF STEVEN OR PETR WANT TO PUSH THE PATCH, I'M NOT
> > 	GOING TO BLOCK IT.
> 
> GREAT! Then we can continue this conversation after the patch goes in.
> Because I'm focused on fixing #1 above.

Thanks for the disclaimer!

> > anyway. like I said weeks ago and repeated it in several emails: I have
> > no intention to NACK or block the patch.
> > but the patch is not doing enough. that's all I'm saying.
> 
> Great, then Petr can start pushing this through.
> 
> Below is my latest module I used for testing:

I am going to send v6 with fixes suggested for the 2nd patch by Steven.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers
  2018-01-11 12:03     ` Petr Mladek
@ 2018-01-12 15:37       ` Steven Rostedt
  2018-01-12 16:08         ` Petr Mladek
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12 15:37 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Thu, 11 Jan 2018 13:03:41 +0100
Petr Mladek <pmladek@suse.com> wrote:

> > > +static DEFINE_RAW_SPINLOCK(console_owner_lock);
> > > +static struct task_struct *console_owner;
> > > +static bool console_waiter;
> > > +
> > > +/**
> > > + * console_lock_spinning_enable - mark beginning of code where another
> > > + *	thread might safely busy wait
> > > + *
> > > + * This might be called in sections where the current console_lock owner  
> > 
> > 
> > "might be"? It has to be called in sections where the current
> > console_lock owner can not sleep. It's basically saying "console lock is
> > now acting like a spinlock".  
> 
> I am afraid that both explanations are confusing. Your one sounds like
> it must be called every time we enter non-preemptive context in
> console_unlock. What about the following?
> 
>  * This is basically saying that "console lock is now acting like
>  * a spinlock". It can be called _only_ in sections where the current
>  * console_lock owner could not sleep. Also it must be ready to hand
>  * over the lock at the end of the section.

I would reword the above:

   * This basically converts console_lock into a spinlock. This marks
   * the section where the console_lock owner can not sleep, because
   * there may be a waiter spinning (like a spinlock). Also it must be
   * ready to hand over the lock at the end of the section.

> 
> > > + * cannot sleep. It is a signal that another thread might start busy
> > > + * waiting for console_lock.
> > > + */  
> 
> All the other changes look good to me. I will use them in the next version.

Great.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers
  2018-01-12 15:37       ` Steven Rostedt
@ 2018-01-12 16:08         ` Petr Mladek
  2018-01-12 16:36           ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-12 16:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Fri 2018-01-12 10:37:54, Steven Rostedt wrote:
> On Thu, 11 Jan 2018 13:03:41 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> > All the other changes look good to me. I will use them in the next version.
> 
> Great.

Please, find below the updated version. If I get Ack at least from
Steven and no nack's, I will put it into linux-next next week.


>From f67f70d910d9cf310a7bc73e97bf14097d31b059 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Fri, 22 Dec 2017 18:58:46 +0100
Subject: [PATCH v6 2/4] printk: Hide console waiter logic into helpers

The commit ("printk: Add console owner and waiter logic to load balance
console writes") made vprintk_emit() and console_unlock() even more
complicated.

This patch extracts the new code into 3 helper functions. They should
help to keep it rather self-contained. It will be easier to use and
maintain.

This patch just shuffles the existing code. It does not change
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
Changes against v5:
 
  + updated some comments (Steven)
  + do console_trylock() in console_trylock_spinning() (Steven)

 kernel/printk/printk.c | 245 +++++++++++++++++++++++++++++--------------------
 1 file changed, 148 insertions(+), 97 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7e6459abba43..3057dbc69b4f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers);
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
 };
-static struct lockdep_map console_owner_dep_map = {
-	.name = "console_owner"
-};
 #endif
 
-static DEFINE_RAW_SPINLOCK(console_owner_lock);
-static struct task_struct *console_owner;
-static bool console_waiter;
-
 enum devkmsg_log_bits {
 	__DEVKMSG_LOG_BIT_ON = 0,
 	__DEVKMSG_LOG_BIT_OFF,
@@ -1551,6 +1544,146 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
 }
 
 /*
+ * Special console_lock variants that help to reduce the risk of soft-lockups.
+ * They allow to pass console_lock to another printk() call using a busy wait.
+ */
+
+#ifdef CONFIG_LOCKDEP
+static struct lockdep_map console_owner_dep_map = {
+	.name = "console_owner"
+};
+#endif
+
+static DEFINE_RAW_SPINLOCK(console_owner_lock);
+static struct task_struct *console_owner;
+static bool console_waiter;
+
+/**
+ * console_lock_spinning_enable - mark beginning of code where another
+ *	thread might safely busy wait
+ *
+ * This basically converts console_lock into a spinlock. This marks
+ * the section where the console_lock owner can not sleep, because
+ * there may be a waiter spinning (like a spinlock). Also it must be
+ * ready to hand over the lock at the end of the section.
+ */
+static void console_lock_spinning_enable(void)
+{
+	raw_spin_lock(&console_owner_lock);
+	console_owner = current;
+	raw_spin_unlock(&console_owner_lock);
+
+	/* The waiter may spin on us after setting console_owner */
+	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+}
+
+/**
+ * console_lock_spinning_disable_and_check - mark end of code where another
+ *	thread was able to busy wait and check if there is a waiter
+ *
+ * This is called at the end of the section where spinning is allowed.
+ * It has two functions. First, it is a signal that it is not longer
+ * safe to start busy waiting for the lock. Second, it checks if
+ * there is a busy waiter and passes the lock rights to her.
+ *
+ * Important: Callers lose the lock if there was the busy waiter.
+ *	They must not touch items synchronized by console_lock
+ *	in this case.
+ *
+ * Return: 1 if the lock rights were passed, 0 otherwise.
+ */
+static int console_lock_spinning_disable_and_check(void)
+{
+	int waiter;
+
+	raw_spin_lock(&console_owner_lock);
+	waiter = READ_ONCE(console_waiter);
+	console_owner = NULL;
+	raw_spin_unlock(&console_owner_lock);
+
+	if (!waiter) {
+		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+		return 0;
+	}
+
+	/* The waiter is now free to continue */
+	WRITE_ONCE(console_waiter, false);
+
+	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+
+	/*
+	 * Hand off console_lock to waiter. The waiter will perform
+	 * the up(). After this, the waiter is the console_lock owner.
+	 */
+	mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
+	return 1;
+}
+
+/**
+ * console_trylock_spinning - try to get console_lock by busy waiting
+ *
+ * This allows to busy wait for the console_lock when the current
+ * owner is running in a special marked sections. It means that
+ * the current owner is running and cannot reschedule until it
+ * is ready to loose the lock.
+ *
+ * Return: 1 if we got the lock, 0 othrewise
+ */
+static int console_trylock_spinning(void)
+{
+	struct task_struct *owner = NULL;
+	bool waiter;
+	bool spin = false;
+	unsigned long flags;
+
+	if (console_trylock())
+		return 1;
+
+	printk_safe_enter_irqsave(flags);
+
+	raw_spin_lock(&console_owner_lock);
+	owner = READ_ONCE(console_owner);
+	waiter = READ_ONCE(console_waiter);
+	if (!waiter && owner && owner != current) {
+		WRITE_ONCE(console_waiter, true);
+		spin = true;
+	}
+	raw_spin_unlock(&console_owner_lock);
+
+	/*
+	 * If there is an active printk() writing to the
+	 * consoles, instead of having it write our data too,
+	 * see if we can offload that load from the active
+	 * printer, and do some printing ourselves.
+	 * Go into a spin only if there isn't already a waiter
+	 * spinning, and there is an active printer, and
+	 * that active printer isn't us (recursive printk?).
+	 */
+	if (!spin) {
+		printk_safe_exit_irqrestore(flags);
+		return 0;
+	}
+
+	/* We spin waiting for the owner to release us */
+	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+	/* Owner will clear console_waiter on hand off */
+	while (READ_ONCE(console_waiter))
+		cpu_relax();
+	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+
+	printk_safe_exit_irqrestore(flags);
+	/*
+	 * The owner passed the console lock to us.
+	 * Since we did not spin on console lock, annotate
+	 * this as a trylock. Otherwise lockdep will
+	 * complain.
+	 */
+	mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
+
+	return 1;
+}
+
+/*
  * Call the console drivers, asking them to write out
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
@@ -1760,56 +1893,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * semaphore.  The release will print out buffers and wake up
 		 * /dev/kmsg and syslog() users.
 		 */
-		if (console_trylock()) {
+		if (console_trylock_spinning())
 			console_unlock();
-		} else {
-			struct task_struct *owner = NULL;
-			bool waiter;
-			bool spin = false;
-
-			printk_safe_enter_irqsave(flags);
-
-			raw_spin_lock(&console_owner_lock);
-			owner = READ_ONCE(console_owner);
-			waiter = READ_ONCE(console_waiter);
-			if (!waiter && owner && owner != current) {
-				WRITE_ONCE(console_waiter, true);
-				spin = true;
-			}
-			raw_spin_unlock(&console_owner_lock);
-
-			/*
-			 * If there is an active printk() writing to the
-			 * consoles, instead of having it write our data too,
-			 * see if we can offload that load from the active
-			 * printer, and do some printing ourselves.
-			 * Go into a spin only if there isn't already a waiter
-			 * spinning, and there is an active printer, and
-			 * that active printer isn't us (recursive printk?).
-			 */
-			if (spin) {
-				/* We spin waiting for the owner to release us */
-				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
-				/* Owner will clear console_waiter on hand off */
-				while (READ_ONCE(console_waiter))
-					cpu_relax();
-
-				spin_release(&console_owner_dep_map, 1, _THIS_IP_);
-				printk_safe_exit_irqrestore(flags);
-
-				/*
-				 * The owner passed the console lock to us.
-				 * Since we did not spin on console lock, annotate
-				 * this as a trylock. Otherwise lockdep will
-				 * complain.
-				 */
-				mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
-				console_unlock();
-				printk_safe_enter_irqsave(flags);
-			}
-			printk_safe_exit_irqrestore(flags);
-
-		}
 	}
 
 	return printed_len;
@@ -1910,6 +1995,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
 static ssize_t msg_print_ext_body(char *buf, size_t size,
 				  char *dict, size_t dict_len,
 				  char *text, size_t text_len) { return 0; }
+static void console_lock_spinning_enable(void) { }
+static int console_lock_spinning_disable_and_check(void) { return 0; }
 static void call_console_drivers(const char *ext_text, size_t ext_len,
 				 const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg,
@@ -2196,7 +2283,6 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
-	bool waiter = false;
 	bool do_cond_resched, retry;
 
 	if (console_suspended) {
@@ -2291,31 +2377,16 @@ void console_unlock(void)
 		 * finish. This task can not be preempted if there is a
 		 * waiter waiting to take over.
 		 */
-		raw_spin_lock(&console_owner_lock);
-		console_owner = current;
-		raw_spin_unlock(&console_owner_lock);
-
-		/* The waiter may spin on us after setting console_owner */
-		spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+		console_lock_spinning_enable();
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
-		raw_spin_lock(&console_owner_lock);
-		waiter = READ_ONCE(console_waiter);
-		console_owner = NULL;
-		raw_spin_unlock(&console_owner_lock);
-
-		/*
-		 * If there is a waiter waiting for us, then pass the
-		 * rest of the work load over to that waiter.
-		 */
-		if (waiter)
-			break;
-
-		/* There was no waiter, and nothing will spin on us here */
-		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+		if (console_lock_spinning_disable_and_check()) {
+			printk_safe_exit_irqrestore(flags);
+			return;
+		}
 
 		printk_safe_exit_irqrestore(flags);
 
@@ -2323,26 +2394,6 @@ void console_unlock(void)
 			cond_resched();
 	}
 
-	/*
-	 * If there is an active waiter waiting on the console_lock.
-	 * Pass off the printing to the waiter, and the waiter
-	 * will continue printing on its CPU, and when all writing
-	 * has finished, the last printer will wake up klogd.
-	 */
-	if (waiter) {
-		WRITE_ONCE(console_waiter, false);
-		/* The waiter is now free to continue */
-		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
-		/*
-		 * Hand off console_lock to waiter. The waiter will perform
-		 * the up(). After this, the waiter is the console_lock owner.
-		 */
-		mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
-		printk_safe_exit_irqrestore(flags);
-		/* Note, if waiter is set, logbuf_lock is not held */
-		return;
-	}
-
 	console_locked = 0;
 
 	/* Release the exclusive_console once it is used */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers
  2018-01-12 16:08         ` Petr Mladek
@ 2018-01-12 16:36           ` Steven Rostedt
  2018-01-15 16:08             ` Petr Mladek
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12 16:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Fri, 12 Jan 2018 17:08:37 +0100
Petr Mladek <pmladek@suse.com> wrote:

> On Fri 2018-01-12 10:37:54, Steven Rostedt wrote:
> > On Thu, 11 Jan 2018 13:03:41 +0100
> > Petr Mladek <pmladek@suse.com> wrote:  
> > > All the other changes look good to me. I will use them in the next version.  
> > 
> > Great.  
> 
> Please, find below the updated version. If I get Ack at least from
> Steven and no nack's, I will put it into linux-next next week.
> 

Typos below.

> 
> >From f67f70d910d9cf310a7bc73e97bf14097d31b059 Mon Sep 17 00:00:00 2001  
> From: Petr Mladek <pmladek@suse.com>
> Date: Fri, 22 Dec 2017 18:58:46 +0100
> Subject: [PATCH v6 2/4] printk: Hide console waiter logic into helpers
> 
> The commit ("printk: Add console owner and waiter logic to load balance
> console writes") made vprintk_emit() and console_unlock() even more
> complicated.
> 
> This patch extracts the new code into 3 helper functions. They should
> help to keep it rather self-contained. It will be easier to use and
> maintain.
> 
> This patch just shuffles the existing code. It does not change
> the functionality.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
> Changes against v5:
>  
>   + updated some comments (Steven)
>   + do console_trylock() in console_trylock_spinning() (Steven)
> 
>  kernel/printk/printk.c | 245 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 148 insertions(+), 97 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 7e6459abba43..3057dbc69b4f 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers);
>  static struct lockdep_map console_lock_dep_map = {
>  	.name = "console_lock"
>  };
> -static struct lockdep_map console_owner_dep_map = {
> -	.name = "console_owner"
> -};
>  #endif
>  
> -static DEFINE_RAW_SPINLOCK(console_owner_lock);
> -static struct task_struct *console_owner;
> -static bool console_waiter;
> -
>  enum devkmsg_log_bits {
>  	__DEVKMSG_LOG_BIT_ON = 0,
>  	__DEVKMSG_LOG_BIT_OFF,
> @@ -1551,6 +1544,146 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
>  }
>  
>  /*
> + * Special console_lock variants that help to reduce the risk of soft-lockups.
> + * They allow to pass console_lock to another printk() call using a busy wait.
> + */
> +
> +#ifdef CONFIG_LOCKDEP
> +static struct lockdep_map console_owner_dep_map = {
> +	.name = "console_owner"
> +};
> +#endif
> +
> +static DEFINE_RAW_SPINLOCK(console_owner_lock);
> +static struct task_struct *console_owner;
> +static bool console_waiter;
> +
> +/**
> + * console_lock_spinning_enable - mark beginning of code where another
> + *	thread might safely busy wait
> + *
> + * This basically converts console_lock into a spinlock. This marks
> + * the section where the console_lock owner can not sleep, because
> + * there may be a waiter spinning (like a spinlock). Also it must be
> + * ready to hand over the lock at the end of the section.
> + */
> +static void console_lock_spinning_enable(void)
> +{
> +	raw_spin_lock(&console_owner_lock);
> +	console_owner = current;
> +	raw_spin_unlock(&console_owner_lock);
> +
> +	/* The waiter may spin on us after setting console_owner */
> +	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> +}
> +
> +/**
> + * console_lock_spinning_disable_and_check - mark end of code where another
> + *	thread was able to busy wait and check if there is a waiter
> + *
> + * This is called at the end of the section where spinning is allowed.
> + * It has two functions. First, it is a signal that it is not longer

"it is no longer safe"

> + * safe to start busy waiting for the lock. Second, it checks if
> + * there is a busy waiter and passes the lock rights to her.
> + *
> + * Important: Callers lose the lock if there was the busy waiter.

"if there was a busy waiter"

> + *	They must not touch items synchronized by console_lock
> + *	in this case.
> + *
> + * Return: 1 if the lock rights were passed, 0 otherwise.
> + */
> +static int console_lock_spinning_disable_and_check(void)
> +{
> +	int waiter;
> +
> +	raw_spin_lock(&console_owner_lock);
> +	waiter = READ_ONCE(console_waiter);
> +	console_owner = NULL;
> +	raw_spin_unlock(&console_owner_lock);
> +
> +	if (!waiter) {
> +		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> +		return 0;
> +	}
> +
> +	/* The waiter is now free to continue */
> +	WRITE_ONCE(console_waiter, false);
> +
> +	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> +
> +	/*
> +	 * Hand off console_lock to waiter. The waiter will perform
> +	 * the up(). After this, the waiter is the console_lock owner.
> +	 */
> +	mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
> +	return 1;
> +}
> +
> +/**
> + * console_trylock_spinning - try to get console_lock by busy waiting
> + *
> + * This allows to busy wait for the console_lock when the current
> + * owner is running in a special marked sections. It means that

 "running in specially marked sections."


> + * the current owner is running and cannot reschedule until it
> + * is ready to loose the lock.

"ready to lose the lock."

> + *
> + * Return: 1 if we got the lock, 0 othrewise
> + */
> +static int console_trylock_spinning(void)
> +{
> +	struct task_struct *owner = NULL;
> +	bool waiter;
> +	bool spin = false;
> +	unsigned long flags;
> +
> +	if (console_trylock())
> +		return 1;
> +
> +	printk_safe_enter_irqsave(flags);
> +
> +	raw_spin_lock(&console_owner_lock);
> +	owner = READ_ONCE(console_owner);
> +	waiter = READ_ONCE(console_waiter);
> +	if (!waiter && owner && owner != current) {
> +		WRITE_ONCE(console_waiter, true);
> +		spin = true;
> +	}
> +	raw_spin_unlock(&console_owner_lock);
> +
> +	/*
> +	 * If there is an active printk() writing to the
> +	 * consoles, instead of having it write our data too,
> +	 * see if we can offload that load from the active
> +	 * printer, and do some printing ourselves.
> +	 * Go into a spin only if there isn't already a waiter
> +	 * spinning, and there is an active printer, and
> +	 * that active printer isn't us (recursive printk?).
> +	 */
> +	if (!spin) {
> +		printk_safe_exit_irqrestore(flags);
> +		return 0;
> +	}
> +
> +	/* We spin waiting for the owner to release us */
> +	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> +	/* Owner will clear console_waiter on hand off */
> +	while (READ_ONCE(console_waiter))
> +		cpu_relax();
> +	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> +
> +	printk_safe_exit_irqrestore(flags);
> +	/*
> +	 * The owner passed the console lock to us.
> +	 * Since we did not spin on console lock, annotate
> +	 * this as a trylock. Otherwise lockdep will
> +	 * complain.
> +	 */
> +	mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
> +
> +	return 1;
> +}
> +
> +/*
>   * Call the console drivers, asking them to write out
>   * log_buf[start] to log_buf[end - 1].
>   * The console_lock must be held.
> @@ -1760,56 +1893,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		 * semaphore.  The release will print out buffers and wake up
>  		 * /dev/kmsg and syslog() users.
>  		 */
> -		if (console_trylock()) {
> +		if (console_trylock_spinning())
>  			console_unlock();
> -		} else {
> -			struct task_struct *owner = NULL;
> -			bool waiter;
> -			bool spin = false;
> -
> -			printk_safe_enter_irqsave(flags);
> -
> -			raw_spin_lock(&console_owner_lock);
> -			owner = READ_ONCE(console_owner);
> -			waiter = READ_ONCE(console_waiter);
> -			if (!waiter && owner && owner != current) {
> -				WRITE_ONCE(console_waiter, true);
> -				spin = true;
> -			}
> -			raw_spin_unlock(&console_owner_lock);
> -
> -			/*
> -			 * If there is an active printk() writing to the
> -			 * consoles, instead of having it write our data too,
> -			 * see if we can offload that load from the active
> -			 * printer, and do some printing ourselves.
> -			 * Go into a spin only if there isn't already a waiter
> -			 * spinning, and there is an active printer, and
> -			 * that active printer isn't us (recursive printk?).
> -			 */
> -			if (spin) {
> -				/* We spin waiting for the owner to release us */
> -				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> -				/* Owner will clear console_waiter on hand off */
> -				while (READ_ONCE(console_waiter))
> -					cpu_relax();
> -
> -				spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> -				printk_safe_exit_irqrestore(flags);
> -
> -				/*
> -				 * The owner passed the console lock to us.
> -				 * Since we did not spin on console lock, annotate
> -				 * this as a trylock. Otherwise lockdep will
> -				 * complain.
> -				 */
> -				mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
> -				console_unlock();
> -				printk_safe_enter_irqsave(flags);
> -			}
> -			printk_safe_exit_irqrestore(flags);
> -
> -		}
>  	}
>  
>  	return printed_len;
> @@ -1910,6 +1995,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
>  static ssize_t msg_print_ext_body(char *buf, size_t size,
>  				  char *dict, size_t dict_len,
>  				  char *text, size_t text_len) { return 0; }
> +static void console_lock_spinning_enable(void) { }
> +static int console_lock_spinning_disable_and_check(void) { return 0; }
>  static void call_console_drivers(const char *ext_text, size_t ext_len,
>  				 const char *text, size_t len) {}
>  static size_t msg_print_text(const struct printk_log *msg,
> @@ -2196,7 +2283,6 @@ void console_unlock(void)
>  	static u64 seen_seq;
>  	unsigned long flags;
>  	bool wake_klogd = false;
> -	bool waiter = false;
>  	bool do_cond_resched, retry;
>  
>  	if (console_suspended) {
> @@ -2291,31 +2377,16 @@ void console_unlock(void)
>  		 * finish. This task can not be preempted if there is a
>  		 * waiter waiting to take over.
>  		 */
> -		raw_spin_lock(&console_owner_lock);
> -		console_owner = current;
> -		raw_spin_unlock(&console_owner_lock);
> -
> -		/* The waiter may spin on us after setting console_owner */
> -		spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> +		console_lock_spinning_enable();
>  
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(ext_text, ext_len, text, len);
>  		start_critical_timings();
>  
> -		raw_spin_lock(&console_owner_lock);
> -		waiter = READ_ONCE(console_waiter);
> -		console_owner = NULL;
> -		raw_spin_unlock(&console_owner_lock);
> -
> -		/*
> -		 * If there is a waiter waiting for us, then pass the
> -		 * rest of the work load over to that waiter.
> -		 */
> -		if (waiter)
> -			break;
> -
> -		/* There was no waiter, and nothing will spin on us here */
> -		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> +		if (console_lock_spinning_disable_and_check()) {
> +			printk_safe_exit_irqrestore(flags);
> +			return;
> +		}
>  
>  		printk_safe_exit_irqrestore(flags);
>  
> @@ -2323,26 +2394,6 @@ void console_unlock(void)
>  			cond_resched();
>  	}
>  
> -	/*
> -	 * If there is an active waiter waiting on the console_lock.
> -	 * Pass off the printing to the waiter, and the waiter
> -	 * will continue printing on its CPU, and when all writing
> -	 * has finished, the last printer will wake up klogd.
> -	 */
> -	if (waiter) {
> -		WRITE_ONCE(console_waiter, false);
> -		/* The waiter is now free to continue */
> -		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> -		/*
> -		 * Hand off console_lock to waiter. The waiter will perform
> -		 * the up(). After this, the waiter is the console_lock owner.
> -		 */
> -		mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
> -		printk_safe_exit_irqrestore(flags);
> -		/* Note, if waiter is set, logbuf_lock is not held */
> -		return;
> -	}
> -
>  	console_locked = 0;
>  
>  	/* Release the exclusive_console once it is used */

Besides the typos (which should be fixed)...

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek
  2018-01-10 16:50   ` Steven Rostedt
@ 2018-01-12 16:54   ` Steven Rostedt
  2018-01-12 17:11     ` Steven Rostedt
  2018-01-18 22:03     ` Pavel Machek
  2018-01-17  2:19   ` Byungchul Park
  2 siblings, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12 16:54 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Wed, 10 Jan 2018 14:24:17 +0100
Petr Mladek <pmladek@suse.com> wrote:

> From: Steven Rostedt <rostedt@goodmis.org>
> 
> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> This patch implements what I discussed in Kernel Summit. I added
> lockdep annotation (hopefully correctly), and it hasn't had any splats
> (since I fixed some bugs in the first iterations). It did catch
> problems when I had the owner covering too much. But now that the owner
> is only set when actively calling the consoles, lockdep has stayed
> quiet.
> 
> Here's the design again:
> 
> I added a "console_owner" which is set to a task that is actively
> writing to the consoles. It is *not* the same as the owner of the
> console_lock. It is only set when doing the calls to the console
> functions. It is protected by a console_owner_lock which is a raw spin
> lock.
> 
> There is a console_waiter. This is set when there is an active console
> owner that is not current, and waiter is not set. This too is protected
> by console_owner_lock.
> 
> In printk() when it tries to write to the consoles, we have:
> 
> 	if (console_trylock())
> 		console_unlock();
> 
> Now I added an else, which will check if there is an active owner, and
> no current waiter. If that is the case, then console_waiter is set, and
> the task goes into a spin until it is no longer set.
> 
> When the active console owner finishes writing the current message to
> the consoles, it grabs the console_owner_lock and sees if there is a
> waiter, and clears console_owner.
> 
> If there is a waiter, then it breaks out of the loop, clears the waiter
> flag (because that will release the waiter from its spin), and exits.
> Note, it does *not* release the console semaphore. Because it is a
> semaphore, there is no owner. Another task may release it. This means
> that the waiter is guaranteed to be the new console owner! Which it
> becomes.
> 
> Then the waiter calls console_unlock() and continues to write to the
> consoles.
> 
> If another task comes along and does a printk() it too can become the
> new waiter, and we wash rinse and repeat!
> 
> By Petr Mladek about possible new deadlocks:
> 
> The thing is that we move console_sem only to printk() call
> that normally calls console_unlock() as well. It means that
> the transferred owner should not bring new type of dependencies.
> As Steven said somewhere: "If there is a deadlock, it was
> there even before."
> 
> We could look at it from this side. The possible deadlock would
> look like:
> 
> CPU0                            CPU1
> 
> console_unlock()
> 
>   console_owner = current;
> 
> 				spin_lockA()
> 				  printk()
> 				    spin = true;
> 				    while (...)
> 
>     call_console_drivers()
>       spin_lockA()
> 
> This would be a deadlock. CPU0 would wait for the lock A.
> While CPU1 would own the lockA and would wait for CPU0
> to finish calling the console drivers and pass the console_sem
> owner.
> 
> But if the above is true than the following scenario was
> already possible before:
> 
> CPU0
> 
> spin_lockA()
>   printk()
>     console_unlock()
>       call_console_drivers()
> 	spin_lockA()
> 
> By other words, this deadlock was there even before. Such
> deadlocks are prevented by using printk_deferred() in
> the sections guarded by the lock A.

Petr,

Please add this here:

====

To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>

static bool stop_testing;
static unsigned int loops = 1;

static void preempt_printk_workfn(struct work_struct *work)
{
	int i;

	while (!READ_ONCE(stop_testing)) {
		for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
			preempt_disable();
			pr_emerg("%5d%-75s\n", smp_processor_id(),
				 " XXX NOPREEMPT");
			preempt_enable();
		}
		msleep(1);
	}
}

static struct work_struct __percpu *works;

static void finish(void)
{
	int cpu;

	WRITE_ONCE(stop_testing, true);
	for_each_online_cpu(cpu)
		flush_work(per_cpu_ptr(works, cpu));
	free_percpu(works);
}

static int __init test_init(void)
{
	int cpu;

	works = alloc_percpu(struct work_struct);
	if (!works)
		return -ENOMEM;

	/*
	 * This is just a test module. This will break if you
	 * do any CPU hot plugging between loading and
	 * unloading the module.
	 */

	for_each_online_cpu(cpu) {
		struct work_struct *work = per_cpu_ptr(works, cpu);

		INIT_WORK(work, &preempt_printk_workfn);
		schedule_work_on(cpu, work);
	}

	return 0;
}

static void __exit test_exit(void)
{
	finish();
}

module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
====

Hmm, how does one have git commit not remove the C preprocessor at the
start of the module?

-- Steve

> 
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> [pmladek@suse.com: Commit message about possible deadlocks]
> ---

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-12 16:54   ` Steven Rostedt
@ 2018-01-12 17:11     ` Steven Rostedt
  2018-01-17 19:13       ` Rasmus Villemoes
  2018-01-18 22:03     ` Pavel Machek
  1 sibling, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-12 17:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Fri, 12 Jan 2018 11:54:54 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> #include <linux/module.h>
> #include <linux/delay.h>
> #include <linux/sched.h>
> #include <linux/mutex.h>
> #include <linux/workqueue.h>
> #include <linux/hrtimer.h>
> 
>


> 
> Hmm, how does one have git commit not remove the C preprocessor at the
> start of the module?

Probably just add a space in front of the entire program.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12 12:21                   ` Steven Rostedt
  2018-01-12 12:55                     ` Petr Mladek
@ 2018-01-13  7:28                     ` Sergey Senozhatsky
  2018-01-15 10:17                       ` Petr Mladek
  2018-01-15 12:06                       ` Steven Rostedt
  1 sibling, 2 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-13  7:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/12/18 07:21), Steven Rostedt wrote:
[..]
> Yep, but I'm still not convinced you are seeing an issue with a single
> printk.

what do you mean by this?

> An OOM does not do everything in one printk, it calls hundreds.
> Having hundreds of printks is an issue, especially in critical sections.

unless your console_sem owner is preempted. as long as it is preempted
it doesn't really matter how many times we call printk from which CPUs
and from which sections, but what matters - who is going to print that all
out when console_sem is running again and how much time will it take.
that's what I'm saying.

[..]
> > with slow serial console, call_console_drivers() takes enough time to
> > to make preemption of a current console_sem owner right after it irqrestore()
> > highly possible; unless there is a spinning console_waiter. which easily may
> > not be there; but can come in while current console_sem is preempted, why not.
> > so when preempted console_sem owner comes back - it suddenly has a whole bunch
> > of new messages to print and on one to hand off printing to. in a super
> > imperfect and ugly world, BTW, this is how console_unlock() still can be
> > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries
> 
> I'm not fixing console_unlock(), I'm fixing printk().

I know. I'm fixing console_unlock(). because console_unlock() is its own
thing.

> > 4) the interesting thing here is that call_console_drivers() can
> >    cause console_sem owner to schedule even if it has handed off the
> >    ownership. because waiting CPU has to spin with local IRQs disabled
> >    as long as call_console_drivers() prints its message. so if consoles
> >    are slow, then the first thing the waiter will face after it receives
> >    the console_sem ownership and enables the IRQs is - preemption.
> 
> If the waiter is preempted, that means its not in a critical section.
> Isn't that what you want?

see below.

> >    so hand off is not immediate. there is a possibility of re-scheduling
> >    between hand off and actual printing. so that "there is always an active
> >    printing CPU" is not quite true.
> > 
> > vprintk_emit()
> > {
> > 
> > 	console_trylock_spinning(void)
> > 	{
> > 	   printk_safe_enter_irqsave(flags);
> > 	   while (READ_ONCE(console_waiter))       // spins as long as call_console_drivers() on other CPU
> > 	        cpu_relax();
> > 	   printk_safe_exit_irqrestore(flags);
> > --->	}  
> > |						   // preemptible up until printk_safe_enter_irqsave() in console_unlock()
> 
> Again, this means the waiter is not in a critical section. Why do we
> care?

which is not what I was talking about. the point was that you said


 :                                                .... and what about the
 : printks that haven't gotten out yet? Delay them to something else, and
 : if the machine were to crash in the transfer, we lost all that data.
 :
 : My method, there's really no delay between a hand off. There's always
 : an active CPU doing printing. It matches the current method which works
 : well for getting information out. A delayed approach will break that


that is not true. we can have preemption "during" hand off. hand off,
thus, is a "delayed approach", by definition. so if you consider the
possibility of "if the machine were to crash in the transfer, we lost
all that data" and if you consider this to be important [otherwise you
wouldn't bring that up, would you] then the reality is that your patch
has the same problem as printk_kthread.

so very schematically, for hand-off it's something like

	if (... console_trylock_spinning()) // grabbed the ownership

		<< ... preempted ... >>

		console_unlock();


for printk_kthread it's something like

		wake_up_process(printk_kthread);
		up(console_sem);


in the later case we at least have console_sem unlocked. so any other CPU
that might do printk() can grab the lock and emit the logbuf messages. but
in case on hand-off, we have console_sem locked, so no printk() will be
able to emit the messages, we need that specific task to become running.


hence the following:

[..]
> > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
> > thing after all.

this was cryptic and misleading. sorry.
some clarifications.

what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7
I think I badly broke printk() [some of paths]. I know what I tried
to fix (and you don't have to explain to me what a lock up is) with
that patch, but I don't think the patch ended up to be a clear win.
a very simple explanation would be:

instead of having a direct nonpreemptible path

	logbuf -> for(;;) call_console_drivers -> happy user

we now have

	logbuf -> for(;;) { call_console_drivers, scheduler ... ???} -> happy user

which is a big change. with a non-zero potential for regressions.
and it didn't take long to find out that not all "happy users" were
exactly happy with the new scheme of things. glance through Tetsuo's
emails [see links in my another email], Tetsuo reported that printk can
stall for minutes now. basically, the worse the system state is the lower
printk throughput can be [down to zero chars in the worst case]. that's
why I think that my patch was a mistake. and that's why in my out-of-tree
patches I'm moving towards the non-preemptible path from logbuf through
console to a happy user [just like it used to be]. but, obviously, I can't
just restore preempt_disable()/preempt_enable() in vprintk_emit(). that's
why I bound console_unlock() to watchdog threshold and move towards the
batched non-preemptible print outs (enabling preemption and up()-ing the
console_sem at the end of each print out batch). this is not super good,
preemption is still here, but at least not after every line console_unlock()
prints. up() console_sem also increases chances that, for instance, systemd
or any other task that is sleeping in TASK_UNINTERRUPTIBLE on console_sem
now has a chance to be woken up sooner (not only after we flush all pending
logbuf messages and finally up() the console_sem).

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12 12:55                     ` Petr Mladek
@ 2018-01-13  7:31                       ` Sergey Senozhatsky
  2018-01-15  8:51                         ` Petr Mladek
  2018-01-15 12:08                       ` Steven Rostedt
  1 sibling, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-13  7:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo,
	Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/12/18 13:55), Petr Mladek wrote:
[..]
> > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
> > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
> > PREEMPT kernels than !PREEMPT ones.
> 
> I would say that the patch improves also console_unlock() but only in
> non-preemttive context.
> 
> By other words, it makes console_unlock() finite in preemptible context
> (limited by buffer size). It might still be unlimited in
> non-preemtible context.

could you elaborate a bit?

[..]
> > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
> > > thing after all.
> > 
> > I would analyze that more before doing so. Because with my patch, I
> > think we make those that can do long prints (without triggering a
> > watchdog), the ones most likely doing the long prints.
> 
> IMHO, it might make sense because it would help to see the messages
> faster. But I would prefer to handle this separately because it
> might also increase the risk of softlockups. Therefore it might
> cause regressions.
> 
> We should also take into account the commit 8d91f8b15361dfb438ab6
> ("printk: do cond_resched() between lines while outputting to
> consoles"). It has the same effect for console_lock() callers.

I replied in another email.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-13  7:31                       ` Sergey Senozhatsky
@ 2018-01-15  8:51                         ` Petr Mladek
  2018-01-15  9:48                           ` Sergey Senozhatsky
  2018-01-16  5:16                           ` Sergey Senozhatsky
  0 siblings, 2 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-15  8:51 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Sat 2018-01-13 16:31:00, Sergey Senozhatsky wrote:
> On (01/12/18 13:55), Petr Mladek wrote:
> [..]
> > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
> > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
> > > PREEMPT kernels than !PREEMPT ones.
> > 
> > I would say that the patch improves also console_unlock() but only in
> > non-preemttive context.
> > 
> > By other words, it makes console_unlock() finite in preemptible context
> > (limited by buffer size). It might still be unlimited in
> > non-preemtible context.
> 
> could you elaborate a bit?

Ah, I am sorry, I swapped the conditions. I meant that
console_unlock() is finite in non-preemptible context.

There are two possibilities if console_unlock() is in atomic context
and never sleeps. First, if there are new printk() callers, they could
take over the job. Second. if they are no more callers, the
current owner will release the lock after processing the existing
messages. In both situations, the current owner will not handle more
than the entire buffer. Therefore it is limited. We might argue
if it is enough. But the point is that it is limited which is
a step forward. And I think that you already agreed that this
was a step forward.

The chance of taking over the lock is lower when console_unlock()
owner could sleep. But then there is not a danger of a softlockup.
In each case, this patch did not make it worse. Could we agree
on this, please?

All in all, this patch improved one scenario and did not make
worse another one. We know that it does not fix everything.
But it is a step forward. Could we agree on this, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15  8:51                         ` Petr Mladek
@ 2018-01-15  9:48                           ` Sergey Senozhatsky
  2018-01-16  5:16                           ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-15  9:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/15/18 09:51), Petr Mladek wrote:
> On Sat 2018-01-13 16:31:00, Sergey Senozhatsky wrote:
> > On (01/12/18 13:55), Petr Mladek wrote:
> > [..]
> > > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
> > > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
> > > > PREEMPT kernels than !PREEMPT ones.
> > > 
> > > I would say that the patch improves also console_unlock() but only in
> > > non-preemttive context.
> > > 
> > > By other words, it makes console_unlock() finite in preemptible context
> > > (limited by buffer size). It might still be unlimited in
> > > non-preemtible context.
> > 
> > could you elaborate a bit?
> 
> Ah, I am sorry, I swapped the conditions. I meant that
> console_unlock() is finite in non-preemptible context.

ah, OK.
yes. it sill can be infinite, in preemptible context.

a side note,
no kernel or user space process is designed to loop in console_unlock(),
so infinte console_unlock() still can do some damage. we don't crash the
kernel, but if we somehow bring down the user space process, then things
are not so clear. e.g. when we do lots of handoffs we don't up() the
console_sem, so anything that might be sleeping in TASK_UNINTERRUPTIBLE
on console_sem stays in that uninterruptible state, which possibly can
fire the hung task alarm, which also may be configured to panic() the
kernel (or some other type of watchdog). so panic() is still possible
even if we do hand offs. but that's a completely different topic.


> There are two possibilities if console_unlock() is in atomic context
> and never sleeps. First, if there are new printk() callers, they could
> take over the job. Second. if they are no more callers, the
> current owner will release the lock after processing the existing
> messages. In both situations, the current owner will not handle more
> than the entire buffer. Therefore it is limited. We might argue
> if it is enough. But the point is that it is limited which is
> a step forward. And I think that you already agreed that this
> was a step forward.

yes.
the question whether O(A * B) bound is good enough is still there,
but in the worst case it's still a lockup, just like before [including
cases of accidental hand off from non-atomic context to a atomic one].


> The chance of taking over the lock is lower when console_unlock()
> owner could sleep. But then there is not a danger of a softlockup.
> In each case, this patch did not make it worse. Could we agree
> on this, please?

yes.


> All in all, this patch improved one scenario and did not make
> worse another one. We know that it does not fix everything.
> But it is a step forward. Could we agree on this, please?

yes.
it's iffy. it's a step forward when it's a step forward :)
and the good old lockup/panic in other cases. IMHO.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-13  7:28                     ` Sergey Senozhatsky
@ 2018-01-15 10:17                       ` Petr Mladek
  2018-01-15 11:50                         ` Petr Mladek
  2018-01-16  5:23                         ` Sergey Senozhatsky
  2018-01-15 12:06                       ` Steven Rostedt
  1 sibling, 2 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-15 10:17 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Hi Sergey,

I wonder if there is still some miss understanding.

Steven and me are trying to get this patch in because we believe
that it is a step forward. We know that it is not perfect. But
we believe that it makes things better. In particular, it limits
the time spent in console_unlock() in atomic context. It does
not make it worse in preemptible context.

It does not block further improvements, including offloading
to the kthread. We will happily discuss and review further
improvements, it they prove to be necessary.

The advantage of this approach is that it is incremental. It should
be easier for review and analyzing possible regressions.

What is the aim of your mails, please?
Do you want to say that this patch might cause regressions?
Or do you want to say that it does not solve all scenarios?

Please, answer the above questions. I am still confused.


On Sat 2018-01-13 16:28:34, Sergey Senozhatsky wrote:
> On (01/12/18 07:21), Steven Rostedt wrote:
> [..]
> > Yep, but I'm still not convinced you are seeing an issue with a single
> > printk.
> 
> what do you mean by this?
> 
> > An OOM does not do everything in one printk, it calls hundreds.
> > Having hundreds of printks is an issue, especially in critical sections.
> 
> unless your console_sem owner is preempted. as long as it is preempted
> it doesn't really matter how many times we call printk from which CPUs
> and from which sections, but what matters - who is going to print that all
> out when console_sem is running again and how much time will it take.
> that's what I'm saying.

Yes, this is a problem. We might need to solve it. But the same
problem is there even without the patch. Therefore we might
solve it later. Do you agree, please?


> [..]
> > > with slow serial console, call_console_drivers() takes enough time to
> > > to make preemption of a current console_sem owner right after it irqrestore()
> > > highly possible; unless there is a spinning console_waiter. which easily may
> > > not be there; but can come in while current console_sem is preempted, why not.
> > > so when preempted console_sem owner comes back - it suddenly has a whole bunch
> > > of new messages to print and on one to hand off printing to. in a super
> > > imperfect and ugly world, BTW, this is how console_unlock() still can be
> > > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries
> > 
> > I'm not fixing console_unlock(), I'm fixing printk().
> 
> which is not what I was talking about. the point was that you said
> 
> 
>  :                                                .... and what about the
>  : printks that haven't gotten out yet? Delay them to something else, and
>  : if the machine were to crash in the transfer, we lost all that data.
>  :
>  : My method, there's really no delay between a hand off. There's always
>  : an active CPU doing printing. It matches the current method which works
>  : well for getting information out. A delayed approach will break that
> 
> 
> that is not true. we can have preemption "during" hand off. hand off,
> thus, is a "delayed approach", by definition. so if you consider the
> possibility of "if the machine were to crash in the transfer, we lost
> all that data" and if you consider this to be important [otherwise you
> wouldn't bring that up, would you] then the reality is that your patch
> has the same problem as printk_kthread.
> 
> so very schematically, for hand-off it's something like
> 
> 	if (... console_trylock_spinning()) // grabbed the ownership
> 
> 		<< ... preempted ... >>
> 
> 		console_unlock();
> 
> 
> for printk_kthread it's something like
> 
> 		wake_up_process(printk_kthread);
> 		up(console_sem);

Good question!

Is this really the same? The console_trylock_spinning() caller will
get preempted only when interrupts (timers?) still work. This is
a sign that the system is still somehow living. Also this information
is quite up-to-date because you checked this after a relatively
short busy wait.

On the other hand, wake_up_process() just puts printk_kthread
into a running state. It does not check if the processes are
still actively being rescheduled on the system. It might check
some flags. But they might be pretty outdated when this is
done after half of the watchdog limit.


In each case, the preemption after console_trylock_spinning()
has the same effect like preemption in console_unlock().
It is possible already now. Therefore I do not consider
this as a regression.


> hence the following:
> 
> [..]
> > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
> > > thing after all.
> 
> this was cryptic and misleading. sorry.
> some clarifications.
> 
> what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7
> I think I badly broke printk() [some of paths]. I know what I tried
> to fix (and you don't have to explain to me what a lock up is) with
> that patch, but I don't think the patch ended up to be a clear win.
> a very simple explanation would be:
> 
> instead of having a direct nonpreemptible path
> 
> 	logbuf -> for(;;) call_console_drivers -> happy user
> 
> we now have
> 
> 	logbuf -> for(;;) { call_console_drivers, scheduler ... ???} -> happy user
> 
> which is a big change. with a non-zero potential for regressions.
> and it didn't take long to find out that not all "happy users" were
> exactly happy with the new scheme of things. glance through Tetsuo's
> emails [see links in my another email], Tetsuo reported that printk can
> stall for minutes now. basically, the worse the system state is the lower
> printk throughput can be [down to zero chars in the worst case]. that's
> why I think that my patch was a mistake. and that's why in my out-of-tree
> patches I'm moving towards the non-preemptible path from logbuf through
> console to a happy user [just like it used to be]. but, obviously, I can't
> just restore preempt_disable()/preempt_enable() in vprintk_emit(). that's
> why I bound console_unlock() to watchdog threshold and move towards the
> batched non-preemptible print outs (enabling preemption and up()-ing the
> console_sem at the end of each print out batch). this is not super good,
> preemption is still here, but at least not after every line console_unlock()
> prints. up() console_sem also increases chances that, for instance, systemd
> or any other task that is sleeping in TASK_UNINTERRUPTIBLE on console_sem
> now has a chance to be woken up sooner (not only after we flush all pending
> logbuf messages and finally up() the console_sem).

I see your point. But this is an orthogonal problem. It is more about
loosing messages because console_unlock() is slow when sleeping. This
patch is about limiting time spent in console_unlock() in atomic
context.

If you want to revert the above mentioned commit, please send a patch
so that we could discuss this separately.

Best Regards,
Petr


PS: Sergey, you have many good points. The printk-stuff is very
complex and we could spend years discussing the perfect solution.

But I am never sure if you discuss this in this thread because
this patch might cause regression or because it does not address
all the issues.

Could we please make it more simple? If you believe that this
patch might cause regression than please say this clearly.
You actually mentioned the word regression few times.
I am not sure if we managed to persuade you about the opposite.

If you think that this patch is not good enough and not worth
merging upstream, please state this clearly as well.

If you think that this patch does not address all problems,
please send further improvements on top of it so that we
could discuss this. If you want to discuss the problems
in advance, please open another thread. IMHO, this thread
brought many ideas for the perfect solution but it is
already too scattered.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15 10:17                       ` Petr Mladek
@ 2018-01-15 11:50                         ` Petr Mladek
  2018-01-16  6:10                           ` Sergey Senozhatsky
  2018-01-16  5:23                         ` Sergey Senozhatsky
  1 sibling, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-15 11:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Mon 2018-01-15 11:17:43, Petr Mladek wrote:
> PS: Sergey, you have many good points. The printk-stuff is very
> complex and we could spend years discussing the perfect solution.

BTW: One solution that comes to my mind is based on ideas
already mentioned in this thread:


void console_unlock(void)
{
	disable_preemtion();

	while(pending_message) {

	    call_console_drivers();

	    if (too_long_here() && current != printk_kthread) {
	       wake_up_process(printk_kthread())

	}

	enable_preemtion();
}

bool too_long_here(void)
{
	return should_resched();
or
	return spent_here() > 1 / HZ / 2;
or
	what ever we agree on
}


int printk_kthread_func(void *data)
{
	while(1) {
		 if (!pending_messaged)
			schedule();

		if (console_trylock_spinning())
			console_unlock();

		cond_resched();
	}
}

It means that console_unlock() will aggressively push messages
with disabled preemption. It will wake up printk_kthread when
it is pushing too long. The printk_kthread would try
to steal the lock and take over the job.

If the system is in reasonable state, printk_kthread should
succeed and avoid softlockup. The offload should be more safe
than a pure wake_up_process().

If printk_kthread is not able to take over the job, it
might suggest that the offload is not safe and the softlockup
is inevitable.

One question is how to avoid softlockup when console_unlock()
is called from printk_kthread. I think that printk_kthread
should release console_lock and call cond_resched from
time to time. It means that the printing will be less
aggressive but anyone could continue flushing the console.
If there are no new messages, it is probably acceptable
to be less aggressive with flushing the messages.


Anyway, this should be more safe than a direct offload
if we agree that getting the messages out is more
important than a possible softlockup.

If this is not enough, I would start thinking about
throttling writers.

Finally, this is all a future work that can be done
and discussed later.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-13  7:28                     ` Sergey Senozhatsky
  2018-01-15 10:17                       ` Petr Mladek
@ 2018-01-15 12:06                       ` Steven Rostedt
  2018-01-15 14:45                         ` Petr Mladek
  2018-01-16  1:46                         ` Sergey Senozhatsky
  1 sibling, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-15 12:06 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Sat, 13 Jan 2018 16:28:34 +0900
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:

> On (01/12/18 07:21), Steven Rostedt wrote:
> [..]
> > Yep, but I'm still not convinced you are seeing an issue with a single
> > printk.  
> 
> what do you mean by this?

I'm not sure your issues happen because a single printk is locked up,
but you have many printks in one area.

> 
> > An OOM does not do everything in one printk, it calls hundreds.
> > Having hundreds of printks is an issue, especially in critical sections.  
> 
> unless your console_sem owner is preempted. as long as it is preempted
> it doesn't really matter how many times we call printk from which CPUs
> and from which sections, but what matters - who is going to print that all
> out when console_sem is running again and how much time will it take.
> that's what I'm saying.

OK, if this is an issue, then we could do:

	preempt_disable();
	if (console_trylock_spinning())
		console_unlock();
	preempt_enable();

Which would prevent any printks from being preempted, but allow for
other console_lock owners to be so.


> 
> [..]
> > > with slow serial console, call_console_drivers() takes enough time to
> > > to make preemption of a current console_sem owner right after it irqrestore()
> > > highly possible; unless there is a spinning console_waiter. which easily may
> > > not be there; but can come in while current console_sem is preempted, why not.
> > > so when preempted console_sem owner comes back - it suddenly has a whole bunch
> > > of new messages to print and on one to hand off printing to. in a super
> > > imperfect and ugly world, BTW, this is how console_unlock() still can be
> > > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries  
> > 
> > I'm not fixing console_unlock(), I'm fixing printk().  
> 
> I know. I'm fixing console_unlock(). because console_unlock() is its own
> thing.
> 
> > > 4) the interesting thing here is that call_console_drivers() can
> > >    cause console_sem owner to schedule even if it has handed off the
> > >    ownership. because waiting CPU has to spin with local IRQs disabled
> > >    as long as call_console_drivers() prints its message. so if consoles
> > >    are slow, then the first thing the waiter will face after it receives
> > >    the console_sem ownership and enables the IRQs is - preemption.  
> > 
> > If the waiter is preempted, that means its not in a critical section.
> > Isn't that what you want?  
> 
> see below.
> 
> > >    so hand off is not immediate. there is a possibility of re-scheduling
> > >    between hand off and actual printing. so that "there is always an active
> > >    printing CPU" is not quite true.
> > > 
> > > vprintk_emit()
> > > {
> > > 
> > > 	console_trylock_spinning(void)
> > > 	{
> > > 	   printk_safe_enter_irqsave(flags);
> > > 	   while (READ_ONCE(console_waiter))       // spins as long as call_console_drivers() on other CPU
> > > 	        cpu_relax();
> > > 	   printk_safe_exit_irqrestore(flags);  
> > > --->	}    
> > > |						   // preemptible up until printk_safe_enter_irqsave() in console_unlock()  
> > 
> > Again, this means the waiter is not in a critical section. Why do we
> > care?  
> 
> which is not what I was talking about. the point was that you said

And would be fixed with the preempt_disable() I added above.

> 
> 
>  :                                                .... and what about the
>  : printks that haven't gotten out yet? Delay them to something else, and
>  : if the machine were to crash in the transfer, we lost all that data.
>  :
>  : My method, there's really no delay between a hand off. There's always
>  : an active CPU doing printing. It matches the current method which works
>  : well for getting information out. A delayed approach will break that
> 
> 
> that is not true. we can have preemption "during" hand off. hand off,
> thus, is a "delayed approach", by definition. so if you consider the
> possibility of "if the machine were to crash in the transfer, we lost
> all that data" and if you consider this to be important [otherwise you
> wouldn't bring that up, would you] then the reality is that your patch
> has the same problem as printk_kthread.

With the preempt_disable() there really isn't a delay. I agree, we
shouldn't let printk preempt (unless we have CONFIG_PREEMPT_RT enabled,
but that's another story).

> 
> so very schematically, for hand-off it's something like
> 
> 	if (... console_trylock_spinning()) // grabbed the ownership
> 
> 		<< ... preempted ... >>
> 
> 		console_unlock();

Which I think we should stop, with the preempt_disable().

> 
> 
> for printk_kthread it's something like
> 
> 		wake_up_process(printk_kthread);
> 		up(console_sem);
> 
> 
> in the later case we at least have console_sem unlocked. so any other CPU
> that might do printk() can grab the lock and emit the logbuf messages. but
> in case on hand-off, we have console_sem locked, so no printk() will be
> able to emit the messages, we need that specific task to become running.
> 
> 
> hence the following:
> 
> [..]
> > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
> > > thing after all.  
> 
> this was cryptic and misleading. sorry.
> some clarifications.
> 
> what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7
> I think I badly broke printk() [some of paths]. I know what I tried

I think adding the preempt_disable() would fix printk() but let non
printk console_unlock() still preempt.

> to fix (and you don't have to explain to me what a lock up is) with
> that patch, but I don't think the patch ended up to be a clear win.
> a very simple explanation would be:
> 
> instead of having a direct nonpreemptible path
> 
> 	logbuf -> for(;;) call_console_drivers -> happy user
> 
> we now have
> 
> 	logbuf -> for(;;) { call_console_drivers, scheduler ... ???} -> happy user
> 
> which is a big change. with a non-zero potential for regressions.
> and it didn't take long to find out that not all "happy users" were
> exactly happy with the new scheme of things. glance through Tetsuo's
> emails [see links in my another email], Tetsuo reported that printk can
> stall for minutes now. basically, the worse the system state is the lower
> printk throughput can be [down to zero chars in the worst case]. that's
> why I think that my patch was a mistake. and that's why in my out-of-tree
> patches I'm moving towards the non-preemptible path from logbuf through
> console to a happy user [just like it used to be]. but, obviously, I can't
> just restore preempt_disable()/preempt_enable() in vprintk_emit(). that's
> why I bound console_unlock() to watchdog threshold and move towards the
> batched non-preemptible print outs (enabling preemption and up()-ing the
> console_sem at the end of each print out batch). this is not super good,
> preemption is still here, but at least not after every line console_unlock()
> prints. up() console_sem also increases chances that, for instance, systemd
> or any other task that is sleeping in TASK_UNINTERRUPTIBLE on console_sem
> now has a chance to be woken up sooner (not only after we flush all pending
> logbuf messages and finally up() the console_sem).

I rather try simpler approaches first (like adding the preempt_disable()
on top of my patch) than an elaborate scheme of printk_kthreads.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12 12:55                     ` Petr Mladek
  2018-01-13  7:31                       ` Sergey Senozhatsky
@ 2018-01-15 12:08                       ` Steven Rostedt
  2018-01-16  4:51                         ` Sergey Senozhatsky
  1 sibling, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-15 12:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Fri, 12 Jan 2018 13:55:37 +0100
Petr Mladek <pmladek@suse.com> wrote:

> > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
> > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
> > PREEMPT kernels than !PREEMPT ones.  
> 
> I would say that the patch improves also console_unlock() but only in
> non-preemttive context.
> 
> By other words, it makes console_unlock() finite in preemptible context
> (limited by buffer size). It might still be unlimited in
> non-preemtible context.

Since I'm worried most about printk(), I would argue to make printk
console unlock always non-preempt.

	preempt_disable();
	if (console_trylock_spinning())
		console_unlock();
	preempt_enable();

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15 12:06                       ` Steven Rostedt
@ 2018-01-15 14:45                         ` Petr Mladek
  2018-01-16  2:23                           ` Sergey Senozhatsky
  2018-01-16  1:46                         ` Sergey Senozhatsky
  1 sibling, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-15 14:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Tejun Heo, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Mon 2018-01-15 07:06:37, Steven Rostedt wrote:
> On Sat, 13 Jan 2018 16:28:34 +0900
> Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> > On (01/12/18 07:21), Steven Rostedt wrote:
> > 
> > > An OOM does not do everything in one printk, it calls hundreds.
> > > Having hundreds of printks is an issue, especially in critical sections.  
> > 
> > unless your console_sem owner is preempted. as long as it is preempted
> > it doesn't really matter how many times we call printk from which CPUs
> > and from which sections, but what matters - who is going to print that all
> > out when console_sem is running again and how much time will it take.
> > that's what I'm saying.
> 
> OK, if this is an issue, then we could do:
> 
> 	preempt_disable();
> 	if (console_trylock_spinning())
> 		console_unlock();
> 	preempt_enable();
> 
> Which would prevent any printks from being preempted, but allow for
> other console_lock owners to be so.

[...]

> >  :                                                .... and what about the
> >  : printks that haven't gotten out yet? Delay them to something else, and
> >  : if the machine were to crash in the transfer, we lost all that data.
> >  :
> >  : My method, there's really no delay between a hand off. There's always
> >  : an active CPU doing printing. It matches the current method which works
> >  : well for getting information out. A delayed approach will break that
> > 
> > 
> > that is not true. we can have preemption "during" hand off. hand off,
> > thus, is a "delayed approach", by definition. so if you consider the
> > possibility of "if the machine were to crash in the transfer, we lost
> > all that data" and if you consider this to be important [otherwise you
> > wouldn't bring that up, would you] then the reality is that your patch
> > has the same problem as printk_kthread.
> 
> With the preempt_disable() there really isn't a delay. I agree, we
> shouldn't let printk preempt (unless we have CONFIG_PREEMPT_RT enabled,
> but that's another story).
> 
> > 
> > so very schematically, for hand-off it's something like
> > 
> > 	if (... console_trylock_spinning()) // grabbed the ownership
> > 
> > 		<< ... preempted ... >>
> > 
> > 		console_unlock();
> 
> Which I think we should stop, with the preempt_disable().

Adding the preempt_disable() basically means to revert the already
mentioned commit 6b97a20d3a7909daa06625 ("printk: set may_schedule
for some of console_trylock() callers").

I originally wanted to solve this separately to make it easier. But
the change looks fine to me. Therefore we reached a mutual agreement.
Sergey, do you want to send a patch or should I just put it at
the end of this patchset?


> > for printk_kthread it's something like
> > 
> > 		wake_up_process(printk_kthread);
> > 		up(console_sem);
> > 
> > 
> > in the later case we at least have console_sem unlocked. so any other CPU
> > that might do printk() can grab the lock and emit the logbuf messages. but
> > in case on hand-off, we have console_sem locked, so no printk() will be
> > able to emit the messages, we need that specific task to become running.
> > 
> > 
> > hence the following:
> > 
> > [..]
> > > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
> > > > thing after all.  
> > 
> > this was cryptic and misleading. sorry.
> > some clarifications.
> > 
> > what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7
> > I think I badly broke printk() [some of paths]. I know what I tried
> 
> I think adding the preempt_disable() would fix printk() but let non
> printk console_unlock() still preempt.

I would personally remove cond_resched() from console_unlock()
completely.

Sleeping in console_unlock() increases the chance that more messages
would need to be handled. And more importantly it reduces the chance
of a successful handover.

As a result, the caller might spend there very long time, it might
be getting increasingly far behind. There is higher risk of lost
messages. Also the eventual taker might have too much to proceed
in preemption disabled context.

Removing cond_resched() is in sync with printk() priorities.
The highest one is to get the messages out.

Finally, removing cond_resched() should make the behavior more
predictable (never preempted), same in all situations (called
from printk() or other locations) => easier to analyze problems
and maintain.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers
  2018-01-12 16:36           ` Steven Rostedt
@ 2018-01-15 16:08             ` Petr Mladek
  2018-01-16  5:05               ` Sergey Senozhatsky
  0 siblings, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-15 16:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On Fri 2018-01-12 11:36:27, Steven Rostedt wrote:
> On Fri, 12 Jan 2018 17:08:37 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> > >From f67f70d910d9cf310a7bc73e97bf14097d31b059 Mon Sep 17 00:00:00 2001  
> > From: Petr Mladek <pmladek@suse.com>
> > Date: Fri, 22 Dec 2017 18:58:46 +0100
> > Subject: [PATCH v6 2/4] printk: Hide console waiter logic into helpers
> > 
> > The commit ("printk: Add console owner and waiter logic to load balance
> > console writes") made vprintk_emit() and console_unlock() even more
> > complicated.
> > 
> > This patch extracts the new code into 3 helper functions. They should
> > help to keep it rather self-contained. It will be easier to use and
> > maintain.
> > 
> > This patch just shuffles the existing code. It does not change
> > the functionality.
> > 
> Besides the typos (which should be fixed)...
> 
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

JFYI, I have fixed the typos, updated the commit message for
the 1st patch and pushed all into printk.git,
branch for-4.16-console-waiter-logic, see
https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/log/?h=for-4.16-console-waiter-logic

I know that the discussion is not completely finished but it is
somehow cycling. Sergey few times wrote that he would not block
these patches. It is high time, I put it into linux-next. I could
always remove it if decided in the discussion.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15 12:06                       ` Steven Rostedt
  2018-01-15 14:45                         ` Petr Mladek
@ 2018-01-16  1:46                         ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16  1:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Tejun Heo,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/15/18 07:06), Steven Rostedt wrote:
> > > Yep, but I'm still not convinced you are seeing an issue with a single
> > > printk.  
> > 
> > what do you mean by this?
> 
> I'm not sure your issues happen because a single printk is locked up,
> but you have many printks in one area.

hm, need to think about it.

> > > An OOM does not do everything in one printk, it calls hundreds.
> > > Having hundreds of printks is an issue, especially in critical sections.  
> > 
> > unless your console_sem owner is preempted. as long as it is preempted
> > it doesn't really matter how many times we call printk from which CPUs
> > and from which sections, but what matters - who is going to print that all
> > out when console_sem is running again and how much time will it take.
> > that's what I'm saying.
> 
> OK, if this is an issue, then we could do:
> 
> 	preempt_disable();
> 	if (console_trylock_spinning())
> 		console_unlock();
> 	preempt_enable();
> 
> Which would prevent any printks from being preempted, but allow for
> other console_lock owners to be so.

yes, non-preemptible printk->console_unlock() is good for a number of
reasons.

[..]
> > > > vprintk_emit()
> > > > {
> > > > 
> > > > 	console_trylock_spinning(void)
> > > > 	{
> > > > 	   printk_safe_enter_irqsave(flags);
> > > > 	   while (READ_ONCE(console_waiter))       // spins as long as call_console_drivers() on other CPU
> > > > 	        cpu_relax();
> > > > 	   printk_safe_exit_irqrestore(flags);  
> > > > --->	}    
> > > > |						   // preemptible up until printk_safe_enter_irqsave() in console_unlock()  
> > > 
> > > Again, this means the waiter is not in a critical section. Why do we
> > > care?  
> > 
> > which is not what I was talking about. the point was that you said
> 
> And would be fixed with the preempt_disable() I added above.

yes. and it's, basically, very close to a revert of the commit
I mentioned.

[..]
> > that is not true. we can have preemption "during" hand off. hand off,
> > thus, is a "delayed approach", by definition. so if you consider the
> > possibility of "if the machine were to crash in the transfer, we lost
> > all that data" and if you consider this to be important [otherwise you
> > wouldn't bring that up, would you] then the reality is that your patch
> > has the same problem as printk_kthread.
> 
> With the preempt_disable() there really isn't a delay. I agree, we
> shouldn't let printk preempt (unless we have CONFIG_PREEMPT_RT enabled,
> but that's another story).

yes.

> > so very schematically, for hand-off it's something like
> > 
> > 	if (... console_trylock_spinning()) // grabbed the ownership
> > 
> > 		<< ... preempted ... >>
> > 
> > 		console_unlock();
> 
> Which I think we should stop, with the preempt_disable().

yes.

> > for printk_kthread it's something like
> > 
> > 		wake_up_process(printk_kthread);
> > 		up(console_sem);
> > 
> > 
> > in the later case we at least have console_sem unlocked. so any other CPU
> > that might do printk() can grab the lock and emit the logbuf messages. but
> > in case on hand-off, we have console_sem locked, so no printk() will be
> > able to emit the messages, we need that specific task to become running.
> > 
> > 
> > hence the following:
> > 
> > [..]
> > > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right
> > > > thing after all.  
> > 
> > this was cryptic and misleading. sorry.
> > some clarifications.
> > 
> > what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7
> > I think I badly broke printk() [some of paths]. I know what I tried
> 
> I think adding the preempt_disable() would fix printk() but let non
> printk console_unlock() still preempt.

yes. might be a bit risky, but can try.

and yes, we still have console_lock() call sites, which can sleep
under console_sem, so scheduler still can mess up with us, but
that's a different story. agreed.

> > to fix (and you don't have to explain to me what a lock up is) with
> > that patch, but I don't think the patch ended up to be a clear win.
> > a very simple explanation would be:
> > 
> > instead of having a direct nonpreemptible path
> > 
> > 	logbuf -> for(;;) call_console_drivers -> happy user
> > 
> > we now have
> > 
> > 	logbuf -> for(;;) { call_console_drivers, scheduler ... ???} -> happy user
> > 
> > which is a big change. with a non-zero potential for regressions.
> > and it didn't take long to find out that not all "happy users" were
> > exactly happy with the new scheme of things. glance through Tetsuo's
> > emails [see links in my another email], Tetsuo reported that printk can
> > stall for minutes now. basically, the worse the system state is the lower
> > printk throughput can be [down to zero chars in the worst case]. that's
> > why I think that my patch was a mistake. and that's why in my out-of-tree
> > patches I'm moving towards the non-preemptible path from logbuf through
> > console to a happy user [just like it used to be]. but, obviously, I can't
> > just restore preempt_disable()/preempt_enable() in vprintk_emit(). that's
> > why I bound console_unlock() to watchdog threshold and move towards the
> > batched non-preemptible print outs (enabling preemption and up()-ing the
> > console_sem at the end of each print out batch). this is not super good,
> > preemption is still here, but at least not after every line console_unlock()
> > prints. up() console_sem also increases chances that, for instance, systemd
> > or any other task that is sleeping in TASK_UNINTERRUPTIBLE on console_sem
> > now has a chance to be woken up sooner (not only after we flush all pending
> > logbuf messages and finally up() the console_sem).
> 
> I rather try simpler approaches first (like adding the preempt_disable()
> on top of my patch) than an elaborate scheme of printk_kthreads.

ok, agreed.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15 14:45                         ` Petr Mladek
@ 2018-01-16  2:23                           ` Sergey Senozhatsky
  2018-01-16  4:47                             ` Sergey Senozhatsky
  2018-01-16 10:13                             ` Petr Mladek
  0 siblings, 2 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16  2:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky,
	Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/15/18 15:45), Petr Mladek wrote:
[..]
> > With the preempt_disable() there really isn't a delay. I agree, we
> > shouldn't let printk preempt (unless we have CONFIG_PREEMPT_RT enabled,
> > but that's another story).
> > 
> > > 
> > > so very schematically, for hand-off it's something like
> > > 
> > > 	if (... console_trylock_spinning()) // grabbed the ownership
> > > 
> > > 		<< ... preempted ... >>
> > > 
> > > 		console_unlock();
> > 
> > Which I think we should stop, with the preempt_disable().
> 
> Adding the preempt_disable() basically means to revert the already
> mentioned commit 6b97a20d3a7909daa06625 ("printk: set may_schedule
> for some of console_trylock() callers").
> 
> I originally wanted to solve this separately to make it easier. But
> the change looks fine to me. Therefore we reached a mutual agreement.
> Sergey, do you want to send a patch or should I just put it at
> the end of this patchset?

you can add the patch.

[..]
> > I think adding the preempt_disable() would fix printk() but let non
> > printk console_unlock() still preempt.
> 
> I would personally remove cond_resched() from console_unlock()
> completely.

hmm, not so sure. I think it's there for !PREEMPT systems which have
to print a lot of messages. the case I'm speaking about in particular
is when we register a CON_PRINTBUFFER console and need to console_unlock()
(flush) all of the messages we currently have in the logbuf. we better
have that cond_resched() there, I think.

> Sleeping in console_unlock() increases the chance that more messages
> would need to be handled. And more importantly it reduces the chance
> of a successful handover.
> 
> As a result, the caller might spend there very long time, it might
> be getting increasingly far behind. There is higher risk of lost
> messages. Also the eventual taker might have too much to proceed
> in preemption disabled context.

yes.

> Removing cond_resched() is in sync with printk() priorities.

hmm, not sure. we have sleeping console_lock()->console_unlock() path
for PREEMPT kernels, that cond_resched() makes the !PREEMPT kernels to
have the same sleeping console_lock()->console_unlock().

printk()->console_unlock() seems to be a pretty independent thing,
unfortunately (!), yet sleeping console_lock()->console_unlock()
messes up with it a lot.

> The highest one is to get the messages out.
> 
> Finally, removing cond_resched() should make the behavior more
> predictable (never preempted)

but we are always preempted in PREEMPT kernels when the current
console_sem owner acquired the lock via console_lock(), not via
console_trylock(). cond_resched() does the same, but for !PREEMPT.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16  2:23                           ` Sergey Senozhatsky
@ 2018-01-16  4:47                             ` Sergey Senozhatsky
  2018-01-16 10:19                               ` Petr Mladek
  2018-01-16 15:45                               ` Steven Rostedt
  2018-01-16 10:13                             ` Petr Mladek
  1 sibling, 2 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16  4:47 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt, Tetsuo Handa
  Cc: Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel, Sergey Senozhatsky

On (01/16/18 11:23), Sergey Senozhatsky wrote:
[..]
> > Adding the preempt_disable() basically means to revert the already
> > mentioned commit 6b97a20d3a7909daa06625 ("printk: set may_schedule
> > for some of console_trylock() callers").
> > 
> > I originally wanted to solve this separately to make it easier. But
> > the change looks fine to me. Therefore we reached a mutual agreement.
> > Sergey, do you want to send a patch or should I just put it at
> > the end of this patchset?
> 
> you can add the patch.

if you don't mind, let me fix the thing that I broke.
that would be responsible. I believe I also must say the following:
  Tetsuo, many thanks for reporting the issues for song long, and
  sorry that it took quite a while to revert that change.

8<====

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: [PATCH] printk: never set console_may_schedule in console_trylock()

This patch, basically, reverts commit 6b97a20d3a79 ("printk:
set may_schedule for some of console_trylock() callers").
That commit was a mistake, it introduced a big dependency
on the scheduler, by enabling preemption under console_sem
in printk()->console_unlock() path, which is rather too
critical. The patch did not significantly reduce the
possibilities of printk() lockups, but made it possible to
stall printk(), as has been reported by Tetsuo Handa [1].

Another issues is that preemption under console_sem also
messes up with Steven Rostedt's hand off scheme, by making
it possible to sleep with console_sem both in console_unlock()
and in vprintk_emit(), after acquiring the console_sem
ownership (anywhere between printk_safe_exit_irqrestore() in
console_trylock_spinning() and printk_safe_enter_irqsave()
in console_unlock()). This makes hand off less likely and,
at the same time, may result in a significant amount of
pending logbuf messages. Preempted console_sem owner makes
it impossible for other CPUs to emit logbuf messages, but
does not make it impossible for other CPUs to append new
messages to the logbuf.

Reinstate the old behavior and make printk() non-preemptible.
Should any printk() lockup reports arrive they must be handled
in a different way.

[1] https://marc.info/?l=linux-mm&m=145692016122716
Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 kernel/printk/printk.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ffe05024c622..9cb943c90d98 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1895,6 +1895,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
+		/*
+		 * Disable preemption to avoid being preempted while holding
+		 * console_sem which would prevent anyone from printing to
+		 * console
+		 */
+		preempt_disable();
 		/*
 		 * Try to acquire and then immediately release the console
 		 * semaphore.  The release will print out buffers and wake up
@@ -1902,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 */
 		if (console_trylock_spinning())
 			console_unlock();
+		preempt_enable();
 	}
 
 	return printed_len;
@@ -2229,20 +2236,7 @@ int console_trylock(void)
 		return 0;
 	}
 	console_locked = 1;
-	/*
-	 * When PREEMPT_COUNT disabled we can't reliably detect if it's
-	 * safe to schedule (e.g. calling printk while holding a spin_lock),
-	 * because preempt_disable()/preempt_enable() are just barriers there
-	 * and preempt_count() is always 0.
-	 *
-	 * RCU read sections have a separate preemption counter when
-	 * PREEMPT_RCU enabled thus we must take extra care and check
-	 * rcu_preempt_depth(), otherwise RCU read sections modify
-	 * preempt_count().
-	 */
-	console_may_schedule = !oops_in_progress &&
-			preemptible() &&
-			!rcu_preempt_depth();
+	console_may_schedule = 0;
 	return 1;
 }
 EXPORT_SYMBOL(console_trylock);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15 12:08                       ` Steven Rostedt
@ 2018-01-16  4:51                         ` Sergey Senozhatsky
  0 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16  4:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/15/18 07:08), Steven Rostedt wrote:
> On Fri, 12 Jan 2018 13:55:37 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
> > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
> > > PREEMPT kernels than !PREEMPT ones.  
> > 
> > I would say that the patch improves also console_unlock() but only in
> > non-preemttive context.
> > 
> > By other words, it makes console_unlock() finite in preemptible context
> > (limited by buffer size). It might still be unlimited in
> > non-preemtible context.
> 
> Since I'm worried most about printk(), I would argue to make printk
> console unlock always non-preempt.

+1


// The next stop is "victims of O(logbuf) memorial" station :)

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers
  2018-01-15 16:08             ` Petr Mladek
@ 2018-01-16  5:05               ` Sergey Senozhatsky
  0 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16  5:05 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel

On (01/15/18 17:08), Petr Mladek wrote:
> > Besides the typos (which should be fixed)...
> > 
> > Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> JFYI, I have fixed the typos, updated the commit message for
> the 1st patch and pushed all into printk.git,
> branch for-4.16-console-waiter-logic, see
> https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/log/?h=for-4.16-console-waiter-logic
> 
> I know that the discussion is not completely finished but it is
> somehow cycling. Sergey few times wrote that he would not block
> these patches. It is high time, I put it into linux-next. I could
> always remove it if decided in the discussion.

Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

at least we have preemption out of printk->user way (one of the
things I tried to tell you), which looks more like a step forward
to me personally.


p.s. the printk is still pretty far from what I want it to be.
     vprintk_emit() still can cause disturbance and damage in
     pretty unrelated places. e.g. hung tasks on console_sem,
     and so on. I'm going to keep my out-of-tree patches alive,
     may be they will be merged upstream in some form or another
     may be not.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15  8:51                         ` Petr Mladek
  2018-01-15  9:48                           ` Sergey Senozhatsky
@ 2018-01-16  5:16                           ` Sergey Senozhatsky
  2018-01-16  9:08                             ` Petr Mladek
  1 sibling, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16  5:16 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Tejun Heo, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On (01/15/18 09:51), Petr Mladek wrote:
> On Sat 2018-01-13 16:31:00, Sergey Senozhatsky wrote:
> > On (01/12/18 13:55), Petr Mladek wrote:
> > [..]
> > > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
> > > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
> > > > PREEMPT kernels than !PREEMPT ones.
> > > 
> > > I would say that the patch improves also console_unlock() but only in
> > > non-preemttive context.
> > > 
> > > By other words, it makes console_unlock() finite in preemptible context
> > > (limited by buffer size). It might still be unlimited in
> > > non-preemtible context.
> > 
> > could you elaborate a bit?
> 
> Ah, I am sorry, I swapped the conditions. I meant that
> console_unlock() is finite in non-preemptible context.

by the way. just for the record,

probably there is a way for us to have a task printing more than
O(logbuf) even in non-preemptible context.

	CPU0

	vprintk_emit()
	 preempt_disable()
	  console_unlock()
	  {
	   for (;;) {
                printk_safe_enter_irqsave()
	        call_console_drivers();
	        printk_safe_exit_irqrestore()

	<< IRQ >>
		dump_stack()
		 printk()->log_store()
		 ....
		 printk()->log_store()
	<< iret >>
	   }
	  }
	 preempt_enable()

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15 10:17                       ` Petr Mladek
  2018-01-15 11:50                         ` Petr Mladek
@ 2018-01-16  5:23                         ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16  5:23 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

Hi,

On (01/15/18 11:17), Petr Mladek wrote:
> Hi Sergey,
> 
> I wonder if there is still some miss understanding.
> 
> Steven and me are trying to get this patch in because we believe
> that it is a step forward. We know that it is not perfect. But
> we believe that it makes things better. In particular, it limits
> the time spent in console_unlock() in atomic context. It does
> not make it worse in preemptible context.
> 
> It does not block further improvements, including offloading
> to the kthread. We will happily discuss and review further
> improvements, it they prove to be necessary.
> 
> The advantage of this approach is that it is incremental. It should
> be easier for review and analyzing possible regressions.
> 
> What is the aim of your mails, please?
> Do you want to say that this patch might cause regressions?
> Or do you want to say that it does not solve all scenarios?
> 
> Please, answer the above questions. I am still confused.

I ACK-ed the patch set, given that I hope that we at least will
do (a)

a) remove preemption out of printk()->user critical path


---

b) the next thing would be - O(logbuf) is not a good boundary

c) the thing after that would be to - O(logbuf) boundary can be
   broken in both preemptible and non-preemptible contexts.

but (b) and (c) can wait.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-15 11:50                         ` Petr Mladek
@ 2018-01-16  6:10                           ` Sergey Senozhatsky
  2018-01-16  9:36                             ` Petr Mladek
  2018-01-16 16:06                             ` Steven Rostedt
  0 siblings, 2 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16  6:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

Hi,

On (01/15/18 12:50), Petr Mladek wrote:
> On Mon 2018-01-15 11:17:43, Petr Mladek wrote:
> > PS: Sergey, you have many good points. The printk-stuff is very
> > complex and we could spend years discussing the perfect solution.
> 
> BTW: One solution that comes to my mind is based on ideas
> already mentioned in this thread:
> 
> void console_unlock(void)
> {
> 	disable_preemtion();
> 
> 	while(pending_message) {
> 
> 	    call_console_drivers();
> 
> 	    if (too_long_here() && current != printk_kthread) {
> 	       wake_up_process(printk_kthread())
> 
> 	}
> 
> 	enable_preemtion();
> }

unfortunately disabling preemtion in console_unlock() is a bit
dangerous :( we have paths that call console_unlock() exactly
to flush everything (not only new pending messages, but everything)
that is in logbuf and we cannot return from console_unlock()
preliminary in that case.

> bool too_long_here(void)
> {
> 	return should_resched();
> or
> 	return spent_here() > 1 / HZ / 2;
> or
> 	what ever we agree on
> }
> 
> 
> int printk_kthread_func(void *data)
> {
> 	while(1) {
> 		 if (!pending_messaged)
> 			schedule();
> 
> 		if (console_trylock_spinning())
> 			console_unlock();
> 
> 		cond_resched();
> 	}
> }

overall that's very close to what I have in one of my private branches.
console_trylock_spinning() for some reason does not perform really
well on my made-up internal printk torture tests. it seems that I
have a much better stability (no lockups and so on) when I also let
printk_kthread to sleep on console_sem(). but I will look further.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16  5:16                           ` Sergey Senozhatsky
@ 2018-01-16  9:08                             ` Petr Mladek
  0 siblings, 0 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-16  9:08 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Tue 2018-01-16 14:16:22, Sergey Senozhatsky wrote:
> On (01/15/18 09:51), Petr Mladek wrote:
> > On Sat 2018-01-13 16:31:00, Sergey Senozhatsky wrote:
> > > On (01/12/18 13:55), Petr Mladek wrote:
> > > [..]
> > > > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my
> > > > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about
> > > > > PREEMPT kernels than !PREEMPT ones.
> > > > 
> > > > I would say that the patch improves also console_unlock() but only in
> > > > non-preemttive context.
> > > > 
> > > > By other words, it makes console_unlock() finite in preemptible context
> > > > (limited by buffer size). It might still be unlimited in
> > > > non-preemtible context.
> > > 
> > > could you elaborate a bit?
> > 
> > Ah, I am sorry, I swapped the conditions. I meant that
> > console_unlock() is finite in non-preemptible context.
> 
> by the way. just for the record,
> 
> probably there is a way for us to have a task printing more than
> O(logbuf) even in non-preemptible context.
> 
> 	CPU0
> 
> 	vprintk_emit()
> 	 preempt_disable()
> 	  console_unlock()
> 	  {
> 	   for (;;) {
>                 printk_safe_enter_irqsave()
> 	        call_console_drivers();
> 	        printk_safe_exit_irqrestore()
> 
> 	<< IRQ >>
> 		dump_stack()
> 		 printk()->log_store()
> 		 ....
> 		 printk()->log_store()
> 	<< iret >>
> 	   }
> 	  }
> 	 preempt_enable()

Great catch! And good to know about it when designing further
improvements.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16  6:10                           ` Sergey Senozhatsky
@ 2018-01-16  9:36                             ` Petr Mladek
  2018-01-16 10:10                               ` Sergey Senozhatsky
  2018-01-16 16:06                             ` Steven Rostedt
  1 sibling, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-16  9:36 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Tue 2018-01-16 15:10:13, Sergey Senozhatsky wrote:
> Hi,
> 
> On (01/15/18 12:50), Petr Mladek wrote:
> > On Mon 2018-01-15 11:17:43, Petr Mladek wrote:
> > > PS: Sergey, you have many good points. The printk-stuff is very
> > > complex and we could spend years discussing the perfect solution.
> > 
> > BTW: One solution that comes to my mind is based on ideas
> > already mentioned in this thread:
> > 
> > void console_unlock(void)
> > {
> > 	disable_preemtion();
> > 
> > 	while(pending_message) {
> > 
> > 	    call_console_drivers();
> > 
> > 	    if (too_long_here() && current != printk_kthread) {
> > 	       wake_up_process(printk_kthread())
> > 
> > 	}
> > 
> > 	enable_preemtion();
> > }
> 
> unfortunately disabling preemtion in console_unlock() is a bit
> dangerous :( we have paths that call console_unlock() exactly
> to flush everything (not only new pending messages, but everything)
> that is in logbuf and we cannot return from console_unlock()
> preliminary in that case.

You are right. Just to be sure. Are you talking about replaying
the entire log when a new console is registered? Or do you know
about more paths?

If I get it correctly, we allow to hand off the lock even when
replying the entire log. But you are right that we should
enable preemption in this case because there are many messages
even without printk() activity.

IMHO, the best solution would be to reply the log in a
separate process asynchronously and do not block existing
consoles in the meantime. But I am not sure if it is worth
the complexity. Anyway, it is a future work.


> > bool too_long_here(void)
> > {
> > 	return should_resched();
> > or
> > 	return spent_here() > 1 / HZ / 2;
> > or
> > 	what ever we agree on
> > }
> > 
> > 
> > int printk_kthread_func(void *data)
> > {
> > 	while(1) {
> > 		 if (!pending_messaged)
> > 			schedule();
> > 
> > 		if (console_trylock_spinning())
> > 			console_unlock();
> > 
> > 		cond_resched();
> > 	}
> > }
> 
> overall that's very close to what I have in one of my private branches.
> console_trylock_spinning() for some reason does not perform really
> well on my made-up internal printk torture tests. it seems that I
> have a much better stability (no lockups and so on) when I also let
> printk_kthread to sleep on console_sem(). but I will look further.

I believe that it is not trivial. console_trylock_spinning() is
tricky and the timing is important. For example, it might be tricky
if a torture test affects the normal workflow by many interrupts.
We might need to call even more console_unlock() code with
spinning enabled to improve the success ratio. Another problem
is that the kthread must be scheduled on another CPU. And so
on. I believe that there are many more problems and areas
for improvement.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16  9:36                             ` Petr Mladek
@ 2018-01-16 10:10                               ` Sergey Senozhatsky
  0 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-16 10:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/16/18 10:36), Petr Mladek wrote:
[..]
> > unfortunately disabling preemtion in console_unlock() is a bit
> > dangerous :( we have paths that call console_unlock() exactly
> > to flush everything (not only new pending messages, but everything)
> > that is in logbuf and we cannot return from console_unlock()
> > preliminary in that case.
> 
> You are right. Just to be sure. Are you talking about replaying
> the entire log when a new console is registered? Or do you know
> about more paths?

to the best of my knowledge CON_PRINTBUFFER is the only thing that
explicitly states
  "I want everything what's in logbuf, even if it has been already
   printed on other consoles"

the rest want to have only pending messages, so we can offload
from there.

CON_PRINTBUFFER registration can happen any time. e.g. via modprobe
netconsole. we can be up and running for some time when netconsole
joins in, so that CON_PRINTBUFFER thing can be painful.

> If I get it correctly, we allow to hand off the lock even when
> replying the entire log. But you are right that we should
> enable preemption in this case because there are many messages
> even without printk() activity.

> IMHO, the best solution would be to reply the log in a
> separate process asynchronously and do not block existing
> consoles in the meantime. But I am not sure if it is worth
> the complexity. Anyway, it is a future work.
[..]
> > > int printk_kthread_func(void *data)
> > > {
> > > 	while(1) {
> > > 		 if (!pending_messaged)
> > > 			schedule();
> > > 
> > > 		if (console_trylock_spinning())
> > > 			console_unlock();
> > > 
> > > 		cond_resched();
> > > 	}
> > > }
> > 
> > overall that's very close to what I have in one of my private branches.
> > console_trylock_spinning() for some reason does not perform really
> > well on my made-up internal printk torture tests. it seems that I
> > have a much better stability (no lockups and so on) when I also let
> > printk_kthread to sleep on console_sem(). but I will look further.
> 
> I believe that it is not trivial. console_trylock_spinning() is
> tricky and the timing is important.

yes, timing seems to be very important.
 *as far as I can see from the traces on my printk torture tests*

> For example, it might be tricky if a torture test affects the normal
> workflow by many interrupts. We might need to call even more
> console_unlock() code with spinning enabled to improve the success
> ratio. Another problem is that the kthread must be scheduled on
> another CPU.

yes, I always schedule it on another CPU [if any].

> And so on. I believe that there are many more problems and areas
> for improvement.

right.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16  2:23                           ` Sergey Senozhatsky
  2018-01-16  4:47                             ` Sergey Senozhatsky
@ 2018-01-16 10:13                             ` Petr Mladek
  2018-01-17  6:29                               ` Sergey Senozhatsky
  1 sibling, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-16 10:13 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Tue 2018-01-16 11:23:49, Sergey Senozhatsky wrote:
> On (01/15/18 15:45), Petr Mladek wrote:
> > > I think adding the preempt_disable() would fix printk() but let non
> > > printk console_unlock() still preempt.
> > 
> > I would personally remove cond_resched() from console_unlock()
> > completely.
> 
> hmm, not so sure. I think it's there for !PREEMPT systems which have
> to print a lot of messages. the case I'm speaking about in particular
> is when we register a CON_PRINTBUFFER console and need to console_unlock()
> (flush) all of the messages we currently have in the logbuf. we better
> have that cond_resched() there, I think.

Good point. I agree that we should keep the cond_resched() there
at least for now.


> > Sleeping in console_unlock() increases the chance that more messages
> > would need to be handled. And more importantly it reduces the chance
> > of a successful handover.
> > 
> > As a result, the caller might spend there very long time, it might
> > be getting increasingly far behind. There is higher risk of lost
> > messages. Also the eventual taker might have too much to proceed
> > in preemption disabled context.
> 
> yes.
> 
> > Removing cond_resched() is in sync with printk() priorities.
> 
> hmm, not sure. we have sleeping console_lock()->console_unlock() path
> for PREEMPT kernels, that cond_resched() makes the !PREEMPT kernels to
> have the same sleeping console_lock()->console_unlock().
> 
> printk()->console_unlock() seems to be a pretty independent thing,
> unfortunately (!), yet sleeping console_lock()->console_unlock()
> messes up with it a lot.

IMHO, the problem here is that console_lock is used to synchronize
too many things. It would be great to separate printk() duties
into a separate lock in the long term.

Anyway, I see it the following way. Most console_lock() callers
do the following things:

void foo()
{
	console_lock()
	foo_specific_work();
	console_unlock();
}

where console_unlock() flushes the printk buffer before actually
releasing the lock.

IMHO, it would make sense if flushing the printk buffer behaves
the same when called either from printk() or from any other path.
I mean that it should be aggressive and allow an effective
hand off.

It should be safe as long as foo_specific_work() does not take
too much time.

>From other side. The cond_resched() in console_unlock() should
be obsoleted by the hand-shake code.


> > The highest one is to get the messages out.
> > 
> > Finally, removing cond_resched() should make the behavior more
> > predictable (never preempted)
> 
> but we are always preempted in PREEMPT kernels when the current
> console_sem owner acquired the lock via console_lock(), not via
> console_trylock(). cond_resched() does the same, but for !PREEMPT.

I agree that the situation is more complicated for cond_resched()
called after console_lock(). I do not resist on removing it now.

Just one more thing. The time axe looks like:

+ cond_resched added into console_unlock in v4.5-rc1, Jan 15, 2016
     (commit 8d91f8b15361dfb438ab6)

+ preemtion enabled in printk in, v4.6-rc1, Mar 17, 2016
     (commit 6b97a20d3a7909daa0662)

They both were obvious solutions that helped to reduce the risk
of soft-lockups. The first one handled evidently safe scenarios.
The second one was even more aggressive. I would say that
they both were more or less add-hoc solutions that did not
take into account the other side effects (delaying output,
even loosing messages).

I would not say that one is a diametric difference between them.
Therefore if we remove one for a reason, we should think about
reverting the other as well. But again. I am fine if we remove
only one now.

Does this make any sense?

Best Regard,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16  4:47                             ` Sergey Senozhatsky
@ 2018-01-16 10:19                               ` Petr Mladek
  2018-01-17  2:24                                 ` Sergey Senozhatsky
  2018-01-16 15:45                               ` Steven Rostedt
  1 sibling, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-16 10:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Tue 2018-01-16 13:47:16, Sergey Senozhatsky wrote:
> if you don't mind, let me fix the thing that I broke.
> that would be responsible. I believe I also must say the following:
>   Tetsuo, many thanks for reporting the issues for song long, and
>   sorry that it took quite a while to revert that change.
> 
> 8<====
> 
> From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Subject: [PATCH] printk: never set console_may_schedule in console_trylock()
> 
> This patch, basically, reverts commit 6b97a20d3a79 ("printk:
> set may_schedule for some of console_trylock() callers").
> That commit was a mistake, it introduced a big dependency
> on the scheduler, by enabling preemption under console_sem
> in printk()->console_unlock() path, which is rather too
> critical. The patch did not significantly reduce the
> possibilities of printk() lockups, but made it possible to
> stall printk(), as has been reported by Tetsuo Handa [1].
> 
> Another issues is that preemption under console_sem also
> messes up with Steven Rostedt's hand off scheme, by making
> it possible to sleep with console_sem both in console_unlock()
> and in vprintk_emit(), after acquiring the console_sem
> ownership (anywhere between printk_safe_exit_irqrestore() in
> console_trylock_spinning() and printk_safe_enter_irqsave()
> in console_unlock()). This makes hand off less likely and,
> at the same time, may result in a significant amount of
> pending logbuf messages. Preempted console_sem owner makes
> it impossible for other CPUs to emit logbuf messages, but
> does not make it impossible for other CPUs to append new
> messages to the logbuf.
> 
> Reinstate the old behavior and make printk() non-preemptible.
> Should any printk() lockup reports arrive they must be handled
> in a different way.
> 
> [1] https://marc.info/?l=linux-mm&m=145692016122716
> Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

IMHO, this is a step in the right direction.

Reviewed-by: Petr Mladek <pmladek@suse.com>

I'll wait for Steven's review and push this into printk.git.
I'll also add your Acks for the other patches.

Thanks for the patch and the various observations.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16  4:47                             ` Sergey Senozhatsky
  2018-01-16 10:19                               ` Petr Mladek
@ 2018-01-16 15:45                               ` Steven Rostedt
  2018-01-17  2:18                                 ` Sergey Senozhatsky
  1 sibling, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-16 15:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park,
	Pavel Machek, linux-kernel

On Tue, 16 Jan 2018 13:47:16 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Subject: [PATCH] printk: never set console_may_schedule in console_trylock()
> 
> This patch, basically, reverts commit 6b97a20d3a79 ("printk:
> set may_schedule for some of console_trylock() callers").
> That commit was a mistake, it introduced a big dependency
> on the scheduler, by enabling preemption under console_sem
> in printk()->console_unlock() path, which is rather too
> critical. The patch did not significantly reduce the
> possibilities of printk() lockups, but made it possible to
> stall printk(), as has been reported by Tetsuo Handa [1].
> 
> Another issues is that preemption under console_sem also
> messes up with Steven Rostedt's hand off scheme, by making
> it possible to sleep with console_sem both in console_unlock()
> and in vprintk_emit(), after acquiring the console_sem
> ownership (anywhere between printk_safe_exit_irqrestore() in
> console_trylock_spinning() and printk_safe_enter_irqsave()
> in console_unlock()). This makes hand off less likely and,
> at the same time, may result in a significant amount of
> pending logbuf messages. Preempted console_sem owner makes
> it impossible for other CPUs to emit logbuf messages, but
> does not make it impossible for other CPUs to append new
> messages to the logbuf.
> 
> Reinstate the old behavior and make printk() non-preemptible.
> Should any printk() lockup reports arrive they must be handled
> in a different way.
> 
> [1] https://marc.info/?l=linux-mm&m=145692016122716

Especially since Konstantin is working on pulling in all LKML archives,
the above should be denoted as:

 Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp

Although the above is for linux-mm and not LKML (it still works), I
should ask Konstantin if he will be pulling in any of the other
archives. Perhaps have both? (in case marc.info goes away).

> Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")

Should we Cc stable@vger.kernel.org?

> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  kernel/printk/printk.c | 22 ++++++++--------------
>  1 file changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ffe05024c622..9cb943c90d98 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1895,6 +1895,12 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>  	/* If called from the scheduler, we can not call up(). */
>  	if (!in_sched) {
> +		/*
> +		 * Disable preemption to avoid being preempted while holding
> +		 * console_sem which would prevent anyone from printing to
> +		 * console
> +		 */
> +		preempt_disable();
>  		/*
>  		 * Try to acquire and then immediately release the console
>  		 * semaphore.  The release will print out buffers and wake up
> @@ -1902,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		 */
>  		if (console_trylock_spinning())
>  			console_unlock();
> +		preempt_enable();
>  	}
>  
>  	return printed_len;
> @@ -2229,20 +2236,7 @@ int console_trylock(void)
>  		return 0;
>  	}
>  	console_locked = 1;
> -	/*
> -	 * When PREEMPT_COUNT disabled we can't reliably detect if it's
> -	 * safe to schedule (e.g. calling printk while holding a spin_lock),
> -	 * because preempt_disable()/preempt_enable() are just barriers there
> -	 * and preempt_count() is always 0.
> -	 *
> -	 * RCU read sections have a separate preemption counter when
> -	 * PREEMPT_RCU enabled thus we must take extra care and check
> -	 * rcu_preempt_depth(), otherwise RCU read sections modify
> -	 * preempt_count().
> -	 */
> -	console_may_schedule = !oops_in_progress &&
> -			preemptible() &&
> -			!rcu_preempt_depth();
> +	console_may_schedule = 0;
>  	return 1;
>  }
>  EXPORT_SYMBOL(console_trylock);

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Thanks Sergey!

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16  6:10                           ` Sergey Senozhatsky
  2018-01-16  9:36                             ` Petr Mladek
@ 2018-01-16 16:06                             ` Steven Rostedt
  1 sibling, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-16 16:06 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Tue, 16 Jan 2018 15:10:13 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> overall that's very close to what I have in one of my private branches.
> console_trylock_spinning() for some reason does not perform really
> well on my made-up internal printk torture tests. it seems that I

One thing I noticed in my test with the module that does printks on all
cpus, was that the patch spreads out the processing of the consoles.
Before my patch, one printk user would be doing all the work, and all
the other printks only had to load their data into the logbuf then
exit. The majority of printks took a few microseconds, which looks
great if you ignore the one worker that is taking milliseconds to
complete. After my patch, since a printk that comes in while another
one was running would block, then it would start printing, it did
lengthen the time for individual printks to finish. Worst case it
would double the time to do printk. But it removed the burden of a
single printk doing all the work for all new printks that came in.

In other words, I would expect this to make printk on average slower.
But no longer unlimited.

-- Steve


> have a much better stability (no lockups and so on) when I also let
> printk_kthread to sleep on console_sem(). but I will look further.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-12  2:55               ` Steven Rostedt
  2018-01-12  4:20                 ` Steven Rostedt
@ 2018-01-16 19:44                 ` Tejun Heo
  2018-01-17  9:12                   ` Petr Mladek
  1 sibling, 1 reply; 140+ messages in thread
From: Tejun Heo @ 2018-01-16 19:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Hello, Steven.

On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote:
> All I did was start off a work queue on each CPU, and each CPU does one
> printk() followed by a millisecond sleep. No 10,000 printks, nothing
> in an interrupt handler. Preemption is disabled while the printk
> happens, but that's normal.
> 
> This is much closer to an OOM happening all over the system, where OOMs
> stack dumps are occurring on different CPUS.

OOMs can't happen all over the system.  It can only happen on a single
CPU at a time.  If you're printing from multiple CPUs, your solution
would work great.  That is the situation your patches are designed to
address to begin with.  That isn't the problem that I reported tho.  I
understand that your solution works for that class of problems and
that is great.  I really wish that it could address the other class of
problems too tho, and it doesn't seem like it would be that difficult
to cover both cases, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16 15:45                               ` Steven Rostedt
@ 2018-01-17  2:18                                 ` Sergey Senozhatsky
  2018-01-17 13:04                                   ` Petr Mladek
  0 siblings, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-17  2:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tetsuo Handa,
	Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/16/18 10:45), Steven Rostedt wrote:
[..]
> > [1] https://marc.info/?l=linux-mm&m=145692016122716
> 
> Especially since Konstantin is working on pulling in all LKML archives,
> the above should be denoted as:
> 
>  Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp

hm, may I ask why? is there a new rule now to percent-encode commit messages?

> Although the above is for linux-mm and not LKML (it still works), I
> should ask Konstantin if he will be pulling in any of the other
> archives. Perhaps have both? (in case marc.info goes away).
> 
> > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
> 
> Should we Cc stable@vger.kernel.org?

that's a good question... maybe yes, maybe no... I'd say this
change is "safer" when we have hand-off.

> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> Thanks Sergey!

thanks.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek
  2018-01-10 16:50   ` Steven Rostedt
  2018-01-12 16:54   ` Steven Rostedt
@ 2018-01-17  2:19   ` Byungchul Park
  2018-01-17  4:54     ` Byungchul Park
                       ` (2 more replies)
  2 siblings, 3 replies; 140+ messages in thread
From: Byungchul Park @ 2018-01-17  2:19 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt, Sergey Senozhatsky
  Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek,
	linux-kernel, kernel-team

On 1/10/2018 10:24 PM, Petr Mladek wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> This patch implements what I discussed in Kernel Summit. I added
> lockdep annotation (hopefully correctly), and it hasn't had any splats
> (since I fixed some bugs in the first iterations). It did catch
> problems when I had the owner covering too much. But now that the owner
> is only set when actively calling the consoles, lockdep has stayed
> quiet.
> 
> Here's the design again:
> 
> I added a "console_owner" which is set to a task that is actively
> writing to the consoles. It is *not* the same as the owner of the
> console_lock. It is only set when doing the calls to the console
> functions. It is protected by a console_owner_lock which is a raw spin
> lock.
> 
> There is a console_waiter. This is set when there is an active console
> owner that is not current, and waiter is not set. This too is protected
> by console_owner_lock.
> 
> In printk() when it tries to write to the consoles, we have:
> 
> 	if (console_trylock())
> 		console_unlock();
> 
> Now I added an else, which will check if there is an active owner, and
> no current waiter. If that is the case, then console_waiter is set, and
> the task goes into a spin until it is no longer set.
> 
> When the active console owner finishes writing the current message to
> the consoles, it grabs the console_owner_lock and sees if there is a
> waiter, and clears console_owner.
> 
> If there is a waiter, then it breaks out of the loop, clears the waiter
> flag (because that will release the waiter from its spin), and exits.
> Note, it does *not* release the console semaphore. Because it is a
> semaphore, there is no owner. Another task may release it. This means
> that the waiter is guaranteed to be the new console owner! Which it
> becomes.
> 
> Then the waiter calls console_unlock() and continues to write to the
> consoles.
> 
> If another task comes along and does a printk() it too can become the
> new waiter, and we wash rinse and repeat!
> 
> By Petr Mladek about possible new deadlocks:
> 
> The thing is that we move console_sem only to printk() call
> that normally calls console_unlock() as well. It means that
> the transferred owner should not bring new type of dependencies.
> As Steven said somewhere: "If there is a deadlock, it was
> there even before."
> 
> We could look at it from this side. The possible deadlock would
> look like:
> 
> CPU0                            CPU1
> 
> console_unlock()
> 
>    console_owner = current;
> 
> 				spin_lockA()
> 				  printk()
> 				    spin = true;
> 				    while (...)
> 
>      call_console_drivers()
>        spin_lockA()
> 
> This would be a deadlock. CPU0 would wait for the lock A.
> While CPU1 would own the lockA and would wait for CPU0
> to finish calling the console drivers and pass the console_sem
> owner.
> 
> But if the above is true than the following scenario was
> already possible before:
> 
> CPU0
> 
> spin_lockA()
>    printk()
>      console_unlock()
>        call_console_drivers()
> 	spin_lockA()
> 
> By other words, this deadlock was there even before. Such
> deadlocks are prevented by using printk_deferred() in
> the sections guarded by the lock A.

Hello,

I didn't see what you did, at the last version. You were
tring to transfer the semaphore owner and make it taken
over. I see.

But, what I mentioned last time is still valid. See below.

> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> [pmladek@suse.com: Commit message about possible deadlocks]
> ---
>   kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 107 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b9006617710f..7e6459abba43 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers);
>   static struct lockdep_map console_lock_dep_map = {
>   	.name = "console_lock"
>   };
> +static struct lockdep_map console_owner_dep_map = {
> +	.name = "console_owner"
> +};
>   #endif
>   
> +static DEFINE_RAW_SPINLOCK(console_owner_lock);
> +static struct task_struct *console_owner;
> +static bool console_waiter;
> +
>   enum devkmsg_log_bits {
>   	__DEVKMSG_LOG_BIT_ON = 0,
>   	__DEVKMSG_LOG_BIT_OFF,
> @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level,
>   		 * semaphore.  The release will print out buffers and wake up
>   		 * /dev/kmsg and syslog() users.
>   		 */
> -		if (console_trylock())
> +		if (console_trylock()) {
>   			console_unlock();
> +		} else {
> +			struct task_struct *owner = NULL;
> +			bool waiter;
> +			bool spin = false;
> +
> +			printk_safe_enter_irqsave(flags);
> +
> +			raw_spin_lock(&console_owner_lock);
> +			owner = READ_ONCE(console_owner);
> +			waiter = READ_ONCE(console_waiter);
> +			if (!waiter && owner && owner != current) {
> +				WRITE_ONCE(console_waiter, true);
> +				spin = true;
> +			}
> +			raw_spin_unlock(&console_owner_lock);
> +
> +			/*
> +			 * If there is an active printk() writing to the
> +			 * consoles, instead of having it write our data too,
> +			 * see if we can offload that load from the active
> +			 * printer, and do some printing ourselves.
> +			 * Go into a spin only if there isn't already a waiter
> +			 * spinning, and there is an active printer, and
> +			 * that active printer isn't us (recursive printk?).
> +			 */
> +			if (spin) {
> +				/* We spin waiting for the owner to release us */
> +				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> +				/* Owner will clear console_waiter on hand off */
> +				while (READ_ONCE(console_waiter))
> +					cpu_relax();
> +
> +				spin_release(&console_owner_dep_map, 1, _THIS_IP_);

Why don't you move this over "while (READ_ONCE(console_waiter))" and
right after acquire()?

As I said last time, only acquisitions between acquire() and release()
are meaningful. Are you taking care of acquisitions within cpu_relax()?
If so, leave it.

> +				printk_safe_exit_irqrestore(flags);
> +
> +				/*
> +				 * The owner passed the console lock to us.
> +				 * Since we did not spin on console lock, annotate
> +				 * this as a trylock. Otherwise lockdep will
> +				 * complain.
> +				 */
> +				mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
> +				console_unlock();
> +				printk_safe_enter_irqsave(flags);
> +			}
> +			printk_safe_exit_irqrestore(flags);
> +
> +		}
>   	}
>   
>   	return printed_len;
> @@ -2141,6 +2196,7 @@ void console_unlock(void)
>   	static u64 seen_seq;
>   	unsigned long flags;
>   	bool wake_klogd = false;
> +	bool waiter = false;
>   	bool do_cond_resched, retry;
>   
>   	if (console_suspended) {
> @@ -2229,14 +2285,64 @@ void console_unlock(void)
>   		console_seq++;
>   		raw_spin_unlock(&logbuf_lock);
>   
> +		/*
> +		 * While actively printing out messages, if another printk()
> +		 * were to occur on another CPU, it may wait for this one to
> +		 * finish. This task can not be preempted if there is a
> +		 * waiter waiting to take over.
> +		 */
> +		raw_spin_lock(&console_owner_lock);
> +		console_owner = current;
> +		raw_spin_unlock(&console_owner_lock);
> +
> +		/* The waiter may spin on us after setting console_owner */
> +		spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> +
>   		stop_critical_timings();	/* don't trace print latency */
>   		call_console_drivers(ext_text, ext_len, text, len);
>   		start_critical_timings();
> +
> +		raw_spin_lock(&console_owner_lock);
> +		waiter = READ_ONCE(console_waiter);
> +		console_owner = NULL;
> +		raw_spin_unlock(&console_owner_lock);
> +
> +		/*
> +		 * If there is a waiter waiting for us, then pass the
> +		 * rest of the work load over to that waiter.
> +		 */
> +		if (waiter)
> +			break;
> +
> +		/* There was no waiter, and nothing will spin on us here */
> +		spin_release(&console_owner_dep_map, 1, _THIS_IP_);

Why don't you move this over "if (waiter)"?

> +
>   		printk_safe_exit_irqrestore(flags);
>   
>   		if (do_cond_resched)
>   			cond_resched();
>   	}
> +
> +	/*
> +	 * If there is an active waiter waiting on the console_lock.
> +	 * Pass off the printing to the waiter, and the waiter
> +	 * will continue printing on its CPU, and when all writing
> +	 * has finished, the last printer will wake up klogd.
> +	 */
> +	if (waiter) {
> +		WRITE_ONCE(console_waiter, false);
> +		/* The waiter is now free to continue */
> +		spin_release(&console_owner_dep_map, 1, _THIS_IP_);

Why don't you remove this release() after relocating the upper one?

> +		/*
> +		 * Hand off console_lock to waiter. The waiter will perform
> +		 * the up(). After this, the waiter is the console_lock owner.
> +		 */
> +		mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
> +		printk_safe_exit_irqrestore(flags);
> +		/* Note, if waiter is set, logbuf_lock is not held */
> +		return;
> +	}
> +
>   	console_locked = 0;
>   
>   	/* Release the exclusive_console once it is used */
> 

-- 
Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16 10:19                               ` Petr Mladek
@ 2018-01-17  2:24                                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-17  2:24 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Tetsuo Handa,
	Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/16/18 11:19), Petr Mladek wrote:
[..]
> > [1] https://marc.info/?l=linux-mm&m=145692016122716
> > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> IMHO, this is a step in the right direction.
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> 
> I'll wait for Steven's review and push this into printk.git.
> I'll also add your Acks for the other patches.
> 
> Thanks for the patch and the various observations.

thanks!


a side note,

our console output is still largely preemptible. a typical system
acquires console_sem via console_lock() all the time, so we still
can have "where is my printk output?" cases.


for instance, my IDLE PREEMPT x86 box, has the following stats

uptime 15 min

# of console_lock() calls: 10981          // can sleep under console_sem
# of vprintk_emit() calls: 825            // cannot sleep under console_sem

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-17  2:19   ` Byungchul Park
@ 2018-01-17  4:54     ` Byungchul Park
  2018-01-17  7:34     ` Byungchul Park
  2018-01-17 12:04     ` Petr Mladek
  2 siblings, 0 replies; 140+ messages in thread
From: Byungchul Park @ 2018-01-17  4:54 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt, Sergey Senozhatsky
  Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek,
	linux-kernel, kernel-team

On 1/17/2018 11:19 AM, Byungchul Park wrote:
> On 1/10/2018 10:24 PM, Petr Mladek wrote:
>> From: Steven Rostedt <rostedt@goodmis.org>
>>
>> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
>>
>> This patch implements what I discussed in Kernel Summit. I added
>> lockdep annotation (hopefully correctly), and it hasn't had any splats
>> (since I fixed some bugs in the first iterations). It did catch
>> problems when I had the owner covering too much. But now that the owner
>> is only set when actively calling the consoles, lockdep has stayed
>> quiet.
>>
>> Here's the design again:
>>
>> I added a "console_owner" which is set to a task that is actively
>> writing to the consoles. It is *not* the same as the owner of the
>> console_lock. It is only set when doing the calls to the console
>> functions. It is protected by a console_owner_lock which is a raw spin
>> lock.
>>
>> There is a console_waiter. This is set when there is an active console
>> owner that is not current, and waiter is not set. This too is protected
>> by console_owner_lock.
>>
>> In printk() when it tries to write to the consoles, we have:
>>
>>     if (console_trylock())
>>         console_unlock();
>>
>> Now I added an else, which will check if there is an active owner, and
>> no current waiter. If that is the case, then console_waiter is set, and
>> the task goes into a spin until it is no longer set.
>>
>> When the active console owner finishes writing the current message to
>> the consoles, it grabs the console_owner_lock and sees if there is a
>> waiter, and clears console_owner.
>>
>> If there is a waiter, then it breaks out of the loop, clears the waiter
>> flag (because that will release the waiter from its spin), and exits.
>> Note, it does *not* release the console semaphore. Because it is a
>> semaphore, there is no owner. Another task may release it. This means
>> that the waiter is guaranteed to be the new console owner! Which it
>> becomes.
>>
>> Then the waiter calls console_unlock() and continues to write to the
>> consoles.
>>
>> If another task comes along and does a printk() it too can become the
>> new waiter, and we wash rinse and repeat!
>>
>> By Petr Mladek about possible new deadlocks:
>>
>> The thing is that we move console_sem only to printk() call
>> that normally calls console_unlock() as well. It means that
>> the transferred owner should not bring new type of dependencies.
>> As Steven said somewhere: "If there is a deadlock, it was
>> there even before."
>>
>> We could look at it from this side. The possible deadlock would
>> look like:
>>
>> CPU0                            CPU1
>>
>> console_unlock()
>>
>>    console_owner = current;
>>
>>                 spin_lockA()
>>                   printk()
>>                     spin = true;
>>                     while (...)
>>
>>      call_console_drivers()
>>        spin_lockA()
>>
>> This would be a deadlock. CPU0 would wait for the lock A.
>> While CPU1 would own the lockA and would wait for CPU0
>> to finish calling the console drivers and pass the console_sem
>> owner.
>>
>> But if the above is true than the following scenario was
>> already possible before:
>>
>> CPU0
>>
>> spin_lockA()
>>    printk()
>>      console_unlock()
>>        call_console_drivers()
>>     spin_lockA()
>>
>> By other words, this deadlock was there even before. Such
>> deadlocks are prevented by using printk_deferred() in
>> the sections guarded by the lock A.
> 
> Hello,
> 
> I didn't see what you did, at the last version. You were
> tring to transfer the semaphore owner and make it taken
> over. I see.
> 
> But, what I mentioned last time is still valid. See below.

Of course, it's not an important thing but trivial one though.

-- 
Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16 10:13                             ` Petr Mladek
@ 2018-01-17  6:29                               ` Sergey Senozhatsky
  0 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-17  6:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky,
	Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/16/18 11:13), Petr Mladek wrote:
[..]
> IMHO, it would make sense if flushing the printk buffer behaves
> the same when called either from printk() or from any other path.
> I mean that it should be aggressive and allow an effective
> hand off.
> 
> It should be safe as long as foo_specific_work() does not take
> too much time.
> 
> From other side. The cond_resched() in console_unlock() should
> be obsoleted by the hand-shake code.

hm, let's not have too optimistic expectations. hand off works in very
specific conditions. console is not exclusively owned by printk, and
console_sem is not printk's own lock. even things like

	systemd -> n_tty_write -> do_output_char -> con_write

involves console_lock() and console_unlock(). IOW user space
logging/debugging can cause printk stalls, and vice versa.

by the way, do_con_write() explicitly calls console_conditional_schedule()
under console_sem, before it goes to console_unlock(). so the scope of
"situation normal, console_sem locked, the owner scheduled out" is much
bigger than just vprintk_emit() -> console_unlock(). IMHO.

and there are even more things there. personally, I don't think
that hand off is enough to obsolete anything in that area.

[...]
> They both were obvious solutions that helped to reduce the risk
> of soft-lockups. The first one handled evidently safe scenarios.
> The second one was even more aggressive. I would say that
> they both were more or less add-hoc solutions that did not
> take into account the other side effects (delaying output,
> even loosing messages).

agreed.

> I would not say that one is a diametric difference between them.
> Therefore if we remove one for a reason, we should think about
> reverting the other as well. But again. I am fine if we remove
> only one now.
> 
> Does this make any sense?

I see cond_resched() as a mirroring of console_lock()->console_unlock()
behaviour on PREEMPT systems, and as such it looks valid to me, so we
probably better keep it there. IMHO.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-17  2:19   ` Byungchul Park
  2018-01-17  4:54     ` Byungchul Park
@ 2018-01-17  7:34     ` Byungchul Park
  2018-01-17 12:04     ` Petr Mladek
  2 siblings, 0 replies; 140+ messages in thread
From: Byungchul Park @ 2018-01-17  7:34 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt, Sergey Senozhatsky
  Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek,
	linux-kernel, kernel-team

On 1/17/2018 11:19 AM, Byungchul Park wrote:
> On 1/10/2018 10:24 PM, Petr Mladek wrote:
>> From: Steven Rostedt <rostedt@goodmis.org>
>>
>> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
>>
>> This patch implements what I discussed in Kernel Summit. I added
>> lockdep annotation (hopefully correctly), and it hasn't had any splats
>> (since I fixed some bugs in the first iterations). It did catch
>> problems when I had the owner covering too much. But now that the owner
>> is only set when actively calling the consoles, lockdep has stayed
>> quiet.
>>
>> Here's the design again:
>>
>> I added a "console_owner" which is set to a task that is actively
>> writing to the consoles. It is *not* the same as the owner of the
>> console_lock. It is only set when doing the calls to the console
>> functions. It is protected by a console_owner_lock which is a raw spin
>> lock.
>>
>> There is a console_waiter. This is set when there is an active console
>> owner that is not current, and waiter is not set. This too is protected
>> by console_owner_lock.
>>
>> In printk() when it tries to write to the consoles, we have:
>>
>>     if (console_trylock())
>>         console_unlock();
>>
>> Now I added an else, which will check if there is an active owner, and
>> no current waiter. If that is the case, then console_waiter is set, and
>> the task goes into a spin until it is no longer set.
>>
>> When the active console owner finishes writing the current message to
>> the consoles, it grabs the console_owner_lock and sees if there is a
>> waiter, and clears console_owner.
>>
>> If there is a waiter, then it breaks out of the loop, clears the waiter
>> flag (because that will release the waiter from its spin), and exits.
>> Note, it does *not* release the console semaphore. Because it is a
>> semaphore, there is no owner. Another task may release it. This means
>> that the waiter is guaranteed to be the new console owner! Which it
>> becomes.
>>
>> Then the waiter calls console_unlock() and continues to write to the
>> consoles.
>>
>> If another task comes along and does a printk() it too can become the
>> new waiter, and we wash rinse and repeat!
>>
>> By Petr Mladek about possible new deadlocks:
>>
>> The thing is that we move console_sem only to printk() call
>> that normally calls console_unlock() as well. It means that
>> the transferred owner should not bring new type of dependencies.
>> As Steven said somewhere: "If there is a deadlock, it was
>> there even before."
>>
>> We could look at it from this side. The possible deadlock would
>> look like:
>>
>> CPU0                            CPU1
>>
>> console_unlock()
>>
>>    console_owner = current;
>>
>>                 spin_lockA()
>>                   printk()
>>                     spin = true;
>>                     while (...)
>>
>>      call_console_drivers()
>>        spin_lockA()
>>
>> This would be a deadlock. CPU0 would wait for the lock A.
>> While CPU1 would own the lockA and would wait for CPU0
>> to finish calling the console drivers and pass the console_sem
>> owner.
>>
>> But if the above is true than the following scenario was
>> already possible before:
>>
>> CPU0
>>
>> spin_lockA()
>>    printk()
>>      console_unlock()
>>        call_console_drivers()
>>     spin_lockA()
>>
>> By other words, this deadlock was there even before. Such
>> deadlocks are prevented by using printk_deferred() in
>> the sections guarded by the lock A.
> 
> Hello,
> 
> I didn't see what you did, at the last version. You were
> tring to transfer the semaphore owner and make it taken
> over. I see.
> 
> But, what I mentioned last time is still valid. See below.
> 
>> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
>> [pmladek@suse.com: Commit message about possible deadlocks]
>> ---
>>   kernel/printk/printk.c | 108 
>> ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 107 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index b9006617710f..7e6459abba43 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers);
>>   static struct lockdep_map console_lock_dep_map = {
>>       .name = "console_lock"
>>   };
>> +static struct lockdep_map console_owner_dep_map = {
>> +    .name = "console_owner"
>> +};
>>   #endif
>> +static DEFINE_RAW_SPINLOCK(console_owner_lock);
>> +static struct task_struct *console_owner;
>> +static bool console_waiter;
>> +
>>   enum devkmsg_log_bits {
>>       __DEVKMSG_LOG_BIT_ON = 0,
>>       __DEVKMSG_LOG_BIT_OFF,
>> @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int 
>> level,
>>            * semaphore.  The release will print out buffers and wake up
>>            * /dev/kmsg and syslog() users.
>>            */
>> -        if (console_trylock())
>> +        if (console_trylock()) {
>>               console_unlock();
>> +        } else {
>> +            struct task_struct *owner = NULL;
>> +            bool waiter;
>> +            bool spin = false;
>> +
>> +            printk_safe_enter_irqsave(flags);
>> +
>> +            raw_spin_lock(&console_owner_lock);
>> +            owner = READ_ONCE(console_owner);
>> +            waiter = READ_ONCE(console_waiter);
>> +            if (!waiter && owner && owner != current) {
>> +                WRITE_ONCE(console_waiter, true);
>> +                spin = true;
>> +            }
>> +            raw_spin_unlock(&console_owner_lock);
>> +
>> +            /*
>> +             * If there is an active printk() writing to the
>> +             * consoles, instead of having it write our data too,
>> +             * see if we can offload that load from the active
>> +             * printer, and do some printing ourselves.
>> +             * Go into a spin only if there isn't already a waiter
>> +             * spinning, and there is an active printer, and
>> +             * that active printer isn't us (recursive printk?).
>> +             */
>> +            if (spin) {
>> +                /* We spin waiting for the owner to release us */
>> +                spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
>> +                /* Owner will clear console_waiter on hand off */
>> +                while (READ_ONCE(console_waiter))
>> +                    cpu_relax();
>> +
>> +                spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> 
> Why don't you move this over "while (READ_ONCE(console_waiter))" and
> right after acquire()?
> 
> As I said last time, only acquisitions between acquire() and release()
> are meaningful. Are you taking care of acquisitions within cpu_relax()?
> If so, leave it.

In addition, this way would be correct if you intended to use
cross-lock's map here, assuming cross-release alive..

But anyway this is just a typical acquire/release pair so we
don't usually use the pair in this way.

-- 
Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-16 19:44                 ` Tejun Heo
@ 2018-01-17  9:12                   ` Petr Mladek
  2018-01-17 15:15                     ` Tejun Heo
  0 siblings, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-17  9:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Tue 2018-01-16 11:44:56, Tejun Heo wrote:
> Hello, Steven.
> 
> On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote:
> > All I did was start off a work queue on each CPU, and each CPU does one
> > printk() followed by a millisecond sleep. No 10,000 printks, nothing
> > in an interrupt handler. Preemption is disabled while the printk
> > happens, but that's normal.
> > 
> > This is much closer to an OOM happening all over the system, where OOMs
> > stack dumps are occurring on different CPUS.
> 
> OOMs can't happen all over the system.  It can only happen on a single
> CPU at a time.  If you're printing from multiple CPUs, your solution
> would work great.  That is the situation your patches are designed to
> address to begin with.  That isn't the problem that I reported tho.  I
> understand that your solution works for that class of problems and
> that is great.  I really wish that it could address the other class of
> problems too tho, and it doesn't seem like it would be that difficult
> to cover both cases, right?

IMHO, the bad scenario with OOM was that any printk() called in
the OOM report became console_lock owner and was responsible
for pushing all new messages to the console. There was a possible
livelock because OOM Killer was blocked in console_unlock() while
other CPUs repeatedly complained about failed allocations.

Even the current patch should help. It allows to hand off
the console_lock to another CPU and OOM killer could eventually
continue.

Of course, it is possible that it might not be enough. For example,
there might still be too many messages to print when the memory is
freed. Therefore there will be no more complains, no more
hand offs and the last console_lock owner might still
cause softlockup. But it still will be better than
the livelockup. Of course, we will need to address
the softlockup. But let's see how this works in practice.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-17  2:19   ` Byungchul Park
  2018-01-17  4:54     ` Byungchul Park
  2018-01-17  7:34     ` Byungchul Park
@ 2018-01-17 12:04     ` Petr Mladek
  2018-01-18  1:53       ` Byungchul Park
  2 siblings, 1 reply; 140+ messages in thread
From: Petr Mladek @ 2018-01-17 12:04 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On Wed 2018-01-17 11:19:53, Byungchul Park wrote:
> On 1/10/2018 10:24 PM, Petr Mladek wrote:
> > From: Steven Rostedt <rostedt@goodmis.org>
> > By Petr Mladek about possible new deadlocks:
> > 
> > The thing is that we move console_sem only to printk() call
> > that normally calls console_unlock() as well. It means that
> > the transferred owner should not bring new type of dependencies.
> > As Steven said somewhere: "If there is a deadlock, it was
> > there even before."
> > 
> > We could look at it from this side. The possible deadlock would
> > look like:
> > 
> > CPU0                            CPU1
> > 
> > console_unlock()
> > 
> >    console_owner = current;
> > 
> > 				spin_lockA()
> > 				  printk()
> > 				    spin = true;
> > 				    while (...)
> > 
> >      call_console_drivers()
> >        spin_lockA()
> > 
> > This would be a deadlock. CPU0 would wait for the lock A.
> > While CPU1 would own the lockA and would wait for CPU0
> > to finish calling the console drivers and pass the console_sem
> > owner.
> > 
> > But if the above is true than the following scenario was
> > already possible before:
> > 
> > CPU0
> > 
> > spin_lockA()
> >    printk()
> >      console_unlock()
> >        call_console_drivers()
> > 	spin_lockA()
> > 
> > By other words, this deadlock was there even before. Such
> > deadlocks are prevented by using printk_deferred() in
> > the sections guarded by the lock A.
> 
> Hello,
> 
> I didn't see what you did, at the last version. You were
> tring to transfer the semaphore owner and make it taken
> over. I see.

I realized that I did not understand lockdep and especially
the cross-release stuff enough to be sure about the annotations.
In addition, the cross-release feature was removed, ...

Instead, I made a proof by contradiction. A very simplified
summary is mentioned in the commit message above. I believe
that the new dependency actually does not bring any new risk
of a deadlock.

Anyway, the last version of the code can be found in printk.git,
for-4.16-console-waiter-logic branch, see
https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/log/?h=for-4.16-console-waiter-logic

It is also merged into linux-next.

> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index b9006617710f..7e6459abba43 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level,
> >   		 * semaphore.  The release will print out buffers and wake up
> >   		 * /dev/kmsg and syslog() users.
> >   		 */
> > -		if (console_trylock())
> > +		if (console_trylock()) {
> >   			console_unlock();
> > +		} else {
> > +			struct task_struct *owner = NULL;
> > +			bool waiter;
> > +			bool spin = false;
> > +
> > +			printk_safe_enter_irqsave(flags);
> > +
> > +			raw_spin_lock(&console_owner_lock);
> > +			owner = READ_ONCE(console_owner);
> > +			waiter = READ_ONCE(console_waiter);
> > +			if (!waiter && owner && owner != current) {
> > +				WRITE_ONCE(console_waiter, true);
> > +				spin = true;
> > +			}
> > +			raw_spin_unlock(&console_owner_lock);
> > +
> > +			/*
> > +			 * If there is an active printk() writing to the
> > +			 * consoles, instead of having it write our data too,
> > +			 * see if we can offload that load from the active
> > +			 * printer, and do some printing ourselves.
> > +			 * Go into a spin only if there isn't already a waiter
> > +			 * spinning, and there is an active printer, and
> > +			 * that active printer isn't us (recursive printk?).
> > +			 */
> > +			if (spin) {
> > +				/* We spin waiting for the owner to release us */
> > +				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> > +				/* Owner will clear console_waiter on hand off */
> > +				while (READ_ONCE(console_waiter))
> > +					cpu_relax();
> > +
> > +				spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> 
> Why don't you move this over "while (READ_ONCE(console_waiter))" and
> right after acquire()?
>
> As I said last time, only acquisitions between acquire() and release()
> are meaningful. Are you taking care of acquisitions within cpu_relax()?
> If so, leave it.

We are simulating a spinlock here. The above code corresponds to

	    spin_lock(&console_owner_spin_lock);
	    spin_unlock(&console_owner_spin_lock);

I mean that spin_acquire() + while-cycle corresponds
to spin_lock(). And spin_release() corresponds to
spin_unlock().

> > +				printk_safe_exit_irqrestore(flags);
> > +
> > +				/*
> > +				 * The owner passed the console lock to us.
> > +				 * Since we did not spin on console lock, annotate
> > +				 * this as a trylock. Otherwise lockdep will
> > +				 * complain.
> > +				 */
> > +				mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
> > +				console_unlock();
> > +				printk_safe_enter_irqsave(flags);
> > +			}
> > +			printk_safe_exit_irqrestore(flags);
> > +
> > +		}
> >   	}
> >   	return printed_len;
> > @@ -2141,6 +2196,7 @@ void console_unlock(void)
> >   	static u64 seen_seq;
> >   	unsigned long flags;
> >   	bool wake_klogd = false;
> > +	bool waiter = false;
> >   	bool do_cond_resched, retry;
> >   	if (console_suspended) {
> > @@ -2229,14 +2285,64 @@ void console_unlock(void)
> >   		console_seq++;
> >   		raw_spin_unlock(&logbuf_lock);
> > +		/*
> > +		 * While actively printing out messages, if another printk()
> > +		 * were to occur on another CPU, it may wait for this one to
> > +		 * finish. This task can not be preempted if there is a
> > +		 * waiter waiting to take over.
> > +		 */
> > +		raw_spin_lock(&console_owner_lock);
> > +		console_owner = current;
> > +		raw_spin_unlock(&console_owner_lock);
> > +
> > +		/* The waiter may spin on us after setting console_owner */
> > +		spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> > +
> >   		stop_critical_timings();	/* don't trace print latency */
> >   		call_console_drivers(ext_text, ext_len, text, len);
> >   		start_critical_timings();
> > +
> > +		raw_spin_lock(&console_owner_lock);
> > +		waiter = READ_ONCE(console_waiter);
> > +		console_owner = NULL;
> > +		raw_spin_unlock(&console_owner_lock);
> > +
> > +		/*
> > +		 * If there is a waiter waiting for us, then pass the
> > +		 * rest of the work load over to that waiter.
> > +		 */
> > +		if (waiter)
> > +			break;
> > +
> > +		/* There was no waiter, and nothing will spin on us here */
> > +		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> 
> Why don't you move this over "if (waiter)"?

We want to actually release the lock before calling spin_release,
see below.


> > +
> >   		printk_safe_exit_irqrestore(flags);
> >   		if (do_cond_resched)
> >   			cond_resched();
> >   	}
> > +
> > +	/*
> > +	 * If there is an active waiter waiting on the console_lock.
> > +	 * Pass off the printing to the waiter, and the waiter
> > +	 * will continue printing on its CPU, and when all writing
> > +	 * has finished, the last printer will wake up klogd.
> > +	 */
> > +	if (waiter) {
> > +		WRITE_ONCE(console_waiter, false);
> > +		/* The waiter is now free to continue */
> > +		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
> 
> Why don't you remove this release() after relocating the upper one?

The manipulation of "console_waiter" implements the spin_lock that
we are trying to simulate. It is such easy because it is guaranteed
that there is always only one process that tries to get this
fake spin_lock. Also the other waiter releases the spin lock
immediately after it gets it.

I mean that WRITE_ONCE(console_waiter, false) causes that
the simulated spin lock is released here. Also the while-cycle
in vprintk_emit() succeeds. The while-cycle success means
that vprintk_emit() actually acquires the simulated spinlock.

This synchronization is need to make sure that the two processes
pass the console_lock ownership at the right place.

I think that at least this simulated spin lock is annotated the right
way by console_owner_dep_map manipulations. And I think that we
do not need the cross-release feature to simulate this spin lock.


> > +		/*
> > +		 * Hand off console_lock to waiter. The waiter will perform
> > +		 * the up(). After this, the waiter is the console_lock owner.
> > +		 */
> > +		mutex_release(&console_lock_dep_map, 1, _THIS_IP_);

The cross-release feature might be needed here. The above annotation
says that the semaphore is release here. In reality, it is released
in the process that calls vprintk_emit(). We actually just passed the
ownership here.

Does this make any sense? Could we do better using the existing
lockdep annotations?

If you have a better solution, it might make sense to send a patch
on top of linux-next. There is a commit that moved these code
into three helper functions:

    console_lock_spinning_enable()
    console_lock_spinning_disable_and_check()
    console_trylock_spinning()

See
https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/commit/?h=for-4.16-console-waiter-logic&id=c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab

Best Regards,
Petr

> > +		printk_safe_exit_irqrestore(flags);
> > +		/* Note, if waiter is set, logbuf_lock is not held */
> > +		return;
> > +	}
> > +
> >   	console_locked = 0;
> >   	/* Release the exclusive_console once it is used */
> > 
> 
> -- 
> Thanks,
> Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17  2:18                                 ` Sergey Senozhatsky
@ 2018-01-17 13:04                                   ` Petr Mladek
  2018-01-17 15:24                                     ` Steven Rostedt
  2018-01-18  4:31                                     ` Sergey Senozhatsky
  0 siblings, 2 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-17 13:04 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> On (01/16/18 10:45), Steven Rostedt wrote:
> [..]
> > > [1] https://marc.info/?l=linux-mm&m=145692016122716
> > 
> > Especially since Konstantin is working on pulling in all LKML archives,
> > the above should be denoted as:
> > 
> >  Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
> 
> hm, may I ask why? is there a new rule now to percent-encode commit messages?

IMHO, the most important thing is that Steven's link is based
on the Message-ID and the stable redirector
https://lkml.kernel.org/. It has a better chance to work
even in the future.

I have been asked by other people to use this type
of links as well.

> > Although the above is for linux-mm and not LKML (it still works), I
> > should ask Konstantin if he will be pulling in any of the other
> > archives. Perhaps have both? (in case marc.info goes away).
> > 
> > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers")
> > 
> > Should we Cc stable@vger.kernel.org?
> 
> that's a good question... maybe yes, maybe no... I'd say this
> change is "safer" when we have hand-off.

I would keep it as is in stable kernels unless there are
many bug reports.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17  9:12                   ` Petr Mladek
@ 2018-01-17 15:15                     ` Tejun Heo
  2018-01-17 17:12                       ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Tejun Heo @ 2018-01-17 15:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Hello,

On Wed, Jan 17, 2018 at 10:12:08AM +0100, Petr Mladek wrote:
> IMHO, the bad scenario with OOM was that any printk() called in
> the OOM report became console_lock owner and was responsible
> for pushing all new messages to the console. There was a possible
> livelock because OOM Killer was blocked in console_unlock() while
> other CPUs repeatedly complained about failed allocations.

I don't know why we're constantly back into this same loop on this
topic but that's not the problem we've been seeing.  There are no
other CPUs involved.

It's great that Steven's patches solve a good number of problems.  It
is also true that there's a class of problems that it doesn't solve,
which other approaches do.  The productive thing to do here is trying
to solve the unsolved one too, especially given that it doesn't seem
too difficuilt to do so on top of what's proposed.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 13:04                                   ` Petr Mladek
@ 2018-01-17 15:24                                     ` Steven Rostedt
  2018-01-18  4:31                                     ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-17 15:24 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Wed, 17 Jan 2018 14:04:07 +0100
Petr Mladek <pmladek@suse.com> wrote:

> On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> > On (01/16/18 10:45), Steven Rostedt wrote:
> > [..]  
> > > > [1] https://marc.info/?l=linux-mm&m=145692016122716  
> > > 
> > > Especially since Konstantin is working on pulling in all LKML archives,
> > > the above should be denoted as:
> > > 
> > >  Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp  
> > 
> > hm, may I ask why? is there a new rule now to percent-encode commit messages?  
> 
> IMHO, the most important thing is that Steven's link is based
> on the Message-ID and the stable redirector
> https://lkml.kernel.org/. It has a better chance to work
> even in the future.

Exactly. There's an effort to avoid any outside link dependencies in
the Linux git history. No one expected gmane to end (although it
appears to be making a comeback), but we don't want to get stuck if
marc.info disappears one day.

> > > 
> > > Should we Cc stable@vger.kernel.org?  
> > 
> > that's a good question... maybe yes, maybe no... I'd say this
> > change is "safer" when we have hand-off.  
> 
> I would keep it as is in stable kernels unless there are
> many bug reports.

Agreed.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 15:15                     ` Tejun Heo
@ 2018-01-17 17:12                       ` Steven Rostedt
  2018-01-17 18:42                         ` Steven Rostedt
                                           ` (2 more replies)
  0 siblings, 3 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-17 17:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Wed, 17 Jan 2018 07:15:09 -0800
Tejun Heo <tj@kernel.org> wrote:

> It's great that Steven's patches solve a good number of problems.  It
> is also true that there's a class of problems that it doesn't solve,
> which other approaches do.  The productive thing to do here is trying
> to solve the unsolved one too, especially given that it doesn't seem
> too difficuilt to do so on top of what's proposed.

OK, let's talk about the other problems, as this is no longer related
to my patch.

>From your previous email:

> 1. Console is IPMI emulated serial console.  Super slow.  Also
>    netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>    / driver tries to allocate memory and then fail, which in turn
>    triggers allocation failure or other warning messages.  printk was
>    already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>    shrinking.  Because OOM handler is trapped in printk flushing, it
>    never manages to free memory and no one else can enter OOM path
>    either, so the system is trapped in this state.

>From what I gathered, you said an OOM would trigger, and then the
network console would not be able to allocate memory and it would
trigger a printk too, and cause an infinite amount of printks.

This could very well be a great place to force offloading. If a printk
is called from within a printk, at the same context (normal, softirq,
irq or NMI), then we should trigger the offloading.

My ftrace ring buffer has a context level recursion check, we could use
that, and even tie it into my previous patch:

With something like this (not compiled tested or anything, and
kick_offload_thread() would need to be implemented).

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..b80b23a0ca13 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2261,6 +2261,63 @@ static int have_callable_console(void)
 
 	return 0;
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI     = 0
+ *  IRQ     = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+	CTX_NMI,
+	CTX_IRQ,
+	CTX_SOFTIRQ,
+	CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+
+static bool recursion_check_start(void)
+{
+	unsigned long pc = preempt_count();
+	int val = this_cpu_read(recursion_bits);
+
+	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+		bit = CTX_NORMAL;
+	else
+		bit = pc & NMI_MASK ? CTX_NMI :
+			pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+	if (unlikely(val & (1 << bit)))
+		return true;
+
+	val |= (1 << bit);
+	this_cpu_write(recursion_bits, val);
+	return false;
+}
+
+static void recursion_check_finish(bool offload)
+{
+	int val = this_cpu_read(recursion_bits);
+
+	if (offload)
+		return;
+
+	val &= val - 1;
+	this_cpu_write(recursion_bits, val);
+}
+
+static void kick_offload_thread(void)
+{
+	/*
+	 * Consoles are triggering printks, offload the printks
+	 * to another CPU to hopefully avoid a lockup.
+	 */
+}
 
 /*
  * Can we actually use the console at this time on this cpu?
@@ -2333,6 +2390,7 @@ void console_unlock(void)
 
 	for (;;) {
 		struct printk_log *msg;
+		bool offload;
 		size_t ext_len = 0;
 		size_t len;
 
@@ -2393,15 +2451,20 @@ void console_unlock(void)
 		 * waiter waiting to take over.
 		 */
 		console_lock_spinning_enable();
+		offload = recursion_check_start();
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
+		recursion_check_finish(offload);
+
 		if (console_lock_spinning_disable_and_check()) {
 			printk_safe_exit_irqrestore(flags);
 			return;
 		}
+		if (offload)
+			kick_offload_thread();
 
 		printk_safe_exit_irqrestore(flags);
 

-- Steve

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 17:12                       ` Steven Rostedt
@ 2018-01-17 18:42                         ` Steven Rostedt
  2018-01-19 18:20                           ` Steven Rostedt
  2018-01-17 20:05                         ` Tejun Heo
  2018-01-18  5:42                         ` Sergey Senozhatsky
  2 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-17 18:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Wed, 17 Jan 2018 12:12:51 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>  		 * waiter waiting to take over.
>  		 */
>  		console_lock_spinning_enable();
> +		offload = recursion_check_start();
>  
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(ext_text, ext_len, text, len);
>  		start_critical_timings();
>  
> +		recursion_check_finish(offload);
> +
>  		if (console_lock_spinning_disable_and_check()) {
>  			printk_safe_exit_irqrestore(flags);
>  			return;
>  		}
> +		if (offload)
> +			kick_offload_thread();
>  

Ah, major flaw in this code. The recursion check needs to be in
printk() itself around the trylock.

-- Steve

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..31df145cc4d7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1826,6 +1826,63 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c
 	/* Store it in the record log */
 	return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len);
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI     = 0
+ *  IRQ     = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+	CTX_NMI,
+	CTX_IRQ,
+	CTX_SOFTIRQ,
+	CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+
+static bool recursion_check_start(void)
+{
+	unsigned long pc = preempt_count();
+	int val = this_cpu_read(recursion_bits);
+
+	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+		bit = CTX_NORMAL;
+	else
+		bit = pc & NMI_MASK ? CTX_NMI :
+			pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+	if (unlikely(val & (1 << bit)))
+		return true;
+
+	val |= (1 << bit);
+	this_cpu_write(recursion_bits, val);
+	return false;
+}
+
+static void recursion_check_finish(bool offload)
+{
+	int val = this_cpu_read(recursion_bits);
+
+	if (offload)
+		return;
+
+	val &= val - 1;
+	this_cpu_write(recursion_bits, val);
+}
+
+static void kick_offload_thread(void)
+{
+	/*
+	 * Consoles are triggering printks, offload the printks
+	 * to another CPU to hopefully avoid a lockup.
+	 */
+}
 
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
@@ -1895,12 +1952,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
+		bool offload;
 		/*
 		 * Disable preemption to avoid being preempted while holding
 		 * console_sem which would prevent anyone from printing to
 		 * console
 		 */
 		preempt_disable();
+		offload = recursion_check_start();
 		/*
 		 * Try to acquire and then immediately release the console
 		 * semaphore.  The release will print out buffers and wake up
@@ -1908,7 +1967,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 */
 		if (console_trylock_spinning())
 			console_unlock();
+
+		recursion_check_finish(offload);
 		preempt_enable();
+
+		if (offload)
+			kick_offload_thread();
 	}
 
 	return printed_len;

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-12 17:11     ` Steven Rostedt
@ 2018-01-17 19:13       ` Rasmus Villemoes
  2018-01-17 19:33         ` Steven Rostedt
  2018-01-19  9:51         ` Sergey Senozhatsky
  0 siblings, 2 replies; 140+ messages in thread
From: Rasmus Villemoes @ 2018-01-17 19:13 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel

On 2018-01-12 18:11, Steven Rostedt wrote:
> On Fri, 12 Jan 2018 11:54:54 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>> #include <linux/module.h>
>> #include <linux/delay.h>
>> #include <linux/sched.h>
>> #include <linux/mutex.h>
>> #include <linux/workqueue.h>
>> #include <linux/hrtimer.h>
>>
>>
> 
> 
>>
>> Hmm, how does one have git commit not remove the C preprocessor at the
>> start of the module?
> 
> Probably just add a space in front of the entire program.

If you use at least git 2.0.0 [1], set commit.cleanup to "scissors".
Something like

  git config commit.cleanup scissors

should do the trick. Instead of stripping all lines starting with #,
that will only strip stuff below a line containing

# ------------------------ >8 ------------------------

and git should be smart enough to insert that in the editor it fires up
for a commit message.


[1] https://github.com/git/git/blob/master/Documentation/RelNotes/2.0.0.txt

Rasmus

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-17 19:13       ` Rasmus Villemoes
@ 2018-01-17 19:33         ` Steven Rostedt
  2018-01-19  9:51         ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-17 19:33 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel

On Wed, 17 Jan 2018 20:13:28 +0100
Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:

> If you use at least git 2.0.0 [1], set commit.cleanup to "scissors".
> Something like
> 
>   git config commit.cleanup scissors
> 
> should do the trick. Instead of stripping all lines starting with #,
> that will only strip stuff below a line containing
> 
> # ------------------------ >8 ------------------------
> 
> and git should be smart enough to insert that in the editor it fires up
> for a commit message.
> 
> 
> [1] https://github.com/git/git/blob/master/Documentation/RelNotes/2.0.0.txt
> 
> 

Thanks for the pointer.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 17:12                       ` Steven Rostedt
  2018-01-17 18:42                         ` Steven Rostedt
@ 2018-01-17 20:05                         ` Tejun Heo
  2018-01-18  5:43                           ` Sergey Senozhatsky
  2018-01-18 11:51                           ` Petr Mladek
  2018-01-18  5:42                         ` Sergey Senozhatsky
  2 siblings, 2 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-17 20:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Hello, Steven.

On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote:
> From what I gathered, you said an OOM would trigger, and then the
> network console would not be able to allocate memory and it would
> trigger a printk too, and cause an infinite amount of printks.

Yeah, it falls into back-and-forth loop between the OOM code and
netconsole path.

> This could very well be a great place to force offloading. If a printk
> is called from within a printk, at the same context (normal, softirq,
> irq or NMI), then we should trigger the offloading.

I was thinking more of a timeout based approach (ie. if stuck for
longer than X or X messages, offload), but if local feedback loop is
the only thing we're missing after your improvements, detecting that
specific condition definitely works and is likely a better approach in
terms of message delivery guarantee.

> +static void kick_offload_thread(void)
> +{
> +	/*
> +	 * Consoles are triggering printks, offload the printks
> +	 * to another CPU to hopefully avoid a lockup.
> +	 */
> +}
...
> @@ -2333,6 +2390,7 @@ void console_unlock(void)
>  
>  	for (;;) {
>  		struct printk_log *msg;
> +		bool offload;
>  		size_t ext_len = 0;
>  		size_t len;
>  
> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>  		 * waiter waiting to take over.
>  		 */
>  		console_lock_spinning_enable();
> +		offload = recursion_check_start();
>  
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(ext_text, ext_len, text, len);
>  		start_critical_timings();
>  
> +		recursion_check_finish(offload);
> +
>  		if (console_lock_spinning_disable_and_check()) {
>  			printk_safe_exit_irqrestore(flags);
>  			return;
>  		}
> +		if (offload)
> +			kick_offload_thread();

Yeah, something like this would definitely work.

Thanks a lot.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-17 12:04     ` Petr Mladek
@ 2018-01-18  1:53       ` Byungchul Park
  2018-01-18  1:57         ` Byungchul Park
  2018-01-18  2:19         ` Steven Rostedt
  0 siblings, 2 replies; 140+ messages in thread
From: Byungchul Park @ 2018-01-18  1:53 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On 1/17/2018 9:04 PM, Petr Mladek wrote:
> On Wed 2018-01-17 11:19:53, Byungchul Park wrote:
>> On 1/10/2018 10:24 PM, Petr Mladek wrote:
>>> From: Steven Rostedt <rostedt@goodmis.org>

[...]

>>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>>> index b9006617710f..7e6459abba43 100644
>>> --- a/kernel/printk/printk.c
>>> +++ b/kernel/printk/printk.c
>>> @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level,
>>>    		 * semaphore.  The release will print out buffers and wake up
>>>    		 * /dev/kmsg and syslog() users.
>>>    		 */
>>> -		if (console_trylock())
>>> +		if (console_trylock()) {
>>>    			console_unlock();
>>> +		} else {
>>> +			struct task_struct *owner = NULL;
>>> +			bool waiter;
>>> +			bool spin = false;
>>> +
>>> +			printk_safe_enter_irqsave(flags);
>>> +
>>> +			raw_spin_lock(&console_owner_lock);
>>> +			owner = READ_ONCE(console_owner);
>>> +			waiter = READ_ONCE(console_waiter);
>>> +			if (!waiter && owner && owner != current) {
>>> +				WRITE_ONCE(console_waiter, true);
>>> +				spin = true;
>>> +			}
>>> +			raw_spin_unlock(&console_owner_lock);
>>> +
>>> +			/*
>>> +			 * If there is an active printk() writing to the
>>> +			 * consoles, instead of having it write our data too,
>>> +			 * see if we can offload that load from the active
>>> +			 * printer, and do some printing ourselves.
>>> +			 * Go into a spin only if there isn't already a waiter
>>> +			 * spinning, and there is an active printer, and
>>> +			 * that active printer isn't us (recursive printk?).
>>> +			 */
>>> +			if (spin) {
>>> +				/* We spin waiting for the owner to release us */
>>> +				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
>>> +				/* Owner will clear console_waiter on hand off */
>>> +				while (READ_ONCE(console_waiter))
>>> +					cpu_relax();
>>> +
>>> +				spin_release(&console_owner_dep_map, 1, _THIS_IP_);
>>
>> Why don't you move this over "while (READ_ONCE(console_waiter))" and
>> right after acquire()?
>>
>> As I said last time, only acquisitions between acquire() and release()
>> are meaningful. Are you taking care of acquisitions within cpu_relax()?
>> If so, leave it.
> 
> We are simulating a spinlock here. The above code corresponds to
> 
> 	    spin_lock(&console_owner_spin_lock);
> 	    spin_unlock(&console_owner_spin_lock);
> 
> I mean that spin_acquire() + while-cycle corresponds
> to spin_lock(). And spin_release() corresponds to
> spin_unlock().

Hello,

This is a thing simulating a wait for an event e.g.
wait_for_completion() doing spinning instead of sleep, rather
than a spinlock. I mean:

    This context
    ------------
    while (READ_ONCE(console_waiter)) /* Wait for the event */
       cpu_relax();

    Another context
    ---------------
    WRITE_ONCE(console_waiter, false); /* Event */

That's why I said this's the exact case of cross-release. Anyway
without cross-release, we usually use typical acquire/release
pairs to cover a wait for an event in the following way:

    A context
    ---------
    lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
                            /* Read one is better though..    */

    /* A section, we suspect, a wait for an event might happen. */
    ...
    lock_map_release(wait);


    The place actually doing the wait
    ---------------------------------
    lock_map_acquire(wait);
    lock_map_acquire(wait);

    wait_for_event(wait); /* Actually do the wait */

You can see a simple example of how to use them by searching
kernel/cpu.c with "lock_acquire" and "wait_for_completion".

However, as I said, if you suspect that cpu_relax() includes
the wait, then it's ok to leave it. Otherwise, I think it
would be better to change it in the way I showed you above.

>>> +				printk_safe_exit_irqrestore(flags);
>>> +
>>> +				/*
>>> +				 * The owner passed the console lock to us.
>>> +				 * Since we did not spin on console lock, annotate
>>> +				 * this as a trylock. Otherwise lockdep will
>>> +				 * complain.
>>> +				 */
>>> +				mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
>>> +				console_unlock();
>>> +				printk_safe_enter_irqsave(flags);
>>> +			}
>>> +			printk_safe_exit_irqrestore(flags);
>>> +
>>> +		}
>>>    	}
>>>    	return printed_len;
>>> @@ -2141,6 +2196,7 @@ void console_unlock(void)
>>>    	static u64 seen_seq;
>>>    	unsigned long flags;
>>>    	bool wake_klogd = false;
>>> +	bool waiter = false;
>>>    	bool do_cond_resched, retry;
>>>    	if (console_suspended) {
>>> @@ -2229,14 +2285,64 @@ void console_unlock(void)
>>>    		console_seq++;
>>>    		raw_spin_unlock(&logbuf_lock);
>>> +		/*
>>> +		 * While actively printing out messages, if another printk()
>>> +		 * were to occur on another CPU, it may wait for this one to
>>> +		 * finish. This task can not be preempted if there is a
>>> +		 * waiter waiting to take over.
>>> +		 */
>>> +		raw_spin_lock(&console_owner_lock);
>>> +		console_owner = current;
>>> +		raw_spin_unlock(&console_owner_lock);
>>> +
>>> +		/* The waiter may spin on us after setting console_owner */
>>> +		spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
>>> +
>>>    		stop_critical_timings();	/* don't trace print latency */
>>>    		call_console_drivers(ext_text, ext_len, text, len);
>>>    		start_critical_timings();
>>> +
>>> +		raw_spin_lock(&console_owner_lock);
>>> +		waiter = READ_ONCE(console_waiter);
>>> +		console_owner = NULL;
>>> +		raw_spin_unlock(&console_owner_lock);
>>> +
>>> +		/*
>>> +		 * If there is a waiter waiting for us, then pass the
>>> +		 * rest of the work load over to that waiter.
>>> +		 */
>>> +		if (waiter)
>>> +			break;
>>> +
>>> +		/* There was no waiter, and nothing will spin on us here */
>>> +		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
>>
>> Why don't you move this over "if (waiter)"?
> 
> We want to actually release the lock before calling spin_release,
> see below.

Excuse me but, I don't see..

>>> +
>>>    		printk_safe_exit_irqrestore(flags);
>>>    		if (do_cond_resched)
>>>    			cond_resched();
>>>    	}
>>> +
>>> +	/*
>>> +	 * If there is an active waiter waiting on the console_lock.
>>> +	 * Pass off the printing to the waiter, and the waiter
>>> +	 * will continue printing on its CPU, and when all writing
>>> +	 * has finished, the last printer will wake up klogd.
>>> +	 */
>>> +	if (waiter) {
>>> +		WRITE_ONCE(console_waiter, false);
>>> +		/* The waiter is now free to continue */
>>> +		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
>>
>> Why don't you remove this release() after relocating the upper one?

You should use this acquire/release pair here to detect if the
following section involves the spinning again for console_waiter:

    stop_critical_timings();
    call_console_drivers(ext_text, ext_len, text, len);
    start_critical_timings();

    raw_spin_lock(&console_owner_lock);
    waiter = READ_ONCE(console_waiter);
    console_owner = NULL;
    raw_spin_unlock(&console_owner_lock);

There should be no more meaning than that.

> The manipulation of "console_waiter" implements the spin_lock that
> we are trying to simulate. It is such easy because it is guaranteed
> that there is always only one process that tries to get this
> fake spin_lock. Also the other waiter releases the spin lock
> immediately after it gets it.
> 
> I mean that WRITE_ONCE(console_waiter, false) causes that
> the simulated spin lock is released here. Also the while-cycle
> in vprintk_emit() succeeds. The while-cycle success means
> that vprintk_emit() actually acquires the simulated spinlock.

I understand what you want to explain. If cross-release was alive,
there might be several things to talk more but now, what I
explained above is all we can do with existing acquire/release.

> This synchronization is need to make sure that the two processes
> pass the console_lock ownership at the right place.
> 
> I think that at least this simulated spin lock is annotated the right
> way by console_owner_dep_map manipulations. And I think that we

I also think it would work logically. I just wanted to say the
code looks like as if it's doing something cross-release stuff,
despite not, and suggest a common way to use typical ones.
That's all. :) I would send a patch if you also think so, but
it's ok even if not.

> do not need the cross-release feature to simulate this spin lock.
> 
> 
>>> +		/*
>>> +		 * Hand off console_lock to waiter. The waiter will perform
>>> +		 * the up(). After this, the waiter is the console_lock owner.
>>> +		 */
>>> +		mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
> 
> The cross-release feature might be needed here. The above annotation
> says that the semaphore is release here. In reality, it is released

Yeah, cross-release might be needed here, but it won't be such
simple anyway.

> in the process that calls vprintk_emit(). We actually just passed the
> ownership here.
> 
> Does this make any sense? Could we do better using the existing
> lockdep annotations?

I wonder what you think about thinks I told you. Could you let me
know?

> If you have a better solution, it might make sense to send a patch
> on top of linux-next. There is a commit that moved these code
> into three helper functions:

I would after getting your feedback.

Thanks a lot.

>      console_lock_spinning_enable()
>      console_lock_spinning_disable_and_check()
>      console_trylock_spinning()
> 
> See
> https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/commit/?h=for-4.16-console-waiter-logic&id=c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab
> 
> Best Regards,
> Petr
> 
>>> +		printk_safe_exit_irqrestore(flags);
>>> +		/* Note, if waiter is set, logbuf_lock is not held */
>>> +		return;
>>> +	}
>>> +
>>>    	console_locked = 0;
>>>    	/* Release the exclusive_console once it is used */
>>>
>>
>> -- 
>> Thanks,
>> Byungchul
> 

-- 
Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-18  1:53       ` Byungchul Park
@ 2018-01-18  1:57         ` Byungchul Park
  2018-01-18  2:19         ` Steven Rostedt
  1 sibling, 0 replies; 140+ messages in thread
From: Byungchul Park @ 2018-01-18  1:57 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On 1/18/2018 10:53 AM, Byungchul Park wrote:
> Hello,
> 
> This is a thing simulating a wait for an event e.g.
> wait_for_completion() doing spinning instead of sleep, rather
> than a spinlock. I mean:
> 
>     This context
>     ------------
>     while (READ_ONCE(console_waiter)) /* Wait for the event */
>        cpu_relax();
> 
>     Another context
>     ---------------
>     WRITE_ONCE(console_waiter, false); /* Event */
> 
> That's why I said this's the exact case of cross-release. Anyway
> without cross-release, we usually use typical acquire/release
> pairs to cover a wait for an event in the following way:
> 
>     A context
>     ---------
>     lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
>                             /* Read one is better though..    */
> 
>     /* A section, we suspect, a wait for an event might happen. */
>     ...
>     lock_map_release(wait);
> 
> 
>     The place actually doing the wait
>     ---------------------------------
>     lock_map_acquire(wait);
>     lock_map_acquire(wait);
       ^
       lock_map_release(wait);

-- 
Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-18  1:53       ` Byungchul Park
  2018-01-18  1:57         ` Byungchul Park
@ 2018-01-18  2:19         ` Steven Rostedt
  2018-01-18  4:01           ` Byungchul Park
  1 sibling, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-18  2:19 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On Thu, 18 Jan 2018 10:53:37 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> Hello,
> 
> This is a thing simulating a wait for an event e.g.
> wait_for_completion() doing spinning instead of sleep, rather
> than a spinlock. I mean:
> 
>     This context
>     ------------
>     while (READ_ONCE(console_waiter)) /* Wait for the event */
>        cpu_relax();
> 
>     Another context
>     ---------------
>     WRITE_ONCE(console_waiter, false); /* Event */

I disagree. It is like a spinlock. You can say a spinlock() that is
blocked is also waiting for an event. That event being the owner does a
spin_unlock().

> 
> That's why I said this's the exact case of cross-release. Anyway
> without cross-release, we usually use typical acquire/release
> pairs to cover a wait for an event in the following way:
> 
>     A context
>     ---------
>     lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
>                             /* Read one is better though..    */
> 
>     /* A section, we suspect, a wait for an event might happen. */
>     ...
>     lock_map_release(wait);
> 
> 
>     The place actually doing the wait
>     ---------------------------------
>     lock_map_acquire(wait);
>     lock_map_acquire(wait);
> 
>     wait_for_event(wait); /* Actually do the wait */
> 
> You can see a simple example of how to use them by searching
> kernel/cpu.c with "lock_acquire" and "wait_for_completion".
> 
> However, as I said, if you suspect that cpu_relax() includes
> the wait, then it's ok to leave it. Otherwise, I think it
> would be better to change it in the way I showed you above.

I find your way confusing. I'm simulating a spinlock not a wait for
completion. A wait for completion usually initiates something then
waits for it to complete. This is trying to get into a critical area
but another task is currently in it. It's simulating a spinlock as far
as I can see.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-18  2:19         ` Steven Rostedt
@ 2018-01-18  4:01           ` Byungchul Park
  2018-01-18 15:21             ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Byungchul Park @ 2018-01-18  4:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On 1/18/2018 11:19 AM, Steven Rostedt wrote:
> On Thu, 18 Jan 2018 10:53:37 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
>> Hello,
>>
>> This is a thing simulating a wait for an event e.g.
>> wait_for_completion() doing spinning instead of sleep, rather
>> than a spinlock. I mean:
>>
>>      This context
>>      ------------
>>      while (READ_ONCE(console_waiter)) /* Wait for the event */
>>         cpu_relax();
>>
>>      Another context
>>      ---------------
>>      WRITE_ONCE(console_waiter, false); /* Event */
> 
> I disagree. It is like a spinlock. You can say a spinlock() that is
> blocked is also waiting for an event. That event being the owner does a
> spin_unlock().

That's exactly what I was saying. Excuse me but, I don't understand
what you want to say. Could you explain more? What do you disagree?

>>
>> That's why I said this's the exact case of cross-release. Anyway
>> without cross-release, we usually use typical acquire/release
>> pairs to cover a wait for an event in the following way:
>>
>>      A context
>>      ---------
>>      lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
>>                              /* Read one is better though..    */
>>
>>      /* A section, we suspect, a wait for an event might happen. */
>>      ...
>>      lock_map_release(wait);
>>
>>
>>      The place actually doing the wait
>>      ---------------------------------
>>      lock_map_acquire(wait);
>>      lock_map_acquire(wait);
>>
>>      wait_for_event(wait); /* Actually do the wait */
>>
>> You can see a simple example of how to use them by searching
>> kernel/cpu.c with "lock_acquire" and "wait_for_completion".
>>
>> However, as I said, if you suspect that cpu_relax() includes
>> the wait, then it's ok to leave it. Otherwise, I think it
>> would be better to change it in the way I showed you above.
> 
> I find your way confusing. I'm simulating a spinlock not a wait for
> completion. A wait for completion usually initiates something then

I used the word, *event* instead of *completion*. wait_for_completion()
and complete() are just an example of a pair of waiter and event.
Lock and unlock can also be another example, too.

Important thing is that who waits and who triggers the event. Using the
pair, we can achieve various things, for examples:

    1. Synchronization like wait_for_completion() does.
    2. Control exclusively entering into a critical area.
    3. Whatever.

> waits for it to complete. This is trying to get into a critical area
> but another task is currently in it. It's simulating a spinlock as far
> as I can see.

Anyway it's an example of "waiter for an event, and the event".

JFYI, spinning or sleeping does not matter. Those are just methods to
achieve a wait. I know you're not talking about this though. It's JFYI.

-- 
Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 13:04                                   ` Petr Mladek
  2018-01-17 15:24                                     ` Steven Rostedt
@ 2018-01-18  4:31                                     ` Sergey Senozhatsky
  2018-01-18 15:22                                       ` Steven Rostedt
  1 sibling, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-18  4:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Tetsuo Handa,
	Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/17/18 14:04), Petr Mladek wrote:
> On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote:
> > On (01/16/18 10:45), Steven Rostedt wrote:
> > [..]
> > > > [1] https://marc.info/?l=linux-mm&m=145692016122716
> > > 
> > > Especially since Konstantin is working on pulling in all LKML archives,
> > > the above should be denoted as:
> > > 
> > >  Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
> > 
> > hm, may I ask why? is there a new rule now to percent-encode commit messages?
> 
> IMHO, the most important thing is that Steven's link is based
> on the Message-ID and the stable redirector
> https://lkml.kernel.org/. It has a better chance to work
> even in the future.

d'oh... indeed, I copy-pasted the wrong URL... it should
have been lkml.kernel.org/r/ [and it actually was].

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 17:12                       ` Steven Rostedt
  2018-01-17 18:42                         ` Steven Rostedt
  2018-01-17 20:05                         ` Tejun Heo
@ 2018-01-18  5:42                         ` Sergey Senozhatsky
  2 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-18  5:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/17/18 12:12), Steven Rostedt wrote:
[..]
>  /*
>   * Can we actually use the console at this time on this cpu?
> @@ -2333,6 +2390,7 @@ void console_unlock(void)
>  
>  	for (;;) {
>  		struct printk_log *msg;
> +		bool offload;
>  		size_t ext_len = 0;
>  		size_t len;
>  
> @@ -2393,15 +2451,20 @@ void console_unlock(void)
>  		 * waiter waiting to take over.
>  		 */
>  		console_lock_spinning_enable();
> +		offload = recursion_check_start();
>  
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(ext_text, ext_len, text, len);
>  		start_critical_timings();
>  
> +		recursion_check_finish(offload);
> +
>  		if (console_lock_spinning_disable_and_check()) {
>  			printk_safe_exit_irqrestore(flags);
>  			return;
>  		}
> +		if (offload)
> +			kick_offload_thread();
>  
>  		printk_safe_exit_irqrestore(flags);
		^^^^^^^^^^^^^^^^

but we call console drivers in printk_safe.
printk -> console_drivers -> printk will be
redirected to this-CPU printk_safe buffer.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 20:05                         ` Tejun Heo
@ 2018-01-18  5:43                           ` Sergey Senozhatsky
  2018-01-18 11:51                           ` Petr Mladek
  1 sibling, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-18  5:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Steven Rostedt, Petr Mladek, Sergey Senozhatsky,
	Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/17/18 12:05), Tejun Heo wrote:
[..]
> > This could very well be a great place to force offloading. If a printk
> > is called from within a printk, at the same context (normal, softirq,
> > irq or NMI), then we should trigger the offloading.
> 
> I was thinking more of a timeout based approach (ie. if stuck for
> longer than X or X messages, offload)

yep, that's what I want. for a whole bunch of different reasons.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 20:05                         ` Tejun Heo
  2018-01-18  5:43                           ` Sergey Senozhatsky
@ 2018-01-18 11:51                           ` Petr Mladek
  1 sibling, 0 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-18 11:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Wed 2018-01-17 12:05:51, Tejun Heo wrote:
> Hello, Steven.
> 
> On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote:
> > From what I gathered, you said an OOM would trigger, and then the
> > network console would not be able to allocate memory and it would
> > trigger a printk too, and cause an infinite amount of printks.
> 
> Yeah, it falls into back-and-forth loop between the OOM code and
> netconsole path.
> 
> > This could very well be a great place to force offloading. If a printk
> > is called from within a printk, at the same context (normal, softirq,
> > irq or NMI), then we should trigger the offloading.
> 
> I was thinking more of a timeout based approach (ie. if stuck for
> longer than X or X messages, offload), but if local feedback loop is
> the only thing we're missing after your improvements, detecting that
> specific condition definitely works and is likely a better approach in
> terms of message delivery guarantee.

I think that we could combine both. The recursion can be detected
rather easily and immediately so there is no reason to wait.

Once we have the code for offloading from recursion then we could
kick_offload_thread() also from other reasons, e.g. when
console_unlock() takes too long.

I think that Sergey is already playing with this. It seems
that we all could be happy in the end.


Best Regards,
Petr

PS: I am sorry for the answer yesterday. Tejun's mail did not mention
any details about the problem. I evidently forgot them. I have OOM
and printk issues associated with Tetsuo. So I messed it. Believe
me. It is a big relief to realize that we are not in the cycle
again.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-18  4:01           ` Byungchul Park
@ 2018-01-18 15:21             ` Steven Rostedt
  2018-01-19  2:37               ` Byungchul Park
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-18 15:21 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On Thu, 18 Jan 2018 13:01:46 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> > I disagree. It is like a spinlock. You can say a spinlock() that is
> > blocked is also waiting for an event. That event being the owner does a
> > spin_unlock().  
> 
> That's exactly what I was saying. Excuse me but, I don't understand
> what you want to say. Could you explain more? What do you disagree?

I guess I'm confused at what you are asking for then.


> > I find your way confusing. I'm simulating a spinlock not a wait for
> > completion. A wait for completion usually initiates something then  
> 
> I used the word, *event* instead of *completion*. wait_for_completion()
> and complete() are just an example of a pair of waiter and event.
> Lock and unlock can also be another example, too.
> 
> Important thing is that who waits and who triggers the event. Using the
> pair, we can achieve various things, for examples:
> 
>     1. Synchronization like wait_for_completion() does.
>     2. Control exclusively entering into a critical area.
>     3. Whatever.
> 
> > waits for it to complete. This is trying to get into a critical area
> > but another task is currently in it. It's simulating a spinlock as far
> > as I can see.  
> 
> Anyway it's an example of "waiter for an event, and the event".
> 
> JFYI, spinning or sleeping does not matter. Those are just methods to
> achieve a wait. I know you're not talking about this though. It's JFYI.

OK, if it is just FYI.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-18  4:31                                     ` Sergey Senozhatsky
@ 2018-01-18 15:22                                       ` Steven Rostedt
  0 siblings, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-18 15:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park,
	Pavel Machek, linux-kernel

On Thu, 18 Jan 2018 13:31:16 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> d'oh... indeed, I copy-pasted the wrong URL... it should
> have been lkml.kernel.org/r/ [and it actually was].

I've learned to do a copy after entering the lkml.kernel.org link into
the browser url, and before hitting enter. The redirection kills you.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-12 16:54   ` Steven Rostedt
  2018-01-12 17:11     ` Steven Rostedt
@ 2018-01-18 22:03     ` Pavel Machek
  2018-01-19  0:20       ` Steven Rostedt
  1 sibling, 1 reply; 140+ messages in thread
From: Pavel Machek @ 2018-01-18 22:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Tejun Heo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1043 bytes --]

Hi!

> > By other words, this deadlock was there even before. Such
> > deadlocks are prevented by using printk_deferred() in
> > the sections guarded by the lock A.
> 
> Petr,
> 
> Please add this here:
> 
> ====
> 
> To demonstrate the issue, this module has been shown to lock up a
> system with 4 CPUs and a slow console (like a serial console). It is
> also able to lock up a 8 CPU system with only a fast (VGA) console, by
> passing in "loops=100". The changes in this commit prevent this module
> from locking up the system.
> 
> #include <linux/module.h>
> #include <linux/delay.h>
> #include <linux/sched.h>
> #include <linux/mutex.h>
> #include <linux/workqueue.h>
> #include <linux/hrtimer.h>

Programs in commit messages. Not preffered way to distribute code, I'd
say. What about putting it into kernel selftests directory or
something like that?
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-18 22:03     ` Pavel Machek
@ 2018-01-19  0:20       ` Steven Rostedt
  0 siblings, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-19  0:20 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Tejun Heo, linux-kernel

On Thu, 18 Jan 2018 23:03:24 +0100
Pavel Machek <pavel@ucw.cz> wrote:


> > To demonstrate the issue, this module has been shown to lock up a
> > system with 4 CPUs and a slow console (like a serial console). It is
> > also able to lock up a 8 CPU system with only a fast (VGA) console, by
> > passing in "loops=100". The changes in this commit prevent this module
> > from locking up the system.
> > 
> > #include <linux/module.h>
> > #include <linux/delay.h>
> > #include <linux/sched.h>
> > #include <linux/mutex.h>
> > #include <linux/workqueue.h>
> > #include <linux/hrtimer.h>  
> 
> Programs in commit messages. Not preffered way to distribute code, I'd
> say. What about putting it into kernel selftests directory or
> something like that?

It's not really a program, but a module. I could add a real module that
can test this, and people can modprobe it if they want to make sure
there's no regressions.

I can send a patch.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-18 15:21             ` Steven Rostedt
@ 2018-01-19  2:37               ` Byungchul Park
  2018-01-19  3:27                 ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Byungchul Park @ 2018-01-19  2:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On 1/19/2018 12:21 AM, Steven Rostedt wrote:
> On Thu, 18 Jan 2018 13:01:46 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
>>> I disagree. It is like a spinlock. You can say a spinlock() that is
>>> blocked is also waiting for an event. That event being the owner does a
>>> spin_unlock().
>>
>> That's exactly what I was saying. Excuse me but, I don't understand
>> what you want to say. Could you explain more? What do you disagree?
> 
> I guess I'm confused at what you are asking for then.

Sorry for not enough explanation. What I asked you for is:

    1. Relocate acquire()s/release()s.
    2. So make it simpler and remove unnecessary one.
    3. So make it look like the following form,
       because it's a thing simulating "wait and event".

       A context
       ---------
       lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
                               /* "Read" one is better though..    */

       /* A section, we suspect a wait for an event might happen. */
       ...

       lock_map_release(wait);

       The place actually doing the wait
       ---------------------------------
       lock_map_acquire(wait);
       lock_map_release(wait);

       wait_for_event(wait); /* Actually do the wait */

Honestly, you used acquire()s/release()s as if they are cross-
release stuff which mainly handles general waits and events,
not only things doing "acquire -> critical area -> release".
But that's not in the mainline at the moment.

>>> I find your way confusing. I'm simulating a spinlock not a wait for
>>> completion. A wait for completion usually initiates something then
>>
>> I used the word, *event* instead of *completion*. wait_for_completion()
>> and complete() are just an example of a pair of waiter and event.
>> Lock and unlock can also be another example, too.
>>
>> Important thing is that who waits and who triggers the event. Using the
>> pair, we can achieve various things, for examples:
>>
>>      1. Synchronization like wait_for_completion() does.
>>      2. Control exclusively entering into a critical area.
>>      3. Whatever.
>>
>>> waits for it to complete. This is trying to get into a critical area
>>> but another task is currently in it. It's simulating a spinlock as far
>>> as I can see.
>>
>> Anyway it's an example of "waiter for an event, and the event".
>>
>> JFYI, spinning or sleeping does not matter. Those are just methods to
          ^
          whether spining or sleeping doesn't matter.

>> achieve a wait. I know you're not talking about this though. It's JFYI.
> 
> OK, if it is just FYI.

Actually, the last paragraph is JFYI tho.

> -- Steve
> 
> 
> 

-- 
Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-19  2:37               ` Byungchul Park
@ 2018-01-19  3:27                 ` Steven Rostedt
  2018-01-22  2:31                   ` Byungchul Park
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-19  3:27 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On Fri, 19 Jan 2018 11:37:13 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> On 1/19/2018 12:21 AM, Steven Rostedt wrote:
> > On Thu, 18 Jan 2018 13:01:46 +0900
> > Byungchul Park <byungchul.park@lge.com> wrote:
> >   
> >>> I disagree. It is like a spinlock. You can say a spinlock() that is
> >>> blocked is also waiting for an event. That event being the owner does a
> >>> spin_unlock().  
> >>
> >> That's exactly what I was saying. Excuse me but, I don't understand
> >> what you want to say. Could you explain more? What do you disagree?  
> > 
> > I guess I'm confused at what you are asking for then.  
> 
> Sorry for not enough explanation. What I asked you for is:
> 
>     1. Relocate acquire()s/release()s.
>     2. So make it simpler and remove unnecessary one.
>     3. So make it look like the following form,
>        because it's a thing simulating "wait and event".
> 
>        A context
>        ---------
>        lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
>                                /* "Read" one is better though..    */

why? I'm assuming you are talking about adding this to the current
owner off the console_owner? This is a mutually exclusive section, no
parallel access. Why the Read?

> 
>        /* A section, we suspect a wait for an event might happen. */
>        ...
> 
>        lock_map_release(wait);
> 
>        The place actually doing the wait
>        ---------------------------------
>        lock_map_acquire(wait);
>        lock_map_release(wait);
> 
>        wait_for_event(wait); /* Actually do the wait */
> 
> Honestly, you used acquire()s/release()s as if they are cross-
> release stuff which mainly handles general waits and events,
> not only things doing "acquire -> critical area -> release".
> But that's not in the mainline at the moment.

Maybe it is more like that. Because, the thing I'm doing is passing off
a semaphore ownership to the waiter.

>From a previous email:

> > +			if (spin) {
> > +				/* We spin waiting for the owner to release us */
> > +				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
> > +				/* Owner will clear console_waiter on hand off */
> > +				while (READ_ONCE(console_waiter))
> > +					cpu_relax();
> > +
> > +				spin_release(&console_owner_dep_map, 1, _THIS_IP_);  
> 
> Why don't you move this over "while (READ_ONCE(console_waiter))" and
> right after acquire()?
> 
> As I said last time, only acquisitions between acquire() and release()
> are meaningful. Are you taking care of acquisitions within cpu_relax()?
> If so, leave it.

There is no acquisitions between acquire and release. To get to 
"if (spin)" the acquire had to already been done. If it was released,
this spinner is now the new "owner". There's no race with anyone else.
But it doesn't technically have it till console_waiter is set to NULL.
Why would we call release() before that? Or maybe I'm missing something.

Or are you just saying that it doesn't matter if it is before or after
the while() loop, to just put it before? Does it really matter?

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-17 19:13       ` Rasmus Villemoes
  2018-01-17 19:33         ` Steven Rostedt
@ 2018-01-19  9:51         ` Sergey Senozhatsky
  1 sibling, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-19  9:51 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Steven Rostedt, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek,
	linux-kernel

On (01/17/18 20:13), Rasmus Villemoes wrote:
[..]
> >> Hmm, how does one have git commit not remove the C preprocessor at the
> >> start of the module?
> > 
> > Probably just add a space in front of the entire program.
> 
> If you use at least git 2.0.0 [1], set commit.cleanup to "scissors".
> Something like
> 
>   git config commit.cleanup scissors
> 
> should do the trick. Instead of stripping all lines starting with #,
> that will only strip stuff below a line containing
> 
> # ------------------------ >8 ------------------------

one thing that it changes is that now when you squash commits


# This is the first patch

first patch commit messages

# This is the second patch

second patch commit message

# ------------------------ >8 ------------------------



those "# This is the first patch" and "# This is the second patch"
won't be removed automatically. takes some time to get used to it.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-17 18:42                         ` Steven Rostedt
@ 2018-01-19 18:20                           ` Steven Rostedt
  2018-01-20  7:14                             ` Sergey Senozhatsky
  2018-01-20 12:19                             ` Tejun Heo
  0 siblings, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-19 18:20 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Tejun,

I was thinking about this a bit more, and instead of offloading a
recursive printk, perhaps its best to simply throttle it. Because the
problem may not go away if a printk thread takes over, because the bug
is really the printk infrastructure filling the printk buffer keeping
printk from ever stopping.

This patch detects that printk is causing itself to print more and
throttles it after 3 messages have printed due to recursion. Could you
see if this helps your test cases?

I built this on top of linux-next (yesterday's branch).

It compiles and boots, but I didn't do any other tests on it.

Thanks!

-- Steve

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..2c7f18876224 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1826,6 +1826,75 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c
 	/* Store it in the record log */
 	return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len);
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI     = 0
+ *  IRQ     = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+	CTX_NMI,
+	CTX_IRQ,
+	CTX_SOFTIRQ,
+	CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+static DEFINE_PER_CPU(int, recursion_count);
+static atomic_t recursion_overflow;
+static const int recursion_max = 3;
+
+static int __recursion_check_test(int val)
+{
+	unsigned long pc = preempt_count();
+	int bit;
+
+	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+		bit = CTX_NORMAL;
+	else
+		bit = pc & NMI_MASK ? CTX_NMI :
+			pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+	return val & (1 << bit);
+}
+
+static bool recursion_check_test(void)
+{
+	int val = this_cpu_read(recursion_bits);
+
+	return __recursion_check_test(val);
+}
+
+static bool recursion_check_start(void)
+{
+	int val = this_cpu_read(recursion_bits);
+	int set;
+
+	set = __recursion_check_test(val);
+
+	if (unlikely(set))
+		return true;
+
+	val |= set;
+	this_cpu_write(recursion_bits, val);
+	return false;
+}
+
+static void recursion_check_finish(bool recursion)
+{
+	int val = this_cpu_read(recursion_bits);
+
+	if (recursion)
+		return;
+
+	val &= val - 1;
+	this_cpu_write(recursion_bits, val);
+}
 
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
@@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	/* This stops the holder of console_sem just where we want him */
 	logbuf_lock_irqsave(flags);
+
+	if (recursion_check_test()) {
+		/* A printk happened within a printk at the same context */
+		if (this_cpu_inc_return(recursion_count) > recursion_max) {
+			atomic_inc(&recursion_overflow);
+			logbuf_unlock_irqrestore(flags);
+			printed_len = 0;
+			goto out;
+		}
+	}
+
 	/*
 	 * The printf needs to come first; we need the syslog
 	 * prefix which might be passed-in as a parameter.
@@ -1895,12 +1975,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	/* If called from the scheduler, we can not call up(). */
 	if (!in_sched) {
+		bool recursion;
 		/*
 		 * Disable preemption to avoid being preempted while holding
 		 * console_sem which would prevent anyone from printing to
 		 * console
 		 */
 		preempt_disable();
+		recursion = recursion_check_start();
 		/*
 		 * Try to acquire and then immediately release the console
 		 * semaphore.  The release will print out buffers and wake up
@@ -1908,9 +1990,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 */
 		if (console_trylock_spinning())
 			console_unlock();
+
+		recursion_check_finish(recursion);
+		this_cpu_write(recursion_count, 0);
 		preempt_enable();
 	}
-
+out:
 	return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
@@ -2343,9 +2428,14 @@ void console_unlock(void)
 			seen_seq = log_next_seq;
 		}
 
-		if (console_seq < log_first_seq) {
+		if (console_seq < log_first_seq || atomic_read(&recursion_overflow)) {
+			size_t missed;
+
+			missed = atomic_xchg(&recursion_overflow, 0);
+			missed += log_first_seq - console_seq;
+
 			len = sprintf(text, "** %u printk messages dropped **\n",
-				      (unsigned)(log_first_seq - console_seq));
+				      (unsigned)missed);
 
 			/* messages are gone, move to first one */
 			console_seq = log_first_seq;

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-19 18:20                           ` Steven Rostedt
@ 2018-01-20  7:14                             ` Sergey Senozhatsky
  2018-01-20 15:49                               ` Steven Rostedt
  2018-01-20 12:19                             ` Tejun Heo
  1 sibling, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-20  7:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel


On (01/19/18 13:20), Steven Rostedt wrote:
[..]
> I was thinking about this a bit more, and instead of offloading a
> recursive printk, perhaps its best to simply throttle it. Because the
> problem may not go away if a printk thread takes over, because the bug
> is really the printk infrastructure filling the printk buffer keeping
> printk from ever stopping.

right. I didn't quite got it how that would help. if we would
kick_offload every time we add new printks after call_console_drivers(),
then we can just end up in a kick_offload loop traveling across all CPUs.

[..]
>  asmlinkage int vprintk_emit(int facility, int level,
>  			    const char *dict, size_t dictlen,
> @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>  	/* This stops the holder of console_sem just where we want him */
>  	logbuf_lock_irqsave(flags);
> +
> +	if (recursion_check_test()) {
> +		/* A printk happened within a printk at the same context */
> +		if (this_cpu_inc_return(recursion_count) > recursion_max) {
> +			atomic_inc(&recursion_overflow);
> +			logbuf_unlock_irqrestore(flags);
> +			printed_len = 0;
> +			goto out;
> +		}
> +	}

didn't have time to look at this carefully, but is this possible?

printks from console_unlock()->call_console_drivers() are redirected
to printk_safe buffer. we need irq_work on that CPU to flush its
printk_safe buffer.

>  EXPORT_SYMBOL(vprintk_emit);
> @@ -2343,9 +2428,14 @@ void console_unlock(void)
>  			seen_seq = log_next_seq;
>  		}
>  
> -		if (console_seq < log_first_seq) {
> +		if (console_seq < log_first_seq || atomic_read(&recursion_overflow)) {
> +			size_t missed;
> +
> +			missed = atomic_xchg(&recursion_overflow, 0);
> +			missed += log_first_seq - console_seq;
> +
>  			len = sprintf(text, "** %u printk messages dropped **\n",
> -				      (unsigned)(log_first_seq - console_seq));
> +				      (unsigned)missed);
>  
>  			/* messages are gone, move to first one */
>  			console_seq = log_first_seq;

how are we going to distinguish between lockdep splats, for instance,
or WARNs from call_console_drivers() -> foo_write(), which are valuable,
and kmalloc() print outs, which might be less valuable? are we going to
lose all of them now? then we can do a much simpler thing - steal one
bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which
will be set around call_console_drivers(). vprintk_func() would redirect
printks to vprintk_noop(fmt, args), which will do nothing.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-19 18:20                           ` Steven Rostedt
  2018-01-20  7:14                             ` Sergey Senozhatsky
@ 2018-01-20 12:19                             ` Tejun Heo
  2018-01-20 14:51                               ` Steven Rostedt
  1 sibling, 1 reply; 140+ messages in thread
From: Tejun Heo @ 2018-01-20 12:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Hello, Steven.

On Fri, Jan 19, 2018 at 01:20:52PM -0500, Steven Rostedt wrote:
> I was thinking about this a bit more, and instead of offloading a
> recursive printk, perhaps its best to simply throttle it. Because the
> problem may not go away if a printk thread takes over, because the bug
> is really the printk infrastructure filling the printk buffer keeping
> printk from ever stopping.
> 
> This patch detects that printk is causing itself to print more and
> throttles it after 3 messages have printed due to recursion. Could you
> see if this helps your test cases?

Sure, if this is the approach we're gonna take, I can try it with the
silly test code and also try to reproduce the original problem and see
whether this helps.

I'm a bit worried tho because this essentially seems like "detect
recursion, ignore messages" approach.  netcons can have a very large
surface for bugs.  Suppressing those messages would make them
difficult to debug.  For example, all our machines have both serial
console (thus the slowness) and netconsole hooked up and netcons code
has had its fair share of issues.  This would likely make tracking
down those problems more challenging.

Can we discuss pros and cons of this approach against offloading
before committing to this?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-20 12:19                             ` Tejun Heo
@ 2018-01-20 14:51                               ` Steven Rostedt
  0 siblings, 0 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-20 14:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Sat, 20 Jan 2018 04:19:53 -0800
Tejun Heo <tj@kernel.org> wrote:

> I'm a bit worried tho because this essentially seems like "detect
> recursion, ignore messages" approach.  netcons can have a very large
> surface for bugs.  Suppressing those messages would make them
> difficult to debug.  For example, all our machines have both serial
> console (thus the slowness) and netconsole hooked up and netcons code
> has had its fair share of issues.  This would likely make tracking
> down those problems more challenging.

Well, it's not totally ignoring them. There's a variable that tells
printk how many to print before it starts ignoring them. I picked 3,
but that could very well be 5 or 10. Probably 10 is the best, because
then it would give us enough idea why printk is recursing on itself
without overloading the buffer. And I made it a variable to easily make
it a knob for userspace to tweak if need be.

> 
> Can we discuss pros and cons of this approach against offloading
> before committing to this?

I'm open. I was just thinking about the scenario that you mentioned and
how what the best way to solve it would be.

We need to define the exact problem(s) we are dealing with before we
offer a solution. The one thing I don't want is a solution looking for
a problem. I want a full understanding of what the problem exactly is
and then we can discuss various solutions, and how they solve the
problem(s). Otherwise we are just doing (to quote Linus) code masturbation.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-20  7:14                             ` Sergey Senozhatsky
@ 2018-01-20 15:49                               ` Steven Rostedt
  2018-01-21 14:15                                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-20 15:49 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Sat, 20 Jan 2018 16:14:02 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> [..]
> >  asmlinkage int vprintk_emit(int facility, int level,
> >  			    const char *dict, size_t dictlen,
> > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  
> >  	/* This stops the holder of console_sem just where we want him */
> >  	logbuf_lock_irqsave(flags);
> > +
> > +	if (recursion_check_test()) {
> > +		/* A printk happened within a printk at the same context */
> > +		if (this_cpu_inc_return(recursion_count) > recursion_max) {
> > +			atomic_inc(&recursion_overflow);
> > +			logbuf_unlock_irqrestore(flags);
> > +			printed_len = 0;
> > +			goto out;
> > +		}
> > +	}  
> 
> didn't have time to look at this carefully, but is this possible?
> 
> printks from console_unlock()->call_console_drivers() are redirected
> to printk_safe buffer. we need irq_work on that CPU to flush its
> printk_safe buffer.

So is the issue that we keep triggering this irq work then? Then this
solution does seem to be one that would work. Because after x amount of
recursive printks (printk called by printk) it would just stop printing
them, and end the irq work.

Perhaps what Tejun is seeing is:

 printk()
   net_console()
     printk() --> redirected to irq work

 <irq work>
  printk
    net_console()
      printk() --> redirected to another irq work

and so on and so on.

This solution would need to be tweaked to add a timer to allow only so
many nested printks in a given time. Otherwise it too would be an issue:

 printk()
   net_console()
     printk() -> redirected
     printk() -> throttled

But the first x printk()s would still be redirected. and that x gets
reset in this current patch at he end of the outermost printk. Perhaps
it shouldn't reset x, or it can flush the printk safe buffer first. Is
there a reason that console_unlock() doesn't flush the
printk_safe_buffer? With a throttle number and flushing the
printk_safe_buffer, that should solve the issue Tejun explained.


> 
> how are we going to distinguish between lockdep splats, for instance,
> or WARNs from call_console_drivers() -> foo_write(), which are valuable,
> and kmalloc() print outs, which might be less valuable? are we going to

The problem is that printk causing more printks is extremely dangerous,
and ANY printk that is caused by a printk is of equal value, whether it
is a console driver running out of memory or a lockdep splat. And
the chances of having two hit at the same time is extremely low.

> lose all of them now? then we can do a much simpler thing - steal one
> bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which
> will be set around call_console_drivers(). vprintk_func() would redirect
> printks to vprintk_noop(fmt, args), which will do nothing.

Not sure what you mean here. Have some pseudo code to demonstrate with?

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-20 15:49                               ` Steven Rostedt
@ 2018-01-21 14:15                                 ` Sergey Senozhatsky
  2018-01-21 21:04                                   ` Steven Rostedt
  2018-01-23  6:40                                   ` Sergey Senozhatsky
  0 siblings, 2 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-21 14:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Tejun Heo, Petr Mladek, Sergey Senozhatsky,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/20/18 10:49), Steven Rostedt wrote:
[..]
> > printks from console_unlock()->call_console_drivers() are redirected
> > to printk_safe buffer. we need irq_work on that CPU to flush its
> > printk_safe buffer.
> 
> So is the issue that we keep triggering this irq work then? Then this
> solution does seem to be one that would work. Because after x amount of
> recursive printks (printk called by printk) it would just stop printing
> them, and end the irq work.
> 
> Perhaps what Tejun is seeing is:
> 
>  printk()
>    net_console()
>      printk() --> redirected to irq work
> 
>  <irq work>
>   printk
>     net_console()
>       printk() --> redirected to another irq work
> 
> and so on and so on.

it's a bit trickier than that, I think.

we have printk recursion from console drivers. it's redirected to
printk_safe and we queue an IRQ work to flush the buffer

 printk
  console_unlock
   call_console_drivers
    net_console
     printk
      printk_save -> irq_work queue

now console_unlock() enables local IRQs, we have the printk_safe
flush. but printk_safe flush does not call into the console_unlock(),
it uses printk_deferred() version of printk

IRQ work

 prink_safe_flush
  printk_deferred -> irq_work queue


so we schedule another IRQ work (deferred printk work), which eventually
tries to lock console_sem

IRQ work
 wake_up_klogd_work_func()
  if (console_trylock())
   console_unlock()

if it succeeds then it goes to console_unlock(), where console driver
can cause another printk recursion. but, once again, it will be
redirected to printk_safe buffer first. if it fails then we have either
the original CPU to print out those irq_work messages, which is sort
of bad, or another CPU which already acquired the console_sem and will
print out.

> This solution would need to be tweaked to add a timer to allow only so
> many nested printks in a given time. Otherwise it too would be an issue:
[..]
> > how are we going to distinguish between lockdep splats, for instance,
> > or WARNs from call_console_drivers() -> foo_write(), which are valuable,
> > and kmalloc() print outs, which might be less valuable? are we going to
> 
> The problem is that printk causing more printks is extremely dangerous,
> and ANY printk that is caused by a printk is of equal value, whether it
> is a console driver running out of memory or a lockdep splat. And
> the chances of having two hit at the same time is extremely low.

so.... fix the console drivers ;)




just kidding. ok...
the problem is that we flush printk_safe right when console_unlock() printing
loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
were enabled in the first place when the CPU went to console_unlock()].
this forces that CPU to loop in console_unlock() as long as we have
printk-s coming from call_console_drivers(). but we probably can postpone
printk_safe flush. basically, we can declare a new rule - we don't flush
printk_safe buffer as long as console_sem is locked. because this is how
that printing CPU stuck in the console_unlock() printing loop. printk_safe
buffer is very important when it comes to storing a non-repetitive stuff, like
a lockdep splat, which is a single shot event. but the more repetitive the
message is, like millions of similar kmalloc() dump_stack()-s over and over
again, the less value in it. we should have printk_safe buffer big enough for
important info, like a lockdep splat, but millions of similar kmalloc()
messages are pretty invaluable - one is already enough, we can drop the rest.
and we should not flush new messages while there is a CPU looping in
console_unlock(), because it already has messages to print, which were
log_store()-ed the normal way.

this is where the "postpone thing" jumps in. so how do we postpone printk_safe
flush.

we can't console_trylock()/console_unlock() in printk_safe flush code.
but there is a `console_locked' flag and is_console_locked() function which
tell us if the console_sem is locked. as long as we are in console_unlock()
printing loop that flag is set, even if we enabled local IRQs and printk_safe
flush work arrived. so now printk_safe flush does extra check and does
not flush printk_safe buffer content as long as someone is currently
printing or soon will start printing. but we need to take extra step and
to re-queue flush on CPUs that did postpone it [console_unlock() can
reschedule]. so now we flush only when printing CPU printed all pending
logbuf messages, hit the "console_seq == log_next_seq" and up()
console_sem. this sets a boundary -- no matter how many times during the
current printing loop we called console drivers and how many times those
drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.


IOW, what we have now, looks like this:

a) printk_safe is for important stuff, we don't guarantee that a flood
   of messages will be preserved.

b) we extend the previously existing "will flush messages later on from
   a safer context" and now we also consider console_unlock() printing loop
   as unsafe context. so the unsafe context it's not only the one that can
   deadlock, but also the one that can lockup CPU in a printing loop because
   of recursive printk messages.


so this

 printk
  console_unlock
  {
   for (;;) {
     call_console_drivers
      net_console
       printk
        printk_save -> irq_work queue

	   IRQ work
	     prink_safe_flush
	       printk_deferred -> log_store()
           iret
    }
    up();
  }


   // which can never break out, because we can always append new messages
   // from prink_safe_flush.

becomes this

printk
  console_unlock
  {
   for (;;) {
     call_console_drivers
      net_console
       printk
        printk_save -> irq_work queue

    }
    up();

  IRQ work
   prink_safe_flush
    printk_deferred -> log_store()
  iret
}



something completely untested, sketchy and ugly.

---

 kernel/printk/internal.h    |  2 ++
 kernel/printk/printk.c      |  1 +
 kernel/printk/printk_safe.c | 37 +++++++++++++++++++++++++++++++++++--
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2a7d04049af4..e85517818a49 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
+void printk_safe_requeue_flushing(void);
+
 #define printk_safe_enter_irqsave(flags)	\
 	do {					\
 		local_irq_save(flags);		\
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..7aca23e8d7b2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2428,6 +2428,7 @@ void console_unlock(void)
 	raw_spin_lock(&logbuf_lock);
 	retry = console_seq != log_next_seq;
 	raw_spin_unlock(&logbuf_lock);
+	printk_safe_requeue_flushing();
 	printk_safe_exit_irqrestore(flags);
 
 	if (retry && console_trylock())
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..45d5b292d7e1 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,7 @@
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
 #include <linux/printk.h>
+#include <linux/console.h>
 
 #include "internal.h"
 
@@ -51,6 +52,7 @@ struct printk_safe_seq_buf {
 	atomic_t		message_lost;
 	struct irq_work		work;	/* IRQ work that flushes the buffer */
 	unsigned char		buffer[SAFE_LOG_BUF_LEN];
+	bool			need_requeue;
 };
 
 static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
@@ -196,6 +198,7 @@ static void __printk_safe_flush(struct irq_work *work)
 	size_t len;
 	int i;
 
+	s->need_requeue = false;
 	/*
 	 * The lock has two functions. First, one reader has to flush all
 	 * available message to make the lockless synchronization with
@@ -243,6 +246,36 @@ static void __printk_safe_flush(struct irq_work *work)
 	raw_spin_unlock_irqrestore(&read_lock, flags);
 }
 
+/* NMI buffers are always flushed */
+static void flush_nmi_buffer(struct irq_work *work)
+{
+	__printk_safe_flush(work);
+}
+
+/* printk_safe buffers flushing, on the contrary, can be postponed */
+static void flush_printk_safe_buffer(struct irq_work *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, work);
+
+	if (is_console_locked()) {
+		s->need_requeue = true;
+		return;
+	}
+
+	__printk_safe_flush(work);
+}
+
+void printk_safe_requeue_flushing(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (per_cpu(safe_print_seq, cpu).need_requeue)
+			queue_flush_work(&per_cpu(safe_print_seq, cpu));
+	}
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -387,11 +420,11 @@ void __init printk_safe_init(void)
 		struct printk_safe_seq_buf *s;
 
 		s = &per_cpu(safe_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->work, flush_printk_safe_buffer);
 
 #ifdef CONFIG_PRINTK_NMI
 		s = &per_cpu(nmi_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->work, flush_nmi_buffer);
 #endif
 	}
 
---



> > lose all of them now? then we can do a much simpler thing - steal one
> > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which
> > will be set around call_console_drivers(). vprintk_func() would redirect
> > printks to vprintk_noop(fmt, args), which will do nothing.
> 
> Not sure what you mean here. Have some pseudo code to demonstrate with?

sure, I meant that if we want to disable printk recursion from
call_console_drivers(), then we can add another printk_safe section, say
printk_noop_begin()/printk_noop_end(), which would set a PRINTK_NOOP
bit of `printk_context', so when we have printk() under PRINTK_NOOP
then vprintk_func() goes to a special vprintk_noop(fmt, args), which
simply drops the message [does not store any in the per-cpu printk
safe buffer, so we don't flush it and don't add new messages to the
logbuf]. and we annotate call_console_drivers() as a pintk_noop
function. but that a no-brainer and I'd prefer to have another solution.

	-ss

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-21 14:15                                 ` Sergey Senozhatsky
@ 2018-01-21 21:04                                   ` Steven Rostedt
  2018-01-22  8:56                                     ` Sergey Senozhatsky
  2018-01-23  6:40                                   ` Sergey Senozhatsky
  1 sibling, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-21 21:04 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Tejun Heo, Petr Mladek, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Sun, 21 Jan 2018 23:15:21 +0900
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:

> so.... fix the console drivers ;)

Totally agree!

> 
> 
> 
> 
> just kidding. ok...

Darn it! ;-)

> the problem is that we flush printk_safe right when console_unlock() printing
> loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
> were enabled in the first place when the CPU went to console_unlock()].
> this forces that CPU to loop in console_unlock() as long as we have
> printk-s coming from call_console_drivers(). but we probably can postpone
> printk_safe flush. basically, we can declare a new rule - we don't flush
> printk_safe buffer as long as console_sem is locked. because this is how
> that printing CPU stuck in the console_unlock() printing loop. printk_safe
> buffer is very important when it comes to storing a non-repetitive stuff, like
> a lockdep splat, which is a single shot event. but the more repetitive the
> message is, like millions of similar kmalloc() dump_stack()-s over and over
> again, the less value in it. we should have printk_safe buffer big enough for
> important info, like a lockdep splat, but millions of similar kmalloc()
> messages are pretty invaluable - one is already enough, we can drop the rest.
> and we should not flush new messages while there is a CPU looping in
> console_unlock(), because it already has messages to print, which were
> log_store()-ed the normal way.

The above is really hard to read without any capitalization. Everything
seems to be a run-on sentence and gives me a head ache. So you lost me
there.

> 
> this is where the "postpone thing" jumps in. so how do we postpone printk_safe
> flush.
> 
> we can't console_trylock()/console_unlock() in printk_safe flush code.
> but there is a `console_locked' flag and is_console_locked() function which
> tell us if the console_sem is locked. as long as we are in console_unlock()
> printing loop that flag is set, even if we enabled local IRQs and printk_safe
> flush work arrived. so now printk_safe flush does extra check and does
> not flush printk_safe buffer content as long as someone is currently
> printing or soon will start printing. but we need to take extra step and
> to re-queue flush on CPUs that did postpone it [console_unlock() can
> reschedule]. so now we flush only when printing CPU printed all pending
> logbuf messages, hit the "console_seq == log_next_seq" and up()
> console_sem. this sets a boundary -- no matter how many times during the
> current printing loop we called console drivers and how many times those
> drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.

Another big paragraph with no capitals (besides macros and CPU ;-)

I guess this is what it is like when people listen to me talk too fast.


> 
> 
> IOW, what we have now, looks like this:
> 
> a) printk_safe is for important stuff, we don't guarantee that a flood
>    of messages will be preserved.
> 
> b) we extend the previously existing "will flush messages later on from
>    a safer context" and now we also consider console_unlock() printing loop
>    as unsafe context. so the unsafe context it's not only the one that can
>    deadlock, but also the one that can lockup CPU in a printing loop because
>    of recursive printk messages.

Sure.

> 
> 
> so this
> 
>  printk
>   console_unlock
>   {
>    for (;;) {
>      call_console_drivers
>       net_console
>        printk
>         printk_save -> irq_work queue
> 
> 	   IRQ work
> 	     prink_safe_flush
> 	       printk_deferred -> log_store()
>            iret
>     }
>     up();
>   }
> 
> 
>    // which can never break out, because we can always append new messages
>    // from prink_safe_flush.
> 
> becomes this
> 
> printk
>   console_unlock
>   {
>    for (;;) {
>      call_console_drivers
>       net_console
>        printk
>         printk_save -> irq_work queue
> 
>     }
>     up();
> 
>   IRQ work
>    prink_safe_flush
>     printk_deferred -> log_store()
>   iret
> }

But we do eventually send this data out to the consoles, and if the
consoles cause more printks, wouldn't this still never end?

> 
> 
> 
> something completely untested, sketchy and ugly.
> 
> ---
> 
>  kernel/printk/internal.h    |  2 ++
>  kernel/printk/printk.c      |  1 +
>  kernel/printk/printk_safe.c | 37 +++++++++++++++++++++++++++++++++++--
>  3 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
> index 2a7d04049af4..e85517818a49 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args);
>  void __printk_safe_enter(void);
>  void __printk_safe_exit(void);
>  
> +void printk_safe_requeue_flushing(void);
> +
>  #define printk_safe_enter_irqsave(flags)	\
>  	do {					\
>  		local_irq_save(flags);		\
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9cb943c90d98..7aca23e8d7b2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2428,6 +2428,7 @@ void console_unlock(void)
>  	raw_spin_lock(&logbuf_lock);
>  	retry = console_seq != log_next_seq;
>  	raw_spin_unlock(&logbuf_lock);
> +	printk_safe_requeue_flushing();
>  	printk_safe_exit_irqrestore(flags);
>  
>  	if (retry && console_trylock())
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index 3e3c2004bb23..45d5b292d7e1 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -22,6 +22,7 @@
>  #include <linux/cpumask.h>
>  #include <linux/irq_work.h>
>  #include <linux/printk.h>
> +#include <linux/console.h>
>  
>  #include "internal.h"
>  
> @@ -51,6 +52,7 @@ struct printk_safe_seq_buf {
>  	atomic_t		message_lost;
>  	struct irq_work		work;	/* IRQ work that flushes the buffer */
>  	unsigned char		buffer[SAFE_LOG_BUF_LEN];
> +	bool			need_requeue;
>  };
>  
>  static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
> @@ -196,6 +198,7 @@ static void __printk_safe_flush(struct irq_work *work)
>  	size_t len;
>  	int i;
>  
> +	s->need_requeue = false;
>  	/*
>  	 * The lock has two functions. First, one reader has to flush all
>  	 * available message to make the lockless synchronization with
> @@ -243,6 +246,36 @@ static void __printk_safe_flush(struct irq_work *work)
>  	raw_spin_unlock_irqrestore(&read_lock, flags);
>  }
>  
> +/* NMI buffers are always flushed */
> +static void flush_nmi_buffer(struct irq_work *work)
> +{
> +	__printk_safe_flush(work);
> +}
> +
> +/* printk_safe buffers flushing, on the contrary, can be postponed */
> +static void flush_printk_safe_buffer(struct irq_work *work)
> +{
> +	struct printk_safe_seq_buf *s =
> +		container_of(work, struct printk_safe_seq_buf, work);
> +
> +	if (is_console_locked()) {
> +		s->need_requeue = true;
> +		return;
> +	}
> +
> +	__printk_safe_flush(work);
> +}
> +
> +void printk_safe_requeue_flushing(void)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (per_cpu(safe_print_seq, cpu).need_requeue)
> +			queue_flush_work(&per_cpu(safe_print_seq, cpu));
> +	}
> +}
> +
>  /**
>   * printk_safe_flush - flush all per-cpu nmi buffers.
>   *
> @@ -387,11 +420,11 @@ void __init printk_safe_init(void)
>  		struct printk_safe_seq_buf *s;
>  
>  		s = &per_cpu(safe_print_seq, cpu);
> -		init_irq_work(&s->work, __printk_safe_flush);
> +		init_irq_work(&s->work, flush_printk_safe_buffer);
>  
>  #ifdef CONFIG_PRINTK_NMI
>  		s = &per_cpu(nmi_print_seq, cpu);
> -		init_irq_work(&s->work, __printk_safe_flush);
> +		init_irq_work(&s->work, flush_nmi_buffer);
>  #endif
>  	}
>  
> ---
> 
> 
> 
> > > lose all of them now? then we can do a much simpler thing - steal one
> > > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which
> > > will be set around call_console_drivers(). vprintk_func() would redirect
> > > printks to vprintk_noop(fmt, args), which will do nothing.  
> > 
> > Not sure what you mean here. Have some pseudo code to demonstrate with?  
> 
> sure, I meant that if we want to disable printk recursion from
> call_console_drivers(), then we can add another printk_safe section, say
> printk_noop_begin()/printk_noop_end(), which would set a PRINTK_NOOP
> bit of `printk_context', so when we have printk() under PRINTK_NOOP
> then vprintk_func() goes to a special vprintk_noop(fmt, args), which
> simply drops the message [does not store any in the per-cpu printk
> safe buffer, so we don't flush it and don't add new messages to the
> logbuf]. and we annotate call_console_drivers() as a pintk_noop
> function. but that a no-brainer and I'd prefer to have another solution.
> 

Another big paragraph without caps, but I figured it out.

I say we try that solution and see if it fixes the current issues.
Because right now, the bug I see Tejun presented was if something in
printk causes printks, it will start a printk bomb and lock up the
system. The only reasonable answer I see to that is to throttle printk
in such a case.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes
  2018-01-19  3:27                 ` Steven Rostedt
@ 2018-01-22  2:31                   ` Byungchul Park
  0 siblings, 0 replies; 140+ messages in thread
From: Byungchul Park @ 2018-01-22  2:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang,
	Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko,
	Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky,
	Tejun Heo, Pavel Machek, linux-kernel, kernel-team

On 1/19/2018 12:27 PM, Steven Rostedt wrote:
> On Fri, 19 Jan 2018 11:37:13 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
>> On 1/19/2018 12:21 AM, Steven Rostedt wrote:
>>> On Thu, 18 Jan 2018 13:01:46 +0900
>>> Byungchul Park <byungchul.park@lge.com> wrote:
>>>    
>>>>> I disagree. It is like a spinlock. You can say a spinlock() that is
>>>>> blocked is also waiting for an event. That event being the owner does a
>>>>> spin_unlock().
>>>>
>>>> That's exactly what I was saying. Excuse me but, I don't understand
>>>> what you want to say. Could you explain more? What do you disagree?
>>>
>>> I guess I'm confused at what you are asking for then.
>>
>> Sorry for not enough explanation. What I asked you for is:
>>
>>      1. Relocate acquire()s/release()s.
>>      2. So make it simpler and remove unnecessary one.
>>      3. So make it look like the following form,
>>         because it's a thing simulating "wait and event".
>>
>>         A context
>>         ---------
>>         lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
>>                                 /* "Read" one is better though..    */
> 
> why? I'm assuming you are talking about adding this to the current

It was about console_unlock()'s body that is:

+        /* The waiter may spin on us after setting console_owner */
+        spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
          ^^^^^^^^^^^^
+
          stop_critical_timings();    /* don't trace print latency */
          call_console_drivers(ext_text, ext_len, text, len);
          start_critical_timings();
+
+        raw_spin_lock(&console_owner_lock);
+        waiter = READ_ONCE(console_waiter);
+        console_owner = NULL;
+        raw_spin_unlock(&console_owner_lock);
+
+        /*
+         * If there is a waiter waiting for us, then pass the
+         * rest of the work load over to that waiter.
+         */
+        if (waiter)
+            break;
+
+        /* There was no waiter, and nothing will spin on us here */
+        spin_release(&console_owner_dep_map, 1, _THIS_IP_);
          ^^^^^^^^^^^^ I recommand to move this over the "if" statament.
+
          printk_safe_exit_irqrestore(flags);
          if (do_cond_resched)
              cond_resched();
      }
+
+    /*
+     * If there is an active waiter waiting on the console_lock.
+     * Pass off the printing to the waiter, and the waiter
+     * will continue printing on its CPU, and when all writing
+     * has finished, the last printer will wake up klogd.
+     */
+    if (waiter) {
+        WRITE_ONCE(console_waiter, false);
+        /* The waiter is now free to continue */
+        spin_release(&console_owner_dep_map, 1, _THIS_IP_);
          ^^^^^^^^^^^^ I recommand to remove this.

> owner off the console_owner? This is a mutually exclusive section, no
> parallel access. Why the Read?

Not much matter whether to use the read version or not.

Let me explain it more since you asked. (I don't stongly insist to use
the read version tho.) For example:

       A context (context A)
       ---------------------
       lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */
                               /* "Read" one is better though..    */

       /* A section, we suspect a wait for the event might happen. */
       ...

       lock_map_release(wait);
       trigger the event;

       The place actually doing the wait (context B)
       ---------------------------------------------
       lock_map_acquire(wait);
       lock_map_release(wait);

       wait_for_event(wait); /* Actually do the wait */

The acquire() in context A is not a real acquisition but only for
detecting if a wait is in the section, which means that should not
interact with another pseudo acqusition but only with real waits.

lock_map_acquire_read() makes it done as we expect. That's why I
said 'read' one is better. But it's ok to use normal(write) one.
(I'm not sure if Peterz finished making the 'read' work well, tho.)

>>
>>         /* A section, we suspect a wait for an event might happen. */
>>         ...
>>
>>         lock_map_release(wait);
>>
>>         The place actually doing the wait
>>         ---------------------------------
>>         lock_map_acquire(wait);
>>         lock_map_release(wait);
>>
>>         wait_for_event(wait); /* Actually do the wait */
>>
>> Honestly, you used acquire()s/release()s as if they are cross-
>> release stuff which mainly handles general waits and events,
>> not only things doing "acquire -> critical area -> release".
>> But that's not in the mainline at the moment.
> 
> Maybe it is more like that. Because, the thing I'm doing is passing off
> a semaphore ownership to the waiter.
> 
>  From a previous email:
> 
>>> +			if (spin) {
>>> +				/* We spin waiting for the owner to release us */
>>> +				spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
>>> +				/* Owner will clear console_waiter on hand off */
>>> +				while (READ_ONCE(console_waiter))
>>> +					cpu_relax();
>>> +
>>> +				spin_release(&console_owner_dep_map, 1, _THIS_IP_);
>>
>> Why don't you move this over "while (READ_ONCE(console_waiter))" and
>> right after acquire()?
>>
>> As I said last time, only acquisitions between acquire() and release()
>> are meaningful. Are you taking care of acquisitions within cpu_relax()?
>> If so, leave it.
> 
> There is no acquisitions between acquire and release. To get to
> "if (spin)" the acquire had to already been done. If it was released,
> this spinner is now the new "owner". There's no race with anyone else.
> But it doesn't technically have it till console_waiter is set to NULL.
> Why would we call release() before that? Or maybe I'm missing something.
> 
> Or are you just saying that it doesn't matter if it is before or after
> the while() loop, to just put it before? Does it really matter?

It doesn't matter. As I said, there's logically no problem on it.
Leave the code if you want to locate those that way. I just started
to mention it becasue some lines can be removed with the code a bit
fixed.

> 
> -- Steve
> 

-- 
Thanks,
Byungchul

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-21 21:04                                   ` Steven Rostedt
@ 2018-01-22  8:56                                     ` Sergey Senozhatsky
  2018-01-22 10:28                                       ` Sergey Senozhatsky
  0 siblings, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-22  8:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Tejun Heo, Petr Mladek,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/21/18 16:04), Steven Rostedt wrote:
[..]
> > The problem is that we flush printk_safe right when console_unlock() printing
> > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs
> > were enabled in the first place when the CPU went to console_unlock()].
> > This forces that CPU to loop in console_unlock() as long as we have
> > printk-s coming from call_console_drivers(). But we probably can postpone
> > printk_safe flush. Basically, we can declare a new rule - we don't flush
> > printk_safe buffer as long as console_sem is locked. Because this is how
> > that printing CPU stuck in the console_unlock() printing loop. printk_safe
> > buffer is very important when it comes to storing a non-repetitive stuff, like
> > a lockdep splat, which is a single shot event. But the more repetitive the
> > message is, like millions of similar kmalloc() dump_stack()-s over and over
> > again, the less value in it. We should have printk_safe buffer big enough for
> > important info, like a lockdep splat, but millions of similar kmalloc()
> > messages are pretty invaluable - one is already enough, we can drop the rest.
> > And we should not flush new messages while there is a CPU looping in
> > console_unlock(), because it already has messages to print, which were
> > log_store()-ed the normal way.
> 
> The above is really hard to read without any capitalization. Everything
> seems to be a run-on sentence and gives me a head ache. So you lost me
> there.

Apologies. Will improve.

> > This is where the "postpone thing" jumps in. so how do we postpone printk_safe
> > flush.
> > 
> > We can't console_trylock()/console_unlock() in printk_safe flush code.
> > But there is a `console_locked' flag and is_console_locked() function which
> > tell us if the console_sem is locked. As long as we are in console_unlock()
> > printing loop that flag is set, even if we enabled local IRQs and printk_safe
> > flush work arrived. So now printk_safe flush does extra check and does
> > not flush printk_safe buffer content as long as someone is currently
> > printing or soon will start printing. But we need to take extra step and
> > to re-queue flush on CPUs that did postpone it [console_unlock() can
> > reschedule]. So now we flush only when printing CPU printed all pending
> > logbuf messages, hit the "console_seq == log_next_seq" and up()
> > console_sem. This sets a boundary -- no matter how many times during the
> > current printing loop we called console drivers and how many times those
> > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars.
> 
> Another big paragraph with no capitals (besides macros and CPU ;-)

I walked through it and mostly "fixed" your head ache :)

> I guess this is what it is like when people listen to me talk too fast.

Absolutely!!!

> > IOW, what we have now, looks like this:
> > 
> > a) printk_safe is for important stuff, we don't guarantee that a flood
> >    of messages will be preserved.
> > 
> > b) we extend the previously existing "will flush messages later on from
> >    a safer context" and now we also consider console_unlock() printing loop
> >    as unsafe context. so the unsafe context it's not only the one that can
> >    deadlock, but also the one that can lockup CPU in a printing loop because
> >    of recursive printk messages.
> 
> Sure.
> 
> > 
> > 
> > so this
> > 
> >  printk
> >   console_unlock
> >   {
> >    for (;;) {
> >      call_console_drivers
> >       net_console
> >        printk
> >         printk_save -> irq_work queue
> > 
> > 	   IRQ work
> > 	     prink_safe_flush
> > 	       printk_deferred -> log_store()
> >            iret
> >     }
> >     up();
> >   }
> > 
> > 
> >    // which can never break out, because we can always append new messages
> >    // from prink_safe_flush.
> > 
> > becomes this
> > 
> > printk
> >   console_unlock
> >   {
> >    for (;;) {
> >      call_console_drivers
> >       net_console
> >        printk
> >         printk_save -> irq_work queue
> > 
> >     }
> >     up();
> > 
> >   IRQ work
> >    prink_safe_flush
> >     printk_deferred -> log_store()
> >   iret
> > }
> 
> But we do eventually send this data out to the consoles, and if the
> consoles cause more printks, wouldn't this still never end?

Right. But not immediately. We wait for all pending messages to be evicted
first (and up()) and we limit the amount of data that we flush. So at least
it's not exponential anymore: every line that we print does not log_store()
a whole new dump_stack() of lines. Which is still miles away from "a perfect
solution", tho. But limiting the number of lines we print recursive is not
much better.

First, we don't know how many lines we want to flush from printk_safe.
And having a knob indicates that no one ever will do it right.

Second, hand off can play games with it.

Assume the following,

- I set `recursion_max' to 200. Which looks reasonable to me.
  Then I have the following ping-pong:

	CPU0						CPU1
	printk()
	recursion_check_start()
	 call_console_drivers()         		printk()
							recursion_check_start()
	  dump_stack()					console_trylock_spinning()
	 flush_printk_safe()
	 spinning_disable_and_check() //handoff
        recursion_check_finish() // reset		 call_console_drivers()
							  dump_stack()
							 flush_printk_safe()
	printk()
	recursion_check_start()
	console_trylock_spinning()			 spinning_disable_and_check() // handoff
							recursion_check_finish() // reset

	 call_console_drivers()				printk
	  dump_stack()					recursion_check_start()
	 flush_printk_safe()				console_trylock_spinning()
	 spinning_disable_and_check()
	recursion_check_finish() // reset		 call_console_drivers()
							 ...

And so on. So it's - take the lock, call console drivers, fill up the
printk_safe buffer, flush it completely, hand off printing to another
CPU, reset this CPU's recursion counter, repeat everything again. Every
line of dump_stack() which we print adds another dump_stack() lines.


	Sergey "no-time-for-capitals" Senozhatsky

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-22  8:56                                     ` Sergey Senozhatsky
@ 2018-01-22 10:28                                       ` Sergey Senozhatsky
  2018-01-22 10:36                                         ` Sergey Senozhatsky
  0 siblings, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-22 10:28 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, Petr Mladek, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On (01/22/18 17:56), Sergey Senozhatsky wrote:
[..]
> Assume the following,

But more importantly we are missing another huge thing - console_unlock().

Suppose:

	console_lock();
	<< preemption >>
						printk
						printk
						..
						printk
	console_unlock()
	 for (;;) {
		call_console_drivers()
		   dump_stack
		   queue IRQ work

		IRQ work >>
		   flush_printk_safe
		   printk_deferred()
		   ...
		   printk_deferred()
		<< iret
	 }

This should explode: sleepable console_unlock() may reschedule,
printk_safe flush bypasses recursion checks.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-22 10:28                                       ` Sergey Senozhatsky
@ 2018-01-22 10:36                                         ` Sergey Senozhatsky
  0 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-22 10:36 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, Petr Mladek, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On (01/22/18 19:28), Sergey Senozhatsky wrote:
> On (01/22/18 17:56), Sergey Senozhatsky wrote:
> [..]
> > Assume the following,
> 
> But more importantly we are missing another huge thing - console_unlock().

IOW, not every console_unlock() is from vprintk_emit(). We can have
console_trylock() -> console_unlock() being from non-preemptible context,
etc. And then irq work to flush printk_safe -> printk_deferred all the time.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-21 14:15                                 ` Sergey Senozhatsky
  2018-01-21 21:04                                   ` Steven Rostedt
@ 2018-01-23  6:40                                   ` Sergey Senozhatsky
  2018-01-23  7:05                                     ` Sergey Senozhatsky
                                                       ` (2 more replies)
  1 sibling, 3 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-23  6:40 UTC (permalink / raw)
  To: Petr Mladek, Tejun Heo, Steven Rostedt
  Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel, Sergey Senozhatsky

Hello,

On (01/21/18 23:15), Sergey Senozhatsky wrote:
[..]
> we have printk recursion from console drivers. it's redirected to
> printk_safe and we queue an IRQ work to flush the buffer
> 
>  printk
>   console_unlock
>    call_console_drivers
>     net_console
>      printk
>       printk_save -> irq_work queue
> 
> now console_unlock() enables local IRQs, we have the printk_safe
> flush. but printk_safe flush does not call into the console_unlock(),
> it uses printk_deferred() version of printk
> 
> IRQ work
> 
>  prink_safe_flush
>   printk_deferred -> irq_work queue
> 
> 
> so we schedule another IRQ work (deferred printk work), which eventually
> tries to lock console_sem
> 
> IRQ work
>  wake_up_klogd_work_func()
>   if (console_trylock())
>    console_unlock()

Why do we even use irq_work for printk_safe?

Okay... So, how about this. For printk_safe we use system_wq for flushing.
IOW, we flush from a task running exactly on the same CPU which hit printk
recursion, not from IRQ. From vprintk_safe() recursion, we queue work on
*that* CPU. Which gives us the following thing: if CPU stuck in
console_unlock() loop with preemption disabled, then system_wq does not
schedule on that CPU and we, thus, don't flush printk_safe buffer from that
CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe
buffer, printing extra messages from that CPU will not lock it up, because
it's in preemptible context.

Thoughts?


Something like this:

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: [PATCH] printk/safe: use slowpath flush for printk_safe

---
 kernel/printk/printk_safe.c | 53 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 48 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..c641853a5fa9 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,8 @@
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
 #include <linux/printk.h>
+#include <linux/console.h>
+#include <linux/workqueue.h>
 
 #include "internal.h"
 
@@ -50,6 +52,7 @@ struct printk_safe_seq_buf {
 	atomic_t		len;	/* length of written data */
 	atomic_t		message_lost;
 	struct irq_work		work;	/* IRQ work that flushes the buffer */
+	struct work_struct	slowpath_flush_work;
 	unsigned char		buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,12 +64,20 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
 	if (printk_safe_irq_ready)
 		irq_work_queue(&s->work);
 }
 
+static void queue_slowpath_flush_work(struct printk_safe_seq_buf *s)
+{
+	if (printk_safe_irq_ready)
+		queue_work_on(smp_processor_id(),
+				system_wq,
+				&s->slowpath_flush_work);
+}
+
 /*
  * Add a message to per-CPU context-dependent buffer. NMI and printk-safe
  * have dedicated buffers, because otherwise printk-safe preempted by
@@ -89,7 +100,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
 	/* The trailing '\0' is not counted into len. */
 	if (len >= sizeof(s->buffer) - 1) {
 		atomic_inc(&s->message_lost);
-		queue_flush_work(s);
+		queue_irq_flush_work(s);
 		return 0;
 	}
 
@@ -112,7 +123,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
 	if (atomic_cmpxchg(&s->len, len, len + add) != len)
 		goto again;
 
-	queue_flush_work(s);
 	return add;
 }
 
@@ -243,6 +253,35 @@ static void __printk_safe_flush(struct irq_work *work)
 	raw_spin_unlock_irqrestore(&read_lock, flags);
 }
 
+/* NMI buffers are always flushed */
+static void flush_nmi_buffer(struct irq_work *work)
+{
+	__printk_safe_flush(work);
+}
+
+/* printk_safe buffers flushing, on the contrary, can be postponed */
+static void flush_printk_safe_buffer(struct irq_work *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, work);
+
+	if (is_console_locked()) {
+		queue_slowpath_flush_work(s);
+		return;
+	}
+
+	__printk_safe_flush(work);
+}
+
+static void slowpath_flush_work_fn(struct work_struct *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf,
+				slowpath_flush_work);
+
+	__printk_safe_flush(&s->work);
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -300,6 +339,7 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
 {
 	struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
 
+	queue_irq_flush_work(s);
 	return printk_safe_log_store(s, fmt, args);
 }
 
@@ -343,6 +383,7 @@ static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args)
 {
 	struct printk_safe_seq_buf *s = this_cpu_ptr(&safe_print_seq);
 
+	queue_slowpath_flush_work(s);
 	return printk_safe_log_store(s, fmt, args);
 }
 
@@ -387,11 +428,13 @@ void __init printk_safe_init(void)
 		struct printk_safe_seq_buf *s;
 
 		s = &per_cpu(safe_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->work, flush_printk_safe_buffer);
+		INIT_WORK(&s->slowpath_flush_work, slowpath_flush_work_fn);
 
 #ifdef CONFIG_PRINTK_NMI
 		s = &per_cpu(nmi_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->work, flush_nmi_buffer);
+		/* we don't use slowpath flush for NMI */
 #endif
 	}
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23  6:40                                   ` Sergey Senozhatsky
@ 2018-01-23  7:05                                     ` Sergey Senozhatsky
  2018-01-23  7:31                                     ` Sergey Senozhatsky
  2018-01-23 14:56                                     ` Steven Rostedt
  2 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-23  7:05 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Steven Rostedt, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky

On (01/23/18 15:40), Sergey Senozhatsky wrote:
[..]
> Why do we even use irq_work for printk_safe?
> 
> Okay... So, how about this. For printk_safe we use system_wq for flushing.
> IOW, we flush from a task running exactly on the same CPU which hit printk
> recursion, not from IRQ. From vprintk_safe() recursion, we queue work on
> *that* CPU. Which gives us the following thing: if CPU stuck in
> console_unlock() loop with preemption disabled, then system_wq does not
> schedule on that CPU and we, thus, don't flush printk_safe buffer from that
> CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe
> buffer, printing extra messages from that CPU will not lock it up, because
> it's in preemptible context.
> 
> Thoughts?

A slightly reworked version:
a) Do not check console_locked
b) Do not have irq_work fast path for printk_safe buffer
 c) Which lets to union WQ/IRQ work structs - we use only IRQ work for
    NMI buffers, and only WQ work for SAFE buffers
 d) And also to refactor the code

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: [PATCH] printk/safe: use system_wq to flush printk_safe buffers

---
 kernel/printk/printk_safe.c | 52 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 40 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..6c8c82cedccb 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,7 @@
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
 #include <linux/printk.h>
+#include <linux/workqueue.h>
 
 #include "internal.h"
 
@@ -49,7 +50,12 @@ static int printk_safe_irq_ready __read_mostly;
 struct printk_safe_seq_buf {
 	atomic_t		len;	/* length of written data */
 	atomic_t		message_lost;
-	struct irq_work		work;	/* IRQ work that flushes the buffer */
+	union {
+		/* IRQ work that flushes NMI buffer */
+		struct irq_work		irq_flush_work;
+		/* WQ work that flushes SAFE buffer */
+		struct work_struct	wq_flush_work;
+	};
 	unsigned char		buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,10 +67,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
 	if (printk_safe_irq_ready)
-		irq_work_queue(&s->work);
+		irq_work_queue(&s->irq_flush_work);
+}
+
+static void queue_wq_flush_work(struct printk_safe_seq_buf *s)
+{
+	if (printk_safe_irq_ready)
+		queue_work_on(smp_processor_id(),
+				system_wq,
+				&s->wq_flush_work);
 }
 
 /*
@@ -89,7 +103,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
 	/* The trailing '\0' is not counted into len. */
 	if (len >= sizeof(s->buffer) - 1) {
 		atomic_inc(&s->message_lost);
-		queue_flush_work(s);
 		return 0;
 	}
 
@@ -112,7 +125,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
 	if (atomic_cmpxchg(&s->len, len, len + add) != len)
 		goto again;
 
-	queue_flush_work(s);
 	return add;
 }
 
@@ -186,12 +198,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s)
  * Flush data from the associated per-CPU buffer. The function
  * can be called either via IRQ work or independently.
  */
-static void __printk_safe_flush(struct irq_work *work)
+static void __printk_safe_flush(struct printk_safe_seq_buf *s)
 {
 	static raw_spinlock_t read_lock =
 		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
-	struct printk_safe_seq_buf *s =
-		container_of(work, struct printk_safe_seq_buf, work);
 	unsigned long flags;
 	size_t len;
 	int i;
@@ -243,6 +253,22 @@ static void __printk_safe_flush(struct irq_work *work)
 	raw_spin_unlock_irqrestore(&read_lock, flags);
 }
 
+static void irq_flush_work_fn(struct irq_work *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+	__printk_safe_flush(s);
+}
+
+static void wq_flush_work_fn(struct work_struct *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, wq_flush_work);
+
+	__printk_safe_flush(s);
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -256,9 +282,9 @@ void printk_safe_flush(void)
 
 	for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
-		__printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work);
+		__printk_safe_flush(this_cpu_ptr(&nmi_print_seq));
 #endif
-		__printk_safe_flush(&per_cpu(safe_print_seq, cpu).work);
+		__printk_safe_flush(this_cpu_ptr(&safe_print_seq));
 	}
 }
 
@@ -300,6 +326,7 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
 {
 	struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
 
+	queue_irq_flush_work(s);
 	return printk_safe_log_store(s, fmt, args);
 }
 
@@ -343,6 +370,7 @@ static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args)
 {
 	struct printk_safe_seq_buf *s = this_cpu_ptr(&safe_print_seq);
 
+	queue_wq_flush_work(s);
 	return printk_safe_log_store(s, fmt, args);
 }
 
@@ -387,11 +415,11 @@ void __init printk_safe_init(void)
 		struct printk_safe_seq_buf *s;
 
 		s = &per_cpu(safe_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		INIT_WORK(&s->wq_flush_work, wq_flush_work_fn);
 
 #ifdef CONFIG_PRINTK_NMI
 		s = &per_cpu(nmi_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->irq_flush_work, irq_flush_work_fn);
 #endif
 	}
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23  6:40                                   ` Sergey Senozhatsky
  2018-01-23  7:05                                     ` Sergey Senozhatsky
@ 2018-01-23  7:31                                     ` Sergey Senozhatsky
  2018-01-23 14:56                                     ` Steven Rostedt
  2 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-23  7:31 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, Steven Rostedt, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky

On (01/23/18 15:40), Sergey Senozhatsky wrote:
> 
> Why do we even use irq_work for printk_safe?
> 

... perhaps because of

wq: pool->lock -> printk -> call_console_drivers -> printk -> vprintk_safe -> wq: pool->lock

Which is a "many things have gone wrong" type of scenario. Maybe we
can workaround it somehow, hm. Tejun, can we have lockless WQ? ;)

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23  6:40                                   ` Sergey Senozhatsky
  2018-01-23  7:05                                     ` Sergey Senozhatsky
  2018-01-23  7:31                                     ` Sergey Senozhatsky
@ 2018-01-23 14:56                                     ` Steven Rostedt
  2018-01-23 15:21                                       ` Sergey Senozhatsky
  2 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-23 14:56 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel, Sergey Senozhatsky

On Tue, 23 Jan 2018 15:40:23 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> Why do we even use irq_work for printk_safe?

Why not?

Really, I think you are trying to solve a symptom and not the problem.
If we are having issues with irq_work, we are going to have issues with
a work queue. It's just spreading out the problem instead of fixing it.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 14:56                                     ` Steven Rostedt
@ 2018-01-23 15:21                                       ` Sergey Senozhatsky
  2018-01-23 15:41                                         ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-23 15:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky

On (01/23/18 09:56), Steven Rostedt wrote:
[..]
> > Why do we even use irq_work for printk_safe?
> 
> Why not?
> 
> Really, I think you are trying to solve a symptom and not the problem.
> If we are having issues with irq_work, we are going to have issues with
> a work queue. It's just spreading out the problem instead of fixing it.

I don't want to have heuristics in print_safe, I don't want to have a magic
number controlled by a user-space visible knob, I don't want to have the
first 3 lines of a lockdep splat.


The problem is - we flush printk_safe too soon and printing CPU ends up
in a lockup - it log_store()-s new messages while it's printing the pending
ones. It's fine to do so when CPU is in preemptible context. Really, we
should not care in printk_safe as long as we don't lockup the kernel. The
misbehaving console must be fixed. If CPU is not in preemptible context then
we do lockup the kernel. Because we flush printk_safe regardless of the
current CPU context. If we will flush printk_safe via WQ then we automatically
add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we
will not lockup it up." thing. Yes, we fill up the logbuf with probably needed
and appreciated or unneeded messages. But we should not care in printk_safe.
We don't lockup the kernel... And the misbehaving console must be fixed.

I disagree with "If we are having issues with irq_work, we are going to have
issues with a work queue". There is a tremendous difference between irq_work
on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU
context, the other one does.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 15:21                                       ` Sergey Senozhatsky
@ 2018-01-23 15:41                                         ` Steven Rostedt
  2018-01-23 15:43                                           ` Tejun Heo
  2018-01-23 16:01                                           ` Sergey Senozhatsky
  0 siblings, 2 replies; 140+ messages in thread
From: Steven Rostedt @ 2018-01-23 15:41 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Wed, 24 Jan 2018 00:21:30 +0900
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:

> On (01/23/18 09:56), Steven Rostedt wrote:
> [..]
> > > Why do we even use irq_work for printk_safe?  
> > 
> > Why not?
> > 
> > Really, I think you are trying to solve a symptom and not the problem.
> > If we are having issues with irq_work, we are going to have issues with
> > a work queue. It's just spreading out the problem instead of fixing it.  
> 
> I don't want to have heuristics in print_safe, I don't want to have a magic
> number controlled by a user-space visible knob, I don't want to have the
> first 3 lines of a lockdep splat.

We can have more. But if printk is causing printks, that's a major bug.
And work queues are not going to fix it, it will just spread out the
pain. Have it be 100 printks, it needs to be fixed if it is happening.
And having all printks just generate more printks is not helpful. Even
if we slow them down. They will still never end.

A printk causing a printk is a special case, and we need to just show
enough to let the user know that its happening, and why printks are
being throttled. Yes, we may lose data, but if every printk that goes
out causes another printk, then there's going to be so much noise that
we wont know what other things went wrong. Honestly, if someone showed
me a report where the logs were filled with printks that caused
printks, I'd stop right there and tell them that needs to be fixed
before we do anything else. And if that recursion is happening because
of another problem, I don't want to see the recursion printks. I want
to see the printks that show what is causing the recursions.



> The problem is - we flush printk_safe too soon and printing CPU ends up
> in a lockup - it log_store()-s new messages while it's printing the pending

No, the problem is that printks are causing more printks. Yes that will
make flushing them soon more likely to lock up the system. But that's
not the problem. The problem is printks causing printks.

> ones. It's fine to do so when CPU is in preemptible context. Really, we
> should not care in printk_safe as long as we don't lockup the kernel. The
> misbehaving console must be fixed. If CPU is not in preemptible context then
> we do lockup the kernel. Because we flush printk_safe regardless of the
> current CPU context. If we will flush printk_safe via WQ then we automatically

And if we can throttle recursive printks, then we should be able to
stop that from happening.

> add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we
> will not lockup it up." thing. Yes, we fill up the logbuf with probably needed
> and appreciated or unneeded messages. But we should not care in printk_safe.
> We don't lockup the kernel... And the misbehaving console must be fixed.

I agree.

> 
> I disagree with "If we are having issues with irq_work, we are going to have
> issues with a work queue". There is a tremendous difference between irq_work
> on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU
> context, the other one does.

But switching to work queue does not address the underlining problem
that printks are causing more printks.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 15:41                                         ` Steven Rostedt
@ 2018-01-23 15:43                                           ` Tejun Heo
  2018-01-23 16:12                                             ` Sergey Senozhatsky
                                                               ` (2 more replies)
  2018-01-23 16:01                                           ` Sergey Senozhatsky
  1 sibling, 3 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-23 15:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Hello, Steven.

On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote:
> > I don't want to have heuristics in print_safe, I don't want to have a magic
> > number controlled by a user-space visible knob, I don't want to have the
> > first 3 lines of a lockdep splat.
> 
> We can have more. But if printk is causing printks, that's a major bug.
> And work queues are not going to fix it, it will just spread out the
> pain. Have it be 100 printks, it needs to be fixed if it is happening.
> And having all printks just generate more printks is not helpful. Even
> if we slow them down. They will still never end.

So, at least in the case that we were seeing, it isn't that black and
white.  printk keeps causing printks but only because printk buffer
flushing is preventing the printk'ing context from making forward
progress.  The key problem there is that a flushing context may get
pinned flushing indefinitely and using a separate context does solve
the problem.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 15:41                                         ` Steven Rostedt
  2018-01-23 15:43                                           ` Tejun Heo
@ 2018-01-23 16:01                                           ` Sergey Senozhatsky
  2018-01-23 16:24                                             ` Steven Rostedt
  2018-01-23 17:22                                             ` Tejun Heo
  1 sibling, 2 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-23 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Tejun Heo,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/23/18 10:41), Steven Rostedt wrote:
[..]
> We can have more. But if printk is causing printks, that's a major bug.
> And work queues are not going to fix it, it will just spread out the
> pain. Have it be 100 printks, it needs to be fixed if it is happening.
> And having all printks just generate more printks is not helpful. Even
> if we slow them down. They will still never end.

Dropping the messages is not the solution either. The original bug report
report was - this "locks up my kernel". That's it. That's all people asked
us to solve.

With WQ we don't lockup the kernel, because we flush printk_safe in
preemptible context. And people are very much expected to fix the
misbehaving consoles. But that should not be printk_safe problem.

> A printk causing a printk is a special case, and we need to just show
> enough to let the user know that its happening, and why printks are
> being throttled. Yes, we may lose data, but if every printk that goes
> out causes another printk, then there's going to be so much noise that
> we wont know what other things went wrong. Honestly, if someone showed
> me a report where the logs were filled with printks that caused
> printks, I'd stop right there and tell them that needs to be fixed
> before we do anything else. And if that recursion is happening because
> of another problem, I don't want to see the recursion printks. I want
> to see the printks that show what is causing the recursions.

I'll re-read this one tomorrow. Not quite following it.

> > The problem is - we flush printk_safe too soon and printing CPU ends up
> > in a lockup - it log_store()-s new messages while it's printing the pending
> 
> No, the problem is that printks are causing more printks. Yes that will
> make flushing them soon more likely to lock up the system. But that's
> not the problem. The problem is printks causing printks.

Yes. And ignoring those printk()-s by simply dropping them does not fix
the problem by any means.

> > ones. It's fine to do so when CPU is in preemptible context. Really, we
> > should not care in printk_safe as long as we don't lockup the kernel. The
> > misbehaving console must be fixed. If CPU is not in preemptible context then
> > we do lockup the kernel. Because we flush printk_safe regardless of the
> > current CPU context. If we will flush printk_safe via WQ then we automatically
> 
> And if we can throttle recursive printks, then we should be able to
> stop that from happening.

pintk_safe was designed to be recursive. It was never designed to be
used to troubleshoot or debug consoles. But it was designed to be
recursive - because that's the sort of the problems it was meant to
handle: recursive printks that would otherwise deadlock us. That's why
we have it in the first place.

> > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we
> > will not lockup it up." thing. Yes, we fill up the logbuf with probably needed
> > and appreciated or unneeded messages. But we should not care in printk_safe.
> > We don't lockup the kernel... And the misbehaving console must be fixed.
> 
> I agree.

Good.

> > I disagree with "If we are having issues with irq_work, we are going to have
> > issues with a work queue". There is a tremendous difference between irq_work
> > on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU
> > context, the other one does.
> 
> But switching to work queue does not address the underlining problem
> that printks are causing more printks.

The only way to address those problems is to fix the console. That's the only.

But that's not what I'm doing with my proposal. I fix the lockup scenario, the
only reported problem so far. Whilst also keeping printk_safe around.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 15:43                                           ` Tejun Heo
@ 2018-01-23 16:12                                             ` Sergey Senozhatsky
  2018-01-23 16:13                                             ` Steven Rostedt
  2018-04-23  5:35                                             ` Sergey Senozhatsky
  2 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-23 16:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky,
	Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

Hello, Tejun

On (01/23/18 07:43), Tejun Heo wrote:
> Hello, Steven.
> 
> On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote:
> > > I don't want to have heuristics in print_safe, I don't want to have a magic
> > > number controlled by a user-space visible knob, I don't want to have the
> > > first 3 lines of a lockdep splat.
> > 
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.

Would you, as the original bug reporter, be OK if we flush printk_safe (only
printk_safe, not printk_nmi for the time being) via WQ? This should move that
"uncontrolled" flush to a safe context. I don't think we can easily add
kthread offloading to printk at the moment (this will result in a massive gun
fight).

Just in case, below is something like a patch. I think I worked around the
possible wq deadlock scenario. But I haven't tested the patch yet. It's
a bit late here and I guess I need some rest. Will try to look more at
it tomorrow.

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: [PATCH] printk/safe: split flush works

---
 kernel/printk/printk_safe.c | 75 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 12 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..54bc40ce3c34 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -22,6 +22,7 @@
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
 #include <linux/printk.h>
+#include <linux/workqueue.h>
 
 #include "internal.h"
 
@@ -49,7 +50,10 @@ static int printk_safe_irq_ready __read_mostly;
 struct printk_safe_seq_buf {
 	atomic_t		len;	/* length of written data */
 	atomic_t		message_lost;
-	struct irq_work		work;	/* IRQ work that flushes the buffer */
+	/* IRQ work that flushes NMI buffer */
+	struct irq_work		irq_flush_work;
+	/* WQ work that flushes SAFE buffer */
+	struct work_struct	wq_flush_work;
 	unsigned char		buffer[SAFE_LOG_BUF_LEN];
 };
 
@@ -61,10 +65,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
 #endif
 
 /* Get flushed in a more safe context. */
-static void queue_flush_work(struct printk_safe_seq_buf *s)
+static void queue_irq_flush_work(struct printk_safe_seq_buf *s)
 {
 	if (printk_safe_irq_ready)
-		irq_work_queue(&s->work);
+		irq_work_queue(&s->irq_flush_work);
+}
+
+static void queue_wq_flush_work(struct printk_safe_seq_buf *s)
+{
+	if (printk_safe_irq_ready)
+		queue_work_on(smp_processor_id(),
+				system_wq,
+				&s->wq_flush_work);
 }
 
 /*
@@ -89,7 +101,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
 	/* The trailing '\0' is not counted into len. */
 	if (len >= sizeof(s->buffer) - 1) {
 		atomic_inc(&s->message_lost);
-		queue_flush_work(s);
+		queue_irq_flush_work(s);
 		return 0;
 	}
 
@@ -112,7 +124,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s,
 	if (atomic_cmpxchg(&s->len, len, len + add) != len)
 		goto again;
 
-	queue_flush_work(s);
+	queue_irq_flush_work(s);
 	return add;
 }
 
@@ -186,12 +198,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s)
  * Flush data from the associated per-CPU buffer. The function
  * can be called either via IRQ work or independently.
  */
-static void __printk_safe_flush(struct irq_work *work)
+static void __printk_safe_flush(struct printk_safe_seq_buf *s)
 {
 	static raw_spinlock_t read_lock =
 		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
-	struct printk_safe_seq_buf *s =
-		container_of(work, struct printk_safe_seq_buf, work);
 	unsigned long flags;
 	size_t len;
 	int i;
@@ -243,6 +253,46 @@ static void __printk_safe_flush(struct irq_work *work)
 	raw_spin_unlock_irqrestore(&read_lock, flags);
 }
 
+static void irq_flush_work_fn(struct irq_work *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+	__printk_safe_flush(s);
+}
+
+/*
+ * We can't queue wq work directly from vprintk_safe(), because we can
+ * deadlock. For instance:
+ *
+ * queue_work()
+ *  spin_lock(pool->lock)
+ *   printk()
+ *    call_console_drivers()
+ *     vprintk_safe()
+ *      queue_work()
+ *       spin_lock(pool->lock)
+ *
+ * So we use irq_work, from which we queue wq work. WQ disables local IRQs
+ * while it works with pool, so if we have irq_work on that CPU then we can
+ * expect that pool->lock is not locked.
+ */
+static void irq_to_wq_flush_work_fn(struct irq_work *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, irq_flush_work);
+
+	queue_wq_flush_work(s);
+}
+
+static void wq_flush_work_fn(struct work_struct *work)
+{
+	struct printk_safe_seq_buf *s =
+		container_of(work, struct printk_safe_seq_buf, wq_flush_work);
+
+	__printk_safe_flush(s);
+}
+
 /**
  * printk_safe_flush - flush all per-cpu nmi buffers.
  *
@@ -256,9 +306,9 @@ void printk_safe_flush(void)
 
 	for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
-		__printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work);
+		__printk_safe_flush(this_cpu_ptr(&nmi_print_seq));
 #endif
-		__printk_safe_flush(&per_cpu(safe_print_seq, cpu).work);
+		__printk_safe_flush(this_cpu_ptr(&safe_print_seq));
 	}
 }
 
@@ -387,11 +437,12 @@ void __init printk_safe_init(void)
 		struct printk_safe_seq_buf *s;
 
 		s = &per_cpu(safe_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->irq_flush_work, irq_to_wq_flush_work_fn);
+		INIT_WORK(&s->wq_flush_work, wq_flush_work_fn);
 
 #ifdef CONFIG_PRINTK_NMI
 		s = &per_cpu(nmi_print_seq, cpu);
-		init_irq_work(&s->work, __printk_safe_flush);
+		init_irq_work(&s->irq_flush_work, irq_flush_work_fn);
 #endif
 	}
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 15:43                                           ` Tejun Heo
  2018-01-23 16:12                                             ` Sergey Senozhatsky
@ 2018-01-23 16:13                                             ` Steven Rostedt
  2018-01-23 17:21                                               ` Tejun Heo
  2018-04-23  5:35                                             ` Sergey Senozhatsky
  2 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-23 16:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Tue, 23 Jan 2018 07:43:47 -0800
Tejun Heo <tj@kernel.org> wrote:

> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.
>

Does it?

>From what I understand is that there's an issue with one of the printk
consoles, due to memory pressure or whatnot. Then a printk happens
within a printk recursively. It gets put into the safe buffer and an
irq is sent to printk this printk.

The issue you are saying is that when the printk enables interrupts,
the irq work triggers and loads the log buffer with the safe buffer, and
then the printk sees the new data added and continues to print, and
hence never leaves this printk.

Your solution is to delay the flushing of the safe buffer to another
thread (work queue), which I also have issues with, because you break
the "get printks out ASAP mantra". Then the work queue comes in and
flushes the printks. And since the printks cause printks, we continue
to spam the machine, but hey, we are making forward progress.

Again, this is treating the symptom and not solving the problem.

I really hate delaying printks to another thread, unless we can
guarantee that that thread is ready to go immediately (basically
spinning on a run queue waiting to print). Because if the system is
having issues (which is the main reason for printks to happen), there's
no guarantee that a work queue or another thread will ever schedule,
and the safe printk buffer never gets out to the consoles.

I much rather have throttling when recursive printks are detected.
Make it a 100 lines to print if you want, but then throttle. Because
once you have 100 lines or so, you will know that printks are causing
printks, and you don't give a crap about the repeated process. Allow
one flushing of the printk safe buffers, and then if it happens again,
throttle it.

Both methods can lose important data. I believe the throttling of
recursive printks, after 100 prints or whatever, will be the least
likely to lose important data, because printks caused by printks will
just keep repeating the same data, and we don't care about repeats. But
delaying the flushing could very well lose important data that caused
a lockup.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 16:01                                           ` Sergey Senozhatsky
@ 2018-01-23 16:24                                             ` Steven Rostedt
  2018-01-24  2:11                                               ` Sergey Senozhatsky
  2018-01-23 17:22                                             ` Tejun Heo
  1 sibling, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-23 16:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Wed, 24 Jan 2018 01:01:53 +0900
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:

> On (01/23/18 10:41), Steven Rostedt wrote:
> [..]
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.  
> 
> Dropping the messages is not the solution either. The original bug report
> report was - this "locks up my kernel". That's it. That's all people asked
> us to solve.

And throttling the printks would stop the lock up too.

> 
> With WQ we don't lockup the kernel, because we flush printk_safe in
> preemptible context. And people are very much expected to fix the
> misbehaving consoles. But that should not be printk_safe problem.

Right, but now you just made printk safe unreliable to get information
out, because you need to wait for a schedule to occur, and if there's
issues, like a deadlock, that thread will never run. And you just lost
you lockdep splat.

> 
> > A printk causing a printk is a special case, and we need to just show
> > enough to let the user know that its happening, and why printks are
> > being throttled. Yes, we may lose data, but if every printk that goes
> > out causes another printk, then there's going to be so much noise that
> > we wont know what other things went wrong. Honestly, if someone showed
> > me a report where the logs were filled with printks that caused
> > printks, I'd stop right there and tell them that needs to be fixed
> > before we do anything else. And if that recursion is happening because
> > of another problem, I don't want to see the recursion printks. I want
> > to see the printks that show what is causing the recursions.  
> 
> I'll re-read this one tomorrow. Not quite following it.

I'll add more capitals next time ;-)

> 
> > > The problem is - we flush printk_safe too soon and printing CPU ends up
> > > in a lockup - it log_store()-s new messages while it's printing the pending  
> > 
> > No, the problem is that printks are causing more printks. Yes that will
> > make flushing them soon more likely to lock up the system. But that's
> > not the problem. The problem is printks causing printks.  
> 
> Yes. And ignoring those printk()-s by simply dropping them does not fix
> the problem by any means.

How so? If we drop them, then the stuck printk has nothing to print and
will move forward.

I say once you start dropping printks due to recursion, keep dropping
them. For at least a second, to allow them to stop killing the machine.

> 
> > > ones. It's fine to do so when CPU is in preemptible context. Really, we
> > > should not care in printk_safe as long as we don't lockup the kernel. The
> > > misbehaving console must be fixed. If CPU is not in preemptible context then
> > > we do lockup the kernel. Because we flush printk_safe regardless of the
> > > current CPU context. If we will flush printk_safe via WQ then we automatically  
> > 
> > And if we can throttle recursive printks, then we should be able to
> > stop that from happening.  
> 
> pintk_safe was designed to be recursive. It was never designed to be
> used to troubleshoot or debug consoles. But it was designed to be
> recursive - because that's the sort of the problems it was meant to
> handle: recursive printks that would otherwise deadlock us. That's why
> we have it in the first place.

So printk safe is only triggered when at the same context? If we can
guarantee that printk safe is triggered only when its because a printk
is happening at the same context (not because of an interrupt, but
really at the same context, using my context check), then I'm fine with
delaying them to a work queue.

That is, if we have this:

	printk()
		console_lock()
			<interrupt>
				printk()
					add to log buffer
		<print irq printk too>
		console_unlock();


	printk()
		console_lock()
			<console does a printk>
				put in printk safe buffer
				trigger work queue
		console_unlock()
	<work queue>
		flush safe buffer
		printk()

Then I'm fine with that.

I have to look at the latest code. If this is indeed what we have, then
I admit I misunderstood the problem you want to solve.

I only want recursive printks (those that are actually triggered by
doing a printk) to be allowed to be delayed.

Make sense?

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 16:13                                             ` Steven Rostedt
@ 2018-01-23 17:21                                               ` Tejun Heo
  0 siblings, 0 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-23 17:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, akpm,
	linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Hey,

On Tue, Jan 23, 2018 at 11:13:30AM -0500, Steven Rostedt wrote:
> From what I understand is that there's an issue with one of the printk
> consoles, due to memory pressure or whatnot. Then a printk happens
> within a printk recursively. It gets put into the safe buffer and an
> irq is sent to printk this printk.
> 
> The issue you are saying is that when the printk enables interrupts,
> the irq work triggers and loads the log buffer with the safe buffer, and
> then the printk sees the new data added and continues to print, and
> hence never leaves this printk.

I'm not sure it's irq or the same calling context, but yeah whatever
it may be, it keeps adding new data.

> Your solution is to delay the flushing of the safe buffer to another
> thread (work queue), which I also have issues with, because you break
> the "get printks out ASAP mantra". Then the work queue comes in and
> flushes the printks. And since the printks cause printks, we continue
> to spam the machine, but hey, we are making forward progress.

I'm not sure "get printks out ASAP mantra" is the overriding concern
after spending 20s flushing in an unknown context.  I'm honestly
curious.  Would that still matter that much at that point?  I went
through the recent common crashes in the fleet earlier today and a
good number of them are printk taking too long unnecessarily
escalating the situation (most commonly triggering NMI watchdog).  I'm
not saying that this should override other concerns but it seems clear
to me that we're pretty badly exposed on this front.

> Again, this is treating the symptom and not solving the problem.

Or adding a safety net when things go south, but this isn't what I was
trying to argue.  I mostly thought your understanding of what I
reported wasn't accurate and wanted to clear that up.

> I really hate delaying printks to another thread, unless we can
> guarantee that that thread is ready to go immediately (basically
> spinning on a run queue waiting to print). Because if the system is
> having issues (which is the main reason for printks to happen), there's
> no guarantee that a work queue or another thread will ever schedule,
> and the safe printk buffer never gets out to the consoles.
>
> I much rather have throttling when recursive printks are detected.
> Make it a 100 lines to print if you want, but then throttle. Because
> once you have 100 lines or so, you will know that printks are causing
> printks, and you don't give a crap about the repeated process. Allow
> one flushing of the printk safe buffers, and then if it happens again,
> throttle it.
> 
> Both methods can lose important data. I believe the throttling of
> recursive printks, after 100 prints or whatever, will be the least
> likely to lose important data, because printks caused by printks will
> just keep repeating the same data, and we don't care about repeats. But
> delaying the flushing could very well lose important data that caused
> a lockup.

Hmmm... what you're suggesting still seems more fragile - ie. when
does that 100 count get reset?  OOM prints quite a few lines and if
we're resetting on each line, that two order explosion of messages can
still be really really bad.  And issues like that seem to suggest that
the root problem to handle here is avoiding locking up a context in
flushing for too long.  Your approach is trying to avoid causing that
but it's a symptom which can be reached in many different ways.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 16:01                                           ` Sergey Senozhatsky
  2018-01-23 16:24                                             ` Steven Rostedt
@ 2018-01-23 17:22                                             ` Tejun Heo
  1 sibling, 0 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-23 17:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, Sergey Senozhatsky, Petr Mladek, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

Hello, Sergey.

On Wed, Jan 24, 2018 at 01:01:53AM +0900, Sergey Senozhatsky wrote:
> On (01/23/18 10:41), Steven Rostedt wrote:
> [..]
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> Dropping the messages is not the solution either. The original bug report
> report was - this "locks up my kernel". That's it. That's all people asked
> us to solve.
> 
> With WQ we don't lockup the kernel, because we flush printk_safe in
> preemptible context. And people are very much expected to fix the
> misbehaving consoles. But that should not be printk_safe problem.

I really don't care as long as it's robust enough.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 16:24                                             ` Steven Rostedt
@ 2018-01-24  2:11                                               ` Sergey Senozhatsky
  2018-01-24  2:52                                                 ` Steven Rostedt
  0 siblings, 1 reply; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-24  2:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Tejun Heo,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

Hello,

On (01/23/18 11:24), Steven Rostedt wrote:
[..]
> > With WQ we don't lockup the kernel, because we flush printk_safe in
> > preemptible context. And people are very much expected to fix the
> > misbehaving consoles. But that should not be printk_safe problem.
> 
> Right, but now you just made printk safe unreliable to get information
> out, because you need to wait for a schedule to occur, and if there's
> issues, like a deadlock, that thread will never run. And you just lost
> you lockdep splat.

Yes and No.

printk_safe and printk_nmi are unreliable - both need irq_work. That's
why we forcibly flush those buffers in panic(). At least for printk_safe
case, and I'm pretty sure the same stands for printk_nmi, we never said
that we will store all the messages that were printed from unsafe context
(recursion or NMI). The only thing we said - we will try not to deadlock
the system.

Now we are adding one more thing to printk_safe - we will also try not to
lockup the system.

Default printk_safe buffer size might not be enough to store a very large
lockdep splat. And we will report that the buffer is too small and that we
lost some of the lines: "here is what we have, we lost N lines, but at least
we didn't deadlock the system". See f975237b76827956fe13ecfe993a319158e2c303
for more details, it contains a list of recursive-printk deadlock scenarios
that printk_safe was meant to handle.

It is possible and OK to lose messages in printk_safe/printk_nmi

printk_safe_enter_irqsave()
  printk
  printk
  ...
  ...
  printk
  printk
printk_safe_exit_irqrestore()

No flush will take place as long as there is no IRQ on that CPU.
But printk_safe and printk_nmi are solving different problem in
the first place.

> > I'll re-read this one tomorrow. Not quite following it.
> 
> I'll add more capitals next time ;-)

Ha-ha-ha ;)

[..]
> > pintk_safe was designed to be recursive. It was never designed to be
> > used to troubleshoot or debug consoles. But it was designed to be
> > recursive - because that's the sort of the problems it was meant to
> > handle: recursive printks that would otherwise deadlock us. That's why
> > we have it in the first place.
> 
> So printk safe is only triggered when at the same context? If we can
> guarantee that printk safe is triggered only when its because a printk
> is happening at the same context (not because of an interrupt, but
> really at the same context, using my context check), then I'm fine with
> delaying them to a work queue.

printk_safe is for printk recursion only. It happens in the same context
only. When we switch to printk_safe we disable local IRQs, NMIs have their
own printk_nmi thing. And the way we flush printk_safe is mostly recursive.
Because we flush when we know that we will not deadlock [as much as we can;
we can't control any 3rd party locks which might be involved; thus
printk_deferred() usage].

Usually it's something like

   printk
    spin_lock_irqsave(logbuf_lock)
     printk
      spin_lock_irqsave(logbuf_lock) << deadlock

What we have with printk_safe is

  printk
   local_irq_save
   printk_safe_enter
   spin_lock(logbuf_lock)
    printk
     vprintk_safe
      queue irq work
   spin_unlock(logbuf_lock)
   printk_safe_exit
   local_irq_restore
   >>> IRQ work
       printk_safe_flush
        printk
	 spin_lock_irqsave(logbuf_lock)
	 log_store()
	 spin_unlock_irqrestore(logbuf_lock)

So we flush printk_safe ASAP, which usually (unless originally we were
not in IRQ) means that the flush is recursive, but safe - we don't
deadlock.

> That is, if we have this:
> 
> 	printk()
> 		console_lock()
> 			<interrupt>
> 				printk()
> 					add to log buffer
> 		<print irq printk too>
> 		console_unlock();

Right. This is what we have right now. Every time we enable local IRQs in
the console_unlock() printing loop - we flush printk_safe. And that's the
problem.

> 	printk()
> 		console_lock()
> 			<console does a printk>
> 				put in printk safe buffer
> 				trigger work queue
> 		console_unlock()
> 	<work queue>
> 		flush safe buffer
> 		printk()

Right. This is what we will have with WQ. We don't flush printk_safe until
we return from console_unlock(). Because printk() disables preemption for
the duration of console_unlock(), we can't schedule WQ on that CPU. And we
schedule flushing work only on the CPU that has triggered the recursion.

Another thing:

console_lock()
 blah blah
console_unlock()

In this case we will flush printk_safe withing the printing loop.
Immediately. But we don't care - the CPU is preemptible, we don't
lock up the kernel.


> Then I'm fine with that.
> 
> I have to look at the latest code. If this is indeed what we have, then
> I admit I misunderstood the problem you want to solve.
> 
> I only want recursive printks (those that are actually triggered by
> doing a printk) to be allowed to be delayed.
> 
> Make sense?

Please take a look.

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-24  2:11                                               ` Sergey Senozhatsky
@ 2018-01-24  2:52                                                 ` Steven Rostedt
  2018-01-24  4:44                                                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 140+ messages in thread
From: Steven Rostedt @ 2018-01-24  2:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm,
	Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman,
	Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Pavel Machek, linux-kernel

On Wed, 24 Jan 2018 11:11:33 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> Please take a look.

Was there something specific to look at?

I'm doing a hundred different things at once, and my memory cache keeps
getting flushed.

-- Steve

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-24  2:52                                                 ` Steven Rostedt
@ 2018-01-24  4:44                                                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-01-24  4:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Tejun Heo,
	akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
	Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
	Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
	rostedt, Byungchul Park, Pavel Machek, linux-kernel

On (01/23/18 21:52), Steven Rostedt wrote:
> On Wed, 24 Jan 2018 11:11:33 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
> 
> > Please take a look.
> 
> Was there something specific to look at?

Not really. Just my previous email, basically.
You said "I have to look at the latest code." so I replied.

Well, if the proposed direction does make sense then I'll send
out a patch.


> I'm doing a hundred different things at once, and my memory cache...

Meltdown vulnerable? Suddenly it all makes sense - you talk too fast
because of speculative execution... ;)

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 17:02     ` Tejun Heo
                         ` (2 preceding siblings ...)
  2018-01-10 18:40       ` Mathieu Desnoyers
@ 2018-01-24  9:36       ` Peter Zijlstra
  2018-01-24 18:46         ` Tejun Heo
  2018-05-09  8:58       ` Sergey Senozhatsky
  4 siblings, 1 reply; 140+ messages in thread
From: Peter Zijlstra @ 2018-01-24  9:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>    netconsole is in use.

So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had
trouble like that (granted I don't typically trigger OOM storms, but
they do occasionally happen).

Is your IPMI much slower and not fixable to be faster?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-24  9:36       ` Peter Zijlstra
@ 2018-01-24 18:46         ` Tejun Heo
  0 siblings, 0 replies; 140+ messages in thread
From: Tejun Heo @ 2018-01-24 18:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt,
	Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt,
	Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel

Hello, Peter.

On Wed, Jan 24, 2018 at 10:36:07AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote:
> > 1. Console is IPMI emulated serial console.  Super slow.  Also
> >    netconsole is in use.
> 
> So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had
> trouble like that (granted I don't typically trigger OOM storms, but
> they do occasionally happen).
> 
> Is your IPMI much slower and not fixable to be faster?

It looks like the latest machines have the baud rate at 57600 and I'm
pretty sure we have a lot of slower ones.  57600 isn't 9600 but is
still slow enough to get into trouble often enough.  There are a huge
number of machines running all sorts of things under heavy load and
trying to rapidly deploy new kernels / features contributes to
encountering bugs and weird corner cases.

UART can run a lot faster and I have no idea why IPMI consoles behave
as if they were connected over mile-long DB9 cables.  Maybe we can
convince hardware people to improve it but, even if that happened
today, we'd still be looking at years of dealing with slower ones, and
IPMI situation here is likely better than what many others are facing.

idk, it's not a particularly difficult problem to solve from kernel
side.  Just need to figure out a better / more robust trade-off.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-23 15:43                                           ` Tejun Heo
  2018-01-23 16:12                                             ` Sergey Senozhatsky
  2018-01-23 16:13                                             ` Steven Rostedt
@ 2018-04-23  5:35                                             ` Sergey Senozhatsky
  2 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-04-23  5:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky,
	Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen,
	Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka,
	Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers,
	Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek,
	linux-kernel

On (01/23/18 07:43), Tejun Heo wrote:
> > 
> > We can have more. But if printk is causing printks, that's a major bug.
> > And work queues are not going to fix it, it will just spread out the
> > pain. Have it be 100 printks, it needs to be fixed if it is happening.
> > And having all printks just generate more printks is not helpful. Even
> > if we slow them down. They will still never end.
> 
> So, at least in the case that we were seeing, it isn't that black and
> white.  printk keeps causing printks but only because printk buffer
> flushing is preventing the printk'ing context from making forward
> progress.  The key problem there is that a flushing context may get
> pinned flushing indefinitely and using a separate context does solve
> the problem.

Hello Tejun,

I'm willing to take a look at those printk()-s from console drivers.
Any chance you can send me some of the backtraces you see [the most
common/disturbing]?

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
  2018-01-10 17:02     ` Tejun Heo
                         ` (3 preceding siblings ...)
  2018-01-24  9:36       ` Peter Zijlstra
@ 2018-05-09  8:58       ` Sergey Senozhatsky
  4 siblings, 0 replies; 140+ messages in thread
From: Sergey Senozhatsky @ 2018-05-09  8:58 UTC (permalink / raw)
  To: Tejun Heo, Petr Mladek, Andrew Morton, Steven Rostedt,
	Johannes Weiner, Michal Hocko, Vlastimil Babka
  Cc: Petr Mladek, Linus Torvalds, Sergey Senozhatsky, linux-mm,
	Cong Wang, Dave Hansen, Mel Gorman, Peter Zijlstra, Jan Kara,
	Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park,
	Sergey Senozhatsky, Pavel Machek, linux-kernel

Hi,

Move printk and (some of) MM people to the recipients list.

On (01/10/18 09:02), Tejun Heo wrote:
[..]
> The particular case that we've been seeing regularly in the fleet was
> the following scenario.
> 
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>    netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>    / driver tries to allocate memory and then fail, which in turn
>    triggers allocation failure or other warning messages.  printk was
>    already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>    shrinking.  Because OOM handler is trapped in printk flushing, it
>    never manages to free memory and no one else can enter OOM path
>    either, so the system is trapped in this state.

Tejun, we have a theory [since there are no logs available] that what
you are looking at is something as follows:

console_unlock()
{
  for (;;) {
   call_console_drivers()
     kmalloc()/etc        /* netconsole, skb kmalloc(), for instance */
      __alloc_pages_slowpath()
        warn_alloc()      /* a bunch of printk() -> log_store() */
  }
}

Now, warn_alloc() is rate limited to
	DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST

so net console driver can add 10 warn_alloc() reports every 5 seconds to
the logbuf.

You have a "super slow" IPMI console and net console. So for every
logbuf entry we do:

console_unlock()
{
  for (;;) {
    call_console_drivers(msg) -> IPMI_write()
    call_console_drivers(msg) -> netconsole_write() -> skb kmalloc() -> warn_alloc() -> ratelimit
  }
}

IPMI_write() is very slow, as you have noted, so it consumes time
printing messages, simultaneously warn_alloc() rate limit depends on
time. *Probably*, slow IPMI_write() is unable to flush 10 warn_alloc()
reports under 5 seconds, which gives net console a chance to add another
10 warn_alloc()-s, while the previous 10 warn_alloc()-s have not been
flushed yet.

It seems that DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST
warn_alloc() rate limit is too permissive for your setup.

Can you confirm that the theory is actually correct?

If it is correct, then can we simply tweak warn_alloc() rate limit?
Say, make it x2 / x4 / etc. times less verbose? E.g. "up to 5 warn_alloc()-s
every 10 seconds"? What do MM folks think?

	-ss

^ permalink raw reply	[flat|nested] 140+ messages in thread

end of thread, other threads:[~2018-05-09  8:58 UTC | newest]

Thread overview: 140+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-10 13:24 [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Petr Mladek
2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek
2018-01-10 16:50   ` Steven Rostedt
2018-01-12 16:54   ` Steven Rostedt
2018-01-12 17:11     ` Steven Rostedt
2018-01-17 19:13       ` Rasmus Villemoes
2018-01-17 19:33         ` Steven Rostedt
2018-01-19  9:51         ` Sergey Senozhatsky
2018-01-18 22:03     ` Pavel Machek
2018-01-19  0:20       ` Steven Rostedt
2018-01-17  2:19   ` Byungchul Park
2018-01-17  4:54     ` Byungchul Park
2018-01-17  7:34     ` Byungchul Park
2018-01-17 12:04     ` Petr Mladek
2018-01-18  1:53       ` Byungchul Park
2018-01-18  1:57         ` Byungchul Park
2018-01-18  2:19         ` Steven Rostedt
2018-01-18  4:01           ` Byungchul Park
2018-01-18 15:21             ` Steven Rostedt
2018-01-19  2:37               ` Byungchul Park
2018-01-19  3:27                 ` Steven Rostedt
2018-01-22  2:31                   ` Byungchul Park
2018-01-10 13:24 ` [PATCH v5 2/2] printk: Hide console waiter logic into helpers Petr Mladek
2018-01-10 17:52   ` Steven Rostedt
2018-01-11 12:03     ` Petr Mladek
2018-01-12 15:37       ` Steven Rostedt
2018-01-12 16:08         ` Petr Mladek
2018-01-12 16:36           ` Steven Rostedt
2018-01-15 16:08             ` Petr Mladek
2018-01-16  5:05               ` Sergey Senozhatsky
2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo
2018-01-10 16:29   ` Petr Mladek
2018-01-10 17:02     ` Tejun Heo
2018-01-10 18:21       ` Peter Zijlstra
2018-01-10 18:30         ` Tejun Heo
2018-01-10 18:41           ` Peter Zijlstra
2018-01-10 19:05             ` Tejun Heo
2018-01-11  5:15         ` Sergey Senozhatsky
2018-01-10 18:22       ` Steven Rostedt
2018-01-10 18:36         ` Tejun Heo
2018-01-10 18:40       ` Mathieu Desnoyers
2018-01-11  7:36         ` Sergey Senozhatsky
2018-01-11 11:24           ` Petr Mladek
2018-01-11 13:19             ` Sergey Senozhatsky
2018-01-24  9:36       ` Peter Zijlstra
2018-01-24 18:46         ` Tejun Heo
2018-05-09  8:58       ` Sergey Senozhatsky
2018-01-10 18:54     ` Steven Rostedt
2018-01-11  5:10     ` Sergey Senozhatsky
2018-01-10 18:05   ` Steven Rostedt
2018-01-10 18:12     ` Tejun Heo
2018-01-10 18:14       ` Tejun Heo
2018-01-10 18:45         ` Steven Rostedt
2018-01-10 18:41       ` Steven Rostedt
2018-01-10 18:57         ` Tejun Heo
2018-01-10 19:17           ` Steven Rostedt
2018-01-10 19:34             ` Tejun Heo
2018-01-10 19:44               ` Steven Rostedt
2018-01-10 22:44                 ` Tejun Heo
2018-01-11  5:35             ` Sergey Senozhatsky
2018-01-11  4:58     ` Sergey Senozhatsky
2018-01-11  9:34       ` Petr Mladek
2018-01-11 10:38         ` Sergey Senozhatsky
2018-01-11 11:50           ` Petr Mladek
2018-01-11 16:29           ` Steven Rostedt
2018-01-12  1:30             ` Steven Rostedt
2018-01-12  2:55               ` Steven Rostedt
2018-01-12  4:20                 ` Steven Rostedt
2018-01-16 19:44                 ` Tejun Heo
2018-01-17  9:12                   ` Petr Mladek
2018-01-17 15:15                     ` Tejun Heo
2018-01-17 17:12                       ` Steven Rostedt
2018-01-17 18:42                         ` Steven Rostedt
2018-01-19 18:20                           ` Steven Rostedt
2018-01-20  7:14                             ` Sergey Senozhatsky
2018-01-20 15:49                               ` Steven Rostedt
2018-01-21 14:15                                 ` Sergey Senozhatsky
2018-01-21 21:04                                   ` Steven Rostedt
2018-01-22  8:56                                     ` Sergey Senozhatsky
2018-01-22 10:28                                       ` Sergey Senozhatsky
2018-01-22 10:36                                         ` Sergey Senozhatsky
2018-01-23  6:40                                   ` Sergey Senozhatsky
2018-01-23  7:05                                     ` Sergey Senozhatsky
2018-01-23  7:31                                     ` Sergey Senozhatsky
2018-01-23 14:56                                     ` Steven Rostedt
2018-01-23 15:21                                       ` Sergey Senozhatsky
2018-01-23 15:41                                         ` Steven Rostedt
2018-01-23 15:43                                           ` Tejun Heo
2018-01-23 16:12                                             ` Sergey Senozhatsky
2018-01-23 16:13                                             ` Steven Rostedt
2018-01-23 17:21                                               ` Tejun Heo
2018-04-23  5:35                                             ` Sergey Senozhatsky
2018-01-23 16:01                                           ` Sergey Senozhatsky
2018-01-23 16:24                                             ` Steven Rostedt
2018-01-24  2:11                                               ` Sergey Senozhatsky
2018-01-24  2:52                                                 ` Steven Rostedt
2018-01-24  4:44                                                   ` Sergey Senozhatsky
2018-01-23 17:22                                             ` Tejun Heo
2018-01-20 12:19                             ` Tejun Heo
2018-01-20 14:51                               ` Steven Rostedt
2018-01-17 20:05                         ` Tejun Heo
2018-01-18  5:43                           ` Sergey Senozhatsky
2018-01-18 11:51                           ` Petr Mladek
2018-01-18  5:42                         ` Sergey Senozhatsky
2018-01-12  3:12               ` Sergey Senozhatsky
2018-01-12  2:56             ` Sergey Senozhatsky
2018-01-12  3:21               ` Steven Rostedt
2018-01-12 10:05                 ` Sergey Senozhatsky
2018-01-12 12:21                   ` Steven Rostedt
2018-01-12 12:55                     ` Petr Mladek
2018-01-13  7:31                       ` Sergey Senozhatsky
2018-01-15  8:51                         ` Petr Mladek
2018-01-15  9:48                           ` Sergey Senozhatsky
2018-01-16  5:16                           ` Sergey Senozhatsky
2018-01-16  9:08                             ` Petr Mladek
2018-01-15 12:08                       ` Steven Rostedt
2018-01-16  4:51                         ` Sergey Senozhatsky
2018-01-13  7:28                     ` Sergey Senozhatsky
2018-01-15 10:17                       ` Petr Mladek
2018-01-15 11:50                         ` Petr Mladek
2018-01-16  6:10                           ` Sergey Senozhatsky
2018-01-16  9:36                             ` Petr Mladek
2018-01-16 10:10                               ` Sergey Senozhatsky
2018-01-16 16:06                             ` Steven Rostedt
2018-01-16  5:23                         ` Sergey Senozhatsky
2018-01-15 12:06                       ` Steven Rostedt
2018-01-15 14:45                         ` Petr Mladek
2018-01-16  2:23                           ` Sergey Senozhatsky
2018-01-16  4:47                             ` Sergey Senozhatsky
2018-01-16 10:19                               ` Petr Mladek
2018-01-17  2:24                                 ` Sergey Senozhatsky
2018-01-16 15:45                               ` Steven Rostedt
2018-01-17  2:18                                 ` Sergey Senozhatsky
2018-01-17 13:04                                   ` Petr Mladek
2018-01-17 15:24                                     ` Steven Rostedt
2018-01-18  4:31                                     ` Sergey Senozhatsky
2018-01-18 15:22                                       ` Steven Rostedt
2018-01-16 10:13                             ` Petr Mladek
2018-01-17  6:29                               ` Sergey Senozhatsky
2018-01-16  1:46                         ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).