All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-07-17 23:48 ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, linux-kernel

Hello,

Here is the second series to rework the printk subsystem. This series
removes LOG_CONT handling from printk() callers, storing all LOG_CONT
parts individually in the ringbuffer. With this series, LOG_CONT
handling is moved to the ringbuffer readers that provide the record
contents to users (console printing, syslog, /dev/kmsg).

This change is necessary in order to support the upcoming move to a
fully lockless printk() implementation.

This series is in line with the agreements [0] made at the meeting
during LPC2019 in Lisbon, with 1 exception: For the /dev/kmsg
interface, empty line placeholder records are reported for the
LOG_CONT parts.

Using placeholders avoids tools such as systemd-journald from
erroneously reporting missed messages. However, it also means that
empty placeholder records are visible in systemd-journald logs and
displayed in tools such as dmesg.

The effect can be easily observed with the sysrq help:

$ echo h | sudo tee /proc/sysrq-trigger
$ sudo dmesg | tail -n 30
$ sudo journalctl -k -n 30

Providing the placeholder entries allows a userspace tool to identify
if records were actually lost. IMHO this an important feature. Its
side effect can be addressed by userspace tools if they change to
silently consume empty records.

For dump tools that process the ringbuffer directly (such as crash,
makedumpfile, kexec-tools), they will need to implement LOG_CONT
handling if they want to present clean continuous line messages.

Finally, by moving LOG_CONT handling from writers to readers, some
incorrect pr_cont() usage is revealed. Patch 4 of this series
addresses one such example.

This series is based on the printk git tree [1] printk-rework branch.

[0] https://lkml.kernel.org/r/87k1acz5rx.fsf@linutronix.de
[1] https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git (printk-rework branch)

John Ogness (4):
  printk: ringbuffer: support dataless records
  printk: store instead of processing cont parts
  printk: process cont records during reading
  ipconfig: cleanup printk usage

 kernel/printk/printk.c            | 569 ++++++++++++++++++++----------
 kernel/printk/printk_ringbuffer.c |  58 ++-
 kernel/printk/printk_ringbuffer.h |  15 +-
 net/ipv4/ipconfig.c               |  25 +-
 4 files changed, 434 insertions(+), 233 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-07-17 23:48 ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

Hello,

Here is the second series to rework the printk subsystem. This series
removes LOG_CONT handling from printk() callers, storing all LOG_CONT
parts individually in the ringbuffer. With this series, LOG_CONT
handling is moved to the ringbuffer readers that provide the record
contents to users (console printing, syslog, /dev/kmsg).

This change is necessary in order to support the upcoming move to a
fully lockless printk() implementation.

This series is in line with the agreements [0] made at the meeting
during LPC2019 in Lisbon, with 1 exception: For the /dev/kmsg
interface, empty line placeholder records are reported for the
LOG_CONT parts.

Using placeholders avoids tools such as systemd-journald from
erroneously reporting missed messages. However, it also means that
empty placeholder records are visible in systemd-journald logs and
displayed in tools such as dmesg.

The effect can be easily observed with the sysrq help:

$ echo h | sudo tee /proc/sysrq-trigger
$ sudo dmesg | tail -n 30
$ sudo journalctl -k -n 30

Providing the placeholder entries allows a userspace tool to identify
if records were actually lost. IMHO this an important feature. Its
side effect can be addressed by userspace tools if they change to
silently consume empty records.

For dump tools that process the ringbuffer directly (such as crash,
makedumpfile, kexec-tools), they will need to implement LOG_CONT
handling if they want to present clean continuous line messages.

Finally, by moving LOG_CONT handling from writers to readers, some
incorrect pr_cont() usage is revealed. Patch 4 of this series
addresses one such example.

This series is based on the printk git tree [1] printk-rework branch.

[0] https://lkml.kernel.org/r/87k1acz5rx.fsf@linutronix.de
[1] https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git (printk-rework branch)

John Ogness (4):
  printk: ringbuffer: support dataless records
  printk: store instead of processing cont parts
  printk: process cont records during reading
  ipconfig: cleanup printk usage

 kernel/printk/printk.c            | 569 ++++++++++++++++++++----------
 kernel/printk/printk_ringbuffer.c |  58 ++-
 kernel/printk/printk_ringbuffer.h |  15 +-
 net/ipv4/ipconfig.c               |  25 +-
 4 files changed, 434 insertions(+), 233 deletions(-)

-- 
2.20.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 1/4] printk: ringbuffer: support dataless records
  2020-07-17 23:48 ` John Ogness
@ 2020-07-17 23:48   ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, linux-kernel

In order to support storage of continuous lines, dataless records must
be allowed. For example, these are generated with the legal calls:

    pr_info("");
    pr_cont("\n");

Currently dataless records are denoted by INVALID_LPOS in order to
recognize failed prb_reserve() calls. Change the code to use two
different identifiers (FAILED_LPOS and NO_LPOS) to distinguish
between failed prb_reserve() records and successful dataless records.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk_ringbuffer.c | 58 ++++++++++++++-----------------
 kernel/printk/printk_ringbuffer.h | 15 ++++----
 2 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 7355ca99e852..54b0a6324dbf 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -264,6 +264,9 @@
 /* Determine how many times the data array has wrapped. */
 #define DATA_WRAPS(data_ring, lpos)	((lpos) >> (data_ring)->size_bits)
 
+/* Determine if a logical position refers to a data-less block. */
+#define LPOS_DATALESS(lpos)		((lpos) & 1UL)
+
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
 ((lpos) & ~DATA_SIZE_MASK(data_ring))
@@ -320,21 +323,13 @@ static unsigned int to_blk_size(unsigned int size)
  * block does not exceed the maximum possible size that could fit within the
  * ringbuffer. This function provides that basic size check so that the
  * assumption is safe.
- *
- * Writers are also not allowed to write 0-sized (data-less) records. Such
- * records are used only internally by the ringbuffer.
  */
 static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
 {
 	struct prb_data_block *db = NULL;
 
-	/*
-	 * Writers are not allowed to write data-less records. Such records
-	 * are used only internally by the ringbuffer to denote records where
-	 * their data failed to allocate or have been lost.
-	 */
 	if (size == 0)
-		return false;
+		return true;
 
 	/*
 	 * Ensure the alignment padded size could possibly fit in the data
@@ -568,8 +563,8 @@ static bool data_push_tail(struct printk_ringbuffer *rb,
 	unsigned long tail_lpos;
 	unsigned long next_lpos;
 
-	/* If @lpos is not valid, there is nothing to do. */
-	if (lpos == INVALID_LPOS)
+	/* If @lpos is from a data-less block, there is nothing to do. */
+	if (LPOS_DATALESS(lpos))
 		return true;
 
 	/*
@@ -962,8 +957,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
 	if (size == 0) {
 		/* Specify a data-less block. */
-		blk_lpos->begin = INVALID_LPOS;
-		blk_lpos->next = INVALID_LPOS;
+		blk_lpos->begin = NO_LPOS;
+		blk_lpos->next = NO_LPOS;
 		return NULL;
 	}
 
@@ -976,8 +971,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
 		if (!data_push_tail(rb, data_ring, next_lpos - DATA_SIZE(data_ring))) {
 			/* Failed to allocate, specify a data-less block. */
-			blk_lpos->begin = INVALID_LPOS;
-			blk_lpos->next = INVALID_LPOS;
+			blk_lpos->begin = FAILED_LPOS;
+			blk_lpos->next = FAILED_LPOS;
 			return NULL;
 		}
 
@@ -1025,6 +1020,10 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 static unsigned int space_used(struct prb_data_ring *data_ring,
 			       struct prb_data_blk_lpos *blk_lpos)
 {
+	/* Data-less blocks take no space. */
+	if (LPOS_DATALESS(blk_lpos->begin))
+		return 0;
+
 	if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next)) {
 		/* Data block does not wrap. */
 		return (DATA_INDEX(data_ring, blk_lpos->next) -
@@ -1080,11 +1079,8 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 	if (!data_check_size(&rb->text_data_ring, r->text_buf_size))
 		goto fail;
 
-	/* Records are allowed to not have dictionaries. */
-	if (r->dict_buf_size) {
-		if (!data_check_size(&rb->dict_data_ring, r->dict_buf_size))
-			goto fail;
-	}
+	if (!data_check_size(&rb->dict_data_ring, r->dict_buf_size))
+		goto fail;
 
 	/*
 	 * Descriptors in the reserved state act as blockers to all further
@@ -1212,10 +1208,8 @@ static char *get_data(struct prb_data_ring *data_ring,
 	struct prb_data_block *db;
 
 	/* Data-less data block description. */
-	if (blk_lpos->begin == INVALID_LPOS &&
-	    blk_lpos->next == INVALID_LPOS) {
+	if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next))
 		return NULL;
-	}
 
 	/* Regular data block: @begin less than @next and in same wrap. */
 	if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next) &&
@@ -1355,11 +1349,11 @@ static int desc_read_committed_seq(struct prb_desc_ring *desc_ring,
 
 	/*
 	 * A descriptor in the reusable state may no longer have its data
-	 * available; report it as a data-less record. Or the record may
-	 * actually be a data-less record.
+	 * available; report it as existing but with lost data. Or the record
+	 * may actually be a record with lost data.
 	 */
 	if (d_state == desc_reusable ||
-	    (blk_lpos->begin == INVALID_LPOS && blk_lpos->next == INVALID_LPOS)) {
+	    (blk_lpos->begin == FAILED_LPOS && blk_lpos->next == FAILED_LPOS)) {
 		return -ENOENT;
 	}
 
@@ -1402,7 +1396,9 @@ static int prb_read(struct printk_ringbuffer *rb, u64 seq,
 	/* Copy text data. If it fails, this is a data-less record. */
 	if (!copy_data(&rb->text_data_ring, &desc.text_blk_lpos, desc.info.text_len,
 		       r->text_buf, r->text_buf_size, line_count)) {
-		return -ENOENT;
+		/* Report an error if there should have been data. */
+		if (desc.info.text_len != 0)
+			return -ENOENT;
 	}
 
 	/*
@@ -1659,10 +1655,10 @@ void prb_init(struct printk_ringbuffer *rb,
 
 	descs[_DESCS_COUNT(descbits) - 1].info.seq = 0;
 	atomic_long_set(&(descs[_DESCS_COUNT(descbits) - 1].state_var), DESC0_SV(descbits));
-	descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.begin = INVALID_LPOS;
-	descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.next = INVALID_LPOS;
-	descs[_DESCS_COUNT(descbits) - 1].dict_blk_lpos.begin = INVALID_LPOS;
-	descs[_DESCS_COUNT(descbits) - 1].dict_blk_lpos.next = INVALID_LPOS;
+	descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.begin = FAILED_LPOS;
+	descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.next = FAILED_LPOS;
+	descs[_DESCS_COUNT(descbits) - 1].dict_blk_lpos.begin = FAILED_LPOS;
+	descs[_DESCS_COUNT(descbits) - 1].dict_blk_lpos.next = FAILED_LPOS;
 }
 
 /**
diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index 3e46a7423c13..e6302da041f9 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -120,12 +120,13 @@ struct prb_reserved_entry {
 #define DESC_FLAGS_MASK			(DESC_COMMITTED_MASK | DESC_REUSE_MASK)
 #define DESC_ID_MASK			(~DESC_FLAGS_MASK)
 #define DESC_ID(sv)			((sv) & DESC_ID_MASK)
-#define INVALID_LPOS			1
+#define FAILED_LPOS			0x1
+#define NO_LPOS				0x3
 
-#define INVALID_BLK_LPOS	\
+#define FAILED_BLK_LPOS	\
 {				\
-	.begin	= INVALID_LPOS,	\
-	.next	= INVALID_LPOS,	\
+	.begin	= FAILED_LPOS,	\
+	.next	= FAILED_LPOS,	\
 }
 
 /*
@@ -147,7 +148,7 @@ struct prb_reserved_entry {
  *
  * To satisfy Req1, the tail initially points to a descriptor that is
  * minimally initialized (having no data block, i.e. data-less with the
- * data block's lpos @begin and @next values set to INVALID_LPOS).
+ * data block's lpos @begin and @next values set to FAILED_LPOS).
  *
  * To satisfy Req2, the initial tail descriptor is initialized to the
  * reusable state. Readers recognize reusable descriptors as existing
@@ -242,8 +243,8 @@ static struct prb_desc _##name##_descs[_DESCS_COUNT(descbits)] = {				\
 		/* reusable */									\
 		.state_var	= ATOMIC_INIT(DESC0_SV(descbits)),				\
 		/* no associated data block */							\
-		.text_blk_lpos	= INVALID_BLK_LPOS,						\
-		.dict_blk_lpos	= INVALID_BLK_LPOS,						\
+		.text_blk_lpos	= FAILED_BLK_LPOS,						\
+		.dict_blk_lpos	= FAILED_BLK_LPOS,						\
 	},											\
 };												\
 static struct printk_ringbuffer name = {							\
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 1/4] printk: ringbuffer: support dataless records
@ 2020-07-17 23:48   ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

In order to support storage of continuous lines, dataless records must
be allowed. For example, these are generated with the legal calls:

    pr_info("");
    pr_cont("\n");

Currently dataless records are denoted by INVALID_LPOS in order to
recognize failed prb_reserve() calls. Change the code to use two
different identifiers (FAILED_LPOS and NO_LPOS) to distinguish
between failed prb_reserve() records and successful dataless records.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk_ringbuffer.c | 58 ++++++++++++++-----------------
 kernel/printk/printk_ringbuffer.h | 15 ++++----
 2 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 7355ca99e852..54b0a6324dbf 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -264,6 +264,9 @@
 /* Determine how many times the data array has wrapped. */
 #define DATA_WRAPS(data_ring, lpos)	((lpos) >> (data_ring)->size_bits)
 
+/* Determine if a logical position refers to a data-less block. */
+#define LPOS_DATALESS(lpos)		((lpos) & 1UL)
+
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
 ((lpos) & ~DATA_SIZE_MASK(data_ring))
@@ -320,21 +323,13 @@ static unsigned int to_blk_size(unsigned int size)
  * block does not exceed the maximum possible size that could fit within the
  * ringbuffer. This function provides that basic size check so that the
  * assumption is safe.
- *
- * Writers are also not allowed to write 0-sized (data-less) records. Such
- * records are used only internally by the ringbuffer.
  */
 static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
 {
 	struct prb_data_block *db = NULL;
 
-	/*
-	 * Writers are not allowed to write data-less records. Such records
-	 * are used only internally by the ringbuffer to denote records where
-	 * their data failed to allocate or have been lost.
-	 */
 	if (size == 0)
-		return false;
+		return true;
 
 	/*
 	 * Ensure the alignment padded size could possibly fit in the data
@@ -568,8 +563,8 @@ static bool data_push_tail(struct printk_ringbuffer *rb,
 	unsigned long tail_lpos;
 	unsigned long next_lpos;
 
-	/* If @lpos is not valid, there is nothing to do. */
-	if (lpos == INVALID_LPOS)
+	/* If @lpos is from a data-less block, there is nothing to do. */
+	if (LPOS_DATALESS(lpos))
 		return true;
 
 	/*
@@ -962,8 +957,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
 	if (size == 0) {
 		/* Specify a data-less block. */
-		blk_lpos->begin = INVALID_LPOS;
-		blk_lpos->next = INVALID_LPOS;
+		blk_lpos->begin = NO_LPOS;
+		blk_lpos->next = NO_LPOS;
 		return NULL;
 	}
 
@@ -976,8 +971,8 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 
 		if (!data_push_tail(rb, data_ring, next_lpos - DATA_SIZE(data_ring))) {
 			/* Failed to allocate, specify a data-less block. */
-			blk_lpos->begin = INVALID_LPOS;
-			blk_lpos->next = INVALID_LPOS;
+			blk_lpos->begin = FAILED_LPOS;
+			blk_lpos->next = FAILED_LPOS;
 			return NULL;
 		}
 
@@ -1025,6 +1020,10 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 static unsigned int space_used(struct prb_data_ring *data_ring,
 			       struct prb_data_blk_lpos *blk_lpos)
 {
+	/* Data-less blocks take no space. */
+	if (LPOS_DATALESS(blk_lpos->begin))
+		return 0;
+
 	if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next)) {
 		/* Data block does not wrap. */
 		return (DATA_INDEX(data_ring, blk_lpos->next) -
@@ -1080,11 +1079,8 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 	if (!data_check_size(&rb->text_data_ring, r->text_buf_size))
 		goto fail;
 
-	/* Records are allowed to not have dictionaries. */
-	if (r->dict_buf_size) {
-		if (!data_check_size(&rb->dict_data_ring, r->dict_buf_size))
-			goto fail;
-	}
+	if (!data_check_size(&rb->dict_data_ring, r->dict_buf_size))
+		goto fail;
 
 	/*
 	 * Descriptors in the reserved state act as blockers to all further
@@ -1212,10 +1208,8 @@ static char *get_data(struct prb_data_ring *data_ring,
 	struct prb_data_block *db;
 
 	/* Data-less data block description. */
-	if (blk_lpos->begin == INVALID_LPOS &&
-	    blk_lpos->next == INVALID_LPOS) {
+	if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next))
 		return NULL;
-	}
 
 	/* Regular data block: @begin less than @next and in same wrap. */
 	if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next) &&
@@ -1355,11 +1349,11 @@ static int desc_read_committed_seq(struct prb_desc_ring *desc_ring,
 
 	/*
 	 * A descriptor in the reusable state may no longer have its data
-	 * available; report it as a data-less record. Or the record may
-	 * actually be a data-less record.
+	 * available; report it as existing but with lost data. Or the record
+	 * may actually be a record with lost data.
 	 */
 	if (d_state == desc_reusable ||
-	    (blk_lpos->begin == INVALID_LPOS && blk_lpos->next == INVALID_LPOS)) {
+	    (blk_lpos->begin == FAILED_LPOS && blk_lpos->next == FAILED_LPOS)) {
 		return -ENOENT;
 	}
 
@@ -1402,7 +1396,9 @@ static int prb_read(struct printk_ringbuffer *rb, u64 seq,
 	/* Copy text data. If it fails, this is a data-less record. */
 	if (!copy_data(&rb->text_data_ring, &desc.text_blk_lpos, desc.info.text_len,
 		       r->text_buf, r->text_buf_size, line_count)) {
-		return -ENOENT;
+		/* Report an error if there should have been data. */
+		if (desc.info.text_len != 0)
+			return -ENOENT;
 	}
 
 	/*
@@ -1659,10 +1655,10 @@ void prb_init(struct printk_ringbuffer *rb,
 
 	descs[_DESCS_COUNT(descbits) - 1].info.seq = 0;
 	atomic_long_set(&(descs[_DESCS_COUNT(descbits) - 1].state_var), DESC0_SV(descbits));
-	descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.begin = INVALID_LPOS;
-	descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.next = INVALID_LPOS;
-	descs[_DESCS_COUNT(descbits) - 1].dict_blk_lpos.begin = INVALID_LPOS;
-	descs[_DESCS_COUNT(descbits) - 1].dict_blk_lpos.next = INVALID_LPOS;
+	descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.begin = FAILED_LPOS;
+	descs[_DESCS_COUNT(descbits) - 1].text_blk_lpos.next = FAILED_LPOS;
+	descs[_DESCS_COUNT(descbits) - 1].dict_blk_lpos.begin = FAILED_LPOS;
+	descs[_DESCS_COUNT(descbits) - 1].dict_blk_lpos.next = FAILED_LPOS;
 }
 
 /**
diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index 3e46a7423c13..e6302da041f9 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -120,12 +120,13 @@ struct prb_reserved_entry {
 #define DESC_FLAGS_MASK			(DESC_COMMITTED_MASK | DESC_REUSE_MASK)
 #define DESC_ID_MASK			(~DESC_FLAGS_MASK)
 #define DESC_ID(sv)			((sv) & DESC_ID_MASK)
-#define INVALID_LPOS			1
+#define FAILED_LPOS			0x1
+#define NO_LPOS				0x3
 
-#define INVALID_BLK_LPOS	\
+#define FAILED_BLK_LPOS	\
 {				\
-	.begin	= INVALID_LPOS,	\
-	.next	= INVALID_LPOS,	\
+	.begin	= FAILED_LPOS,	\
+	.next	= FAILED_LPOS,	\
 }
 
 /*
@@ -147,7 +148,7 @@ struct prb_reserved_entry {
  *
  * To satisfy Req1, the tail initially points to a descriptor that is
  * minimally initialized (having no data block, i.e. data-less with the
- * data block's lpos @begin and @next values set to INVALID_LPOS).
+ * data block's lpos @begin and @next values set to FAILED_LPOS).
  *
  * To satisfy Req2, the initial tail descriptor is initialized to the
  * reusable state. Readers recognize reusable descriptors as existing
@@ -242,8 +243,8 @@ static struct prb_desc _##name##_descs[_DESCS_COUNT(descbits)] = {				\
 		/* reusable */									\
 		.state_var	= ATOMIC_INIT(DESC0_SV(descbits)),				\
 		/* no associated data block */							\
-		.text_blk_lpos	= INVALID_BLK_LPOS,						\
-		.dict_blk_lpos	= INVALID_BLK_LPOS,						\
+		.text_blk_lpos	= FAILED_BLK_LPOS,						\
+		.dict_blk_lpos	= FAILED_BLK_LPOS,						\
 	},											\
 };												\
 static struct printk_ringbuffer name = {							\
-- 
2.20.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-17 23:48 ` John Ogness
@ 2020-07-17 23:48   ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, linux-kernel

Instead of buffering continuous line parts before storing the full
line into the ringbuffer, store each part as its own record.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 114 ++++-------------------------------------
 1 file changed, 11 insertions(+), 103 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fec71229169e..c4274c867771 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -495,9 +495,14 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len)
 		*trunc_msg_len = 0;
 }
 
+static inline u32 printk_caller_id(void)
+{
+	return in_task() ? task_pid_nr(current) :
+		0x80000000 + raw_smp_processor_id();
+}
+
 /* insert record into the buffer, discard old ones, update heads */
-static int log_store(u32 caller_id, int facility, int level,
-		     enum log_flags flags, u64 ts_nsec,
+static int log_store(int facility, int level, enum log_flags flags,
 		     const char *dict, u16 dict_len,
 		     const char *text, u16 text_len)
 {
@@ -525,11 +530,8 @@ static int log_store(u32 caller_id, int facility, int level,
 	r.info->facility = facility;
 	r.info->level = level & 7;
 	r.info->flags = flags & 0x1f;
-	if (ts_nsec > 0)
-		r.info->ts_nsec = ts_nsec;
-	else
-		r.info->ts_nsec = local_clock();
-	r.info->caller_id = caller_id;
+	r.info->ts_nsec = local_clock();
+	r.info->caller_id = printk_caller_id();
 
 	/* insert message */
 	prb_commit(&e);
@@ -1874,100 +1876,6 @@ static inline void printk_delay(void)
 	}
 }
 
-static inline u32 printk_caller_id(void)
-{
-	return in_task() ? task_pid_nr(current) :
-		0x80000000 + raw_smp_processor_id();
-}
-
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-	char buf[LOG_LINE_MAX];
-	size_t len;			/* length == 0 means unused buffer */
-	u32 caller_id;			/* printk_caller_id() of first print */
-	u64 ts_nsec;			/* time of first print */
-	u8 level;			/* log level of first message */
-	u8 facility;			/* log facility of first message */
-	enum log_flags flags;		/* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-	if (cont.len == 0)
-		return;
-
-	log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
-		  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-	cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-		     enum log_flags flags, const char *text, size_t len)
-{
-	/* If the line gets too long, split it up in separate records. */
-	if (cont.len + len > sizeof(cont.buf)) {
-		cont_flush();
-		return false;
-	}
-
-	if (!cont.len) {
-		cont.facility = facility;
-		cont.level = level;
-		cont.caller_id = caller_id;
-		cont.ts_nsec = local_clock();
-		cont.flags = flags;
-	}
-
-	memcpy(cont.buf + cont.len, text, len);
-	cont.len += len;
-
-	// The original flags come from the first line,
-	// but later continuations can add a newline.
-	if (flags & LOG_NEWLINE) {
-		cont.flags |= LOG_NEWLINE;
-		cont_flush();
-	}
-
-	return true;
-}
-
-static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len)
-{
-	const u32 caller_id = printk_caller_id();
-
-	/*
-	 * If an earlier line was buffered, and we're a continuation
-	 * write from the same context, try to add it to the buffer.
-	 */
-	if (cont.len) {
-		if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-			if (cont_add(caller_id, facility, level, lflags, text, text_len))
-				return text_len;
-		}
-		/* Otherwise, make sure it's flushed */
-		cont_flush();
-	}
-
-	/* Skip empty continuation lines that couldn't be added - they just flush */
-	if (!text_len && (lflags & LOG_CONT))
-		return 0;
-
-	/* If it doesn't end in a newline, try to buffer the current line */
-	if (!(lflags & LOG_NEWLINE)) {
-		if (cont_add(caller_id, facility, level, lflags, text, text_len))
-			return text_len;
-	}
-
-	/* Store it in the record log */
-	return log_store(caller_id, facility, level, lflags, 0,
-			 dict, dictlen, text, text_len);
-}
-
 /* Must be called under logbuf_lock. */
 int vprintk_store(int facility, int level,
 		  const char *dict, size_t dictlen,
@@ -2015,8 +1923,8 @@ int vprintk_store(int facility, int level,
 	if (dict)
 		lflags |= LOG_NEWLINE;
 
-	return log_output(facility, level, lflags,
-			  dict, dictlen, text, text_len);
+	return log_store(facility, level, lflags, dict, dictlen,
+			 text, text_len);
 }
 
 asmlinkage int vprintk_emit(int facility, int level,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-17 23:48   ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

Instead of buffering continuous line parts before storing the full
line into the ringbuffer, store each part as its own record.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 114 ++++-------------------------------------
 1 file changed, 11 insertions(+), 103 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fec71229169e..c4274c867771 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -495,9 +495,14 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len)
 		*trunc_msg_len = 0;
 }
 
+static inline u32 printk_caller_id(void)
+{
+	return in_task() ? task_pid_nr(current) :
+		0x80000000 + raw_smp_processor_id();
+}
+
 /* insert record into the buffer, discard old ones, update heads */
-static int log_store(u32 caller_id, int facility, int level,
-		     enum log_flags flags, u64 ts_nsec,
+static int log_store(int facility, int level, enum log_flags flags,
 		     const char *dict, u16 dict_len,
 		     const char *text, u16 text_len)
 {
@@ -525,11 +530,8 @@ static int log_store(u32 caller_id, int facility, int level,
 	r.info->facility = facility;
 	r.info->level = level & 7;
 	r.info->flags = flags & 0x1f;
-	if (ts_nsec > 0)
-		r.info->ts_nsec = ts_nsec;
-	else
-		r.info->ts_nsec = local_clock();
-	r.info->caller_id = caller_id;
+	r.info->ts_nsec = local_clock();
+	r.info->caller_id = printk_caller_id();
 
 	/* insert message */
 	prb_commit(&e);
@@ -1874,100 +1876,6 @@ static inline void printk_delay(void)
 	}
 }
 
-static inline u32 printk_caller_id(void)
-{
-	return in_task() ? task_pid_nr(current) :
-		0x80000000 + raw_smp_processor_id();
-}
-
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-	char buf[LOG_LINE_MAX];
-	size_t len;			/* length == 0 means unused buffer */
-	u32 caller_id;			/* printk_caller_id() of first print */
-	u64 ts_nsec;			/* time of first print */
-	u8 level;			/* log level of first message */
-	u8 facility;			/* log facility of first message */
-	enum log_flags flags;		/* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-	if (cont.len == 0)
-		return;
-
-	log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
-		  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-	cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-		     enum log_flags flags, const char *text, size_t len)
-{
-	/* If the line gets too long, split it up in separate records. */
-	if (cont.len + len > sizeof(cont.buf)) {
-		cont_flush();
-		return false;
-	}
-
-	if (!cont.len) {
-		cont.facility = facility;
-		cont.level = level;
-		cont.caller_id = caller_id;
-		cont.ts_nsec = local_clock();
-		cont.flags = flags;
-	}
-
-	memcpy(cont.buf + cont.len, text, len);
-	cont.len += len;
-
-	// The original flags come from the first line,
-	// but later continuations can add a newline.
-	if (flags & LOG_NEWLINE) {
-		cont.flags |= LOG_NEWLINE;
-		cont_flush();
-	}
-
-	return true;
-}
-
-static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len)
-{
-	const u32 caller_id = printk_caller_id();
-
-	/*
-	 * If an earlier line was buffered, and we're a continuation
-	 * write from the same context, try to add it to the buffer.
-	 */
-	if (cont.len) {
-		if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-			if (cont_add(caller_id, facility, level, lflags, text, text_len))
-				return text_len;
-		}
-		/* Otherwise, make sure it's flushed */
-		cont_flush();
-	}
-
-	/* Skip empty continuation lines that couldn't be added - they just flush */
-	if (!text_len && (lflags & LOG_CONT))
-		return 0;
-
-	/* If it doesn't end in a newline, try to buffer the current line */
-	if (!(lflags & LOG_NEWLINE)) {
-		if (cont_add(caller_id, facility, level, lflags, text, text_len))
-			return text_len;
-	}
-
-	/* Store it in the record log */
-	return log_store(caller_id, facility, level, lflags, 0,
-			 dict, dictlen, text, text_len);
-}
-
 /* Must be called under logbuf_lock. */
 int vprintk_store(int facility, int level,
 		  const char *dict, size_t dictlen,
@@ -2015,8 +1923,8 @@ int vprintk_store(int facility, int level,
 	if (dict)
 		lflags |= LOG_NEWLINE;
 
-	return log_output(facility, level, lflags,
-			  dict, dictlen, text, text_len);
+	return log_store(facility, level, lflags, dict, dictlen,
+			 text, text_len);
 }
 
 asmlinkage int vprintk_emit(int facility, int level,
-- 
2.20.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 3/4] printk: process cont records during reading
  2020-07-17 23:48 ` John Ogness
@ 2020-07-17 23:48   ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, linux-kernel

Readers of the printk ringbuffer can use the continuous line interface
to read full lines. The interface buffers continuous line parts until
the full line is available or that line was interrupted by a writer
from another context.

The continuous line interface automatically throws out partial lines if
a reader jumps to older sequence numbers. If a reader jumps to higher
sequence numbers, any cached partial lines are flushed.

The continuous line interface is used by:

  - console printing
  - syslog
  - devkmsg

devkmsg has the additional requirement that it must show a line for
every sequence number if the corresponding continuous line record was
not dropped. The continuous line interface supports this by allowing
the reader to provide a printk_record struct that will be filled in
with placeholder information (but no text) in case a full line is not
yet available.

Note that kmsg_dump does not use the continuous line interface.

The continuous line interface discards dictionaries of continuous lines.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 455 +++++++++++++++++++++++++++++++++--------
 1 file changed, 371 insertions(+), 84 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c4274c867771..363ef290f313 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -657,6 +657,287 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 	return p - buf;
 }
 
+/*
+ * Readers of the printk ringbuffer can use the continuous line interface
+ * to read full lines. The interface buffers continuous line parts until
+ * the full line is available or that line was interrupted by a writer
+ * from another context.
+ *
+ * The continuous line interface automatically throws out partial lines if a
+ * reader jumps to older sequence numbers. If a reader jumps to higher
+ * sequence numbers, any cached partial lines are flushed.
+ *
+ * The continuous line interface is used by:
+ *
+ *   - console printing
+ *   - syslog
+ *   - devkmsg
+ *
+ * devkmsg has the additional requirement that it must show a line for every
+ * sequence number if the corresponding continuous line record was not dropped.
+ * The continuous line interface supports this by allowing the reader to
+ * provide a printk_record struct that will be filled in with placeholder
+ * information (but no text) in case a full line is not yet available.
+ *
+ * Note that kmsg_dump does not use the continuous line interface.
+ *
+ * The continuous line interface discards dictionaries of continuous lines.
+ */
+
+struct cont_record {
+	struct printk_record	r;
+	struct printk_info	info;
+	char			text[LOG_LINE_MAX + PREFIX_MAX];
+	bool			set;
+};
+
+/*
+ * The continuous line buffer manager.
+ *
+ * @cr:        record buffers for reading and caching continuous lines
+ * @dict:      the dictionary used when reading a record
+ * @cache_ind: index of the cache record in @cr
+ * @begin_seq: the minimal sequence number of the current continuous line
+ * @end_seq:   the maximal sequence number of the current continuous line
+ * @dropped:   count of dropped records during the current continuous line
+ */
+struct cont {
+	struct cont_record	cr[2];
+	char			dict[LOG_LINE_MAX];
+	int			cache_ind;
+	u64			begin_seq;
+	u64			end_seq;
+	unsigned long		dropped;
+};
+
+/*
+ * Initialize the continuous line manager. As an alternative, it is also
+ * acceptable if the structure is set to all zeros.
+ */
+static void cont_init(struct cont *c, u64 seq)
+{
+	c->cr[0].set = false;
+	c->cr[1].set = false;
+	c->cache_ind = 0;
+	c->begin_seq = seq;
+	c->end_seq = seq;
+	c->dropped = 0;
+}
+
+/* Get the continuous line cache, if one exists. */
+static struct printk_record *cont_cache(struct cont *c)
+{
+	struct cont_record *cr = &c->cr[c->cache_ind];
+
+	if (!cr->set)
+		return NULL;
+	return &cr->r;
+}
+
+/*
+ * Like cont_cache(), but also flushes the dropped count, clears the
+ * dictionary, and switches to the other record buffer for future caching.
+ */
+static struct printk_record *cont_flush(struct cont *c, unsigned long *dropped)
+{
+	struct cont_record *cr = &c->cr[c->cache_ind];
+
+	c->cache_ind ^= 1;
+
+	if (!cr->set)
+		return NULL;
+
+	if (dropped)
+		*dropped = c->dropped;
+	c->dropped = 0;
+
+	c->begin_seq = cr->info.seq;
+	cr->info.dict_len = 0;
+	cr->set = false;
+
+	return &cr->r;
+}
+
+/*
+ * Wrapper for prb_read_valid() that reads a new record into the
+ * non-caching record buffer.
+ */
+static struct printk_record *cont_read(struct cont *c, u64 seq)
+{
+	struct cont_record *cr = &c->cr[c->cache_ind ^ 1];
+	struct printk_record *r = &cr->r;
+
+	prb_rec_init_rd(r, &cr->info, cr->text, sizeof(cr->text),
+			c->dict, sizeof(c->dict));
+
+	if (!prb_read_valid(prb, seq, r))
+		return NULL;
+
+	cr->set = true;
+
+	/*
+	 * After a successful read, the reader is allowed to request the
+	 * next sequential record.
+	 */
+	c->end_seq = r->info->seq + 1;
+
+	return r;
+}
+
+/*
+ * Add a record to the cache. If there is no cache, @new becomes the cache.
+ * Note that @new must be a record returned by cont_read().
+ */
+static bool cont_add(struct cont *c, struct printk_record *new)
+{
+	struct printk_record *cache;
+
+	cache = cont_cache(c);
+	if (!cache) {
+		/* No cache? This record is now the cache. */
+		c->cache_ind ^= 1;
+		return true;
+	}
+
+	/*
+	 * If an earlier line was buffered, and this is a continuation
+	 * write from the same context, try to add it to the buffer.
+	 */
+
+	if (new->info->caller_id != cache->info->caller_id)
+		return false;
+
+	if (!(new->info->flags & LOG_CONT))
+		return false;
+
+	/* If the line gets too long, split it up in separate records. */
+	if (cache->info->text_len + new->info->text_len > cache->text_buf_size)
+		return false;
+
+	memcpy(cache->text_buf + cache->info->text_len,
+	       new->text_buf, new->info->text_len);
+
+	cache->info->text_len += new->info->text_len;
+
+	/*
+	 * The original flags come from the first line,
+	 * but later continuations can add a newline.
+	 */
+	if (new->info->flags & LOG_NEWLINE)
+		cache->info->flags |= LOG_NEWLINE;
+
+	/* The cache always has the sequence number of its newest part. */
+	cache->info->seq = new->info->seq;
+
+	/* Data copied, the non-cache buffer now available for reuse. */
+	container_of(new, struct cont_record, r)->set = false;
+
+	return true;
+}
+
+/*
+ * Get the next continuous full line.
+ *
+ * Optionally the reader can provide @r to be used to store placeholder
+ * information in case all read sequence numbers need to be shown. Only the
+ * meta-data of @r will be set and the fields @text_len and @dict_len will
+ * be set to 0. So @r does not require text and dictionary buffers.
+ *
+ * If @r is provided and no continuous full line is available, @r is filled
+ * and @r is returned (rather than returning an internal buffer of @c).
+ */
+static struct printk_record *next_record(struct cont *c, u64 seq,
+					 unsigned long *dropped,
+					 struct printk_record *r)
+{
+	struct printk_record *cache;
+	struct printk_record *new;
+
+	if (dropped)
+		*dropped = 0;
+
+	cache = cont_cache(c);
+
+	/*
+	 * Re-initialize @c if the reader is jumping back or forward.
+	 * For the forward case, flush any existing cache first.
+	 */
+	if (seq < c->begin_seq) {
+		cont_init(c, seq);
+	} else if (seq > c->end_seq) {
+		if (cache)
+			goto flush;
+		else
+			cont_init(c, seq);
+	}
+
+	for (;;) {
+		/* If a full line is already cached, no need to read. */
+		if (cache && (cache->info->flags & LOG_NEWLINE))
+			goto flush;
+
+		/*
+		 * The reader is not aware that some records may be cached
+		 * already. Always read the next sequential record beyond
+		 * the cache contents.
+		 */
+		seq = c->end_seq;
+
+		new = cont_read(c, seq);
+		if (!new)
+			return NULL;
+
+		if (new->info->seq > seq) {
+			/* Some records were lost. */
+			c->dropped += new->info->seq - seq;
+			if (cache)
+				goto flush;
+		}
+
+		if (r && cache) {
+			/*
+			 * The caller wants a placeholder in case there is no
+			 * full line ready. Copy the cache meta-data now in
+			 * case the current record is added to the cache.
+			 */
+			memcpy(r->info, cache->info, sizeof(*(r->info)));
+			r->info->text_len = 0;
+			r->info->dict_len = 0;
+			r->info->level = 7;
+			r->info->flags = LOG_CONT | LOG_NEWLINE;
+		}
+
+		if (!cont_add(c, new))
+			goto flush;
+
+		if (r && cache) {
+			/*
+			 * The record was added to an existing cache and the
+			 * caller wants a place holder.
+			 */
+			return r;
+		}
+
+		cache = cont_cache(c);
+	}
+flush:
+	return cont_flush(c, dropped);
+}
+
+/*
+ * A continuous line variant of prb_for_each_record().
+ *
+ * @from: The sequence number to begin with.
+ * @c:    A continuous line manager to use.
+ * @s:    A u64 to store the sequence number on each iteration.
+ * @r:    A printk_record to store the record on each iteration.
+ * @d:    The @dropped argument for next_record().
+ */
+#define cont_for_each_record(from, c, s, r, d) \
+for ((s) = from, cont_init(c, s); \
+	((r) = next_record(c, s, d, NULL)); \
+	(s) = (r)->info->seq + 1)
+
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
 	u64 seq;
@@ -664,10 +945,9 @@ struct devkmsg_user {
 	struct mutex lock;
 	char buf[CONSOLE_EXT_LOG_MAX];
 
+	struct cont cont;
+	struct printk_record r;
 	struct printk_info info;
-	char text_buf[CONSOLE_EXT_LOG_MAX];
-	char dict_buf[CONSOLE_EXT_LOG_MAX];
-	struct printk_record record;
 };
 
 static __printf(3, 4) __cold
@@ -750,7 +1030,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
 	struct devkmsg_user *user = file->private_data;
-	struct printk_record *r = &user->record;
+	struct printk_record *r;
 	size_t len;
 	ssize_t ret;
 
@@ -762,7 +1042,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 		return ret;
 
 	logbuf_lock_irq();
-	if (!prb_read_valid(prb, user->seq, r)) {
+	r = next_record(&user->cont, user->seq, NULL, &user->r);
+	if (!r) {
 		if (file->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
 			logbuf_unlock_irq();
@@ -771,7 +1052,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 
 		logbuf_unlock_irq();
 		ret = wait_event_interruptible(log_wait,
-					prb_read_valid(prb, user->seq, r));
+			(r = next_record(&user->cont, user->seq, NULL, &user->r)));
 		if (ret)
 			goto out;
 		logbuf_lock_irq();
@@ -780,6 +1061,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (user->seq < prb_first_valid_seq(prb)) {
 		/* our last seen message is gone, return error and reset */
 		user->seq = prb_first_valid_seq(prb);
+		cont_init(&user->cont, user->seq);
 		ret = -EPIPE;
 		logbuf_unlock_irq();
 		goto out;
@@ -787,8 +1069,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 
 	len = info_print_ext_header(user->buf, sizeof(user->buf), r->info);
 	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
-				  &r->dict_buf[0], r->info->dict_len,
-				  &r->text_buf[0], r->info->text_len);
+				  r->dict_buf, r->info->dict_len,
+				  r->text_buf, r->info->text_len);
 
 	user->seq = r->info->seq + 1;
 	logbuf_unlock_irq();
@@ -823,6 +1105,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 	case SEEK_SET:
 		/* the first record */
 		user->seq = prb_first_valid_seq(prb);
+		cont_init(&user->cont, user->seq);
 		break;
 	case SEEK_DATA:
 		/*
@@ -831,10 +1114,12 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 		 * changes no global state, and does not clear anything.
 		 */
 		user->seq = clear_seq;
+		cont_init(&user->cont, user->seq);
 		break;
 	case SEEK_END:
 		/* after the last record */
 		user->seq = prb_next_seq(prb);
+		cont_init(&user->cont, user->seq);
 		break;
 	default:
 		ret = -EINVAL;
@@ -891,9 +1176,8 @@ static int devkmsg_open(struct inode *inode, struct file *file)
 
 	mutex_init(&user->lock);
 
-	prb_rec_init_rd(&user->record, &user->info,
-			&user->text_buf[0], sizeof(user->text_buf),
-			&user->dict_buf[0], sizeof(user->dict_buf));
+	cont_init(&user->cont, 0);
+	prb_rec_init_rd(&user->r, &user->info, NULL, 0, NULL, 0);
 
 	logbuf_lock_irq();
 	user->seq = prb_first_valid_seq(prb);
@@ -1426,29 +1710,30 @@ static size_t get_record_print_text_size(struct printk_info *info,
 
 static int syslog_print(char __user *buf, int size)
 {
-	struct printk_info info;
-	struct printk_record r;
-	char *text;
+	struct printk_record *r;
+	struct cont *cont;
 	int len = 0;
 
-	text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
-	if (!text)
+	cont = kmalloc(sizeof(*cont), GFP_KERNEL);
+	if (!cont)
 		return -ENOMEM;
 
-	prb_rec_init_rd(&r, &info, text, LOG_LINE_MAX + PREFIX_MAX, NULL, 0);
+	cont_init(cont, 0);
 
 	while (size > 0) {
+		unsigned long dropped;
 		size_t n;
 		size_t skip;
 
 		logbuf_lock_irq();
-		if (!prb_read_valid(prb, syslog_seq, &r)) {
+		r = next_record(cont, syslog_seq, &dropped, NULL);
+		if (!r) {
 			logbuf_unlock_irq();
 			break;
 		}
-		if (r.info->seq != syslog_seq) {
+
+		if (dropped) {
 			/* message is gone, move to next valid one */
-			syslog_seq = r.info->seq;
 			syslog_partial = 0;
 		}
 
@@ -1460,10 +1745,10 @@ static int syslog_print(char __user *buf, int size)
 			syslog_time = printk_time;
 
 		skip = syslog_partial;
-		n = record_print_text(&r, true, syslog_time);
+		n = record_print_text(r, true, syslog_time);
 		if (n - syslog_partial <= size) {
 			/* message fits into buffer, move forward */
-			syslog_seq = r.info->seq + 1;
+			syslog_seq = r->info->seq + 1;
 			n -= syslog_partial;
 			syslog_partial = 0;
 		} else if (!len){
@@ -1477,7 +1762,7 @@ static int syslog_print(char __user *buf, int size)
 		if (!n)
 			break;
 
-		if (copy_to_user(buf, text + skip, n)) {
+		if (copy_to_user(buf, r->text_buf + skip, n)) {
 			if (!len)
 				len = -EFAULT;
 			break;
@@ -1488,22 +1773,21 @@ static int syslog_print(char __user *buf, int size)
 		buf += n;
 	}
 
-	kfree(text);
+	kfree(cont);
 	return len;
 }
 
 static int syslog_print_all(char __user *buf, int size, bool clear)
 {
-	struct printk_info info;
-	unsigned int line_count;
-	struct printk_record r;
-	char *text;
+	struct printk_record *r;
+	struct cont *c;
 	int len = 0;
+	u64 lastseq;
 	u64 seq;
 	bool time;
 
-	text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
-	if (!text)
+	c = kmalloc(sizeof(*c), GFP_KERNEL);
+	if (!c)
 		return -ENOMEM;
 
 	time = printk_time;
@@ -1512,31 +1796,31 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 	 * Find first record that fits, including all following records,
 	 * into the user-provided buffer for this dump.
 	 */
-	prb_for_each_info(clear_seq, prb, seq, &info, &line_count)
-		len += get_record_print_text_size(&info, line_count, true, time);
+	cont_for_each_record(clear_seq, c, seq, r, NULL)
+		len += record_print_text(r, true, time);
 
 	/* move first record forward until length fits into the buffer */
-	prb_for_each_info(clear_seq, prb, seq, &info, &line_count) {
+	cont_for_each_record(clear_seq, c, seq, r, NULL) {
 		if (len <= size)
 			break;
-		len -= get_record_print_text_size(&info, line_count, true, time);
+		len -= record_print_text(r, true, time);
 	}
 
-	prb_rec_init_rd(&r, &info, text, LOG_LINE_MAX + PREFIX_MAX, NULL, 0);
-
+	lastseq = clear_seq;
 	len = 0;
-	prb_for_each_record(seq, prb, seq, &r) {
+	cont_for_each_record(seq, c, seq, r, NULL) {
 		int textlen;
 
-		textlen = record_print_text(&r, true, time);
+		textlen = record_print_text(r, true, time);
 
 		if (len + textlen > size) {
-			seq--;
+			seq = lastseq;
 			break;
 		}
+		lastseq = seq;
 
 		logbuf_unlock_irq();
-		if (copy_to_user(buf + len, text, textlen))
+		if (copy_to_user(buf + len, r->text_buf, textlen))
 			len = -EFAULT;
 		else
 			len += textlen;
@@ -1550,7 +1834,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		clear_seq = seq;
 	logbuf_unlock_irq();
 
-	kfree(text);
+	kfree(c);
 	return len;
 }
 
@@ -1563,6 +1847,9 @@ static void syslog_clear(void)
 
 int do_syslog(int type, char __user *buf, int len, int source)
 {
+	struct printk_record *r;
+	struct cont *c;
+	u64 seq;
 	bool clear = false;
 	static int saved_console_loglevel = LOGLEVEL_DEFAULT;
 	int error;
@@ -1632,6 +1919,11 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
+		c = kmalloc(sizeof(*c), GFP_KERNEL);
+		if (!c) {
+			error = -ENOMEM;
+			break;
+		}
 		logbuf_lock_irq();
 		if (syslog_seq < prb_first_valid_seq(prb)) {
 			/* messages are gone, move to first one */
@@ -1640,26 +1932,27 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		}
 		if (source == SYSLOG_FROM_PROC) {
 			/*
-			 * Short-cut for poll(/"proc/kmsg") which simply checks
-			 * for pending data, not the size; return the count of
-			 * records, not the length.
+			 * Short-cut for poll("/proc/kmsg") which simply checks
+			 * for pending data, not the size.
 			 */
-			error = prb_next_seq(prb) - syslog_seq;
+			cont_init(c, syslog_seq);
+			error = !!next_record(c, syslog_seq, NULL, NULL);
 		} else {
 			bool time = syslog_partial ? syslog_time : printk_time;
-			struct printk_info info;
-			unsigned int line_count;
-			u64 seq;
-
-			prb_for_each_info(syslog_seq, prb, seq, &info,
-					  &line_count) {
-				error += get_record_print_text_size(&info, line_count,
-								    true, time);
+			bool use_partial = true;
+			unsigned long dropped;
+
+			cont_for_each_record(syslog_seq, c, seq, r, &dropped) {
+				error += record_print_text(r, true, time);
 				time = printk_time;
+				if (dropped)
+					use_partial = false;
 			}
-			error -= syslog_partial;
+			if (use_partial)
+				error -= syslog_partial;
 		}
 		logbuf_unlock_irq();
+		kfree(c);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -2335,20 +2628,17 @@ static inline int can_use_console(void)
 void console_unlock(void)
 {
 	static char ext_text[CONSOLE_EXT_LOG_MAX];
-	static char text[LOG_LINE_MAX + PREFIX_MAX];
-	static char dict[LOG_LINE_MAX];
-	unsigned long flags;
 	bool do_cond_resched, retry;
-	struct printk_info info;
-	struct printk_record r;
+	struct printk_record *r;
+	static struct cont cont;
+	unsigned long dropped;
+	unsigned long flags;
 
 	if (console_suspended) {
 		up_console_sem();
 		return;
 	}
 
-	prb_rec_init_rd(&r, &info, text, sizeof(text), dict, sizeof(dict));
-
 	/*
 	 * Console drivers are called with interrupts disabled, so
 	 * @console_may_schedule should be cleared before; however, we may
@@ -2385,30 +2675,28 @@ void console_unlock(void)
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
 skip:
-		if (!prb_read_valid(prb, console_seq, &r))
+		r = next_record(&cont, console_seq, &dropped, NULL);
+		if (!r)
 			break;
 
-		if (console_seq != r.info->seq) {
-			console_dropped += r.info->seq - console_seq;
-			console_seq = r.info->seq;
+		/* Output to all consoles once old messages replayed. */
+		if (unlikely(exclusive_console &&
+			     r->info->seq >= exclusive_console_stop_seq)) {
+			exclusive_console = NULL;
 		}
 
-		if (suppress_message_printing(r.info->level)) {
+		console_dropped += dropped;
+		console_seq = r->info->seq + 1;
+
+		if (suppress_message_printing(r->info->level)) {
 			/*
 			 * Skip record we have buffered and already printed
 			 * directly to the console when we received it, and
 			 * record that has level above the console loglevel.
 			 */
-			console_seq++;
 			goto skip;
 		}
 
-		/* Output to all consoles once old messages replayed. */
-		if (unlikely(exclusive_console &&
-			     console_seq >= exclusive_console_stop_seq)) {
-			exclusive_console = NULL;
-		}
-
 		/*
 		 * Handle extended console text first because later
 		 * record_print_text() will modify the record buffer in-place.
@@ -2416,18 +2704,17 @@ void console_unlock(void)
 		if (nr_ext_console_drivers) {
 			ext_len = info_print_ext_header(ext_text,
 						sizeof(ext_text),
-						r.info);
+						r->info);
 			ext_len += msg_print_ext_body(ext_text + ext_len,
 						sizeof(ext_text) - ext_len,
-						&r.dict_buf[0],
-						r.info->dict_len,
-						&r.text_buf[0],
-						r.info->text_len);
+						r->dict_buf,
+						r->info->dict_len,
+						r->text_buf,
+						r->info->text_len);
 		}
-		len = record_print_text(&r,
-				console_msg_format & MSG_FORMAT_SYSLOG,
-				printk_time);
-		console_seq++;
+		len = record_print_text(r,
+					console_msg_format & MSG_FORMAT_SYSLOG,
+					printk_time);
 		raw_spin_unlock(&logbuf_lock);
 
 		/*
@@ -2439,7 +2726,7 @@ void console_unlock(void)
 		console_lock_spinning_enable();
 
 		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(ext_text, ext_len, text, len);
+		call_console_drivers(ext_text, ext_len, r->text_buf, len);
 		start_critical_timings();
 
 		if (console_lock_spinning_disable_and_check()) {
@@ -2466,7 +2753,7 @@ void console_unlock(void)
 	 * flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
-	retry = prb_read_valid(prb, console_seq, NULL);
+	retry = next_record(&cont, console_seq, NULL, NULL);
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 3/4] printk: process cont records during reading
@ 2020-07-17 23:48   ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

Readers of the printk ringbuffer can use the continuous line interface
to read full lines. The interface buffers continuous line parts until
the full line is available or that line was interrupted by a writer
from another context.

The continuous line interface automatically throws out partial lines if
a reader jumps to older sequence numbers. If a reader jumps to higher
sequence numbers, any cached partial lines are flushed.

The continuous line interface is used by:

  - console printing
  - syslog
  - devkmsg

devkmsg has the additional requirement that it must show a line for
every sequence number if the corresponding continuous line record was
not dropped. The continuous line interface supports this by allowing
the reader to provide a printk_record struct that will be filled in
with placeholder information (but no text) in case a full line is not
yet available.

Note that kmsg_dump does not use the continuous line interface.

The continuous line interface discards dictionaries of continuous lines.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 455 +++++++++++++++++++++++++++++++++--------
 1 file changed, 371 insertions(+), 84 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c4274c867771..363ef290f313 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -657,6 +657,287 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 	return p - buf;
 }
 
+/*
+ * Readers of the printk ringbuffer can use the continuous line interface
+ * to read full lines. The interface buffers continuous line parts until
+ * the full line is available or that line was interrupted by a writer
+ * from another context.
+ *
+ * The continuous line interface automatically throws out partial lines if a
+ * reader jumps to older sequence numbers. If a reader jumps to higher
+ * sequence numbers, any cached partial lines are flushed.
+ *
+ * The continuous line interface is used by:
+ *
+ *   - console printing
+ *   - syslog
+ *   - devkmsg
+ *
+ * devkmsg has the additional requirement that it must show a line for every
+ * sequence number if the corresponding continuous line record was not dropped.
+ * The continuous line interface supports this by allowing the reader to
+ * provide a printk_record struct that will be filled in with placeholder
+ * information (but no text) in case a full line is not yet available.
+ *
+ * Note that kmsg_dump does not use the continuous line interface.
+ *
+ * The continuous line interface discards dictionaries of continuous lines.
+ */
+
+struct cont_record {
+	struct printk_record	r;
+	struct printk_info	info;
+	char			text[LOG_LINE_MAX + PREFIX_MAX];
+	bool			set;
+};
+
+/*
+ * The continuous line buffer manager.
+ *
+ * @cr:        record buffers for reading and caching continuous lines
+ * @dict:      the dictionary used when reading a record
+ * @cache_ind: index of the cache record in @cr
+ * @begin_seq: the minimal sequence number of the current continuous line
+ * @end_seq:   the maximal sequence number of the current continuous line
+ * @dropped:   count of dropped records during the current continuous line
+ */
+struct cont {
+	struct cont_record	cr[2];
+	char			dict[LOG_LINE_MAX];
+	int			cache_ind;
+	u64			begin_seq;
+	u64			end_seq;
+	unsigned long		dropped;
+};
+
+/*
+ * Initialize the continuous line manager. As an alternative, it is also
+ * acceptable if the structure is set to all zeros.
+ */
+static void cont_init(struct cont *c, u64 seq)
+{
+	c->cr[0].set = false;
+	c->cr[1].set = false;
+	c->cache_ind = 0;
+	c->begin_seq = seq;
+	c->end_seq = seq;
+	c->dropped = 0;
+}
+
+/* Get the continuous line cache, if one exists. */
+static struct printk_record *cont_cache(struct cont *c)
+{
+	struct cont_record *cr = &c->cr[c->cache_ind];
+
+	if (!cr->set)
+		return NULL;
+	return &cr->r;
+}
+
+/*
+ * Like cont_cache(), but also flushes the dropped count, clears the
+ * dictionary, and switches to the other record buffer for future caching.
+ */
+static struct printk_record *cont_flush(struct cont *c, unsigned long *dropped)
+{
+	struct cont_record *cr = &c->cr[c->cache_ind];
+
+	c->cache_ind ^= 1;
+
+	if (!cr->set)
+		return NULL;
+
+	if (dropped)
+		*dropped = c->dropped;
+	c->dropped = 0;
+
+	c->begin_seq = cr->info.seq;
+	cr->info.dict_len = 0;
+	cr->set = false;
+
+	return &cr->r;
+}
+
+/*
+ * Wrapper for prb_read_valid() that reads a new record into the
+ * non-caching record buffer.
+ */
+static struct printk_record *cont_read(struct cont *c, u64 seq)
+{
+	struct cont_record *cr = &c->cr[c->cache_ind ^ 1];
+	struct printk_record *r = &cr->r;
+
+	prb_rec_init_rd(r, &cr->info, cr->text, sizeof(cr->text),
+			c->dict, sizeof(c->dict));
+
+	if (!prb_read_valid(prb, seq, r))
+		return NULL;
+
+	cr->set = true;
+
+	/*
+	 * After a successful read, the reader is allowed to request the
+	 * next sequential record.
+	 */
+	c->end_seq = r->info->seq + 1;
+
+	return r;
+}
+
+/*
+ * Add a record to the cache. If there is no cache, @new becomes the cache.
+ * Note that @new must be a record returned by cont_read().
+ */
+static bool cont_add(struct cont *c, struct printk_record *new)
+{
+	struct printk_record *cache;
+
+	cache = cont_cache(c);
+	if (!cache) {
+		/* No cache? This record is now the cache. */
+		c->cache_ind ^= 1;
+		return true;
+	}
+
+	/*
+	 * If an earlier line was buffered, and this is a continuation
+	 * write from the same context, try to add it to the buffer.
+	 */
+
+	if (new->info->caller_id != cache->info->caller_id)
+		return false;
+
+	if (!(new->info->flags & LOG_CONT))
+		return false;
+
+	/* If the line gets too long, split it up in separate records. */
+	if (cache->info->text_len + new->info->text_len > cache->text_buf_size)
+		return false;
+
+	memcpy(cache->text_buf + cache->info->text_len,
+	       new->text_buf, new->info->text_len);
+
+	cache->info->text_len += new->info->text_len;
+
+	/*
+	 * The original flags come from the first line,
+	 * but later continuations can add a newline.
+	 */
+	if (new->info->flags & LOG_NEWLINE)
+		cache->info->flags |= LOG_NEWLINE;
+
+	/* The cache always has the sequence number of its newest part. */
+	cache->info->seq = new->info->seq;
+
+	/* Data copied, the non-cache buffer now available for reuse. */
+	container_of(new, struct cont_record, r)->set = false;
+
+	return true;
+}
+
+/*
+ * Get the next continuous full line.
+ *
+ * Optionally the reader can provide @r to be used to store placeholder
+ * information in case all read sequence numbers need to be shown. Only the
+ * meta-data of @r will be set and the fields @text_len and @dict_len will
+ * be set to 0. So @r does not require text and dictionary buffers.
+ *
+ * If @r is provided and no continuous full line is available, @r is filled
+ * and @r is returned (rather than returning an internal buffer of @c).
+ */
+static struct printk_record *next_record(struct cont *c, u64 seq,
+					 unsigned long *dropped,
+					 struct printk_record *r)
+{
+	struct printk_record *cache;
+	struct printk_record *new;
+
+	if (dropped)
+		*dropped = 0;
+
+	cache = cont_cache(c);
+
+	/*
+	 * Re-initialize @c if the reader is jumping back or forward.
+	 * For the forward case, flush any existing cache first.
+	 */
+	if (seq < c->begin_seq) {
+		cont_init(c, seq);
+	} else if (seq > c->end_seq) {
+		if (cache)
+			goto flush;
+		else
+			cont_init(c, seq);
+	}
+
+	for (;;) {
+		/* If a full line is already cached, no need to read. */
+		if (cache && (cache->info->flags & LOG_NEWLINE))
+			goto flush;
+
+		/*
+		 * The reader is not aware that some records may be cached
+		 * already. Always read the next sequential record beyond
+		 * the cache contents.
+		 */
+		seq = c->end_seq;
+
+		new = cont_read(c, seq);
+		if (!new)
+			return NULL;
+
+		if (new->info->seq > seq) {
+			/* Some records were lost. */
+			c->dropped += new->info->seq - seq;
+			if (cache)
+				goto flush;
+		}
+
+		if (r && cache) {
+			/*
+			 * The caller wants a placeholder in case there is no
+			 * full line ready. Copy the cache meta-data now in
+			 * case the current record is added to the cache.
+			 */
+			memcpy(r->info, cache->info, sizeof(*(r->info)));
+			r->info->text_len = 0;
+			r->info->dict_len = 0;
+			r->info->level = 7;
+			r->info->flags = LOG_CONT | LOG_NEWLINE;
+		}
+
+		if (!cont_add(c, new))
+			goto flush;
+
+		if (r && cache) {
+			/*
+			 * The record was added to an existing cache and the
+			 * caller wants a place holder.
+			 */
+			return r;
+		}
+
+		cache = cont_cache(c);
+	}
+flush:
+	return cont_flush(c, dropped);
+}
+
+/*
+ * A continuous line variant of prb_for_each_record().
+ *
+ * @from: The sequence number to begin with.
+ * @c:    A continuous line manager to use.
+ * @s:    A u64 to store the sequence number on each iteration.
+ * @r:    A printk_record to store the record on each iteration.
+ * @d:    The @dropped argument for next_record().
+ */
+#define cont_for_each_record(from, c, s, r, d) \
+for ((s) = from, cont_init(c, s); \
+	((r) = next_record(c, s, d, NULL)); \
+	(s) = (r)->info->seq + 1)
+
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
 	u64 seq;
@@ -664,10 +945,9 @@ struct devkmsg_user {
 	struct mutex lock;
 	char buf[CONSOLE_EXT_LOG_MAX];
 
+	struct cont cont;
+	struct printk_record r;
 	struct printk_info info;
-	char text_buf[CONSOLE_EXT_LOG_MAX];
-	char dict_buf[CONSOLE_EXT_LOG_MAX];
-	struct printk_record record;
 };
 
 static __printf(3, 4) __cold
@@ -750,7 +1030,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
 	struct devkmsg_user *user = file->private_data;
-	struct printk_record *r = &user->record;
+	struct printk_record *r;
 	size_t len;
 	ssize_t ret;
 
@@ -762,7 +1042,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 		return ret;
 
 	logbuf_lock_irq();
-	if (!prb_read_valid(prb, user->seq, r)) {
+	r = next_record(&user->cont, user->seq, NULL, &user->r);
+	if (!r) {
 		if (file->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
 			logbuf_unlock_irq();
@@ -771,7 +1052,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 
 		logbuf_unlock_irq();
 		ret = wait_event_interruptible(log_wait,
-					prb_read_valid(prb, user->seq, r));
+			(r = next_record(&user->cont, user->seq, NULL, &user->r)));
 		if (ret)
 			goto out;
 		logbuf_lock_irq();
@@ -780,6 +1061,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (user->seq < prb_first_valid_seq(prb)) {
 		/* our last seen message is gone, return error and reset */
 		user->seq = prb_first_valid_seq(prb);
+		cont_init(&user->cont, user->seq);
 		ret = -EPIPE;
 		logbuf_unlock_irq();
 		goto out;
@@ -787,8 +1069,8 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 
 	len = info_print_ext_header(user->buf, sizeof(user->buf), r->info);
 	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
-				  &r->dict_buf[0], r->info->dict_len,
-				  &r->text_buf[0], r->info->text_len);
+				  r->dict_buf, r->info->dict_len,
+				  r->text_buf, r->info->text_len);
 
 	user->seq = r->info->seq + 1;
 	logbuf_unlock_irq();
@@ -823,6 +1105,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 	case SEEK_SET:
 		/* the first record */
 		user->seq = prb_first_valid_seq(prb);
+		cont_init(&user->cont, user->seq);
 		break;
 	case SEEK_DATA:
 		/*
@@ -831,10 +1114,12 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 		 * changes no global state, and does not clear anything.
 		 */
 		user->seq = clear_seq;
+		cont_init(&user->cont, user->seq);
 		break;
 	case SEEK_END:
 		/* after the last record */
 		user->seq = prb_next_seq(prb);
+		cont_init(&user->cont, user->seq);
 		break;
 	default:
 		ret = -EINVAL;
@@ -891,9 +1176,8 @@ static int devkmsg_open(struct inode *inode, struct file *file)
 
 	mutex_init(&user->lock);
 
-	prb_rec_init_rd(&user->record, &user->info,
-			&user->text_buf[0], sizeof(user->text_buf),
-			&user->dict_buf[0], sizeof(user->dict_buf));
+	cont_init(&user->cont, 0);
+	prb_rec_init_rd(&user->r, &user->info, NULL, 0, NULL, 0);
 
 	logbuf_lock_irq();
 	user->seq = prb_first_valid_seq(prb);
@@ -1426,29 +1710,30 @@ static size_t get_record_print_text_size(struct printk_info *info,
 
 static int syslog_print(char __user *buf, int size)
 {
-	struct printk_info info;
-	struct printk_record r;
-	char *text;
+	struct printk_record *r;
+	struct cont *cont;
 	int len = 0;
 
-	text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
-	if (!text)
+	cont = kmalloc(sizeof(*cont), GFP_KERNEL);
+	if (!cont)
 		return -ENOMEM;
 
-	prb_rec_init_rd(&r, &info, text, LOG_LINE_MAX + PREFIX_MAX, NULL, 0);
+	cont_init(cont, 0);
 
 	while (size > 0) {
+		unsigned long dropped;
 		size_t n;
 		size_t skip;
 
 		logbuf_lock_irq();
-		if (!prb_read_valid(prb, syslog_seq, &r)) {
+		r = next_record(cont, syslog_seq, &dropped, NULL);
+		if (!r) {
 			logbuf_unlock_irq();
 			break;
 		}
-		if (r.info->seq != syslog_seq) {
+
+		if (dropped) {
 			/* message is gone, move to next valid one */
-			syslog_seq = r.info->seq;
 			syslog_partial = 0;
 		}
 
@@ -1460,10 +1745,10 @@ static int syslog_print(char __user *buf, int size)
 			syslog_time = printk_time;
 
 		skip = syslog_partial;
-		n = record_print_text(&r, true, syslog_time);
+		n = record_print_text(r, true, syslog_time);
 		if (n - syslog_partial <= size) {
 			/* message fits into buffer, move forward */
-			syslog_seq = r.info->seq + 1;
+			syslog_seq = r->info->seq + 1;
 			n -= syslog_partial;
 			syslog_partial = 0;
 		} else if (!len){
@@ -1477,7 +1762,7 @@ static int syslog_print(char __user *buf, int size)
 		if (!n)
 			break;
 
-		if (copy_to_user(buf, text + skip, n)) {
+		if (copy_to_user(buf, r->text_buf + skip, n)) {
 			if (!len)
 				len = -EFAULT;
 			break;
@@ -1488,22 +1773,21 @@ static int syslog_print(char __user *buf, int size)
 		buf += n;
 	}
 
-	kfree(text);
+	kfree(cont);
 	return len;
 }
 
 static int syslog_print_all(char __user *buf, int size, bool clear)
 {
-	struct printk_info info;
-	unsigned int line_count;
-	struct printk_record r;
-	char *text;
+	struct printk_record *r;
+	struct cont *c;
 	int len = 0;
+	u64 lastseq;
 	u64 seq;
 	bool time;
 
-	text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
-	if (!text)
+	c = kmalloc(sizeof(*c), GFP_KERNEL);
+	if (!c)
 		return -ENOMEM;
 
 	time = printk_time;
@@ -1512,31 +1796,31 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 	 * Find first record that fits, including all following records,
 	 * into the user-provided buffer for this dump.
 	 */
-	prb_for_each_info(clear_seq, prb, seq, &info, &line_count)
-		len += get_record_print_text_size(&info, line_count, true, time);
+	cont_for_each_record(clear_seq, c, seq, r, NULL)
+		len += record_print_text(r, true, time);
 
 	/* move first record forward until length fits into the buffer */
-	prb_for_each_info(clear_seq, prb, seq, &info, &line_count) {
+	cont_for_each_record(clear_seq, c, seq, r, NULL) {
 		if (len <= size)
 			break;
-		len -= get_record_print_text_size(&info, line_count, true, time);
+		len -= record_print_text(r, true, time);
 	}
 
-	prb_rec_init_rd(&r, &info, text, LOG_LINE_MAX + PREFIX_MAX, NULL, 0);
-
+	lastseq = clear_seq;
 	len = 0;
-	prb_for_each_record(seq, prb, seq, &r) {
+	cont_for_each_record(seq, c, seq, r, NULL) {
 		int textlen;
 
-		textlen = record_print_text(&r, true, time);
+		textlen = record_print_text(r, true, time);
 
 		if (len + textlen > size) {
-			seq--;
+			seq = lastseq;
 			break;
 		}
+		lastseq = seq;
 
 		logbuf_unlock_irq();
-		if (copy_to_user(buf + len, text, textlen))
+		if (copy_to_user(buf + len, r->text_buf, textlen))
 			len = -EFAULT;
 		else
 			len += textlen;
@@ -1550,7 +1834,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		clear_seq = seq;
 	logbuf_unlock_irq();
 
-	kfree(text);
+	kfree(c);
 	return len;
 }
 
@@ -1563,6 +1847,9 @@ static void syslog_clear(void)
 
 int do_syslog(int type, char __user *buf, int len, int source)
 {
+	struct printk_record *r;
+	struct cont *c;
+	u64 seq;
 	bool clear = false;
 	static int saved_console_loglevel = LOGLEVEL_DEFAULT;
 	int error;
@@ -1632,6 +1919,11 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		break;
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
+		c = kmalloc(sizeof(*c), GFP_KERNEL);
+		if (!c) {
+			error = -ENOMEM;
+			break;
+		}
 		logbuf_lock_irq();
 		if (syslog_seq < prb_first_valid_seq(prb)) {
 			/* messages are gone, move to first one */
@@ -1640,26 +1932,27 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		}
 		if (source == SYSLOG_FROM_PROC) {
 			/*
-			 * Short-cut for poll(/"proc/kmsg") which simply checks
-			 * for pending data, not the size; return the count of
-			 * records, not the length.
+			 * Short-cut for poll("/proc/kmsg") which simply checks
+			 * for pending data, not the size.
 			 */
-			error = prb_next_seq(prb) - syslog_seq;
+			cont_init(c, syslog_seq);
+			error = !!next_record(c, syslog_seq, NULL, NULL);
 		} else {
 			bool time = syslog_partial ? syslog_time : printk_time;
-			struct printk_info info;
-			unsigned int line_count;
-			u64 seq;
-
-			prb_for_each_info(syslog_seq, prb, seq, &info,
-					  &line_count) {
-				error += get_record_print_text_size(&info, line_count,
-								    true, time);
+			bool use_partial = true;
+			unsigned long dropped;
+
+			cont_for_each_record(syslog_seq, c, seq, r, &dropped) {
+				error += record_print_text(r, true, time);
 				time = printk_time;
+				if (dropped)
+					use_partial = false;
 			}
-			error -= syslog_partial;
+			if (use_partial)
+				error -= syslog_partial;
 		}
 		logbuf_unlock_irq();
+		kfree(c);
 		break;
 	/* Size of the log buffer */
 	case SYSLOG_ACTION_SIZE_BUFFER:
@@ -2335,20 +2628,17 @@ static inline int can_use_console(void)
 void console_unlock(void)
 {
 	static char ext_text[CONSOLE_EXT_LOG_MAX];
-	static char text[LOG_LINE_MAX + PREFIX_MAX];
-	static char dict[LOG_LINE_MAX];
-	unsigned long flags;
 	bool do_cond_resched, retry;
-	struct printk_info info;
-	struct printk_record r;
+	struct printk_record *r;
+	static struct cont cont;
+	unsigned long dropped;
+	unsigned long flags;
 
 	if (console_suspended) {
 		up_console_sem();
 		return;
 	}
 
-	prb_rec_init_rd(&r, &info, text, sizeof(text), dict, sizeof(dict));
-
 	/*
 	 * Console drivers are called with interrupts disabled, so
 	 * @console_may_schedule should be cleared before; however, we may
@@ -2385,30 +2675,28 @@ void console_unlock(void)
 		printk_safe_enter_irqsave(flags);
 		raw_spin_lock(&logbuf_lock);
 skip:
-		if (!prb_read_valid(prb, console_seq, &r))
+		r = next_record(&cont, console_seq, &dropped, NULL);
+		if (!r)
 			break;
 
-		if (console_seq != r.info->seq) {
-			console_dropped += r.info->seq - console_seq;
-			console_seq = r.info->seq;
+		/* Output to all consoles once old messages replayed. */
+		if (unlikely(exclusive_console &&
+			     r->info->seq >= exclusive_console_stop_seq)) {
+			exclusive_console = NULL;
 		}
 
-		if (suppress_message_printing(r.info->level)) {
+		console_dropped += dropped;
+		console_seq = r->info->seq + 1;
+
+		if (suppress_message_printing(r->info->level)) {
 			/*
 			 * Skip record we have buffered and already printed
 			 * directly to the console when we received it, and
 			 * record that has level above the console loglevel.
 			 */
-			console_seq++;
 			goto skip;
 		}
 
-		/* Output to all consoles once old messages replayed. */
-		if (unlikely(exclusive_console &&
-			     console_seq >= exclusive_console_stop_seq)) {
-			exclusive_console = NULL;
-		}
-
 		/*
 		 * Handle extended console text first because later
 		 * record_print_text() will modify the record buffer in-place.
@@ -2416,18 +2704,17 @@ void console_unlock(void)
 		if (nr_ext_console_drivers) {
 			ext_len = info_print_ext_header(ext_text,
 						sizeof(ext_text),
-						r.info);
+						r->info);
 			ext_len += msg_print_ext_body(ext_text + ext_len,
 						sizeof(ext_text) - ext_len,
-						&r.dict_buf[0],
-						r.info->dict_len,
-						&r.text_buf[0],
-						r.info->text_len);
+						r->dict_buf,
+						r->info->dict_len,
+						r->text_buf,
+						r->info->text_len);
 		}
-		len = record_print_text(&r,
-				console_msg_format & MSG_FORMAT_SYSLOG,
-				printk_time);
-		console_seq++;
+		len = record_print_text(r,
+					console_msg_format & MSG_FORMAT_SYSLOG,
+					printk_time);
 		raw_spin_unlock(&logbuf_lock);
 
 		/*
@@ -2439,7 +2726,7 @@ void console_unlock(void)
 		console_lock_spinning_enable();
 
 		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(ext_text, ext_len, text, len);
+		call_console_drivers(ext_text, ext_len, r->text_buf, len);
 		start_critical_timings();
 
 		if (console_lock_spinning_disable_and_check()) {
@@ -2466,7 +2753,7 @@ void console_unlock(void)
 	 * flush, no worries.
 	 */
 	raw_spin_lock(&logbuf_lock);
-	retry = prb_read_valid(prb, console_seq, NULL);
+	retry = next_record(&cont, console_seq, NULL, NULL);
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
-- 
2.20.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 4/4] ipconfig: cleanup printk usage
  2020-07-17 23:48 ` John Ogness
@ 2020-07-17 23:48   ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, linux-kernel

The use of pr_info() and pr_cont() was not ordered correctly for
all cases. Order it so that all cases provide the expected output.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 net/ipv4/ipconfig.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 561f15b5a944..0f4bd7a59310 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1442,6 +1442,9 @@ static int __init ip_auto_config(void)
 #endif
 	int err;
 	unsigned int i;
+#ifndef IPCONFIG_SILENT
+	bool pr0;
+#endif
 
 	/* Initialise all name servers and NTP servers to NONE (but only if the
 	 * "ip=" or "nfsaddrs=" kernel command line parameters weren't decoded,
@@ -1575,31 +1578,37 @@ static int __init ip_auto_config(void)
 	if (ic_dev_mtu)
 		pr_cont(", mtu=%d", ic_dev_mtu);
 	/* Name servers (if any): */
+	pr0 = false;
 	for (i = 0; i < CONF_NAMESERVERS_MAX; i++) {
 		if (ic_nameservers[i] != NONE) {
-			if (i == 0)
+			if (!pr0) {
 				pr_info("     nameserver%u=%pI4",
 					i, &ic_nameservers[i]);
-			else
+				pr0 = true;
+			} else {
 				pr_cont(", nameserver%u=%pI4",
 					i, &ic_nameservers[i]);
+			}
 		}
-		if (i + 1 == CONF_NAMESERVERS_MAX)
-			pr_cont("\n");
 	}
+	if (pr0)
+		pr_cont("\n");
 	/* NTP servers (if any): */
+	pr0 = false;
 	for (i = 0; i < CONF_NTP_SERVERS_MAX; i++) {
 		if (ic_ntp_servers[i] != NONE) {
-			if (i == 0)
+			if (!pr0) {
 				pr_info("     ntpserver%u=%pI4",
 					i, &ic_ntp_servers[i]);
-			else
+				pr0 = true;
+			} else {
 				pr_cont(", ntpserver%u=%pI4",
 					i, &ic_ntp_servers[i]);
+			}
 		}
-		if (i + 1 == CONF_NTP_SERVERS_MAX)
-			pr_cont("\n");
 	}
+	if (pr0)
+		pr_cont("\n");
 #endif /* !SILENT */
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 4/4] ipconfig: cleanup printk usage
@ 2020-07-17 23:48   ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-17 23:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

The use of pr_info() and pr_cont() was not ordered correctly for
all cases. Order it so that all cases provide the expected output.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 net/ipv4/ipconfig.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 561f15b5a944..0f4bd7a59310 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1442,6 +1442,9 @@ static int __init ip_auto_config(void)
 #endif
 	int err;
 	unsigned int i;
+#ifndef IPCONFIG_SILENT
+	bool pr0;
+#endif
 
 	/* Initialise all name servers and NTP servers to NONE (but only if the
 	 * "ip=" or "nfsaddrs=" kernel command line parameters weren't decoded,
@@ -1575,31 +1578,37 @@ static int __init ip_auto_config(void)
 	if (ic_dev_mtu)
 		pr_cont(", mtu=%d", ic_dev_mtu);
 	/* Name servers (if any): */
+	pr0 = false;
 	for (i = 0; i < CONF_NAMESERVERS_MAX; i++) {
 		if (ic_nameservers[i] != NONE) {
-			if (i == 0)
+			if (!pr0) {
 				pr_info("     nameserver%u=%pI4",
 					i, &ic_nameservers[i]);
-			else
+				pr0 = true;
+			} else {
 				pr_cont(", nameserver%u=%pI4",
 					i, &ic_nameservers[i]);
+			}
 		}
-		if (i + 1 == CONF_NAMESERVERS_MAX)
-			pr_cont("\n");
 	}
+	if (pr0)
+		pr_cont("\n");
 	/* NTP servers (if any): */
+	pr0 = false;
 	for (i = 0; i < CONF_NTP_SERVERS_MAX; i++) {
 		if (ic_ntp_servers[i] != NONE) {
-			if (i == 0)
+			if (!pr0) {
 				pr_info("     ntpserver%u=%pI4",
 					i, &ic_ntp_servers[i]);
-			else
+				pr0 = true;
+			} else {
 				pr_cont(", ntpserver%u=%pI4",
 					i, &ic_ntp_servers[i]);
+			}
 		}
-		if (i + 1 == CONF_NTP_SERVERS_MAX)
-			pr_cont("\n");
 	}
+	if (pr0)
+		pr_cont("\n");
 #endif /* !SILENT */
 
 	/*
-- 
2.20.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-07-17 23:48 ` John Ogness
@ 2020-07-18  0:00   ` Linus Torvalds
  -1 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-18  0:00 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Fri, Jul 17, 2020 at 4:48 PM John Ogness <john.ogness@linutronix.de> wrote:
>
> Using placeholders avoids tools such as systemd-journald from
> erroneously reporting missed messages. However, it also means that
> empty placeholder records are visible in systemd-journald logs and
> displayed in tools such as dmesg.

As long as the readers still reliably do the joining, this is fine.

HOWEVER.

Make sure you test the case of "fast concurrent readers". The last
time we did things like this, it was a disaster, because a concurrent
reader would see and return the _incomplete_ line, and the next entry
was still being generated on another CPU.

The reader would then decide to return that incomplete line, because
it had something.

And while in theory this could then be handled properly in user space,
in practice it wasn't. So you'd see a lot of logging tools that would
then report all those continuations as separate log events.

Which is the whole point of LOG_CONT - for that *not* to happen.

So this is just a heads-up that I will not pull something that breaks
LOG_CONT because it thinks "user space can handle it". No. User space
does not handle it, and we need to handle it for the user.

So as long as the kernel makes sure the joining does happen it at some
point, it's all fine. It obviously doesn't have to happen at printk()
time, just as long as incomplete records aren't exposed even to
concurrent readers.

A test-case with a short delay in between writes might be a good idea,
although the last time this broke it was very clear in peoples syslogs
and dmesg because it turns out there are lots of concurrent readers at
boot time and _somebody_ will hit the race.

             Linus

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-07-18  0:00   ` Linus Torvalds
  0 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-18  0:00 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On Fri, Jul 17, 2020 at 4:48 PM John Ogness <john.ogness@linutronix.de> wrote:
>
> Using placeholders avoids tools such as systemd-journald from
> erroneously reporting missed messages. However, it also means that
> empty placeholder records are visible in systemd-journald logs and
> displayed in tools such as dmesg.

As long as the readers still reliably do the joining, this is fine.

HOWEVER.

Make sure you test the case of "fast concurrent readers". The last
time we did things like this, it was a disaster, because a concurrent
reader would see and return the _incomplete_ line, and the next entry
was still being generated on another CPU.

The reader would then decide to return that incomplete line, because
it had something.

And while in theory this could then be handled properly in user space,
in practice it wasn't. So you'd see a lot of logging tools that would
then report all those continuations as separate log events.

Which is the whole point of LOG_CONT - for that *not* to happen.

So this is just a heads-up that I will not pull something that breaks
LOG_CONT because it thinks "user space can handle it". No. User space
does not handle it, and we need to handle it for the user.

So as long as the kernel makes sure the joining does happen it at some
point, it's all fine. It obviously doesn't have to happen at printk()
time, just as long as incomplete records aren't exposed even to
concurrent readers.

A test-case with a short delay in between writes might be a good idea,
although the last time this broke it was very clear in peoples syslogs
and dmesg because it turns out there are lots of concurrent readers at
boot time and _somebody_ will hit the race.

             Linus

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-07-18  0:00   ` Linus Torvalds
@ 2020-07-18 14:42     ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-18 14:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On 2020-07-17, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Make sure you test the case of "fast concurrent readers". The last
> time we did things like this, it was a disaster, because a concurrent
> reader would see and return the _incomplete_ line, and the next entry
> was still being generated on another CPU.
>
> The reader would then decide to return that incomplete line, because
> it had something.
>
> And while in theory this could then be handled properly in user space,
> in practice it wasn't. So you'd see a lot of logging tools that would
> then report all those continuations as separate log events.
>
> Which is the whole point of LOG_CONT - for that *not* to happen.

I expect this is handled correctly since the reader is not given any
parts until a full line is ready, but I will put more focus on testing
this to make sure. Thanks for the regression and testing tips.

> So this is just a heads-up that I will not pull something that breaks
> LOG_CONT because it thinks "user space can handle it". No. User space
> does not handle it, and we need to handle it for the user.

Understood. Petr and Sergey are also strict about this. We are making a
serious effort to avoid breaking things for userspace.

John Ogness

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-07-18 14:42     ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-18 14:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On 2020-07-17, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Make sure you test the case of "fast concurrent readers". The last
> time we did things like this, it was a disaster, because a concurrent
> reader would see and return the _incomplete_ line, and the next entry
> was still being generated on another CPU.
>
> The reader would then decide to return that incomplete line, because
> it had something.
>
> And while in theory this could then be handled properly in user space,
> in practice it wasn't. So you'd see a lot of logging tools that would
> then report all those continuations as separate log events.
>
> Which is the whole point of LOG_CONT - for that *not* to happen.

I expect this is handled correctly since the reader is not given any
parts until a full line is ready, but I will put more focus on testing
this to make sure. Thanks for the regression and testing tips.

> So this is just a heads-up that I will not pull something that breaks
> LOG_CONT because it thinks "user space can handle it". No. User space
> does not handle it, and we need to handle it for the user.

Understood. Petr and Sergey are also strict about this. We are making a
serious effort to avoid breaking things for userspace.

John Ogness

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-07-18 14:42     ` John Ogness
@ 2020-07-18 17:42       ` Linus Torvalds
  -1 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-18 17:42 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Sat, Jul 18, 2020 at 7:43 AM John Ogness <john.ogness@linutronix.de> wrote:
>
> I expect this is handled correctly since the reader is not given any
> parts until a full line is ready, but I will put more focus on testing
> this to make sure.

Yeah, the patches looked fine, but I only scanned them, and just
wanted to make sure.

Over the years, we've gotten printk wrong so many times that I get a
bit paranoid. Things can look fine on the screen, but then have odd
line breaks in the logs. Or vice versa. Or work fine on some machine,
but consistently show some race on another.

And some of the more complex features are hardly ever actually used -
I'm not sure the optional message context (aka dictionary) is ever
actually used.

Yes, all the "dev_printk()" helpers fill it in with the device
information (create_syslog_header()), and you _can_ use them if you
know about them (ie

    journalctl -b _KERNEL_SUBSYSTEM=pci_bus

but I sometimes wonder how many people use all this complexity. And
how many people even know about it..)

So there are hidden things in there that can easily break *subtly* and
then take ages for people to notice, because while some are very
obvious indeed ("why is my module list message broken up into a
hundred lines?") others might be things people aren't even aware of.

Maybe a lot of system tools use those kernel dictionaries. Maybe it
would break immediately. I just sometimes wonder...

                 Linus

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-07-18 17:42       ` Linus Torvalds
  0 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-18 17:42 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On Sat, Jul 18, 2020 at 7:43 AM John Ogness <john.ogness@linutronix.de> wrote:
>
> I expect this is handled correctly since the reader is not given any
> parts until a full line is ready, but I will put more focus on testing
> this to make sure.

Yeah, the patches looked fine, but I only scanned them, and just
wanted to make sure.

Over the years, we've gotten printk wrong so many times that I get a
bit paranoid. Things can look fine on the screen, but then have odd
line breaks in the logs. Or vice versa. Or work fine on some machine,
but consistently show some race on another.

And some of the more complex features are hardly ever actually used -
I'm not sure the optional message context (aka dictionary) is ever
actually used.

Yes, all the "dev_printk()" helpers fill it in with the device
information (create_syslog_header()), and you _can_ use them if you
know about them (ie

    journalctl -b _KERNEL_SUBSYSTEM=pci_bus

but I sometimes wonder how many people use all this complexity. And
how many people even know about it..)

So there are hidden things in there that can easily break *subtly* and
then take ages for people to notice, because while some are very
obvious indeed ("why is my module list message broken up into a
hundred lines?") others might be things people aren't even aware of.

Maybe a lot of system tools use those kernel dictionaries. Maybe it
would break immediately. I just sometimes wonder...

                 Linus

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-17 23:48   ` John Ogness
@ 2020-07-19 14:35     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-19 14:35 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman,
	Peter Zijlstra, Thomas Gleixner, kexec, linux-kernel

On (20/07/18 01:54), John Ogness wrote:
> ---
>  kernel/printk/printk.c | 114 ++++-------------------------------------
>  1 file changed, 11 insertions(+), 103 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index fec71229169e..c4274c867771 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -495,9 +495,14 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len)
>  		*trunc_msg_len = 0;
>  }
>  
> +static inline u32 printk_caller_id(void)
> +{
> +	return in_task() ? task_pid_nr(current) :
> +		0x80000000 + raw_smp_processor_id();
> +}

Can we merge lines that we don't want to merge?

   pr_cont()  ->  IRQ -> pr_cont() -> NMI -> pr_cont()

                  |__________________________________|
   different       same 0x80000000 + cpu_id caller id
   caller id

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-19 14:35     ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-19 14:35 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, linux-kernel, Steven Rostedt,
	Sergey Senozhatsky, Thomas Gleixner, Linus Torvalds

On (20/07/18 01:54), John Ogness wrote:
> ---
>  kernel/printk/printk.c | 114 ++++-------------------------------------
>  1 file changed, 11 insertions(+), 103 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index fec71229169e..c4274c867771 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -495,9 +495,14 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len)
>  		*trunc_msg_len = 0;
>  }
>  
> +static inline u32 printk_caller_id(void)
> +{
> +	return in_task() ? task_pid_nr(current) :
> +		0x80000000 + raw_smp_processor_id();
> +}

Can we merge lines that we don't want to merge?

   pr_cont()  ->  IRQ -> pr_cont() -> NMI -> pr_cont()

                  |__________________________________|
   different       same 0x80000000 + cpu_id caller id
   caller id

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-19 14:35     ` Sergey Senozhatsky
@ 2020-07-19 18:27       ` Linus Torvalds
  -1 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-19 18:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On Sun, Jul 19, 2020 at 7:35 AM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> Can we merge lines that we don't want to merge?
>
>    pr_cont()  ->  IRQ -> pr_cont() -> NMI -> pr_cont()

That pr_cont in either IRQ or NMI context would be a bug.

You can't validly have a PR_CONT without the non-cont that precedes it.

                Linus

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-19 18:27       ` Linus Torvalds
  0 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-19 18:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner

On Sun, Jul 19, 2020 at 7:35 AM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> Can we merge lines that we don't want to merge?
>
>    pr_cont()  ->  IRQ -> pr_cont() -> NMI -> pr_cont()

That pr_cont in either IRQ or NMI context would be a bug.

You can't validly have a PR_CONT without the non-cont that precedes it.

                Linus

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-19 18:27       ` Linus Torvalds
@ 2020-07-20  1:50         ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-20  1:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On (20/07/19 11:27), Linus Torvalds wrote:
> On Sun, Jul 19, 2020 at 7:35 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> >
> > Can we merge lines that we don't want to merge?
> >
> >    pr_cont()  ->  IRQ -> pr_cont() -> NMI -> pr_cont()
> 
> That pr_cont in either IRQ or NMI context would be a bug.
> 
> You can't validly have a PR_CONT without the non-cont that precedes it.

Do I get it right, what you are saying is - when we process a PR_CONT
message the cont buffer should already contain previous non-LOG_NEWLINE
and non-PR_CONT message, otherwise it's a bug?

lockdep (I'll trim the code)

static void __print_lock_name(struct lock_class *class)
{
 ..
	name = class->name;
	if (!name) {
		name = __get_key_name(class->key, str);
		printk(KERN_CONT "%s", name);
	} else {
		printk(KERN_CONT "%s", name);
		if (class->name_version > 1)
			printk(KERN_CONT "#%d", class->name_version);
		if (class->subclass)
			printk(KERN_CONT "/%d", class->subclass);
	}
}

static void print_lock_name(struct lock_class *class)
{
	printk(KERN_CONT " (");
	__print_lock_name(class);
	printk(KERN_CONT "){%s}-{%hd:%hd}", usage, ...
}

static void
print_bad_irq_dependency(struct task_struct *curr,
{
 ..
	pr_warn("which would create a new lock dependency:\n");
	print_lock_name(hlock_class(prev));
	pr_cont(" ->");
	print_lock_name(hlock_class(next));
	pr_cont("\n");
 ..
}

pr_warn() is LOG_NEWLINE, so cont buffer is empty by the time
we call print_lock_name()->__print_lock_name(), which do several
pr_cont() print outs.

I'm quite sure there is more code that does similar things.

But, overall, isn't it by design that we can process pr_cont()
message with no preceding non-cont message? Because of preliminary
flushes. Example:

CPU0

 pr_info("foo");    // !LOG_NEWLINE goes into the cont buffer
 pr_cont("1");      // OK
 -> IRQ / NMI / exception / etc
 	pr_alert("error\n"); // flush cont buffer, log_store error message (it's LOG_NEWLINE)
 <- iret
 pr_cont("2");      // cont buffer was flushed. There is no preceding non-cont message
 pr_cont("3");
 pr_cont("\n");

Or am I misunderstanding what you saying?

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-20  1:50         ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-20  1:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On (20/07/19 11:27), Linus Torvalds wrote:
> On Sun, Jul 19, 2020 at 7:35 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> >
> > Can we merge lines that we don't want to merge?
> >
> >    pr_cont()  ->  IRQ -> pr_cont() -> NMI -> pr_cont()
> 
> That pr_cont in either IRQ or NMI context would be a bug.
> 
> You can't validly have a PR_CONT without the non-cont that precedes it.

Do I get it right, what you are saying is - when we process a PR_CONT
message the cont buffer should already contain previous non-LOG_NEWLINE
and non-PR_CONT message, otherwise it's a bug?

lockdep (I'll trim the code)

static void __print_lock_name(struct lock_class *class)
{
 ..
	name = class->name;
	if (!name) {
		name = __get_key_name(class->key, str);
		printk(KERN_CONT "%s", name);
	} else {
		printk(KERN_CONT "%s", name);
		if (class->name_version > 1)
			printk(KERN_CONT "#%d", class->name_version);
		if (class->subclass)
			printk(KERN_CONT "/%d", class->subclass);
	}
}

static void print_lock_name(struct lock_class *class)
{
	printk(KERN_CONT " (");
	__print_lock_name(class);
	printk(KERN_CONT "){%s}-{%hd:%hd}", usage, ...
}

static void
print_bad_irq_dependency(struct task_struct *curr,
{
 ..
	pr_warn("which would create a new lock dependency:\n");
	print_lock_name(hlock_class(prev));
	pr_cont(" ->");
	print_lock_name(hlock_class(next));
	pr_cont("\n");
 ..
}

pr_warn() is LOG_NEWLINE, so cont buffer is empty by the time
we call print_lock_name()->__print_lock_name(), which do several
pr_cont() print outs.

I'm quite sure there is more code that does similar things.

But, overall, isn't it by design that we can process pr_cont()
message with no preceding non-cont message? Because of preliminary
flushes. Example:

CPU0

 pr_info("foo");    // !LOG_NEWLINE goes into the cont buffer
 pr_cont("1");      // OK
 -> IRQ / NMI / exception / etc
 	pr_alert("error\n"); // flush cont buffer, log_store error message (it's LOG_NEWLINE)
 <- iret
 pr_cont("2");      // cont buffer was flushed. There is no preceding non-cont message
 pr_cont("3");
 pr_cont("\n");

Or am I misunderstanding what you saying?

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 1/4] printk: ringbuffer: support dataless records
  2020-07-17 23:48   ` John Ogness
@ 2020-07-20 14:49     ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-20 14:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, linux-kernel

On 2020-07-18, John Ogness <john.ogness@linutronix.de> wrote:
> In order to support storage of continuous lines, dataless records must
> be allowed. For example, these are generated with the legal calls:
>
>     pr_info("");
>     pr_cont("\n");
>
> Currently dataless records are denoted by INVALID_LPOS in order to
> recognize failed prb_reserve() calls. Change the code to use two
> different identifiers (FAILED_LPOS and NO_LPOS) to distinguish
> between failed prb_reserve() records and successful dataless records.

This patch has been re-posted [0] as a regression fix for the first
series that is already in linux-next. Only the commit message has been
changed to reflect the regression fix rather than preparing for
continuous line support.

Assuming that patch is accepted, this one should be dropped.

John Ogness

[0] https://lkml.kernel.org/r/20200720140111.19935-1-john.ogness@linutronix.de

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 1/4] printk: ringbuffer: support dataless records
@ 2020-07-20 14:49     ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-20 14:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	linux-kernel, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On 2020-07-18, John Ogness <john.ogness@linutronix.de> wrote:
> In order to support storage of continuous lines, dataless records must
> be allowed. For example, these are generated with the legal calls:
>
>     pr_info("");
>     pr_cont("\n");
>
> Currently dataless records are denoted by INVALID_LPOS in order to
> recognize failed prb_reserve() calls. Change the code to use two
> different identifiers (FAILED_LPOS and NO_LPOS) to distinguish
> between failed prb_reserve() records and successful dataless records.

This patch has been re-posted [0] as a regression fix for the first
series that is already in linux-next. Only the commit message has been
changed to reflect the regression fix rather than preparing for
continuous line support.

Assuming that patch is accepted, this one should be dropped.

John Ogness

[0] https://lkml.kernel.org/r/20200720140111.19935-1-john.ogness@linutronix.de

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-20  1:50         ` Sergey Senozhatsky
@ 2020-07-20 18:30           ` Linus Torvalds
  -1 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-20 18:30 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On Sun, Jul 19, 2020 at 6:51 PM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> Do I get it right, what you are saying is - when we process a PR_CONT
> message the cont buffer should already contain previous non-LOG_NEWLINE
> and non-PR_CONT message, otherwise it's a bug?

No.

I'm saying that the code that does PR_CONT should have done *some*
printing before, otherwise it's at the very least questionable.

IOW, you can't just randomly start printing with PR_CONT, without
having established _some_ context for it.

But that context could be a previous newline you created (the PR_CONT
will be a no-op). That's certainly useful for printing a header and
then after that printing possible other complex data that may or may
not have line breaks in it.

So your example looks fine. The context starts out with

    pr_warn("which would create a new lock dependency:\n");

and after that you can use KERN_CONT / pr_cont() as much as you want,
since you've established a context for what you're printing.

And then it ends with 'pr_cont("\n")'.

So anything that was interrupted by this, and uses KERN_CONT /
pr_cont() will have no ambiguous issues. The code you pointed at both
started and ended a line.

That said, we have traditionally used not just "current process", but
also "last irq-level" as the context information, so I do think it
would be good to continue to do that.

At that point, "an interrupt printed something in the middle" isn't
even an issue any more, because it's clear that the context has
changed.

                     Linus

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-20 18:30           ` Linus Torvalds
  0 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-20 18:30 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner

On Sun, Jul 19, 2020 at 6:51 PM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> Do I get it right, what you are saying is - when we process a PR_CONT
> message the cont buffer should already contain previous non-LOG_NEWLINE
> and non-PR_CONT message, otherwise it's a bug?

No.

I'm saying that the code that does PR_CONT should have done *some*
printing before, otherwise it's at the very least questionable.

IOW, you can't just randomly start printing with PR_CONT, without
having established _some_ context for it.

But that context could be a previous newline you created (the PR_CONT
will be a no-op). That's certainly useful for printing a header and
then after that printing possible other complex data that may or may
not have line breaks in it.

So your example looks fine. The context starts out with

    pr_warn("which would create a new lock dependency:\n");

and after that you can use KERN_CONT / pr_cont() as much as you want,
since you've established a context for what you're printing.

And then it ends with 'pr_cont("\n")'.

So anything that was interrupted by this, and uses KERN_CONT /
pr_cont() will have no ambiguous issues. The code you pointed at both
started and ended a line.

That said, we have traditionally used not just "current process", but
also "last irq-level" as the context information, so I do think it
would be good to continue to do that.

At that point, "an interrupt printed something in the middle" isn't
even an issue any more, because it's clear that the context has
changed.

                     Linus

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-20 18:30           ` Linus Torvalds
@ 2020-07-21  3:53             ` Joe Perches
  -1 siblings, 0 replies; 80+ messages in thread
From: Joe Perches @ 2020-07-21  3:53 UTC (permalink / raw)
  To: Linus Torvalds, Sergey Senozhatsky
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On Mon, 2020-07-20 at 11:30 -0700, Linus Torvalds wrote:
> On Sun, Jul 19, 2020 at 6:51 PM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> > Do I get it right, what you are saying is - when we process a PR_CONT
> > message the cont buffer should already contain previous non-LOG_NEWLINE
> > and non-PR_CONT message, otherwise it's a bug?
> 
> No.
> 
> I'm saying that the code that does PR_CONT should have done *some*
> printing before, otherwise it's at the very least questionable.
> 
> IOW, you can't just randomly start printing with PR_CONT, without
> having established _some_ context for it.

I believe there are at least a few cases that
_only_ use pr_cont to emit
complete lines.

For example: SEQ_printf in kernel/sched/debug.c



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-21  3:53             ` Joe Perches
  0 siblings, 0 replies; 80+ messages in thread
From: Joe Perches @ 2020-07-21  3:53 UTC (permalink / raw)
  To: Linus Torvalds, Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner

On Mon, 2020-07-20 at 11:30 -0700, Linus Torvalds wrote:
> On Sun, Jul 19, 2020 at 6:51 PM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> > Do I get it right, what you are saying is - when we process a PR_CONT
> > message the cont buffer should already contain previous non-LOG_NEWLINE
> > and non-PR_CONT message, otherwise it's a bug?
> 
> No.
> 
> I'm saying that the code that does PR_CONT should have done *some*
> printing before, otherwise it's at the very least questionable.
> 
> IOW, you can't just randomly start printing with PR_CONT, without
> having established _some_ context for it.

I believe there are at least a few cases that
_only_ use pr_cont to emit
complete lines.

For example: SEQ_printf in kernel/sched/debug.c



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-20 18:30           ` Linus Torvalds
@ 2020-07-21 14:42             ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-21 14:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On (20/07/20 11:30), Linus Torvalds wrote:
> > Do I get it right, what you are saying is - when we process a PR_CONT
> > message the cont buffer should already contain previous non-LOG_NEWLINE
> > and non-PR_CONT message, otherwise it's a bug?
> 
> No.
> 
> I'm saying that the code that does PR_CONT should have done *some*
> printing before, otherwise it's at the very least questionable.
> 
> IOW, you can't just randomly start printing with PR_CONT, without
> having established _some_ context for it.

OK, I see. I sort of suspect that we may actually have code that does
just pr_cont() (e.g. what Joe pointed out). It doesn't seem like that
"establish a context" was ever enforced, doing a bunch of pr_cont()
simply works.

[..]
> That said, we have traditionally used not just "current process", but
> also "last irq-level" as the context information, so I do think it
> would be good to continue to do that.

OK, so basically, extending printk_caller_id() so that for IRQ/NMI
we will have more info than just "0x80000000 + raw_smp_processor_id()".

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-21 14:42             ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-21 14:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On (20/07/20 11:30), Linus Torvalds wrote:
> > Do I get it right, what you are saying is - when we process a PR_CONT
> > message the cont buffer should already contain previous non-LOG_NEWLINE
> > and non-PR_CONT message, otherwise it's a bug?
> 
> No.
> 
> I'm saying that the code that does PR_CONT should have done *some*
> printing before, otherwise it's at the very least questionable.
> 
> IOW, you can't just randomly start printing with PR_CONT, without
> having established _some_ context for it.

OK, I see. I sort of suspect that we may actually have code that does
just pr_cont() (e.g. what Joe pointed out). It doesn't seem like that
"establish a context" was ever enforced, doing a bunch of pr_cont()
simply works.

[..]
> That said, we have traditionally used not just "current process", but
> also "last irq-level" as the context information, so I do think it
> would be good to continue to do that.

OK, so basically, extending printk_caller_id() so that for IRQ/NMI
we will have more info than just "0x80000000 + raw_smp_processor_id()".

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-21 14:42             ` Sergey Senozhatsky
@ 2020-07-21 14:57               ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-21 14:57 UTC (permalink / raw)
  To: Sergey Senozhatsky, Linus Torvalds
  Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On 2020-07-21, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
>> That said, we have traditionally used not just "current process", but
>> also "last irq-level" as the context information, so I do think it
>> would be good to continue to do that.
>
> OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> we will have more info than just "0x80000000 + raw_smp_processor_id()".

If bit31 is set, the upper 8 bits could specify what the lower 24 bits
represent. That would give some freedom for the future.

For example:

0x80 = cpu id (generic context)
0x81 = interrupt number
0x82 = cpu id (nmi context)

Or maybe ascii should be used instead?

0x80 | '\0' = cpu id (generic context)
0x80 | 'i'  = interrupt number
0x80 | 'n'  = cpu id (nmi context)

Just an idea.

John Ogness

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-21 14:57               ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-07-21 14:57 UTC (permalink / raw)
  To: Sergey Senozhatsky, Linus Torvalds
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On 2020-07-21, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
>> That said, we have traditionally used not just "current process", but
>> also "last irq-level" as the context information, so I do think it
>> would be good to continue to do that.
>
> OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> we will have more info than just "0x80000000 + raw_smp_processor_id()".

If bit31 is set, the upper 8 bits could specify what the lower 24 bits
represent. That would give some freedom for the future.

For example:

0x80 = cpu id (generic context)
0x81 = interrupt number
0x82 = cpu id (nmi context)

Or maybe ascii should be used instead?

0x80 | '\0' = cpu id (generic context)
0x80 | 'i'  = interrupt number
0x80 | 'n'  = cpu id (nmi context)

Just an idea.

John Ogness

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-21 14:42             ` Sergey Senozhatsky
@ 2020-07-21 15:40               ` Linus Torvalds
  -1 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-21 15:40 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On Tue, Jul 21, 2020 at 7:42 AM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> we will have more info than just "0x80000000 + raw_smp_processor_id()".

I think it's really preempt_count() that we want to have there.

That has the softirq/hardirq/NMI information in it.

So the "context" identifier of an incomplete write would be { task,
cpu, preempt_count() } of the writer. If a new KERN_CONT writer comes
in, it would have to match in that context to actually combine.

You can probably play "hide the bits" tricks to not *ac tually* use
three words for it. The task structure in particular tends to be very
aligned, you could hide bits in the low bits there. The CPU number
would fit in there, for example.

             Linus

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-21 15:40               ` Linus Torvalds
  0 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-07-21 15:40 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner

On Tue, Jul 21, 2020 at 7:42 AM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> we will have more info than just "0x80000000 + raw_smp_processor_id()".

I think it's really preempt_count() that we want to have there.

That has the softirq/hardirq/NMI information in it.

So the "context" identifier of an incomplete write would be { task,
cpu, preempt_count() } of the writer. If a new KERN_CONT writer comes
in, it would have to match in that context to actually combine.

You can probably play "hide the bits" tricks to not *ac tually* use
three words for it. The task structure in particular tends to be very
aligned, you could hide bits in the low bits there. The CPU number
would fit in there, for example.

             Linus

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-21 15:40               ` Linus Torvalds
@ 2020-07-28  2:22                 ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-28  2:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, John Ogness, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On (20/07/21 08:40), Linus Torvalds wrote:
> On Tue, Jul 21, 2020 at 7:42 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> >
> > OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> > we will have more info than just "0x80000000 + raw_smp_processor_id()".
> 
> I think it's really preempt_count() that we want to have there.
> 
> That has the softirq/hardirq/NMI information in it.
> 
> So the "context" identifier of an incomplete write would be { task,
> cpu, preempt_count() } of the writer. If a new KERN_CONT writer comes
> in, it would have to match in that context to actually combine.
> 
> You can probably play "hide the bits" tricks to not *ac tually* use
> three words for it. The task structure in particular tends to be very
> aligned, you could hide bits in the low bits there. The CPU number
> would fit in there, for example.

The purpose of callerid has changed. We used to keep it _only_ in
struct printk_log and never used it for anything but

	if (cont.owner == current && (lflags & LOG_CONT))
		...

so it was easy to use task_struct pointers - we only cared if cont.owner
matches current and never dereferenced the cont.owner pointer.

But we do show caller id to users now (CONFIG_PRINTK_CALLER), so it
makes sense to keep there something more or less human readable, it
helps syzkaller/fuzzer people. That's why we keep PID in [30,0] bits
of callerid if the printk was called in_task(), and CPU_ID otherwise.

Replacing easy to understand PID/CPU_ID with some magic hex, e.g.
hash(current) >> shift or hash(smp_processor_id()) >> shift, will
make things less convenient, so I think it is reasonable to preserve
the existing scheme.

I like what John suggests:

- if printk was called from in_task(), then we keep PID in [30,0] bits
- otherwise we keep smp_processor_id() in [27,0] bits and in the upper
  bits we keep the IRQ/NMI/etc flags.

It may make sense to "promote" u32 callerid to u64, just in case if
someone sometime in the future will have boxes with more than several
billion PIDs.

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-28  2:22                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-28  2:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On (20/07/21 08:40), Linus Torvalds wrote:
> On Tue, Jul 21, 2020 at 7:42 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> >
> > OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> > we will have more info than just "0x80000000 + raw_smp_processor_id()".
> 
> I think it's really preempt_count() that we want to have there.
> 
> That has the softirq/hardirq/NMI information in it.
> 
> So the "context" identifier of an incomplete write would be { task,
> cpu, preempt_count() } of the writer. If a new KERN_CONT writer comes
> in, it would have to match in that context to actually combine.
> 
> You can probably play "hide the bits" tricks to not *ac tually* use
> three words for it. The task structure in particular tends to be very
> aligned, you could hide bits in the low bits there. The CPU number
> would fit in there, for example.

The purpose of callerid has changed. We used to keep it _only_ in
struct printk_log and never used it for anything but

	if (cont.owner == current && (lflags & LOG_CONT))
		...

so it was easy to use task_struct pointers - we only cared if cont.owner
matches current and never dereferenced the cont.owner pointer.

But we do show caller id to users now (CONFIG_PRINTK_CALLER), so it
makes sense to keep there something more or less human readable, it
helps syzkaller/fuzzer people. That's why we keep PID in [30,0] bits
of callerid if the printk was called in_task(), and CPU_ID otherwise.

Replacing easy to understand PID/CPU_ID with some magic hex, e.g.
hash(current) >> shift or hash(smp_processor_id()) >> shift, will
make things less convenient, so I think it is reasonable to preserve
the existing scheme.

I like what John suggests:

- if printk was called from in_task(), then we keep PID in [30,0] bits
- otherwise we keep smp_processor_id() in [27,0] bits and in the upper
  bits we keep the IRQ/NMI/etc flags.

It may make sense to "promote" u32 callerid to u64, just in case if
someone sometime in the future will have boxes with more than several
billion PIDs.

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
  2020-07-21 14:57               ` John Ogness
@ 2020-07-29  2:03                 ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-29  2:03 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Linus Torvalds, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt, Greg Kroah-Hartman,
	Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On (20/07/21 17:03), John Ogness wrote:
> On 2020-07-21, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> >> That said, we have traditionally used not just "current process", but
> >> also "last irq-level" as the context information, so I do think it
> >> would be good to continue to do that.
> >
> > OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> > we will have more info than just "0x80000000 + raw_smp_processor_id()".
> 
> If bit31 is set, the upper 8 bits could specify what the lower 24 bits
> represent. That would give some freedom for the future.
> 
> For example:
> 
> 0x80 = cpu id (generic context)
> 0x81 = interrupt number
> 0x82 = cpu id (nmi context)

Looks good to me. I'd probably add in_softirq state. So the contexts
are: in_task, in_softirq, in_irq, in_nmi

For the purpose of future extensions, I would probably consider
u64 for callerid. So that we can store more "state bits" in the future,
like disabled/enabled preemption; disabled/enabled local irqs; etc.

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/4] printk: store instead of processing cont parts
@ 2020-07-29  2:03                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-07-29  2:03 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner,
	Linus Torvalds

On (20/07/21 17:03), John Ogness wrote:
> On 2020-07-21, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> >> That said, we have traditionally used not just "current process", but
> >> also "last irq-level" as the context information, so I do think it
> >> would be good to continue to do that.
> >
> > OK, so basically, extending printk_caller_id() so that for IRQ/NMI
> > we will have more info than just "0x80000000 + raw_smp_processor_id()".
> 
> If bit31 is set, the upper 8 bits could specify what the lower 24 bits
> represent. That would give some freedom for the future.
> 
> For example:
> 
> 0x80 = cpu id (generic context)
> 0x81 = interrupt number
> 0x82 = cpu id (nmi context)

Looks good to me. I'd probably add in_softirq state. So the contexts
are: in_task, in_softirq, in_irq, in_nmi

For the purpose of future extensions, I would probably consider
u64 for callerid. So that we can store more "state bits" in the future,
like disabled/enabled preemption; disabled/enabled local irqs; etc.

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-07-18 14:42     ` John Ogness
@ 2020-08-11 16:05       ` Petr Mladek
  -1 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-11 16:05 UTC (permalink / raw)
  To: John Ogness
  Cc: Linus Torvalds, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Sat 2020-07-18 16:48:55, John Ogness wrote:
> On 2020-07-17, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > Make sure you test the case of "fast concurrent readers". The last
> > time we did things like this, it was a disaster, because a concurrent
> > reader would see and return the _incomplete_ line, and the next entry
> > was still being generated on another CPU.
> >
> > The reader would then decide to return that incomplete line, because
> > it had something.
> >
> > And while in theory this could then be handled properly in user space,
> > in practice it wasn't. So you'd see a lot of logging tools that would
> > then report all those continuations as separate log events.
> >
> > Which is the whole point of LOG_CONT - for that *not* to happen.
> 
> I expect this is handled correctly since the reader is not given any
> parts until a full line is ready, but I will put more focus on testing
> this to make sure. Thanks for the regression and testing tips.

Hmm, the current patchset has different problem. The continuation
pieces are correctly passed as a single lines. But empty line is
printed for each unused sequence number to avoid warnings about
missed messages in journactl. It looks like:

Aug 11 17:08:16 sle15-sp1 kernel: x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
Aug 11 17:08:16 sle15-sp1 kernel: x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
Aug 11 17:08:16 sle15-sp1 kernel: x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
Aug 11 17:08:16 sle15-sp1 kernel: BIOS-provided physical RAM map:
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
[...]
Aug 11 17:08:16 sle15-sp1 kernel: kvm-clock: using sched offset of 85305506196438 cycles
Aug 11 17:08:16 sle15-sp1 kernel: clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 88>
Aug 11 17:08:16 sle15-sp1 kernel: tsc: Detected 2112.000 MHz processor
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
Aug 11 17:08:16 sle15-sp1 kernel: last_pfn = 0x180000 max_arch_pfn = 0x400000000
Aug 11 17:08:16 sle15-sp1 kernel: MTRR default type: write-back
Aug 11 17:08:16 sle15-sp1 kernel: MTRR fixed ranges enabled:
[...]
Aug 11 17:08:16 sle15-sp1 kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519>
Aug 11 17:08:16 sle15-sp1 kernel: setup_percpu: NR_CPUS:8192 nr_cpumask_bits:3 nr_cpu_ids:3 nr_node_ids:1
Aug 11 17:08:16 sle15-sp1 kernel: percpu: Embedded 508 pages/cpu s2043904 r8192 d28672 u2097152
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: pcpu-alloc: s2043904 r8192 d28672 u2097152 alloc=1*2097152
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: pcpu-alloc: [0] 0 [0] 1 [0] 2 
Aug 11 17:08:16 sle15-sp1 kernel: KVM setup async PF for cpu 0
Aug 11 17:08:16 sle15-sp1 kernel: kvm-stealtime: cpu 0, msr 17f9f2080
Aug 11 17:08:16 sle15-sp1 kernel: Built 1 zonelists, mobility grouping on.  Total pages: 1031901
Aug 11 17:08:16 sle15-sp1 kernel: Policy zone: Normal


> > So this is just a heads-up that I will not pull something that breaks
> > LOG_CONT because it thinks "user space can handle it". No. User space
> > does not handle it, and we need to handle it for the user.

I am afraid that this not acceptable for Linus either because people
will report bugs and complain about regression.

Slightly better solution is to skip the unused sequence numbers. It
looks good in both dmesg/journalctl by default. Note that journalctl
shows the warnings about missing lines "only" with "-a" option.
Sigh, but I am afraid that this is still not acceptable.


Hmm, I can't find any simple or even working solution for maintaining
a monotonic sequence number a lockless way that would be the same
for all stored pieces.

I am afraid that the only working solution is to store all pieces
in a single lockless transaction. I think that John already
proposed using 2nd small lockless buffer for this. The problem
might be how to synchronize flushing the pieces into the final
buffer.

Another solution would be to use separate buffer for each context
and CPU. The problem is a missing final "\n". It might cause
that a buffer is not flushed for a long time until another
message is printed in the same context on the same CPU.

The 2nd small logbuffer looks like a better choice if we
are able to solve the lockless flush.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-11 16:05       ` Petr Mladek
  0 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-11 16:05 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	Linux Kernel Mailing List, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On Sat 2020-07-18 16:48:55, John Ogness wrote:
> On 2020-07-17, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > Make sure you test the case of "fast concurrent readers". The last
> > time we did things like this, it was a disaster, because a concurrent
> > reader would see and return the _incomplete_ line, and the next entry
> > was still being generated on another CPU.
> >
> > The reader would then decide to return that incomplete line, because
> > it had something.
> >
> > And while in theory this could then be handled properly in user space,
> > in practice it wasn't. So you'd see a lot of logging tools that would
> > then report all those continuations as separate log events.
> >
> > Which is the whole point of LOG_CONT - for that *not* to happen.
> 
> I expect this is handled correctly since the reader is not given any
> parts until a full line is ready, but I will put more focus on testing
> this to make sure. Thanks for the regression and testing tips.

Hmm, the current patchset has different problem. The continuation
pieces are correctly passed as a single lines. But empty line is
printed for each unused sequence number to avoid warnings about
missed messages in journactl. It looks like:

Aug 11 17:08:16 sle15-sp1 kernel: x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
Aug 11 17:08:16 sle15-sp1 kernel: x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
Aug 11 17:08:16 sle15-sp1 kernel: x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
Aug 11 17:08:16 sle15-sp1 kernel: BIOS-provided physical RAM map:
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
[...]
Aug 11 17:08:16 sle15-sp1 kernel: kvm-clock: using sched offset of 85305506196438 cycles
Aug 11 17:08:16 sle15-sp1 kernel: clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 88>
Aug 11 17:08:16 sle15-sp1 kernel: tsc: Detected 2112.000 MHz processor
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
Aug 11 17:08:16 sle15-sp1 kernel: last_pfn = 0x180000 max_arch_pfn = 0x400000000
Aug 11 17:08:16 sle15-sp1 kernel: MTRR default type: write-back
Aug 11 17:08:16 sle15-sp1 kernel: MTRR fixed ranges enabled:
[...]
Aug 11 17:08:16 sle15-sp1 kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519>
Aug 11 17:08:16 sle15-sp1 kernel: setup_percpu: NR_CPUS:8192 nr_cpumask_bits:3 nr_cpu_ids:3 nr_node_ids:1
Aug 11 17:08:16 sle15-sp1 kernel: percpu: Embedded 508 pages/cpu s2043904 r8192 d28672 u2097152
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: pcpu-alloc: s2043904 r8192 d28672 u2097152 alloc=1*2097152
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: 
Aug 11 17:08:16 sle15-sp1 kernel: pcpu-alloc: [0] 0 [0] 1 [0] 2 
Aug 11 17:08:16 sle15-sp1 kernel: KVM setup async PF for cpu 0
Aug 11 17:08:16 sle15-sp1 kernel: kvm-stealtime: cpu 0, msr 17f9f2080
Aug 11 17:08:16 sle15-sp1 kernel: Built 1 zonelists, mobility grouping on.  Total pages: 1031901
Aug 11 17:08:16 sle15-sp1 kernel: Policy zone: Normal


> > So this is just a heads-up that I will not pull something that breaks
> > LOG_CONT because it thinks "user space can handle it". No. User space
> > does not handle it, and we need to handle it for the user.

I am afraid that this not acceptable for Linus either because people
will report bugs and complain about regression.

Slightly better solution is to skip the unused sequence numbers. It
looks good in both dmesg/journalctl by default. Note that journalctl
shows the warnings about missing lines "only" with "-a" option.
Sigh, but I am afraid that this is still not acceptable.


Hmm, I can't find any simple or even working solution for maintaining
a monotonic sequence number a lockless way that would be the same
for all stored pieces.

I am afraid that the only working solution is to store all pieces
in a single lockless transaction. I think that John already
proposed using 2nd small lockless buffer for this. The problem
might be how to synchronize flushing the pieces into the final
buffer.

Another solution would be to use separate buffer for each context
and CPU. The problem is a missing final "\n". It might cause
that a buffer is not flushed for a long time until another
message is printed in the same context on the same CPU.

The 2nd small logbuffer looks like a better choice if we
are able to solve the lockless flush.

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-11 16:05       ` Petr Mladek
@ 2020-08-12 16:39         ` Petr Mladek
  -1 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-12 16:39 UTC (permalink / raw)
  To: John Ogness
  Cc: Linus Torvalds, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Tue 2020-08-11 18:05:51, Petr Mladek wrote:
> On Sat 2020-07-18 16:48:55, John Ogness wrote:
> > On 2020-07-17, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > Make sure you test the case of "fast concurrent readers". The last
> > > time we did things like this, it was a disaster, because a concurrent
> > > reader would see and return the _incomplete_ line, and the next entry
> > > was still being generated on another CPU.
> > >
> > > The reader would then decide to return that incomplete line, because
> > > it had something.
> > >
> > > And while in theory this could then be handled properly in user space,
> > > in practice it wasn't. So you'd see a lot of logging tools that would
> > > then report all those continuations as separate log events.
> > >
> > > Which is the whole point of LOG_CONT - for that *not* to happen.
> > 
> > I expect this is handled correctly since the reader is not given any
> > parts until a full line is ready, but I will put more focus on testing
> > this to make sure. Thanks for the regression and testing tips.
> 
> Hmm, the current patchset has different problem. The continuation
> pieces are correctly passed as a single lines. But empty line is
> printed for each unused sequence number to avoid warnings about
> missed messages in journactl. It looks like:
> 
> I am afraid that the only working solution is to store all pieces
> in a single lockless transaction. I think that John already
> proposed using 2nd small lockless buffer for this. The problem
> might be how to synchronize flushing the pieces into the final
> buffer.

Do not panic! It might look scary. But I am less scared
after I wrote some pieces of the pseudo code.

So, I have one crazy idea to add one more state bit so that we
could have:

  + committed: set when the data are written into the data ring.
  + final: set when the data block could not longer get reopened
  + reuse: set when the desctiptor/data block could get reused


"final" bit will define when the descriptor could not longer
get reopened (cleared committed bit) and the data block could
not get extended.

The logic would be the following:

bool prb_reserve() {

	desc = try_reopen_desc(seq);
	if (desc) {
		text_buf = data_alloc_continuous();
		if (text_buf)
			goto success;
		else
			/* commit the reopened desc back again */
			prb_commit(desc);
	}

	/* Otherwise, do as before */
	desc = desc_reserve();
	if (!desc)
		goto fail;

	text_buf = data_alloc();
	...

where:

static struct prb_desc *try_reopen_desc(seq)
{
	struct prb_desc *desc;

	enum desc_state d_state;
	struct prb_desc desc;

	d_state = desc_read(desc_ring, seq, &desc);
	if (d_state != committed_and_not_finalized)
		return NULL;

	if (!is_same_context(desc))
		return NULL;

	/* try to reopen only when the state is still the same */
	if(!atomic_long_cmpxchg_relaxed(state_var,
					val_committed_and_not_finished,
					val_reserved))
		return NULL;

	return desc;
}

static char *data_alloc_continuous()
{
	/*
	 * Same as data_alloc() with one added parameter:
	 * unsigned long requested_begin_lpos;
	 */

	begin_lpos = atomic_long_read(&data_ring->head_lpos);

	do {
		if (begin_lpos != requested_begin_lpos)
			return NULL;

	... same as before

	} while (!atomic_long_try_cmpxchg(&data_ring->head_lpos, &begin_lpos,
					  next_lpos)); /* LMM(data_alloc:A) */

	if (requested_begin_lpos) {
		/* only update tail lpos */
		blk_lpos->next = next_lpos;
		/* return pointer to the new data space */
		return &blk->data[0];
	}

	/* For completely new block do everything as before */
	blk = to_block(data_ring, begin_lpos);
	blk->id = id; /* LMM(data_alloc:B) */
	...
}

void prb_commit_and_finalize()
{
	/* Same as prb_commit() + it will set also 'final' bit */
}


Addintional changes in the code:

+ desc_resrved() will also set 'final' bit in the previous
  descriptor so that the descriptor could not longer get reopended
  once committed.

+ prb_commit_and_finalize() will be called instead of prb_commit()
  when the message ends with '\n'.

+ prb_read() will allow to read the data only when
  the state is "committed_and_finalized".

+ desc_make_reusable() can be called only when the desciptor
  is in "commited_and_finalized" state.


I am not sure if it is everything. Also it might need some code
refactoring.

But it looks like it might work. And it should not require new barriers.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-12 16:39         ` Petr Mladek
  0 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-12 16:39 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	Linux Kernel Mailing List, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On Tue 2020-08-11 18:05:51, Petr Mladek wrote:
> On Sat 2020-07-18 16:48:55, John Ogness wrote:
> > On 2020-07-17, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > Make sure you test the case of "fast concurrent readers". The last
> > > time we did things like this, it was a disaster, because a concurrent
> > > reader would see and return the _incomplete_ line, and the next entry
> > > was still being generated on another CPU.
> > >
> > > The reader would then decide to return that incomplete line, because
> > > it had something.
> > >
> > > And while in theory this could then be handled properly in user space,
> > > in practice it wasn't. So you'd see a lot of logging tools that would
> > > then report all those continuations as separate log events.
> > >
> > > Which is the whole point of LOG_CONT - for that *not* to happen.
> > 
> > I expect this is handled correctly since the reader is not given any
> > parts until a full line is ready, but I will put more focus on testing
> > this to make sure. Thanks for the regression and testing tips.
> 
> Hmm, the current patchset has different problem. The continuation
> pieces are correctly passed as a single lines. But empty line is
> printed for each unused sequence number to avoid warnings about
> missed messages in journactl. It looks like:
> 
> I am afraid that the only working solution is to store all pieces
> in a single lockless transaction. I think that John already
> proposed using 2nd small lockless buffer for this. The problem
> might be how to synchronize flushing the pieces into the final
> buffer.

Do not panic! It might look scary. But I am less scared
after I wrote some pieces of the pseudo code.

So, I have one crazy idea to add one more state bit so that we
could have:

  + committed: set when the data are written into the data ring.
  + final: set when the data block could not longer get reopened
  + reuse: set when the desctiptor/data block could get reused


"final" bit will define when the descriptor could not longer
get reopened (cleared committed bit) and the data block could
not get extended.

The logic would be the following:

bool prb_reserve() {

	desc = try_reopen_desc(seq);
	if (desc) {
		text_buf = data_alloc_continuous();
		if (text_buf)
			goto success;
		else
			/* commit the reopened desc back again */
			prb_commit(desc);
	}

	/* Otherwise, do as before */
	desc = desc_reserve();
	if (!desc)
		goto fail;

	text_buf = data_alloc();
	...

where:

static struct prb_desc *try_reopen_desc(seq)
{
	struct prb_desc *desc;

	enum desc_state d_state;
	struct prb_desc desc;

	d_state = desc_read(desc_ring, seq, &desc);
	if (d_state != committed_and_not_finalized)
		return NULL;

	if (!is_same_context(desc))
		return NULL;

	/* try to reopen only when the state is still the same */
	if(!atomic_long_cmpxchg_relaxed(state_var,
					val_committed_and_not_finished,
					val_reserved))
		return NULL;

	return desc;
}

static char *data_alloc_continuous()
{
	/*
	 * Same as data_alloc() with one added parameter:
	 * unsigned long requested_begin_lpos;
	 */

	begin_lpos = atomic_long_read(&data_ring->head_lpos);

	do {
		if (begin_lpos != requested_begin_lpos)
			return NULL;

	... same as before

	} while (!atomic_long_try_cmpxchg(&data_ring->head_lpos, &begin_lpos,
					  next_lpos)); /* LMM(data_alloc:A) */

	if (requested_begin_lpos) {
		/* only update tail lpos */
		blk_lpos->next = next_lpos;
		/* return pointer to the new data space */
		return &blk->data[0];
	}

	/* For completely new block do everything as before */
	blk = to_block(data_ring, begin_lpos);
	blk->id = id; /* LMM(data_alloc:B) */
	...
}

void prb_commit_and_finalize()
{
	/* Same as prb_commit() + it will set also 'final' bit */
}


Addintional changes in the code:

+ desc_resrved() will also set 'final' bit in the previous
  descriptor so that the descriptor could not longer get reopended
  once committed.

+ prb_commit_and_finalize() will be called instead of prb_commit()
  when the message ends with '\n'.

+ prb_read() will allow to read the data only when
  the state is "committed_and_finalized".

+ desc_make_reusable() can be called only when the desciptor
  is in "commited_and_finalized" state.


I am not sure if it is everything. Also it might need some code
refactoring.

But it looks like it might work. And it should not require new barriers.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-12 16:39         ` Petr Mladek
@ 2020-08-13  0:24           ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-08-13  0:24 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Linus Torvalds, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On 2020-08-12, Petr Mladek <pmladek@suse.com> wrote:
> So, I have one crazy idea to add one more state bit so that we
> could have:
>
>   + committed: set when the data are written into the data ring.
>   + final: set when the data block could not longer get reopened
>   + reuse: set when the desctiptor/data block could get reused
>
> "final" bit will define when the descriptor could not longer
> get reopened (cleared committed bit) and the data block could
> not get extended.

I had not thought of extending data blocks. That is clever!

I implemented this solution for myself and am currently running more
tests. Some things that I changed from your suggestion:

1. I created a separate prb_reserve_cont() function. The reason for this
is because the caller needs to understand what is happening. The caller
is getting an existing record with existing data and must append new
data. The @text_len field of the info reports how long the existing data
is. So the LOG_CONT handling code in printk.c looks something like this:

        if (lflags & LOG_CONT) {
                struct prb_reserved_entry e;
                struct printk_record r;

                prb_rec_init_wr(&r, text_len, 0);

                if (prb_reserve_cont(&e, prb, &r, caller_id)) {
                        memcpy(&r.text_buf[r.info->text_len], text, text_len);
                        r.info->text_len += text_len;

                        if (lflags & LOG_NEWLINE)
                                r.info->flags |= LOG_NEWLINE;

                        if (r.info->flags & LOG_NEWLINE)
                                prb_commit_finalize(&e);
                        else
                                prb_commit(&e);

                        return text_len;
                }
        }

This seemed simpler than trying to extend prb_reserve() to secretly
support LOG_CONT records.

2. I haven't yet figured out how to preserve calling context when a
newline appears. For example:

pr_info("text");
pr_cont(" 1");
pr_cont(" 2\n");
pr_cont("3");
pr_cont(" 4\n");

For "3" the calling context (info, timestamp) is lost because with "2"
the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

3. There are some memory barriers introduced, but it looks like it
shouldn't add too much complexity.

I will continue to refine my working version and post a patch so that we
have something to work with. This looks to be the most promising way
forward. Thanks.

John Ogness

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13  0:24           ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-08-13  0:24 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	Linux Kernel Mailing List, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On 2020-08-12, Petr Mladek <pmladek@suse.com> wrote:
> So, I have one crazy idea to add one more state bit so that we
> could have:
>
>   + committed: set when the data are written into the data ring.
>   + final: set when the data block could not longer get reopened
>   + reuse: set when the desctiptor/data block could get reused
>
> "final" bit will define when the descriptor could not longer
> get reopened (cleared committed bit) and the data block could
> not get extended.

I had not thought of extending data blocks. That is clever!

I implemented this solution for myself and am currently running more
tests. Some things that I changed from your suggestion:

1. I created a separate prb_reserve_cont() function. The reason for this
is because the caller needs to understand what is happening. The caller
is getting an existing record with existing data and must append new
data. The @text_len field of the info reports how long the existing data
is. So the LOG_CONT handling code in printk.c looks something like this:

        if (lflags & LOG_CONT) {
                struct prb_reserved_entry e;
                struct printk_record r;

                prb_rec_init_wr(&r, text_len, 0);

                if (prb_reserve_cont(&e, prb, &r, caller_id)) {
                        memcpy(&r.text_buf[r.info->text_len], text, text_len);
                        r.info->text_len += text_len;

                        if (lflags & LOG_NEWLINE)
                                r.info->flags |= LOG_NEWLINE;

                        if (r.info->flags & LOG_NEWLINE)
                                prb_commit_finalize(&e);
                        else
                                prb_commit(&e);

                        return text_len;
                }
        }

This seemed simpler than trying to extend prb_reserve() to secretly
support LOG_CONT records.

2. I haven't yet figured out how to preserve calling context when a
newline appears. For example:

pr_info("text");
pr_cont(" 1");
pr_cont(" 2\n");
pr_cont("3");
pr_cont(" 4\n");

For "3" the calling context (info, timestamp) is lost because with "2"
the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

3. There are some memory barriers introduced, but it looks like it
shouldn't add too much complexity.

I will continue to refine my working version and post a patch so that we
have something to work with. This looks to be the most promising way
forward. Thanks.

John Ogness

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13  0:24           ` John Ogness
@ 2020-08-13  5:18             ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-08-13  5:18 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Linus Torvalds, Sergey Senozhatsky,
	Sergey Senozhatsky, Steven Rostedt, Greg Kroah-Hartman,
	Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On (20/08/13 02:30), John Ogness wrote:
> 2. I haven't yet figured out how to preserve calling context when a
> newline appears. For example:
> 
> pr_info("text");
> pr_cont(" 1");
> pr_cont(" 2\n");
> pr_cont("3");
> pr_cont(" 4\n");
>
> For "3" the calling context (info, timestamp) is lost because with "2"
> the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

This is not an unseen pattern, I'm afraid. And the problem here can
be more general:

	pr_info("text");
	pr_cont("1");
	exception/IRQ/NMI
		pr_alert("text\n");
	pr_cont("2");
	pr_cont("\n");

I guess the solution would be to store "last log_level" in task_struct
and get current (new) timestamp for broken cont line?

We have this problem now. E.g. early boot messages from one of my boxes:

6,173,41837,-;x86: Booting SMP configuration:
6,174,41838,-;.... node  #0, CPUs:      #1 #2 #3 #4
4,175,44993,-;MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/...
4,176,44993,c; #5 #6 #7

"CPUs: #1 #2 #3 #4 #5 #6 #7" is supposed to be one cont line with
loglevel 6. But it gets interrupted and flushed so that "#5 #6 #7"
appears with the different loglevel.

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13  5:18             ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-08-13  5:18 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner,
	Linus Torvalds

On (20/08/13 02:30), John Ogness wrote:
> 2. I haven't yet figured out how to preserve calling context when a
> newline appears. For example:
> 
> pr_info("text");
> pr_cont(" 1");
> pr_cont(" 2\n");
> pr_cont("3");
> pr_cont(" 4\n");
>
> For "3" the calling context (info, timestamp) is lost because with "2"
> the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

This is not an unseen pattern, I'm afraid. And the problem here can
be more general:

	pr_info("text");
	pr_cont("1");
	exception/IRQ/NMI
		pr_alert("text\n");
	pr_cont("2");
	pr_cont("\n");

I guess the solution would be to store "last log_level" in task_struct
and get current (new) timestamp for broken cont line?

We have this problem now. E.g. early boot messages from one of my boxes:

6,173,41837,-;x86: Booting SMP configuration:
6,174,41838,-;.... node  #0, CPUs:      #1 #2 #3 #4
4,175,44993,-;MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/...
4,176,44993,c; #5 #6 #7

"CPUs: #1 #2 #3 #4 #5 #6 #7" is supposed to be one cont line with
loglevel 6. But it gets interrupted and flushed so that "#5 #6 #7"
appears with the different loglevel.

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13  0:24           ` John Ogness
@ 2020-08-13  7:31             ` Petr Mladek
  -1 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-13  7:31 UTC (permalink / raw)
  To: John Ogness
  Cc: Linus Torvalds, Sergey Senozhatsky, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Thu 2020-08-13 02:30:02, John Ogness wrote:
> On 2020-08-12, Petr Mladek <pmladek@suse.com> wrote:
> > So, I have one crazy idea to add one more state bit so that we
> > could have:
> >
> >   + committed: set when the data are written into the data ring.
> >   + final: set when the data block could not longer get reopened
> >   + reuse: set when the desctiptor/data block could get reused
> >
> > "final" bit will define when the descriptor could not longer
> > get reopened (cleared committed bit) and the data block could
> > not get extended.
> 
> I implemented this solution for myself and am currently running more
> tests. Some things that I changed from your suggestion:
> 
> 1. I created a separate prb_reserve_cont() function. The reason for this
> is because the caller needs to understand what is happening. The caller
> is getting an existing record with existing data and must append new
> data. The @text_len field of the info reports how long the existing data
> is.

I see.

> So the LOG_CONT handling code in printk.c looks something like this:

Yeah, it makes sense.


> 2. I haven't yet figured out how to preserve calling context when a
> newline appears. For example:
> 
> pr_info("text");
> pr_cont(" 1");
> pr_cont(" 2\n");
> pr_cont("3");
> pr_cont(" 4\n");
> 
> For "3" the calling context (info, timestamp) is lost because with "2"
> the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

I am going to answer Sergey's reply.


> 3. There are some memory barriers introduced, but it looks like it
> shouldn't add too much complexity.

Uff. I have hooped for this.

> I will continue to refine my working version and post a patch so that we
> have something to work with. This looks to be the most promising way
> forward. Thanks.

Uff, I am happy that it seems working. Several other approaches looked
much more complicated or they caused regressions.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13  7:31             ` Petr Mladek
  0 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-13  7:31 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	Linux Kernel Mailing List, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On Thu 2020-08-13 02:30:02, John Ogness wrote:
> On 2020-08-12, Petr Mladek <pmladek@suse.com> wrote:
> > So, I have one crazy idea to add one more state bit so that we
> > could have:
> >
> >   + committed: set when the data are written into the data ring.
> >   + final: set when the data block could not longer get reopened
> >   + reuse: set when the desctiptor/data block could get reused
> >
> > "final" bit will define when the descriptor could not longer
> > get reopened (cleared committed bit) and the data block could
> > not get extended.
> 
> I implemented this solution for myself and am currently running more
> tests. Some things that I changed from your suggestion:
> 
> 1. I created a separate prb_reserve_cont() function. The reason for this
> is because the caller needs to understand what is happening. The caller
> is getting an existing record with existing data and must append new
> data. The @text_len field of the info reports how long the existing data
> is.

I see.

> So the LOG_CONT handling code in printk.c looks something like this:

Yeah, it makes sense.


> 2. I haven't yet figured out how to preserve calling context when a
> newline appears. For example:
> 
> pr_info("text");
> pr_cont(" 1");
> pr_cont(" 2\n");
> pr_cont("3");
> pr_cont(" 4\n");
> 
> For "3" the calling context (info, timestamp) is lost because with "2"
> the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

I am going to answer Sergey's reply.


> 3. There are some memory barriers introduced, but it looks like it
> shouldn't add too much complexity.

Uff. I have hooped for this.

> I will continue to refine my working version and post a patch so that we
> have something to work with. This looks to be the most promising way
> forward. Thanks.

Uff, I am happy that it seems working. Several other approaches looked
much more complicated or they caused regressions.

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13  5:18             ` Sergey Senozhatsky
@ 2020-08-13  7:44               ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-08-13  7:44 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Linus Torvalds, Sergey Senozhatsky,
	Sergey Senozhatsky, Steven Rostedt, Greg Kroah-Hartman,
	Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> This is not an unseen pattern, I'm afraid. And the problem here can
> be more general:
>
> 	pr_info("text");
> 	pr_cont("1");
> 	exception/IRQ/NMI
> 		pr_alert("text\n");
> 	pr_cont("2");
> 	pr_cont("\n");
>
> I guess the solution would be to store "last log_level" in task_struct
> and get current (new) timestamp for broken cont line?

(Warning: new ideas ahead)

The fundamental problem is that there is no real association between
the cont parts. So any interruption results in a broken record. If we
really want to do this correctly, we need real association.

With the new finalize flag for records, I thought about perhaps adding
support for chaining data blocks.

A data block currently stores an unsigned long for the ID of the
associated descriptor. But it could optionally include a second unsigned
long, which is the lpos of the next text part. All the data blocks of a
chain would point back to the same descriptor. The descriptor would only
point to the first data block of the chain and include a flag that it is
using chained data blocks.

Then we would only need to track the sequence number of the open record
and new data blocks could be added to the data block chain of the
correct record. Readers cannot see the record until it is finalized.

Also, since only finalized records can be invalidated, there are no
races of chains becoming invalidated while being appended.

My concerns about this idea:

- What if the printk user does not correctly terminate the cont message?
  There is no mechanism to allow that open record to be force-finalized
  so that readers can read newer records.

- For tasks, the sequence number of the open record could be stored on
  the task_struct. For non-tasks, we could use a global per-cpu variable
  where each CPU stores 2 sequence numbers: the sequence number of the
  open record for the non-task and the sequence number of the open
  record for an interrupting NMI. Is that sufficient?

John Ogness

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13  7:44               ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-08-13  7:44 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner,
	Linus Torvalds

On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> This is not an unseen pattern, I'm afraid. And the problem here can
> be more general:
>
> 	pr_info("text");
> 	pr_cont("1");
> 	exception/IRQ/NMI
> 		pr_alert("text\n");
> 	pr_cont("2");
> 	pr_cont("\n");
>
> I guess the solution would be to store "last log_level" in task_struct
> and get current (new) timestamp for broken cont line?

(Warning: new ideas ahead)

The fundamental problem is that there is no real association between
the cont parts. So any interruption results in a broken record. If we
really want to do this correctly, we need real association.

With the new finalize flag for records, I thought about perhaps adding
support for chaining data blocks.

A data block currently stores an unsigned long for the ID of the
associated descriptor. But it could optionally include a second unsigned
long, which is the lpos of the next text part. All the data blocks of a
chain would point back to the same descriptor. The descriptor would only
point to the first data block of the chain and include a flag that it is
using chained data blocks.

Then we would only need to track the sequence number of the open record
and new data blocks could be added to the data block chain of the
correct record. Readers cannot see the record until it is finalized.

Also, since only finalized records can be invalidated, there are no
races of chains becoming invalidated while being appended.

My concerns about this idea:

- What if the printk user does not correctly terminate the cont message?
  There is no mechanism to allow that open record to be force-finalized
  so that readers can read newer records.

- For tasks, the sequence number of the open record could be stored on
  the task_struct. For non-tasks, we could use a global per-cpu variable
  where each CPU stores 2 sequence numbers: the sequence number of the
  open record for the non-task and the sequence number of the open
  record for an interrupting NMI. Is that sufficient?

John Ogness

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13  5:18             ` Sergey Senozhatsky
@ 2020-08-13  7:51               ` Petr Mladek
  -1 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-13  7:51 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: John Ogness, Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On Thu 2020-08-13 14:18:53, Sergey Senozhatsky wrote:
> On (20/08/13 02:30), John Ogness wrote:
> > 2. I haven't yet figured out how to preserve calling context when a
> > newline appears. For example:
> > 
> > pr_info("text");
> > pr_cont(" 1");
> > pr_cont(" 2\n");
> > pr_cont("3");
> > pr_cont(" 4\n");
> >
> > For "3" the calling context (info, timestamp) is lost because with "2"
> > the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

If I get it correctly, the original code has the same problem.

The cont buffer is flushed when the cont piece ends with newline:

static bool cont_add(u32 caller_id, int facility, int level,
		     enum log_flags flags, const char *text, size_t len)
{
   [...]

	// The original flags come from the first line,
	// but later continuations can add a newline.
	if (flags & LOG_NEWLINE) {
		cont.flags |= LOG_NEWLINE;
		cont_flush();
	}

	return true;
}

cont_flush sets	cont.len = 0;

static void cont_flush(void)
{
	[...]
	cont.len = 0;
}


The messages is appended only when cont.len != 0 in log_output:

static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len)
{
	const u32 caller_id = printk_caller_id();

	/*
	 * If an earlier line was buffered, and we're a continuation
	 * write from the same context, try to add it to the buffer.
	 */
	if (cont.len) {
		if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
			if (cont_add(caller_id, facility, level, lflags, text, text_len))
				return text_len;
		}

      [...]
}


Also the original context is overridden when the cont buffer is empty:

static bool cont_add(u32 caller_id, int facility, int level,
		     enum log_flags flags, const char *text, size_t len)
{
	[...]

	if (!cont.len) {
		cont.facility = facility;
		cont.level = level;
		cont.caller_id = caller_id;
		cont.ts_nsec = local_clock();
		cont.flags = flags;
	}

	[...]
}

So I would ignore this problem for now.


> This is not an unseen pattern, I'm afraid. And the problem here can
> be more general:
> 
> 	pr_info("text");
> 	pr_cont("1");
> 	exception/IRQ/NMI
> 		pr_alert("text\n");
> 	pr_cont("2");
> 	pr_cont("\n");
> 

Good point.


> I guess the solution would be to store "last log_level" in task_struct
> and get current (new) timestamp for broken cont line?

I think about storing the context in per-CPU and per-context array.
It should be more memory efficient than task_struct and it should
solve even the above problem.


> We have this problem now. E.g. early boot messages from one of my boxes:
> 
> 6,173,41837,-;x86: Booting SMP configuration:
> 6,174,41838,-;.... node  #0, CPUs:      #1 #2 #3 #4
> 4,175,44993,-;MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/...
> 4,176,44993,c; #5 #6 #7
> 
> "CPUs: #1 #2 #3 #4 #5 #6 #7" is supposed to be one cont line with
> loglevel 6. But it gets interrupted and flushed so that "#5 #6 #7"
> appears with the different loglevel.

Nice example. It would be nice to fix this. But it should be done
separately.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13  7:51               ` Petr Mladek
  0 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-13  7:51 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner, Linus Torvalds

On Thu 2020-08-13 14:18:53, Sergey Senozhatsky wrote:
> On (20/08/13 02:30), John Ogness wrote:
> > 2. I haven't yet figured out how to preserve calling context when a
> > newline appears. For example:
> > 
> > pr_info("text");
> > pr_cont(" 1");
> > pr_cont(" 2\n");
> > pr_cont("3");
> > pr_cont(" 4\n");
> >
> > For "3" the calling context (info, timestamp) is lost because with "2"
> > the record is finalized. Perhaps the above is invalid usage of LOG_CONT?

If I get it correctly, the original code has the same problem.

The cont buffer is flushed when the cont piece ends with newline:

static bool cont_add(u32 caller_id, int facility, int level,
		     enum log_flags flags, const char *text, size_t len)
{
   [...]

	// The original flags come from the first line,
	// but later continuations can add a newline.
	if (flags & LOG_NEWLINE) {
		cont.flags |= LOG_NEWLINE;
		cont_flush();
	}

	return true;
}

cont_flush sets	cont.len = 0;

static void cont_flush(void)
{
	[...]
	cont.len = 0;
}


The messages is appended only when cont.len != 0 in log_output:

static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len)
{
	const u32 caller_id = printk_caller_id();

	/*
	 * If an earlier line was buffered, and we're a continuation
	 * write from the same context, try to add it to the buffer.
	 */
	if (cont.len) {
		if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
			if (cont_add(caller_id, facility, level, lflags, text, text_len))
				return text_len;
		}

      [...]
}


Also the original context is overridden when the cont buffer is empty:

static bool cont_add(u32 caller_id, int facility, int level,
		     enum log_flags flags, const char *text, size_t len)
{
	[...]

	if (!cont.len) {
		cont.facility = facility;
		cont.level = level;
		cont.caller_id = caller_id;
		cont.ts_nsec = local_clock();
		cont.flags = flags;
	}

	[...]
}

So I would ignore this problem for now.


> This is not an unseen pattern, I'm afraid. And the problem here can
> be more general:
> 
> 	pr_info("text");
> 	pr_cont("1");
> 	exception/IRQ/NMI
> 		pr_alert("text\n");
> 	pr_cont("2");
> 	pr_cont("\n");
> 

Good point.


> I guess the solution would be to store "last log_level" in task_struct
> and get current (new) timestamp for broken cont line?

I think about storing the context in per-CPU and per-context array.
It should be more memory efficient than task_struct and it should
solve even the above problem.


> We have this problem now. E.g. early boot messages from one of my boxes:
> 
> 6,173,41837,-;x86: Booting SMP configuration:
> 6,174,41838,-;.... node  #0, CPUs:      #1 #2 #3 #4
> 4,175,44993,-;MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/...
> 4,176,44993,c; #5 #6 #7
> 
> "CPUs: #1 #2 #3 #4 #5 #6 #7" is supposed to be one cont line with
> loglevel 6. But it gets interrupted and flushed so that "#5 #6 #7"
> appears with the different loglevel.

Nice example. It would be nice to fix this. But it should be done
separately.

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13  7:44               ` John Ogness
@ 2020-08-13  8:41                 ` Petr Mladek
  -1 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-13  8:41 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Linus Torvalds, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Thu 2020-08-13 09:50:25, John Ogness wrote:
> On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> > This is not an unseen pattern, I'm afraid. And the problem here can
> > be more general:
> >
> > 	pr_info("text");
> > 	pr_cont("1");
> > 	exception/IRQ/NMI
> > 		pr_alert("text\n");
> > 	pr_cont("2");
> > 	pr_cont("\n");
> >
> > I guess the solution would be to store "last log_level" in task_struct
> > and get current (new) timestamp for broken cont line?
> 
> (Warning: new ideas ahead)
> 
> The fundamental problem is that there is no real association between
> the cont parts. So any interruption results in a broken record. If we
> really want to do this correctly, we need real association.
> 
> With the new finalize flag for records, I thought about perhaps adding
> support for chaining data blocks.
> 
> A data block currently stores an unsigned long for the ID of the
> associated descriptor. But it could optionally include a second unsigned
> long, which is the lpos of the next text part. All the data blocks of a
> chain would point back to the same descriptor. The descriptor would only
> point to the first data block of the chain and include a flag that it is
> using chained data blocks.
>
> Then we would only need to track the sequence number of the open record
> and new data blocks could be added to the data block chain of the
> correct record. Readers cannot see the record until it is finalized.

Note that we still must try to append the continuous piece into the
same data block. We could not concatenate them in the reader API
because it would create the problem with unused/skipped sequence numbers.

Also it would complicate the reader code. It would need to know
whether a record has already been printed together with a previous
one.


> Also, since only finalized records can be invalidated, there are no
> races of chains becoming invalidated while being appended.
> 
> My concerns about this idea:
> 
> - What if the printk user does not correctly terminate the cont message?
>   There is no mechanism to allow that open record to be force-finalized
>   so that readers can read newer records.

This is a real problem. And it is the reason why the cont buffer is
currently flushed (finalized) by the next message from another context.


> - For tasks, the sequence number of the open record could be stored on
>   the task_struct. For non-tasks, we could use a global per-cpu variable
>   where each CPU stores 2 sequence numbers: the sequence number of the
>   open record for the non-task and the sequence number of the open
>   record for an interrupting NMI. Is that sufficient?

Yeah, it would be possible.

Anyway, this would fix an already existing problem. It might get
complicated and tricky. I am afraid that it comes from the "perfection
is the enemy of good" department.

A good compromise might be to just store "loglevel" from the previous
message in the given context. It could then be used for pr_cont()
messages in the same context.

Anyway, I would solve this later because it is already existing
problem.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13  8:41                 ` Petr Mladek
  0 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-13  8:41 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	Linux Kernel Mailing List, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On Thu 2020-08-13 09:50:25, John Ogness wrote:
> On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> > This is not an unseen pattern, I'm afraid. And the problem here can
> > be more general:
> >
> > 	pr_info("text");
> > 	pr_cont("1");
> > 	exception/IRQ/NMI
> > 		pr_alert("text\n");
> > 	pr_cont("2");
> > 	pr_cont("\n");
> >
> > I guess the solution would be to store "last log_level" in task_struct
> > and get current (new) timestamp for broken cont line?
> 
> (Warning: new ideas ahead)
> 
> The fundamental problem is that there is no real association between
> the cont parts. So any interruption results in a broken record. If we
> really want to do this correctly, we need real association.
> 
> With the new finalize flag for records, I thought about perhaps adding
> support for chaining data blocks.
> 
> A data block currently stores an unsigned long for the ID of the
> associated descriptor. But it could optionally include a second unsigned
> long, which is the lpos of the next text part. All the data blocks of a
> chain would point back to the same descriptor. The descriptor would only
> point to the first data block of the chain and include a flag that it is
> using chained data blocks.
>
> Then we would only need to track the sequence number of the open record
> and new data blocks could be added to the data block chain of the
> correct record. Readers cannot see the record until it is finalized.

Note that we still must try to append the continuous piece into the
same data block. We could not concatenate them in the reader API
because it would create the problem with unused/skipped sequence numbers.

Also it would complicate the reader code. It would need to know
whether a record has already been printed together with a previous
one.


> Also, since only finalized records can be invalidated, there are no
> races of chains becoming invalidated while being appended.
> 
> My concerns about this idea:
> 
> - What if the printk user does not correctly terminate the cont message?
>   There is no mechanism to allow that open record to be force-finalized
>   so that readers can read newer records.

This is a real problem. And it is the reason why the cont buffer is
currently flushed (finalized) by the next message from another context.


> - For tasks, the sequence number of the open record could be stored on
>   the task_struct. For non-tasks, we could use a global per-cpu variable
>   where each CPU stores 2 sequence numbers: the sequence number of the
>   open record for the non-task and the sequence number of the open
>   record for an interrupting NMI. Is that sufficient?

Yeah, it would be possible.

Anyway, this would fix an already existing problem. It might get
complicated and tricky. I am afraid that it comes from the "perfection
is the enemy of good" department.

A good compromise might be to just store "loglevel" from the previous
message in the given context. It could then be used for pr_cont()
messages in the same context.

Anyway, I would solve this later because it is already existing
problem.

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13  8:41                 ` Petr Mladek
@ 2020-08-13 10:29                   ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-08-13 10:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Linus Torvalds, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On 2020-08-13, Petr Mladek <pmladek@suse.com> wrote:
> On Thu 2020-08-13 09:50:25, John Ogness wrote:
>> On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
>> > This is not an unseen pattern, I'm afraid. And the problem here can
>> > be more general:
>> >
>> > 	pr_info("text");
>> > 	pr_cont("1");
>> > 	exception/IRQ/NMI
>> > 		pr_alert("text\n");
>> > 	pr_cont("2");
>> > 	pr_cont("\n");
>> >
>> > I guess the solution would be to store "last log_level" in task_struct
>> > and get current (new) timestamp for broken cont line?
>> 
>> (Warning: new ideas ahead)
>> 
>> The fundamental problem is that there is no real association between
>> the cont parts. So any interruption results in a broken record. If we
>> really want to do this correctly, we need real association.

I believe I failed to recognize the fundamental problem. The fundamental
problem is that the pr_cont() semantics are very poor. I now strongly
believe that we need to fix those semantics by having the pr_cont() user
take responsibility for buffering the message. Patching the ~2000
pr_cont() users will be far easier than continuing to twist ourselves
around this madness.

Here is an example for a new pr_cont() API:

    struct pr_cont c;

    pr_cont_alloc_info(&c);
       (or alternatively)
    dev_cont_alloc_info(dev, &c);

    pr_cont(&c, "1");
    pr_cont(&c, "2");

    pr_cont_flush(&c);

Using macro magic, there can be the usual dbg, warn, err, etc. variants
of the alloc functions.

The alloc function would need to work for any context, but that would
not be an issue. If the cont message started to get too large, pr_cont()
could do its own flushing in between, while still holding on to the
context information. If for some reason the alloc function could not
allocate a buffer, all the pr_cont() calls could fallback to logging the
individual cont parts.

I believe this would solve all cont-related problems while also allowing
the new ringbuffer to remain as it already is in linux-next.

John Ogness

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13 10:29                   ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-08-13 10:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	Linux Kernel Mailing List, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On 2020-08-13, Petr Mladek <pmladek@suse.com> wrote:
> On Thu 2020-08-13 09:50:25, John Ogness wrote:
>> On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
>> > This is not an unseen pattern, I'm afraid. And the problem here can
>> > be more general:
>> >
>> > 	pr_info("text");
>> > 	pr_cont("1");
>> > 	exception/IRQ/NMI
>> > 		pr_alert("text\n");
>> > 	pr_cont("2");
>> > 	pr_cont("\n");
>> >
>> > I guess the solution would be to store "last log_level" in task_struct
>> > and get current (new) timestamp for broken cont line?
>> 
>> (Warning: new ideas ahead)
>> 
>> The fundamental problem is that there is no real association between
>> the cont parts. So any interruption results in a broken record. If we
>> really want to do this correctly, we need real association.

I believe I failed to recognize the fundamental problem. The fundamental
problem is that the pr_cont() semantics are very poor. I now strongly
believe that we need to fix those semantics by having the pr_cont() user
take responsibility for buffering the message. Patching the ~2000
pr_cont() users will be far easier than continuing to twist ourselves
around this madness.

Here is an example for a new pr_cont() API:

    struct pr_cont c;

    pr_cont_alloc_info(&c);
       (or alternatively)
    dev_cont_alloc_info(dev, &c);

    pr_cont(&c, "1");
    pr_cont(&c, "2");

    pr_cont_flush(&c);

Using macro magic, there can be the usual dbg, warn, err, etc. variants
of the alloc functions.

The alloc function would need to work for any context, but that would
not be an issue. If the cont message started to get too large, pr_cont()
could do its own flushing in between, while still holding on to the
context information. If for some reason the alloc function could not
allocate a buffer, all the pr_cont() calls could fallback to logging the
individual cont parts.

I believe this would solve all cont-related problems while also allowing
the new ringbuffer to remain as it already is in linux-next.

John Ogness

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13 10:29                   ` John Ogness
@ 2020-08-13 11:30                     ` Petr Mladek
  -1 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-13 11:30 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Linus Torvalds, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Thu 2020-08-13 12:35:47, John Ogness wrote:
> On 2020-08-13, Petr Mladek <pmladek@suse.com> wrote:
> > On Thu 2020-08-13 09:50:25, John Ogness wrote:
> >> On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> >> > This is not an unseen pattern, I'm afraid. And the problem here can
> >> > be more general:
> >> >
> >> > 	pr_info("text");
> >> > 	pr_cont("1");
> >> > 	exception/IRQ/NMI
> >> > 		pr_alert("text\n");
> >> > 	pr_cont("2");
> >> > 	pr_cont("\n");
> >> >
> >> > I guess the solution would be to store "last log_level" in task_struct
> >> > and get current (new) timestamp for broken cont line?
> >> 
> >> (Warning: new ideas ahead)
> >> 
> >> The fundamental problem is that there is no real association between
> >> the cont parts. So any interruption results in a broken record. If we
> >> really want to do this correctly, we need real association.
> 
> I believe I failed to recognize the fundamental problem. The fundamental
> problem is that the pr_cont() semantics are very poor. I now strongly
> believe that we need to fix those semantics by having the pr_cont() user
> take responsibility for buffering the message. Patching the ~2000
> pr_cont() users will be far easier than continuing to twist ourselves
> around this madness.

Yes, this would make the API much more reliable. But it is not completely
easy. It has been discussed several times. The most serious attempt
is https://lore.kernel.org/lkml/1541165517-3557-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/#t

lockdep.c is probably the most complicated pr_cont() user. The
conversion complicated the code a bit, see
https://lore.kernel.org/lkml/1541165517-3557-3-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/

Linus also wanted to have an API to flush part of the message
before some action and finalize it later. It would be usable
for things like:

	pr_info("Doing something dangerous... ");
	printk_flush();
	do_dangerous_thing();
	pr_cont("ok\n");

Where pr_cont() would allow to flush part of the message to the
consoles and add the end of the line later when it did not crash.

It would need some support in the lockless ring buffer API anyway.
But it is only nice to have from my POV.


> The alloc function would need to work for any context, but that would
> not be an issue. If the cont message started to get too large, pr_cont()
> could do its own flushing in between, while still holding on to the
> context information. If for some reason the alloc function could not
> allocate a buffer, all the pr_cont() calls could fallback to logging the
> individual cont parts.

Also this was discussed several times. The most acceptable solution
was to use "small" buffer on the stack (128 bytes or so).


> I believe this would solve all cont-related problems while also allowing
> the new ringbuffer to remain as it already is in linux-next.

You are right. The buffering is really good step. It would make
the API sane and reliable. It has been proposed and discussed many
times. The main obstacle was that there was no volunteer to get this
done. If you are able to do so, it would be amazing.

I am just not sure what is more complicated. Getting lockless
operation correctly is always tricky. But going though 2000
callers individually is also a lot of work.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13 11:30                     ` Petr Mladek
  0 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-13 11:30 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	Linux Kernel Mailing List, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On Thu 2020-08-13 12:35:47, John Ogness wrote:
> On 2020-08-13, Petr Mladek <pmladek@suse.com> wrote:
> > On Thu 2020-08-13 09:50:25, John Ogness wrote:
> >> On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> >> > This is not an unseen pattern, I'm afraid. And the problem here can
> >> > be more general:
> >> >
> >> > 	pr_info("text");
> >> > 	pr_cont("1");
> >> > 	exception/IRQ/NMI
> >> > 		pr_alert("text\n");
> >> > 	pr_cont("2");
> >> > 	pr_cont("\n");
> >> >
> >> > I guess the solution would be to store "last log_level" in task_struct
> >> > and get current (new) timestamp for broken cont line?
> >> 
> >> (Warning: new ideas ahead)
> >> 
> >> The fundamental problem is that there is no real association between
> >> the cont parts. So any interruption results in a broken record. If we
> >> really want to do this correctly, we need real association.
> 
> I believe I failed to recognize the fundamental problem. The fundamental
> problem is that the pr_cont() semantics are very poor. I now strongly
> believe that we need to fix those semantics by having the pr_cont() user
> take responsibility for buffering the message. Patching the ~2000
> pr_cont() users will be far easier than continuing to twist ourselves
> around this madness.

Yes, this would make the API much more reliable. But it is not completely
easy. It has been discussed several times. The most serious attempt
is https://lore.kernel.org/lkml/1541165517-3557-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/#t

lockdep.c is probably the most complicated pr_cont() user. The
conversion complicated the code a bit, see
https://lore.kernel.org/lkml/1541165517-3557-3-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/

Linus also wanted to have an API to flush part of the message
before some action and finalize it later. It would be usable
for things like:

	pr_info("Doing something dangerous... ");
	printk_flush();
	do_dangerous_thing();
	pr_cont("ok\n");

Where pr_cont() would allow to flush part of the message to the
consoles and add the end of the line later when it did not crash.

It would need some support in the lockless ring buffer API anyway.
But it is only nice to have from my POV.


> The alloc function would need to work for any context, but that would
> not be an issue. If the cont message started to get too large, pr_cont()
> could do its own flushing in between, while still holding on to the
> context information. If for some reason the alloc function could not
> allocate a buffer, all the pr_cont() calls could fallback to logging the
> individual cont parts.

Also this was discussed several times. The most acceptable solution
was to use "small" buffer on the stack (128 bytes or so).


> I believe this would solve all cont-related problems while also allowing
> the new ringbuffer to remain as it already is in linux-next.

You are right. The buffering is really good step. It would make
the API sane and reliable. It has been proposed and discussed many
times. The main obstacle was that there was no volunteer to get this
done. If you are able to do so, it would be amazing.

I am just not sure what is more complicated. Getting lockless
operation correctly is always tricky. But going though 2000
callers individually is also a lot of work.

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13  8:41                 ` Petr Mladek
@ 2020-08-13 11:54                   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-08-13 11:54 UTC (permalink / raw)
  To: Petr Mladek
  Cc: John Ogness, Sergey Senozhatsky, Linus Torvalds,
	Sergey Senozhatsky, Steven Rostedt, Greg Kroah-Hartman,
	Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On (20/08/13 10:41), Petr Mladek wrote:
> > My concerns about this idea:
> > 
> > - What if the printk user does not correctly terminate the cont message?
> >   There is no mechanism to allow that open record to be force-finalized
> >   so that readers can read newer records.
> 
> This is a real problem. And it is the reason why the cont buffer is
> currently flushed (finalized) by the next message from another context.

I understand that you think that this should be discussed and addressed
later in a separate patch, but, since we are on pr_cont topic right now,
can we slow down and maybe re-think what is actually expected from
pr_cont()? IOW, have the "what is expect from this feature" thread?

For instance, is missing \n the one and only reason why printk-s from
another context flush cont buffer now? Because I can see some more reasons
for current behaviour and I'd like to question those reasons.

I think what Linus said a long time ago was that the initial purpose of
pr_cont was

	pr_info("Initialize feature foo...");
	if (init_feature_foo() == 0)
		pr_cont("ok\n");
	else
		pr_cont("not ok\n");

	And if init_feature_foo() crashes the kernel then the first printk()
	form panic() will flush the cont buffer.

We can handle this by realizing that new printk() message has LOG_NEWLINE
and has different log_level (not pr_cont), maybe.

Let's look at the more general case:

CPU0					.. CPU255
pr_info("text");
					pr_alert("boom\n");
pr_cont("1);
pr_cont("2\n");

Do we really need to preliminary flush CPU0 pr_cont buffer in this
case? There is no connection between messages from CPU0 and CPU255.
Maybe (maybe!) what matters here is keeping the order of messages
per-context rather than globally system-wide?

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-13 11:54                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-08-13 11:54 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner,
	Linus Torvalds

On (20/08/13 10:41), Petr Mladek wrote:
> > My concerns about this idea:
> > 
> > - What if the printk user does not correctly terminate the cont message?
> >   There is no mechanism to allow that open record to be force-finalized
> >   so that readers can read newer records.
> 
> This is a real problem. And it is the reason why the cont buffer is
> currently flushed (finalized) by the next message from another context.

I understand that you think that this should be discussed and addressed
later in a separate patch, but, since we are on pr_cont topic right now,
can we slow down and maybe re-think what is actually expected from
pr_cont()? IOW, have the "what is expect from this feature" thread?

For instance, is missing \n the one and only reason why printk-s from
another context flush cont buffer now? Because I can see some more reasons
for current behaviour and I'd like to question those reasons.

I think what Linus said a long time ago was that the initial purpose of
pr_cont was

	pr_info("Initialize feature foo...");
	if (init_feature_foo() == 0)
		pr_cont("ok\n");
	else
		pr_cont("not ok\n");

	And if init_feature_foo() crashes the kernel then the first printk()
	form panic() will flush the cont buffer.

We can handle this by realizing that new printk() message has LOG_NEWLINE
and has different log_level (not pr_cont), maybe.

Let's look at the more general case:

CPU0					.. CPU255
pr_info("text");
					pr_alert("boom\n");
pr_cont("1);
pr_cont("2\n");

Do we really need to preliminary flush CPU0 pr_cont buffer in this
case? There is no connection between messages from CPU0 and CPU255.
Maybe (maybe!) what matters here is keeping the order of messages
per-context rather than globally system-wide?

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13 10:29                   ` John Ogness
@ 2020-08-14  3:34                     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-08-14  3:34 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Linus Torvalds,
	Sergey Senozhatsky, Steven Rostedt, Greg Kroah-Hartman,
	Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On (20/08/13 12:35), John Ogness wrote:
> I believe I failed to recognize the fundamental problem. The fundamental
> problem is that the pr_cont() semantics are very poor.

The semantics is pretty clear - use it only in UP early bootup,
anything else is broken :)

  /*
   * Annotation for a "continued" line of log printout (only done after a
   * line that had no enclosing \n). Only to be used by core/arch code
   * during early bootup (a continued line is not SMP-safe otherwise).
   */
  #define KERN_CONT	KERN_SOH "c"

> I now strongly believe that we need to fix those semantics by having the
> pr_cont() user take responsibility for buffering the message. Patching the
> ~2000 pr_cont() users will be far easier than continuing to twist ourselves
> around this madness.

I welcome this effort. We've been talking about the fact that pr_cont() is
not something we can ignore anymore (we have more and more SMP users of
it) since the Kernel Summit in Santa Fe, NM, but the general response back
then was "oh my god, who cares" (pretty sure this is very close to what Ted
Ts'o said during the printk session).

> Here is an example for a new pr_cont() API:
> 
>     struct pr_cont c;
> 
>     pr_cont_alloc_info(&c);
>        (or alternatively)
>     dev_cont_alloc_info(dev, &c);
> 
>     pr_cont(&c, "1");
>     pr_cont(&c, "2");
> 
>     pr_cont_flush(&c);

This might be a bit more complex.

One thing that we need to handle here, I believe, is that the context
which crashes the kernel should flush its cont buffer, because the
information there is relevant to the crash:

	pr_cont_alloc_info(&c);
	pr_cont(&c, "1");
	pr_cont(&c, "2");
	>>
	   oops
	      panic()
	<<
	pr_cont_flush(&c);

We better flush that context's pr_cont buffer during panic().

Another example:


	pr_cont_alloc_info(&c);

	for (i = 0; i < p->sz; i++)
		pr_cont(&c, p->buf[i]);
	>>
	   page fault
	    exit
	<<
	pr_cont_flush(&c);

I believe we need to preliminary flush pr_cont() in this case as well,
because the information there might be very helpful.

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-14  3:34                     ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-08-14  3:34 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner,
	Linus Torvalds

On (20/08/13 12:35), John Ogness wrote:
> I believe I failed to recognize the fundamental problem. The fundamental
> problem is that the pr_cont() semantics are very poor.

The semantics is pretty clear - use it only in UP early bootup,
anything else is broken :)

  /*
   * Annotation for a "continued" line of log printout (only done after a
   * line that had no enclosing \n). Only to be used by core/arch code
   * during early bootup (a continued line is not SMP-safe otherwise).
   */
  #define KERN_CONT	KERN_SOH "c"

> I now strongly believe that we need to fix those semantics by having the
> pr_cont() user take responsibility for buffering the message. Patching the
> ~2000 pr_cont() users will be far easier than continuing to twist ourselves
> around this madness.

I welcome this effort. We've been talking about the fact that pr_cont() is
not something we can ignore anymore (we have more and more SMP users of
it) since the Kernel Summit in Santa Fe, NM, but the general response back
then was "oh my god, who cares" (pretty sure this is very close to what Ted
Ts'o said during the printk session).

> Here is an example for a new pr_cont() API:
> 
>     struct pr_cont c;
> 
>     pr_cont_alloc_info(&c);
>        (or alternatively)
>     dev_cont_alloc_info(dev, &c);
> 
>     pr_cont(&c, "1");
>     pr_cont(&c, "2");
> 
>     pr_cont_flush(&c);

This might be a bit more complex.

One thing that we need to handle here, I believe, is that the context
which crashes the kernel should flush its cont buffer, because the
information there is relevant to the crash:

	pr_cont_alloc_info(&c);
	pr_cont(&c, "1");
	pr_cont(&c, "2");
	>>
	   oops
	      panic()
	<<
	pr_cont_flush(&c);

We better flush that context's pr_cont buffer during panic().

Another example:


	pr_cont_alloc_info(&c);

	for (i = 0; i < p->sz; i++)
		pr_cont(&c, p->buf[i]);
	>>
	   page fault
	    exit
	<<
	pr_cont_flush(&c);

I believe we need to preliminary flush pr_cont() in this case as well,
because the information there might be very helpful.

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-14  3:34                     ` Sergey Senozhatsky
@ 2020-08-14  8:16                       ` John Ogness
  -1 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-08-14  8:16 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Linus Torvalds,
	Sergey Senozhatsky, Steven Rostedt, Greg Kroah-Hartman,
	Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On 2020-08-14, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> One thing that we need to handle here, I believe, is that the context
> which crashes the kernel should flush its cont buffer, because the
> information there is relevant to the crash:
>
> 	pr_cont_alloc_info(&c);
> 	pr_cont(&c, "1");
> 	pr_cont(&c, "2");
> 	>>
> 	   oops
> 	      panic()
> 	<<
> 	pr_cont_flush(&c);
>
> We better flush that context's pr_cont buffer during panic().

I am not convinced of the general usefulness of partial messages, but as
long as we have an API that includes registration, usage, and
deregistration of some sort of handle, then we leave the window open for
such implementations.

John Ogness

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-14  8:16                       ` John Ogness
  0 siblings, 0 replies; 80+ messages in thread
From: John Ogness @ 2020-08-14  8:16 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner,
	Linus Torvalds

On 2020-08-14, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> One thing that we need to handle here, I believe, is that the context
> which crashes the kernel should flush its cont buffer, because the
> information there is relevant to the crash:
>
> 	pr_cont_alloc_info(&c);
> 	pr_cont(&c, "1");
> 	pr_cont(&c, "2");
> 	>>
> 	   oops
> 	      panic()
> 	<<
> 	pr_cont_flush(&c);
>
> We better flush that context's pr_cont buffer during panic().

I am not convinced of the general usefulness of partial messages, but as
long as we have an API that includes registration, usage, and
deregistration of some sort of handle, then we leave the window open for
such implementations.

John Ogness

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13 11:54                   ` Sergey Senozhatsky
@ 2020-08-14  9:04                     ` Petr Mladek
  -1 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-14  9:04 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: John Ogness, Linus Torvalds, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On Thu 2020-08-13 20:54:35, Sergey Senozhatsky wrote:
> On (20/08/13 10:41), Petr Mladek wrote:
> > > My concerns about this idea:
> > > 
> > > - What if the printk user does not correctly terminate the cont message?
> > >   There is no mechanism to allow that open record to be force-finalized
> > >   so that readers can read newer records.
> > 
> > This is a real problem. And it is the reason why the cont buffer is
> > currently flushed (finalized) by the next message from another context.
> 
> I understand that you think that this should be discussed and addressed
> later in a separate patch, but, since we are on pr_cont topic right now,
> can we slow down and maybe re-think what is actually expected from
> pr_cont()? IOW, have the "what is expect from this feature" thread?
> 
> For instance, is missing \n the one and only reason why printk-s from
> another context flush cont buffer now? Because I can see some more reasons
> for current behaviour and I'd like to question those reasons.
> 
> I think what Linus said a long time ago was that the initial purpose of
> pr_cont was
> 
> 	pr_info("Initialize feature foo...");
> 	if (init_feature_foo() == 0)
> 		pr_cont("ok\n");
> 	else
> 		pr_cont("not ok\n");
> 
> 	And if init_feature_foo() crashes the kernel then the first printk()
> 	form panic() will flush the cont buffer.
> 
> We can handle this by realizing that new printk() message has LOG_NEWLINE
> and has different log_level (not pr_cont), maybe.

Yes, this is a handy behavior. But it is also complicated
on the implementation side. It requires that consoles are able to
print the existing part of the continuous line and print only
the rest later.

BTW: It used to work before the commit 5c2992ee7fd8a29d041 ("printk:
     remove console flushing special cases for partial buffered
     lines");

BTW2: It will be much easier to implement when only the last
     message can be partially shown on the consoles. Each console
     driver will need to track the position only in one message.

     Also it will be easier when the part of the message if stored
     in the main lockless ring buffer. Then the driver could just
     try to reread the last message and see if it was concatenated.


> Let's look at the more general case:
> 
> CPU0					.. CPU255
> pr_info("text");
> 					pr_alert("boom\n");
> pr_cont("1);
> pr_cont("2\n");
> 
> Do we really need to preliminary flush CPU0 pr_cont buffer in this
> case? There is no connection between messages from CPU0 and CPU255.
> Maybe (maybe!) what matters here is keeping the order of messages
> per-context rather than globally system-wide?

Honestly, I have no idea how many newlines are missing. They are often
not noticed because the buffered message is flushed later by some
other one.

The chance that some other "random" message will flush the pending
message is much lower if we have many cont buffers per-context
and per-cpu.

I am not brave enough to add more cont buffers without some fallback
mechanism to flush them later (irq_work?, timer?) or without
audit of all callers.

Where the audit is implicit when all callers are converted to
the buffered printk API.


There is one more problem. Any buffering might cause that nobody
will be able to see the message when things go wrong. Flushing
during panic() might help but only when panic() is called
and when there are system-wide cont buffers.

By other words, the current pr_cont() behavior causes mixed output
from time to time. But it increases the chance to see the messages.
And it makes it easier to find them in crashdump. Perfect output
is nice. But it will not help when the messages gets lost.

All I want to say that it is not black and white.


My opinion:

I will leave the decision on John. If he thinks that converting
all pr_cont() users to a buffered API is easier I will be fine
with it. It was proposed and requested several times.

If John realizes that my proposal to allow to reopen committed
messages is easier, I will be fine with it as well. We could
create the buffered API and convert the most critical
users one by one later. Also the context information will
allow to connect the broken messages in userspace and
do not complicate the kernel side.

Anyway, the lockless printk() feature is more important for
me that a perfect output of continuous lines.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-14  9:04                     ` Petr Mladek
  0 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-14  9:04 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner, Linus Torvalds

On Thu 2020-08-13 20:54:35, Sergey Senozhatsky wrote:
> On (20/08/13 10:41), Petr Mladek wrote:
> > > My concerns about this idea:
> > > 
> > > - What if the printk user does not correctly terminate the cont message?
> > >   There is no mechanism to allow that open record to be force-finalized
> > >   so that readers can read newer records.
> > 
> > This is a real problem. And it is the reason why the cont buffer is
> > currently flushed (finalized) by the next message from another context.
> 
> I understand that you think that this should be discussed and addressed
> later in a separate patch, but, since we are on pr_cont topic right now,
> can we slow down and maybe re-think what is actually expected from
> pr_cont()? IOW, have the "what is expect from this feature" thread?
> 
> For instance, is missing \n the one and only reason why printk-s from
> another context flush cont buffer now? Because I can see some more reasons
> for current behaviour and I'd like to question those reasons.
> 
> I think what Linus said a long time ago was that the initial purpose of
> pr_cont was
> 
> 	pr_info("Initialize feature foo...");
> 	if (init_feature_foo() == 0)
> 		pr_cont("ok\n");
> 	else
> 		pr_cont("not ok\n");
> 
> 	And if init_feature_foo() crashes the kernel then the first printk()
> 	form panic() will flush the cont buffer.
> 
> We can handle this by realizing that new printk() message has LOG_NEWLINE
> and has different log_level (not pr_cont), maybe.

Yes, this is a handy behavior. But it is also complicated
on the implementation side. It requires that consoles are able to
print the existing part of the continuous line and print only
the rest later.

BTW: It used to work before the commit 5c2992ee7fd8a29d041 ("printk:
     remove console flushing special cases for partial buffered
     lines");

BTW2: It will be much easier to implement when only the last
     message can be partially shown on the consoles. Each console
     driver will need to track the position only in one message.

     Also it will be easier when the part of the message if stored
     in the main lockless ring buffer. Then the driver could just
     try to reread the last message and see if it was concatenated.


> Let's look at the more general case:
> 
> CPU0					.. CPU255
> pr_info("text");
> 					pr_alert("boom\n");
> pr_cont("1);
> pr_cont("2\n");
> 
> Do we really need to preliminary flush CPU0 pr_cont buffer in this
> case? There is no connection between messages from CPU0 and CPU255.
> Maybe (maybe!) what matters here is keeping the order of messages
> per-context rather than globally system-wide?

Honestly, I have no idea how many newlines are missing. They are often
not noticed because the buffered message is flushed later by some
other one.

The chance that some other "random" message will flush the pending
message is much lower if we have many cont buffers per-context
and per-cpu.

I am not brave enough to add more cont buffers without some fallback
mechanism to flush them later (irq_work?, timer?) or without
audit of all callers.

Where the audit is implicit when all callers are converted to
the buffered printk API.


There is one more problem. Any buffering might cause that nobody
will be able to see the message when things go wrong. Flushing
during panic() might help but only when panic() is called
and when there are system-wide cont buffers.

By other words, the current pr_cont() behavior causes mixed output
from time to time. But it increases the chance to see the messages.
And it makes it easier to find them in crashdump. Perfect output
is nice. But it will not help when the messages gets lost.

All I want to say that it is not black and white.


My opinion:

I will leave the decision on John. If he thinks that converting
all pr_cont() users to a buffered API is easier I will be fine
with it. It was proposed and requested several times.

If John realizes that my proposal to allow to reopen committed
messages is easier, I will be fine with it as well. We could
create the buffered API and convert the most critical
users one by one later. Also the context information will
allow to connect the broken messages in userspace and
do not complicate the kernel side.

Anyway, the lockless printk() feature is more important for
me that a perfect output of continuous lines.

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-14  8:16                       ` John Ogness
@ 2020-08-14  9:12                         ` Petr Mladek
  -1 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-14  9:12 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Linus Torvalds, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Fri 2020-08-14 10:22:35, John Ogness wrote:
> On 2020-08-14, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> > One thing that we need to handle here, I believe, is that the context
> > which crashes the kernel should flush its cont buffer, because the
> > information there is relevant to the crash:
> >
> > 	pr_cont_alloc_info(&c);
> > 	pr_cont(&c, "1");
> > 	pr_cont(&c, "2");
> > 	>>
> > 	   oops
> > 	      panic()
> > 	<<
> > 	pr_cont_flush(&c);
> >
> > We better flush that context's pr_cont buffer during panic().
> 
> I am not convinced of the general usefulness of partial messages, but as
> long as we have an API that includes registration, usage, and
> deregistration of some sort of handle, then we leave the window open for
> such implementations.

Registering some handle is an interesting idea. But it rules out
buffers on the stack. And we should avoid dynamic allocation.
printk() must be reliable also when there is not enough memory.

Please, keep it simple and do not add dependencies on new subsystems.
Using external code in printk() is always a call for problems.
The called code must be lockless and must not call printk().

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-14  9:12                         ` Petr Mladek
  0 siblings, 0 replies; 80+ messages in thread
From: Petr Mladek @ 2020-08-14  9:12 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Peter Zijlstra, Greg Kroah-Hartman, kexec,
	Linux Kernel Mailing List, Steven Rostedt, Sergey Senozhatsky,
	Thomas Gleixner, Linus Torvalds

On Fri 2020-08-14 10:22:35, John Ogness wrote:
> On 2020-08-14, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> > One thing that we need to handle here, I believe, is that the context
> > which crashes the kernel should flush its cont buffer, because the
> > information there is relevant to the crash:
> >
> > 	pr_cont_alloc_info(&c);
> > 	pr_cont(&c, "1");
> > 	pr_cont(&c, "2");
> > 	>>
> > 	   oops
> > 	      panic()
> > 	<<
> > 	pr_cont_flush(&c);
> >
> > We better flush that context's pr_cont buffer during panic().
> 
> I am not convinced of the general usefulness of partial messages, but as
> long as we have an API that includes registration, usage, and
> deregistration of some sort of handle, then we leave the window open for
> such implementations.

Registering some handle is an interesting idea. But it rules out
buffers on the stack. And we should avoid dynamic allocation.
printk() must be reliable also when there is not enough memory.

Please, keep it simple and do not add dependencies on new subsystems.
Using external code in printk() is always a call for problems.
The called code must be lockless and must not call printk().

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-13 11:54                   ` Sergey Senozhatsky
@ 2020-08-14 22:46                     ` Linus Torvalds
  -1 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-08-14 22:46 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On Thu, Aug 13, 2020 at 4:54 AM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> I think what Linus said a long time ago was that the initial purpose of
> pr_cont was
>
>         pr_info("Initialize feature foo...");
>         if (init_feature_foo() == 0)
>                 pr_cont("ok\n");
>         else
>                 pr_cont("not ok\n");
>
>         And if init_feature_foo() crashes the kernel then the first printk()
>         form panic() will flush the cont buffer.

Right.

This is why I think any discussion that says "people should buffer
their lines themselves and we should get rid if pr_cont()" is
fundamentally broken.

Don't go down that hole. I won't take it. It's wrong.

The fact is, pr_cont() goes back to the original kernel. No, it wasn't
pr_cont() back then, and no, there were no actual explicit markers for
"this is a continuation" at all, it was all just "the last printk
didn't have a newline, so we continue where we left off".

We've added pr_cont (and KERN_CONT) since then, and I realize that a
lot of people hate the complexity it introduces, but it's a
fundamental complexity that you have to live with.

If you can't live with pr_cont(), you shouldn't be working on
printk(), and find some other area of the kernel that you _can_ live
with.

It really is that simple.

            Linus

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-14 22:46                     ` Linus Torvalds
  0 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-08-14 22:46 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner

On Thu, Aug 13, 2020 at 4:54 AM Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> I think what Linus said a long time ago was that the initial purpose of
> pr_cont was
>
>         pr_info("Initialize feature foo...");
>         if (init_feature_foo() == 0)
>                 pr_cont("ok\n");
>         else
>                 pr_cont("not ok\n");
>
>         And if init_feature_foo() crashes the kernel then the first printk()
>         form panic() will flush the cont buffer.

Right.

This is why I think any discussion that says "people should buffer
their lines themselves and we should get rid if pr_cont()" is
fundamentally broken.

Don't go down that hole. I won't take it. It's wrong.

The fact is, pr_cont() goes back to the original kernel. No, it wasn't
pr_cont() back then, and no, there were no actual explicit markers for
"this is a continuation" at all, it was all just "the last printk
didn't have a newline, so we continue where we left off".

We've added pr_cont (and KERN_CONT) since then, and I realize that a
lot of people hate the complexity it introduces, but it's a
fundamental complexity that you have to live with.

If you can't live with pr_cont(), you shouldn't be working on
printk(), and find some other area of the kernel that you _can_ live
with.

It really is that simple.

            Linus

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-14 22:46                     ` Linus Torvalds
@ 2020-08-14 23:52                       ` Joe Perches
  -1 siblings, 0 replies; 80+ messages in thread
From: Joe Perches @ 2020-08-14 23:52 UTC (permalink / raw)
  To: Linus Torvalds, Sergey Senozhatsky
  Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

On Fri, 2020-08-14 at 15:46 -0700, Linus Torvalds wrote:
> On Thu, Aug 13, 2020 at 4:54 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> > I think what Linus said a long time ago was that the initial purpose of
> > pr_cont was
> > 
> >         pr_info("Initialize feature foo...");
> >         if (init_feature_foo() == 0)
> >                 pr_cont("ok\n");
> >         else
> >                 pr_cont("not ok\n");
> > 
> >         And if init_feature_foo() crashes the kernel then the first printk()
> >         form panic() will flush the cont buffer.
> 
> Right.
> 
> This is why I think any discussion that says "people should buffer
> their lines themselves and we should get rid if pr_cont()" is
> fundamentally broken.
> 
> Don't go down that hole. I won't take it. It's wrong.

I don't think it's wrong per se.

It's reasonable to avoid pr_cont when appropriate.

Trivial buffering, or adding and using YA vsprintf 
extension can avoid unnecessary message interleaving.

For instance, I just sent this patch to allow removal
of print_vma_addr and its use of pr_cont.

https://lore.kernel.org/lkml/09f11651f0e913e159b955ac447cd8cadf36cb0d.camel@perches.com/

This is similar to the dump_flags_names removal back
in commit edf14cdbf9a0 ("mm, printk: introduce new format
string for flags")

> The fact is, pr_cont() goes back to the original kernel. No, it wasn't
> pr_cont() back then, and no, there were no actual explicit markers for
> "this is a continuation" at all, it was all just "the last printk
> didn't have a newline, so we continue where we left off".
> 
> We've added pr_cont (and KERN_CONT) since then, and I realize that a
> lot of people hate the complexity it introduces, but it's a
> fundamental complexity that you have to live with.
> 
> If you can't live with pr_cont(), you shouldn't be working on
> printk(), and find some other area of the kernel that you _can_ live
> with.
> 
> It really is that simple.
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-14 23:52                       ` Joe Perches
  0 siblings, 0 replies; 80+ messages in thread
From: Joe Perches @ 2020-08-14 23:52 UTC (permalink / raw)
  To: Linus Torvalds, Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner

On Fri, 2020-08-14 at 15:46 -0700, Linus Torvalds wrote:
> On Thu, Aug 13, 2020 at 4:54 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> > I think what Linus said a long time ago was that the initial purpose of
> > pr_cont was
> > 
> >         pr_info("Initialize feature foo...");
> >         if (init_feature_foo() == 0)
> >                 pr_cont("ok\n");
> >         else
> >                 pr_cont("not ok\n");
> > 
> >         And if init_feature_foo() crashes the kernel then the first printk()
> >         form panic() will flush the cont buffer.
> 
> Right.
> 
> This is why I think any discussion that says "people should buffer
> their lines themselves and we should get rid if pr_cont()" is
> fundamentally broken.
> 
> Don't go down that hole. I won't take it. It's wrong.

I don't think it's wrong per se.

It's reasonable to avoid pr_cont when appropriate.

Trivial buffering, or adding and using YA vsprintf 
extension can avoid unnecessary message interleaving.

For instance, I just sent this patch to allow removal
of print_vma_addr and its use of pr_cont.

https://lore.kernel.org/lkml/09f11651f0e913e159b955ac447cd8cadf36cb0d.camel@perches.com/

This is similar to the dump_flags_names removal back
in commit edf14cdbf9a0 ("mm, printk: introduce new format
string for flags")

> The fact is, pr_cont() goes back to the original kernel. No, it wasn't
> pr_cont() back then, and no, there were no actual explicit markers for
> "this is a continuation" at all, it was all just "the last printk
> didn't have a newline, so we continue where we left off".
> 
> We've added pr_cont (and KERN_CONT) since then, and I realize that a
> lot of people hate the complexity it introduces, but it's a
> fundamental complexity that you have to live with.
> 
> If you can't live with pr_cont(), you shouldn't be working on
> printk(), and find some other area of the kernel that you _can_ live
> with.
> 
> It really is that simple.
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-14 23:52                       ` Joe Perches
@ 2020-08-15  2:33                         ` Linus Torvalds
  -1 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-08-15  2:33 UTC (permalink / raw)
  To: Joe Perches
  Cc: Sergey Senozhatsky, Petr Mladek, John Ogness, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Fri, Aug 14, 2020 at 4:52 PM Joe Perches <joe@perches.com> wrote:
>
> On Fri, 2020-08-14 at 15:46 -0700, Linus Torvalds wrote:
> >
> > This is why I think any discussion that says "people should buffer
> > their lines themselves and we should get rid if pr_cont()" is
> > fundamentally broken.
> >
> > Don't go down that hole. I won't take it. It's wrong.
>
> I don't think it's wrong per se.

It's *absolutely* and 100% wrong.

Yes, any random *user* of pr_cont() can decide to buffer on it's own.

But when the discussion is about printk() - the implementation, not
the users - then it's complete and utter BS to talk about trying to
get rid of pr_cont().

See the difference?

                Linus

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-15  2:33                         ` Linus Torvalds
  0 siblings, 0 replies; 80+ messages in thread
From: Linus Torvalds @ 2020-08-15  2:33 UTC (permalink / raw)
  To: Joe Perches
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On Fri, Aug 14, 2020 at 4:52 PM Joe Perches <joe@perches.com> wrote:
>
> On Fri, 2020-08-14 at 15:46 -0700, Linus Torvalds wrote:
> >
> > This is why I think any discussion that says "people should buffer
> > their lines themselves and we should get rid if pr_cont()" is
> > fundamentally broken.
> >
> > Don't go down that hole. I won't take it. It's wrong.
>
> I don't think it's wrong per se.

It's *absolutely* and 100% wrong.

Yes, any random *user* of pr_cont() can decide to buffer on it's own.

But when the discussion is about printk() - the implementation, not
the users - then it's complete and utter BS to talk about trying to
get rid of pr_cont().

See the difference?

                Linus

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-15  2:33                         ` Linus Torvalds
@ 2020-08-15  3:08                           ` Joe Perches
  -1 siblings, 0 replies; 80+ messages in thread
From: Joe Perches @ 2020-08-15  3:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Petr Mladek, John Ogness, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On Fri, 2020-08-14 at 19:33 -0700, Linus Torvalds wrote:
> On Fri, Aug 14, 2020 at 4:52 PM Joe Perches <joe@perches.com> wrote:
> > On Fri, 2020-08-14 at 15:46 -0700, Linus Torvalds wrote:
> > > This is why I think any discussion that says "people should buffer
> > > their lines themselves and we should get rid if pr_cont()" is
> > > fundamentally broken.
> > > 
> > > Don't go down that hole. I won't take it. It's wrong.
> > 
> > I don't think it's wrong per se.
> 
> It's *absolutely* and 100% wrong.
> 
> Yes, any random *user* of pr_cont() can decide to buffer on it's own.

Which I believe is the point of the discussion,
not the complete removal of KERN_CONT.

> But when the discussion is about printk() - the implementation, not
> the users - then it's complete and utter BS to talk about trying to
> get rid of pr_cont().
> 
> See the difference?

Sure, but I fail to see where anyone said get rid of pr_cont
in this thread.  It seems all that was discussed was just
various schemes to improve coalescing output.



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-15  3:08                           ` Joe Perches
  0 siblings, 0 replies; 80+ messages in thread
From: Joe Perches @ 2020-08-15  3:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On Fri, 2020-08-14 at 19:33 -0700, Linus Torvalds wrote:
> On Fri, Aug 14, 2020 at 4:52 PM Joe Perches <joe@perches.com> wrote:
> > On Fri, 2020-08-14 at 15:46 -0700, Linus Torvalds wrote:
> > > This is why I think any discussion that says "people should buffer
> > > their lines themselves and we should get rid if pr_cont()" is
> > > fundamentally broken.
> > > 
> > > Don't go down that hole. I won't take it. It's wrong.
> > 
> > I don't think it's wrong per se.
> 
> It's *absolutely* and 100% wrong.
> 
> Yes, any random *user* of pr_cont() can decide to buffer on it's own.

Which I believe is the point of the discussion,
not the complete removal of KERN_CONT.

> But when the discussion is about printk() - the implementation, not
> the users - then it's complete and utter BS to talk about trying to
> get rid of pr_cont().
> 
> See the difference?

Sure, but I fail to see where anyone said get rid of pr_cont
in this thread.  It seems all that was discussed was just
various schemes to improve coalescing output.



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-14 22:46                     ` Linus Torvalds
@ 2020-08-15  5:41                       ` Sergey Senozhatsky
  -1 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-08-15  5:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Petr Mladek, John Ogness, Sergey Senozhatsky,
	Steven Rostedt, Greg Kroah-Hartman, Peter Zijlstra,
	Thomas Gleixner, kexec, Linux Kernel Mailing List

On (20/08/14 15:46), Linus Torvalds wrote:
> On Thu, Aug 13, 2020 at 4:54 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> >
> > I think what Linus said a long time ago was that the initial purpose of
> > pr_cont was
> >
> >         pr_info("Initialize feature foo...");
> >         if (init_feature_foo() == 0)
> >                 pr_cont("ok\n");
> >         else
> >                 pr_cont("not ok\n");
> >
> >         And if init_feature_foo() crashes the kernel then the first printk()
> >         form panic() will flush the cont buffer.
> 
> Right.
> 
> This is why I think any discussion that says "people should buffer
> their lines themselves and we should get rid if pr_cont()" is
> fundamentally broken.

I think what we've been talking about so far was "how do we buffer
cont lines now", what are the problems of current buffering and what
can we do to make that buffering SMP safe (preserving the context of
broken cont buffer, etc. etc.). I don't think that anyone concluded
to just "s/pr_cont/printk/g", I don't see this.

	-ss

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-15  5:41                       ` Sergey Senozhatsky
  0 siblings, 0 replies; 80+ messages in thread
From: Sergey Senozhatsky @ 2020-08-15  5:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Sergey Senozhatsky, Thomas Gleixner

On (20/08/14 15:46), Linus Torvalds wrote:
> On Thu, Aug 13, 2020 at 4:54 AM Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> >
> > I think what Linus said a long time ago was that the initial purpose of
> > pr_cont was
> >
> >         pr_info("Initialize feature foo...");
> >         if (init_feature_foo() == 0)
> >                 pr_cont("ok\n");
> >         else
> >                 pr_cont("not ok\n");
> >
> >         And if init_feature_foo() crashes the kernel then the first printk()
> >         form panic() will flush the cont buffer.
> 
> Right.
> 
> This is why I think any discussion that says "people should buffer
> their lines themselves and we should get rid if pr_cont()" is
> fundamentally broken.

I think what we've been talking about so far was "how do we buffer
cont lines now", what are the problems of current buffering and what
can we do to make that buffering SMP safe (preserving the context of
broken cont buffer, etc. etc.). I don't think that anyone concluded
to just "s/pr_cont/printk/g", I don't see this.

	-ss

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
  2020-08-14 23:52                       ` Joe Perches
@ 2020-08-15  9:25                         ` David Laight
  -1 siblings, 0 replies; 80+ messages in thread
From: David Laight @ 2020-08-15  9:25 UTC (permalink / raw)
  To: 'Joe Perches', Linus Torvalds, Sergey Senozhatsky
  Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, kexec,
	Linux Kernel Mailing List

From: Joe Perches
> Sent: 15 August 2020 00:52
...
> > This is why I think any discussion that says "people should buffer
> > their lines themselves and we should get rid if pr_cont()" is
> > fundamentally broken.
> >
> > Don't go down that hole. I won't take it. It's wrong.
> 
> I don't think it's wrong per se.
> 
> It's reasonable to avoid pr_cont when appropriate.
> 
> Trivial buffering, or adding and using YA vsprintf
> extension can avoid unnecessary message interleaving.
> 
> For instance, I just sent this patch to allow removal
> of print_vma_addr and its use of pr_cont.
> 
> https://lore.kernel.org/lkml/09f11651f0e913e159b955ac447cd8cadf36cb0d.camel@perches.com/

ISTM that is a bit complex for a printf format.
Even with the 'noinline_for_stack' it is going to
use a lot of stack - and error printfs are already likely
to be near the stack limit.
The recursion for %pV might also cause grief.

In that case formatting the data into an on-stack char[]
before the printf is probably the simplest solution.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling
@ 2020-08-15  9:25                         ` David Laight
  0 siblings, 0 replies; 80+ messages in thread
From: David Laight @ 2020-08-15  9:25 UTC (permalink / raw)
  To: 'Joe Perches', Linus Torvalds, Sergey Senozhatsky
  Cc: Petr Mladek, Sergey Senozhatsky, John Ogness, Peter Zijlstra,
	Greg Kroah-Hartman, kexec, Linux Kernel Mailing List,
	Steven Rostedt, Thomas Gleixner

From: Joe Perches
> Sent: 15 August 2020 00:52
...
> > This is why I think any discussion that says "people should buffer
> > their lines themselves and we should get rid if pr_cont()" is
> > fundamentally broken.
> >
> > Don't go down that hole. I won't take it. It's wrong.
> 
> I don't think it's wrong per se.
> 
> It's reasonable to avoid pr_cont when appropriate.
> 
> Trivial buffering, or adding and using YA vsprintf
> extension can avoid unnecessary message interleaving.
> 
> For instance, I just sent this patch to allow removal
> of print_vma_addr and its use of pr_cont.
> 
> https://lore.kernel.org/lkml/09f11651f0e913e159b955ac447cd8cadf36cb0d.camel@perches.com/

ISTM that is a bit complex for a printf format.
Even with the 'noinline_for_stack' it is going to
use a lot of stack - and error printfs are already likely
to be near the stack limit.
The recursion for %pV might also cause grief.

In that case formatting the data into an on-stack char[]
before the printf is probably the simplest solution.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2020-08-16  1:29 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17 23:48 [PATCH 0/4] printk: reimplement LOG_CONT handling John Ogness
2020-07-17 23:48 ` John Ogness
2020-07-17 23:48 ` [PATCH 1/4] printk: ringbuffer: support dataless records John Ogness
2020-07-17 23:48   ` John Ogness
2020-07-20 14:49   ` John Ogness
2020-07-20 14:49     ` John Ogness
2020-07-17 23:48 ` [PATCH 2/4] printk: store instead of processing cont parts John Ogness
2020-07-17 23:48   ` John Ogness
2020-07-19 14:35   ` Sergey Senozhatsky
2020-07-19 14:35     ` Sergey Senozhatsky
2020-07-19 18:27     ` Linus Torvalds
2020-07-19 18:27       ` Linus Torvalds
2020-07-20  1:50       ` Sergey Senozhatsky
2020-07-20  1:50         ` Sergey Senozhatsky
2020-07-20 18:30         ` Linus Torvalds
2020-07-20 18:30           ` Linus Torvalds
2020-07-21  3:53           ` Joe Perches
2020-07-21  3:53             ` Joe Perches
2020-07-21 14:42           ` Sergey Senozhatsky
2020-07-21 14:42             ` Sergey Senozhatsky
2020-07-21 14:57             ` John Ogness
2020-07-21 14:57               ` John Ogness
2020-07-29  2:03               ` Sergey Senozhatsky
2020-07-29  2:03                 ` Sergey Senozhatsky
2020-07-21 15:40             ` Linus Torvalds
2020-07-21 15:40               ` Linus Torvalds
2020-07-28  2:22               ` Sergey Senozhatsky
2020-07-28  2:22                 ` Sergey Senozhatsky
2020-07-17 23:48 ` [PATCH 3/4] printk: process cont records during reading John Ogness
2020-07-17 23:48   ` John Ogness
2020-07-17 23:48 ` [PATCH 4/4] ipconfig: cleanup printk usage John Ogness
2020-07-17 23:48   ` John Ogness
2020-07-18  0:00 ` [PATCH 0/4] printk: reimplement LOG_CONT handling Linus Torvalds
2020-07-18  0:00   ` Linus Torvalds
2020-07-18 14:42   ` John Ogness
2020-07-18 14:42     ` John Ogness
2020-07-18 17:42     ` Linus Torvalds
2020-07-18 17:42       ` Linus Torvalds
2020-08-11 16:05     ` Petr Mladek
2020-08-11 16:05       ` Petr Mladek
2020-08-12 16:39       ` POC: Alternative solution: " Petr Mladek
2020-08-12 16:39         ` Petr Mladek
2020-08-13  0:24         ` John Ogness
2020-08-13  0:24           ` John Ogness
2020-08-13  5:18           ` Sergey Senozhatsky
2020-08-13  5:18             ` Sergey Senozhatsky
2020-08-13  7:44             ` John Ogness
2020-08-13  7:44               ` John Ogness
2020-08-13  8:41               ` Petr Mladek
2020-08-13  8:41                 ` Petr Mladek
2020-08-13 10:29                 ` John Ogness
2020-08-13 10:29                   ` John Ogness
2020-08-13 11:30                   ` Petr Mladek
2020-08-13 11:30                     ` Petr Mladek
2020-08-14  3:34                   ` Sergey Senozhatsky
2020-08-14  3:34                     ` Sergey Senozhatsky
2020-08-14  8:16                     ` John Ogness
2020-08-14  8:16                       ` John Ogness
2020-08-14  9:12                       ` Petr Mladek
2020-08-14  9:12                         ` Petr Mladek
2020-08-13 11:54                 ` Sergey Senozhatsky
2020-08-13 11:54                   ` Sergey Senozhatsky
2020-08-14  9:04                   ` Petr Mladek
2020-08-14  9:04                     ` Petr Mladek
2020-08-14 22:46                   ` Linus Torvalds
2020-08-14 22:46                     ` Linus Torvalds
2020-08-14 23:52                     ` Joe Perches
2020-08-14 23:52                       ` Joe Perches
2020-08-15  2:33                       ` Linus Torvalds
2020-08-15  2:33                         ` Linus Torvalds
2020-08-15  3:08                         ` Joe Perches
2020-08-15  3:08                           ` Joe Perches
2020-08-15  9:25                       ` David Laight
2020-08-15  9:25                         ` David Laight
2020-08-15  5:41                     ` Sergey Senozhatsky
2020-08-15  5:41                       ` Sergey Senozhatsky
2020-08-13  7:51             ` Petr Mladek
2020-08-13  7:51               ` Petr Mladek
2020-08-13  7:31           ` Petr Mladek
2020-08-13  7:31             ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.