linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling
@ 2020-09-14 12:33 John Ogness
  2020-09-14 12:33 ` [PATCH printk v5 1/6] printk: ringbuffer: relocate get_data() John Ogness
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: John Ogness @ 2020-09-14 12:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

Hello,

Here is v5 for the second series to rework the printk subsystem.
(The v4 is here [0].) This series implements a new ringbuffer
feature that allows the last record to be extended. Petr Mladek
provided the initial proof of concept [1] for this.

Using the record extension feature, LOG_CONT is re-implemented
in a way that exactly preserves its behavior, but avoids the
need for an extra buffer. In particular, it avoids the need for
any synchronization that such a buffer requires.

This series deviates from the agreements [2] made at the meeting
during LPC2019 in Lisbon. The test results of the v1 series,
which implemented LOG_CONT as agreed upon, showed that the
effects on existing userspace tools using /dev/kmsg (journalctl,
dmesg) were not acceptable [3].

Patch 5 introduces *four* new memory barrier pairs. Two of them
are insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose and context of
the two existing memory barrier pairs they provide an alternate
path for. The other two new memory barrier pairs are significant
additions:

desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
    ensure the state transitions back to desc_reserved before
    fully trusting the descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
    ensure the state transitions to desc_committed before checking
    the head ID to see if the descriptor needs to be finalized.

The test module used to test the ringbuffer is available
here [4].

The series is based on the printk-rework branch of the printk git
tree:

e60768311af8 ("scripts/gdb: update for lockless printk ringbuffer")

The list of changes since v4:

printk_ringbuffer
=================

- desc_read(): revert setting @state_var when inconsistent (a
  separate series [5] is addressing this bug)

- desc_reserve(): use DESC_SV() when setting reserved

- data_realloc(): also do nothing if the size is the same

- prb_reserve_in_last(): adjust dataless checks/warnings to match
  the non-dataless case

- prb_reserve_in_last(): fix length modifier in warnings

- change comments about "state flags" to just talk about "states"

John Ogness

[0] https://lkml.kernel.org/r/20200908202859.2736-1-john.ogness@linutronix.de
[1] https://lkml.kernel.org/r/20200812163908.GH12903@alley
[2] https://lkml.kernel.org/r/87k1acz5rx.fsf@linutronix.de
[3] https://lkml.kernel.org/r/20200811160551.GC12903@alley
[4] https://github.com/Linutronix/prb-test.git
[5] https://lkml.kernel.org/r/20200914094803.27365-1-john.ogness@linutronix.de

John Ogness (6):
  printk: ringbuffer: relocate get_data()
  printk: ringbuffer: add BLK_DATALESS() macro
  printk: ringbuffer: clear initial reserved fields
  printk: ringbuffer: change representation of states
  printk: ringbuffer: add finalization/extension support
  printk: reimplement log_cont using record extension

 Documentation/admin-guide/kdump/gdbmacros.txt |  13 +-
 kernel/printk/printk.c                        | 110 +--
 kernel/printk/printk_ringbuffer.c             | 683 ++++++++++++++----
 kernel/printk/printk_ringbuffer.h             |  35 +-
 scripts/gdb/linux/dmesg.py                    |  12 +-
 5 files changed, 615 insertions(+), 238 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH printk v5 1/6] printk: ringbuffer: relocate get_data()
  2020-09-14 12:33 [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling John Ogness
@ 2020-09-14 12:33 ` John Ogness
  2020-09-14 12:33 ` [PATCH printk v5 2/6] printk: ringbuffer: add BLK_DATALESS() macro John Ogness
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: John Ogness @ 2020-09-14 12:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

Move the internal get_data() function as-is above prb_reserve() so
that a later change can make use of the static function.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk_ringbuffer.c | 116 +++++++++++++++---------------
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 0659b50872b5..aa6e31a27601 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1038,6 +1038,64 @@ static unsigned int space_used(struct prb_data_ring *data_ring,
 		DATA_SIZE(data_ring) - DATA_INDEX(data_ring, blk_lpos->begin));
 }
 
+/*
+ * Given @blk_lpos, return a pointer to the writer data from the data block
+ * and calculate the size of the data part. A NULL pointer is returned if
+ * @blk_lpos specifies values that could never be legal.
+ *
+ * This function (used by readers) performs strict validation on the lpos
+ * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
+ * triggered if an internal error is detected.
+ */
+static const char *get_data(struct prb_data_ring *data_ring,
+			    struct prb_data_blk_lpos *blk_lpos,
+			    unsigned int *data_size)
+{
+	struct prb_data_block *db;
+
+	/* Data-less data block description. */
+	if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+		if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
+			*data_size = 0;
+			return "";
+		}
+		return NULL;
+	}
+
+	/* Regular data block: @begin less than @next and in same wrap. */
+	if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next) &&
+	    blk_lpos->begin < blk_lpos->next) {
+		db = to_block(data_ring, blk_lpos->begin);
+		*data_size = blk_lpos->next - blk_lpos->begin;
+
+	/* Wrapping data block: @begin is one wrap behind @next. */
+	} else if (DATA_WRAPS(data_ring, blk_lpos->begin + DATA_SIZE(data_ring)) ==
+		   DATA_WRAPS(data_ring, blk_lpos->next)) {
+		db = to_block(data_ring, 0);
+		*data_size = DATA_INDEX(data_ring, blk_lpos->next);
+
+	/* Illegal block description. */
+	} else {
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	/* A valid data block will always be aligned to the ID size. */
+	if (WARN_ON_ONCE(blk_lpos->begin != ALIGN(blk_lpos->begin, sizeof(db->id))) ||
+	    WARN_ON_ONCE(blk_lpos->next != ALIGN(blk_lpos->next, sizeof(db->id)))) {
+		return NULL;
+	}
+
+	/* A valid data block will always have at least an ID. */
+	if (WARN_ON_ONCE(*data_size < sizeof(db->id)))
+		return NULL;
+
+	/* Subtract block ID space from size to reflect data size. */
+	*data_size -= sizeof(db->id);
+
+	return &db->data[0];
+}
+
 /**
  * prb_reserve() - Reserve space in the ringbuffer.
  *
@@ -1192,64 +1250,6 @@ void prb_commit(struct prb_reserved_entry *e)
 	local_irq_restore(e->irqflags);
 }
 
-/*
- * Given @blk_lpos, return a pointer to the writer data from the data block
- * and calculate the size of the data part. A NULL pointer is returned if
- * @blk_lpos specifies values that could never be legal.
- *
- * This function (used by readers) performs strict validation on the lpos
- * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
- * triggered if an internal error is detected.
- */
-static const char *get_data(struct prb_data_ring *data_ring,
-			    struct prb_data_blk_lpos *blk_lpos,
-			    unsigned int *data_size)
-{
-	struct prb_data_block *db;
-
-	/* Data-less data block description. */
-	if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
-		if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
-			*data_size = 0;
-			return "";
-		}
-		return NULL;
-	}
-
-	/* Regular data block: @begin less than @next and in same wrap. */
-	if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next) &&
-	    blk_lpos->begin < blk_lpos->next) {
-		db = to_block(data_ring, blk_lpos->begin);
-		*data_size = blk_lpos->next - blk_lpos->begin;
-
-	/* Wrapping data block: @begin is one wrap behind @next. */
-	} else if (DATA_WRAPS(data_ring, blk_lpos->begin + DATA_SIZE(data_ring)) ==
-		   DATA_WRAPS(data_ring, blk_lpos->next)) {
-		db = to_block(data_ring, 0);
-		*data_size = DATA_INDEX(data_ring, blk_lpos->next);
-
-	/* Illegal block description. */
-	} else {
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-
-	/* A valid data block will always be aligned to the ID size. */
-	if (WARN_ON_ONCE(blk_lpos->begin != ALIGN(blk_lpos->begin, sizeof(db->id))) ||
-	    WARN_ON_ONCE(blk_lpos->next != ALIGN(blk_lpos->next, sizeof(db->id)))) {
-		return NULL;
-	}
-
-	/* A valid data block will always have at least an ID. */
-	if (WARN_ON_ONCE(*data_size < sizeof(db->id)))
-		return NULL;
-
-	/* Subtract block ID space from size to reflect data size. */
-	*data_size -= sizeof(db->id);
-
-	return &db->data[0];
-}
-
 /*
  * Count the number of lines in provided text. All text has at least 1 line
  * (even if @text_size is 0). Each '\n' processed is counted as an additional
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH printk v5 2/6] printk: ringbuffer: add BLK_DATALESS() macro
  2020-09-14 12:33 [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling John Ogness
  2020-09-14 12:33 ` [PATCH printk v5 1/6] printk: ringbuffer: relocate get_data() John Ogness
@ 2020-09-14 12:33 ` John Ogness
  2020-09-14 12:33 ` [PATCH printk v5 3/6] printk: ringbuffer: clear initial reserved fields John Ogness
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: John Ogness @ 2020-09-14 12:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

Rather than continually needing to explicitly check @begin and @next
to identify a dataless block, introduce and use a BLK_DATALESS()
macro.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk_ringbuffer.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index aa6e31a27601..6ee5ebce1450 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -266,6 +266,8 @@
 
 /* Determine if a logical position refers to a data-less block. */
 #define LPOS_DATALESS(lpos)		((lpos) & 1UL)
+#define BLK_DATALESS(blk)		(LPOS_DATALESS((blk)->begin) && \
+					 LPOS_DATALESS((blk)->next))
 
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
@@ -1021,7 +1023,7 @@ static unsigned int space_used(struct prb_data_ring *data_ring,
 			       struct prb_data_blk_lpos *blk_lpos)
 {
 	/* Data-less blocks take no space. */
-	if (LPOS_DATALESS(blk_lpos->begin))
+	if (BLK_DATALESS(blk_lpos))
 		return 0;
 
 	if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, blk_lpos->next)) {
@@ -1054,7 +1056,7 @@ static const char *get_data(struct prb_data_ring *data_ring,
 	struct prb_data_block *db;
 
 	/* Data-less data block description. */
-	if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+	if (BLK_DATALESS(blk_lpos)) {
 		if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
 			*data_size = 0;
 			return "";
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH printk v5 3/6] printk: ringbuffer: clear initial reserved fields
  2020-09-14 12:33 [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling John Ogness
  2020-09-14 12:33 ` [PATCH printk v5 1/6] printk: ringbuffer: relocate get_data() John Ogness
  2020-09-14 12:33 ` [PATCH printk v5 2/6] printk: ringbuffer: add BLK_DATALESS() macro John Ogness
@ 2020-09-14 12:33 ` John Ogness
  2020-09-14 12:33 ` [PATCH printk v5 4/6] printk: ringbuffer: change representation of states John Ogness
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: John Ogness @ 2020-09-14 12:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

prb_reserve() will set some meta data values and leave others
uninitialized (or rather, containing the values of the previous
wrap). Simplify the API by always clearing out all the fields.
Only the sequence number is filled in. The caller is now
responsible for filling in the rest of the meta data fields.
In particular, for correctly filling in text and dict lengths.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c            | 12 ++++++++----
 kernel/printk/printk_ringbuffer.c | 30 ++++++++++++++++++------------
 2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fec71229169e..964b5701688f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -520,8 +520,11 @@ static int log_store(u32 caller_id, int facility, int level,
 	memcpy(&r.text_buf[0], text, text_len);
 	if (trunc_msg_len)
 		memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len);
-	if (r.dict_buf)
+	r.info->text_len = text_len + trunc_msg_len;
+	if (r.dict_buf) {
 		memcpy(&r.dict_buf[0], dict, dict_len);
+		r.info->dict_len = dict_len;
+	}
 	r.info->facility = facility;
 	r.info->level = level & 7;
 	r.info->flags = flags & 0x1f;
@@ -1069,10 +1072,11 @@ static unsigned int __init add_to_rb(struct printk_ringbuffer *rb,
 	if (!prb_reserve(&e, rb, &dest_r))
 		return 0;
 
-	memcpy(&dest_r.text_buf[0], &r->text_buf[0], dest_r.text_buf_size);
+	memcpy(&dest_r.text_buf[0], &r->text_buf[0], r->info->text_len);
+	dest_r.info->text_len = r->info->text_len;
 	if (dest_r.dict_buf) {
-		memcpy(&dest_r.dict_buf[0], &r->dict_buf[0],
-		       dest_r.dict_buf_size);
+		memcpy(&dest_r.dict_buf[0], &r->dict_buf[0], r->info->dict_len);
+		dest_r.info->dict_len = r->info->dict_len;
 	}
 	dest_r.info->facility = r->info->facility;
 	dest_r.info->level = r->info->level;
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 6ee5ebce1450..82347abb22a5 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -146,10 +146,13 @@
  *
  *	if (prb_reserve(&e, &test_rb, &r)) {
  *		snprintf(r.text_buf, r.text_buf_size, "%s", textstr);
+ *		r.info->text_len = strlen(textstr);
  *
  *		// dictionary allocation may have failed
- *		if (r.dict_buf)
+ *		if (r.dict_buf) {
  *			snprintf(r.dict_buf, r.dict_buf_size, "%s", dictstr);
+ *			r.info->dict_len = strlen(dictstr);
+ *		}
  *
  *		r.info->ts_nsec = local_clock();
  *
@@ -1125,9 +1128,9 @@ static const char *get_data(struct prb_data_ring *data_ring,
  * @dict_buf_size is set to 0. Writers must check this before writing to
  * dictionary space.
  *
- * @info->text_len and @info->dict_len will already be set to @text_buf_size
- * and @dict_buf_size, respectively. If dictionary space reservation fails,
- * @info->dict_len is set to 0.
+ * Important: @info->text_len and @info->dict_len need to be set correctly by
+ *            the writer in order for data to be readable and/or extended.
+ *            Their values are initialized to 0.
  */
 bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 		 struct printk_record *r)
@@ -1135,6 +1138,7 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 	struct prb_desc_ring *desc_ring = &rb->desc_ring;
 	struct prb_desc *d;
 	unsigned long id;
+	u64 seq;
 
 	if (!data_check_size(&rb->text_data_ring, r->text_buf_size))
 		goto fail;
@@ -1159,6 +1163,14 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 
 	d = to_desc(desc_ring, id);
 
+	/*
+	 * All @info fields (except @seq) are cleared and must be filled in
+	 * by the writer. Save @seq before clearing because it is used to
+	 * determine the new sequence number.
+	 */
+	seq = d->info.seq;
+	memset(&d->info, 0, sizeof(d->info));
+
 	/*
 	 * Set the @e fields here so that prb_commit() can be used if
 	 * text data allocation fails.
@@ -1177,17 +1189,15 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 	 * See the "Bootstrap" comment block in printk_ringbuffer.h for
 	 * details about how the initializer bootstraps the descriptors.
 	 */
-	if (d->info.seq == 0 && DESC_INDEX(desc_ring, id) != 0)
+	if (seq == 0 && DESC_INDEX(desc_ring, id) != 0)
 		d->info.seq = DESC_INDEX(desc_ring, id);
 	else
-		d->info.seq += DESCS_COUNT(desc_ring);
+		d->info.seq = seq + DESCS_COUNT(desc_ring);
 
 	r->text_buf = data_alloc(rb, &rb->text_data_ring, r->text_buf_size,
 				 &d->text_blk_lpos, id);
 	/* If text data allocation fails, a data-less record is committed. */
 	if (r->text_buf_size && !r->text_buf) {
-		d->info.text_len = 0;
-		d->info.dict_len = 0;
 		prb_commit(e);
 		/* prb_commit() re-enabled interrupts. */
 		goto fail;
@@ -1204,10 +1214,6 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 
 	r->info = &d->info;
 
-	/* Set default values for the sizes. */
-	d->info.text_len = r->text_buf_size;
-	d->info.dict_len = r->dict_buf_size;
-
 	/* Record full text space used by record. */
 	e->text_space = space_used(&rb->text_data_ring, &d->text_blk_lpos);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH printk v5 4/6] printk: ringbuffer: change representation of states
  2020-09-14 12:33 [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling John Ogness
                   ` (2 preceding siblings ...)
  2020-09-14 12:33 ` [PATCH printk v5 3/6] printk: ringbuffer: clear initial reserved fields John Ogness
@ 2020-09-14 12:33 ` John Ogness
  2020-09-14 12:33 ` [PATCH printk v5 5/6] printk: ringbuffer: add finalization/extension support John Ogness
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: John Ogness @ 2020-09-14 12:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

Rather than deriving the state by evaluating bits within the flags
area of the state variable, assign the states explicit values and
set those values in the flags area. Introduce macros to make it
simple to read and write state values for the state variable.

Although the functionality is preserved, the binary representation
for the states is changed.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 Documentation/admin-guide/kdump/gdbmacros.txt | 12 ++++---
 kernel/printk/printk_ringbuffer.c             | 28 +++++------------
 kernel/printk/printk_ringbuffer.h             | 31 ++++++++++++-------
 scripts/gdb/linux/dmesg.py                    | 11 ++++---
 4 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt b/Documentation/admin-guide/kdump/gdbmacros.txt
index 7adece30237e..8f533b751c46 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -295,9 +295,12 @@ document dump_record
 end
 
 define dmesg
-	set var $desc_committed = 1UL << ((sizeof(long) * 8) - 1)
-	set var $flags_mask = 3UL << ((sizeof(long) * 8) - 2)
-	set var $id_mask = ~$flags_mask
+	# definitions from kernel/printk/printk_ringbuffer.h
+	set var $desc_committed = 1
+	set var $desc_sv_bits = sizeof(long) * 8
+	set var $desc_flags_shift = $desc_sv_bits - 2
+	set var $desc_flags_mask = 3 << $desc_flags_shift
+	set var $id_mask = ~$desc_flags_mask
 
 	set var $desc_count = 1U << prb->desc_ring.count_bits
 	set var $prev_flags = 0
@@ -309,7 +312,8 @@ define dmesg
 		set var $desc = &prb->desc_ring.descs[$id % $desc_count]
 
 		# skip non-committed record
-		if (($desc->state_var.counter & $flags_mask) == $desc_committed)
+		set var $state = 3 & ($desc->state_var.counter >> $desc_flags_shift)
+		if ($state == $desc_committed)
 			dump_record $desc $prev_flags
 			set var $prev_flags = $desc->info.flags
 		end
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 82347abb22a5..911fbe150e9a 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -348,14 +348,6 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
 	return true;
 }
 
-/* The possible responses of a descriptor state-query. */
-enum desc_state {
-	desc_miss,	/* ID mismatch */
-	desc_reserved,	/* reserved, in use by writer */
-	desc_committed, /* committed, writer is done */
-	desc_reusable,	/* free, not yet used by any writer */
-};
-
 /* Query the state of a descriptor. */
 static enum desc_state get_desc_state(unsigned long id,
 				      unsigned long state_val)
@@ -363,13 +355,7 @@ static enum desc_state get_desc_state(unsigned long id,
 	if (id != DESC_ID(state_val))
 		return desc_miss;
 
-	if (state_val & DESC_REUSE_MASK)
-		return desc_reusable;
-
-	if (state_val & DESC_COMMITTED_MASK)
-		return desc_committed;
-
-	return desc_reserved;
+	return DESC_STATE(state_val);
 }
 
 /*
@@ -467,8 +453,8 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
 static void desc_make_reusable(struct prb_desc_ring *desc_ring,
 			       unsigned long id)
 {
-	unsigned long val_committed = id | DESC_COMMITTED_MASK;
-	unsigned long val_reusable = val_committed | DESC_REUSE_MASK;
+	unsigned long val_committed = DESC_SV(id, desc_committed);
+	unsigned long val_reusable = DESC_SV(id, desc_reusable);
 	struct prb_desc *desc = to_desc(desc_ring, id);
 	atomic_long_t *state_var = &desc->state_var;
 
@@ -904,7 +890,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
 	 */
 	prev_state_val = atomic_long_read(&desc->state_var); /* LMM(desc_reserve:E) */
 	if (prev_state_val &&
-	    prev_state_val != (id_prev_wrap | DESC_COMMITTED_MASK | DESC_REUSE_MASK)) {
+	    get_desc_state(id_prev_wrap, prev_state_val) != desc_reusable) {
 		WARN_ON_ONCE(1);
 		return false;
 	}
@@ -918,7 +904,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
 	 * This pairs with desc_read:D.
 	 */
 	if (!atomic_long_try_cmpxchg(&desc->state_var, &prev_state_val,
-				     id | 0)) { /* LMM(desc_reserve:F) */
+			DESC_SV(id, desc_reserved))) { /* LMM(desc_reserve:F) */
 		WARN_ON_ONCE(1);
 		return false;
 	}
@@ -1237,7 +1223,7 @@ void prb_commit(struct prb_reserved_entry *e)
 {
 	struct prb_desc_ring *desc_ring = &e->rb->desc_ring;
 	struct prb_desc *d = to_desc(desc_ring, e->id);
-	unsigned long prev_state_val = e->id | 0;
+	unsigned long prev_state_val = DESC_SV(e->id, desc_reserved);
 
 	/* Now the writer has finished all writing: LMM(prb_commit:A) */
 
@@ -1250,7 +1236,7 @@ void prb_commit(struct prb_reserved_entry *e)
 	 * this. This pairs with desc_read:B.
 	 */
 	if (!atomic_long_try_cmpxchg(&d->state_var, &prev_state_val,
-				     e->id | DESC_COMMITTED_MASK)) { /* LMM(prb_commit:B) */
+			DESC_SV(e->id, desc_committed))) { /* LMM(prb_commit:B) */
 		WARN_ON_ONCE(1);
 	}
 
diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index e6302da041f9..a9d85a6727b1 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -112,16 +112,25 @@ struct prb_reserved_entry {
 	unsigned int			text_space;
 };
 
-#define _DATA_SIZE(sz_bits)		(1UL << (sz_bits))
-#define _DESCS_COUNT(ct_bits)		(1U << (ct_bits))
-#define DESC_SV_BITS			(sizeof(unsigned long) * 8)
-#define DESC_COMMITTED_MASK		(1UL << (DESC_SV_BITS - 1))
-#define DESC_REUSE_MASK			(1UL << (DESC_SV_BITS - 2))
-#define DESC_FLAGS_MASK			(DESC_COMMITTED_MASK | DESC_REUSE_MASK)
-#define DESC_ID_MASK			(~DESC_FLAGS_MASK)
-#define DESC_ID(sv)			((sv) & DESC_ID_MASK)
-#define FAILED_LPOS			0x1
-#define NO_LPOS				0x3
+/* The possible responses of a descriptor state-query. */
+enum desc_state {
+	desc_miss	=  -1,	/* ID mismatch (pseudo state) */
+	desc_reserved	= 0x0,	/* reserved, in use by writer */
+	desc_committed	= 0x1,	/* committed by writer */
+	desc_reusable	= 0x3,	/* free, not yet used by any writer */
+};
+
+#define _DATA_SIZE(sz_bits)	(1UL << (sz_bits))
+#define _DESCS_COUNT(ct_bits)	(1U << (ct_bits))
+#define DESC_SV_BITS		(sizeof(unsigned long) * 8)
+#define DESC_FLAGS_SHIFT	(DESC_SV_BITS - 2)
+#define DESC_FLAGS_MASK		(3UL << DESC_FLAGS_SHIFT)
+#define DESC_STATE(sv)		(3UL & (sv >> DESC_FLAGS_SHIFT))
+#define DESC_SV(id, state)	(((unsigned long)state << DESC_FLAGS_SHIFT) | id)
+#define DESC_ID_MASK		(~DESC_FLAGS_MASK)
+#define DESC_ID(sv)		((sv) & DESC_ID_MASK)
+#define FAILED_LPOS		0x1
+#define NO_LPOS			0x3
 
 #define FAILED_BLK_LPOS	\
 {				\
@@ -213,7 +222,7 @@ struct prb_reserved_entry {
  */
 #define BLK0_LPOS(sz_bits)	(-(_DATA_SIZE(sz_bits)))
 #define DESC0_ID(ct_bits)	DESC_ID(-(_DESCS_COUNT(ct_bits) + 1))
-#define DESC0_SV(ct_bits)	(DESC_COMMITTED_MASK | DESC_REUSE_MASK | DESC0_ID(ct_bits))
+#define DESC0_SV(ct_bits)	DESC_SV(DESC0_ID(ct_bits), desc_reusable)
 
 /*
  * Define a ringbuffer with an external text data buffer. The same as
diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
index 6c6022012ea8..dd8c0b95063a 100644
--- a/scripts/gdb/linux/dmesg.py
+++ b/scripts/gdb/linux/dmesg.py
@@ -78,10 +78,10 @@ class LxDmesg(gdb.Command):
         len_off = off + printk_info_type.get_type()['text_len'].bitpos // 8
 
         # definitions from kernel/printk/printk_ringbuffer.h
+        desc_committed = 1
         desc_sv_bits = utils.get_long_type().sizeof * 8
-        desc_committed_mask = 1 << (desc_sv_bits - 1)
-        desc_reuse_mask = 1 << (desc_sv_bits - 2)
-        desc_flags_mask = desc_committed_mask | desc_reuse_mask
+        desc_flags_shift = desc_sv_bits - 2
+        desc_flags_mask = 3 << desc_flags_shift
         desc_id_mask = ~desc_flags_mask
 
         # read in tail and head descriptor ids
@@ -96,8 +96,9 @@ class LxDmesg(gdb.Command):
             desc_off = desc_sz * ind
 
             # skip non-committed record
-            state = utils.read_u64(descs, desc_off + sv_off + counter_off) & desc_flags_mask
-            if state != desc_committed_mask:
+            state = 3 & (utils.read_u64(descs, desc_off + sv_off +
+                                        counter_off) >> desc_flags_shift)
+            if state != desc_committed:
                 if did == head_id:
                     break
                 did = (did + 1) & desc_id_mask
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH printk v5 5/6] printk: ringbuffer: add finalization/extension support
  2020-09-14 12:33 [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling John Ogness
                   ` (3 preceding siblings ...)
  2020-09-14 12:33 ` [PATCH printk v5 4/6] printk: ringbuffer: change representation of states John Ogness
@ 2020-09-14 12:33 ` John Ogness
  2020-09-15  9:30   ` Petr Mladek
  2020-09-14 12:33 ` [PATCH printk v5 6/6] printk: reimplement log_cont using record extension John Ogness
  2020-09-15  9:47 ` [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling Petr Mladek
  6 siblings, 1 reply; 12+ messages in thread
From: John Ogness @ 2020-09-14 12:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

Add support for extending the newest data block. For this, introduce
a new finalization state (desc_finalized) denoting a committed
descriptor that cannot be extended.

Until a record is finalized, a writer can reopen that record to
append new data. Reopening a record means transitioning from the
desc_committed state back to the desc_reserved state.

A writer can explicitly finalize a record if there is no intention
of extending it. Also, records are automatically finalized when a
new record is reserved. This relieves writers of needing to
explicitly finalize while also making such records available to
readers sooner. (Readers can only traverse finalized records.)

Four new memory barrier pairs are introduced. Two of them are
insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose, pairing, and
context of the two existing memory barrier pairs they provide an
alternate path for. The other two new memory barrier pairs are
significant additions:

desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
    ensure the state transitions back to desc_reserved before
    fully trusting the descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
    ensure the state transitions to desc_committed before checking
    the head ID to see if the descriptor needs to be finalized.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 Documentation/admin-guide/kdump/gdbmacros.txt |   3 +-
 kernel/printk/printk_ringbuffer.c             | 525 ++++++++++++++++--
 kernel/printk/printk_ringbuffer.h             |   6 +-
 scripts/gdb/linux/dmesg.py                    |   3 +-
 4 files changed, 480 insertions(+), 57 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt b/Documentation/admin-guide/kdump/gdbmacros.txt
index 8f533b751c46..94fabb165abf 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -297,6 +297,7 @@ end
 define dmesg
 	# definitions from kernel/printk/printk_ringbuffer.h
 	set var $desc_committed = 1
+	set var $desc_finalized = 2
 	set var $desc_sv_bits = sizeof(long) * 8
 	set var $desc_flags_shift = $desc_sv_bits - 2
 	set var $desc_flags_mask = 3 << $desc_flags_shift
@@ -313,7 +314,7 @@ define dmesg
 
 		# skip non-committed record
 		set var $state = 3 & ($desc->state_var.counter >> $desc_flags_shift)
-		if ($state == $desc_committed)
+		if ($state == $desc_committed || $state == $desc_finalized)
 			dump_record $desc $prev_flags
 			set var $prev_flags = $desc->info.flags
 		end
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 911fbe150e9a..4e526c79f89c 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -46,20 +46,26 @@
  * into a single descriptor field named @state_var, allowing ID and state to
  * be synchronously and atomically updated.
  *
- * Descriptors have three states:
+ * Descriptors have four states:
  *
  *   reserved
  *     A writer is modifying the record.
  *
  *   committed
- *     The record and all its data are complete and available for reading.
+ *     The record and all its data are written. A writer can reopen the
+ *     descriptor (transitioning it back to reserved), but in the committed
+ *     state the data is consistent.
+ *
+ *   finalized
+ *     The record and all its data are complete and available for reading. A
+ *     writer cannot reopen the descriptor.
  *
  *   reusable
  *     The record exists, but its text and/or dictionary data may no longer
  *     be available.
  *
  * Querying the @state_var of a record requires providing the ID of the
- * descriptor to query. This can yield a possible fourth (pseudo) state:
+ * descriptor to query. This can yield a possible fifth (pseudo) state:
  *
  *   miss
  *     The descriptor being queried has an unexpected ID.
@@ -79,6 +85,28 @@
  * committed or reusable queried state. This makes it possible that a valid
  * sequence number of the tail is always available.
  *
+ * Descriptor Finalization
+ * ~~~~~~~~~~~~~~~~~~~~~~~
+ * When a writer calls the commit function prb_commit(), record data is
+ * fully stored and is consistent within the ringbuffer. However, a writer can
+ * reopen that record, claiming exclusive access (as with prb_reserve()), and
+ * modify that record. When finished, the writer must again commit the record.
+ *
+ * In order for a record to be made available to readers (and also become
+ * recyclable for writers), it must be finalized. A finalized record cannot be
+ * reopened and can never become "unfinalized". Record finalization can occur
+ * in three different scenarios:
+ *
+ *   1) A writer can simultaneously commit and finalize its record by calling
+ *      prb_final_commit() instead of prb_commit().
+ *
+ *   2) When a new record is reserved and the previous record has been
+ *      committed via prb_commit(), that previous record is automatically
+ *      finalized.
+ *
+ *   3) When a record is committed via prb_commit() and a newer record
+ *      already exists, the record being committed is automatically finalized.
+ *
  * Data Rings
  * ~~~~~~~~~~
  * The two data rings (text and dictionary) function identically. They exist
@@ -97,7 +125,7 @@
  * are met:
  *
  *   1) The descriptor associated with the data block is in the committed
- *      queried state.
+ *      or finalized queried state.
  *
  *   2) The blk_lpos struct within the descriptor associated with the data
  *      block references back to the same data block.
@@ -156,9 +184,38 @@
  *
  *		r.info->ts_nsec = local_clock();
  *
+ *		prb_final_commit(&e);
+ *	}
+ *
+ * Note that additional writer functions are available to extend a record
+ * after it has been committed but not yet finalized. This can be done as
+ * long as no new records have been reserved and the caller is the same.
+ *
+ * Sample writer code (record extending)::
+ *
+ *		// alternate rest of previous example
+ *		r.info->ts_nsec = local_clock();
+ *		r.info->text_len = strlen(textstr);
+ *		r.info->caller_id = printk_caller_id();
+ *
+ *		// commit the record (but do not finalize yet)
  *		prb_commit(&e);
  *	}
  *
+ *	...
+ *
+ *	// specify additional 5 bytes text space to extend
+ *	prb_rec_init_wr(&r, 5, 0);
+ *
+ *	if (prb_reserve_in_last(&e, &test_rb, &r, printk_caller_id())) {
+ *		snprintf(&r.text_buf[r.info->text_len],
+ *			 r.text_buf_size - r.info->text_len, "hello");
+ *
+ *		r.info->text_len += 5;
+ *
+ *		prb_final_commit(&e);
+ *	}
+ *
  * Sample reader code::
  *
  *	struct printk_info info;
@@ -236,15 +293,21 @@
  *   desc_reserve:F / desc_read:D
  *     set new descriptor id and reserved (state), then allow writer changes
  *
- *   data_alloc:A / desc_read:D
+ *   data_alloc:A (or data_realloc:A) / desc_read:D
  *     set old descriptor reusable (state), then modify new data block area
  *
- *   data_alloc:A / data_push_tail:B
+ *   data_alloc:A (or data_realloc:A) / data_push_tail:B
  *     push data tail (lpos), then modify new data block area
  *
- *   prb_commit:B / desc_read:B
+ *   _prb_commit:B / desc_read:B
  *     store writer changes, then set new descriptor committed (state)
  *
+ *   desc_reopen_last:A / _prb_commit:B
+ *     set descriptor reserved (state), then read descriptor data
+ *
+ *   _prb_commit:B / desc_reserve:D
+ *     set new descriptor committed (state), then check descriptor head (id)
+ *
  *   data_push_tail:D / data_push_tail:A
  *     set descriptor reusable (state), then push data tail (lpos)
  *
@@ -380,16 +443,16 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
 	/*
 	 * Guarantee the state is loaded before copying the descriptor
 	 * content. This avoids copying obsolete descriptor content that might
-	 * not apply to the descriptor state. This pairs with prb_commit:B.
+	 * not apply to the descriptor state. This pairs with _prb_commit:B.
 	 *
 	 * Memory barrier involvement:
 	 *
-	 * If desc_read:A reads from prb_commit:B, then desc_read:C reads
-	 * from prb_commit:A.
+	 * If desc_read:A reads from _prb_commit:B, then desc_read:C reads
+	 * from _prb_commit:A.
 	 *
 	 * Relies on:
 	 *
-	 * WMB from prb_commit:A to prb_commit:B
+	 * WMB from _prb_commit:A to _prb_commit:B
 	 *    matching
 	 * RMB from desc_read:A to desc_read:C
 	 */
@@ -420,7 +483,8 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
 	 *
 	 * 2. Guarantee the record data is loaded before re-checking the
 	 *    state. This avoids reading an obsolete descriptor state that may
-	 *    not apply to the copied data. This pairs with data_alloc:A.
+	 *    not apply to the copied data. This pairs with data_alloc:A and
+	 *    data_realloc:A.
 	 *
 	 *    Memory barrier involvement:
 	 *
@@ -446,19 +510,19 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring,
 }
 
 /*
- * Take a specified descriptor out of the committed state by attempting
- * the transition from committed to reusable. Either this context or some
+ * Take a specified descriptor out of the finalized state by attempting
+ * the transition from finalized to reusable. Either this context or some
  * other context will have been successful.
  */
 static void desc_make_reusable(struct prb_desc_ring *desc_ring,
 			       unsigned long id)
 {
-	unsigned long val_committed = DESC_SV(id, desc_committed);
+	unsigned long val_finalized = DESC_SV(id, desc_finalized);
 	unsigned long val_reusable = DESC_SV(id, desc_reusable);
 	struct prb_desc *desc = to_desc(desc_ring, id);
 	atomic_long_t *state_var = &desc->state_var;
 
-	atomic_long_cmpxchg_relaxed(state_var, val_committed,
+	atomic_long_cmpxchg_relaxed(state_var, val_finalized,
 				    val_reusable); /* LMM(desc_make_reusable:A) */
 }
 
@@ -467,7 +531,7 @@ static void desc_make_reusable(struct prb_desc_ring *desc_ring,
  * data block from @lpos_begin until @lpos_end into the reusable state.
  *
  * If there is any problem making the associated descriptor reusable, either
- * the descriptor has not yet been committed or another writer context has
+ * the descriptor has not yet been finalized or another writer context has
  * already pushed the tail lpos past the problematic data block. Regardless,
  * on error the caller can re-load the tail lpos to determine the situation.
  */
@@ -511,10 +575,10 @@ static bool data_make_reusable(struct printk_ringbuffer *rb,
 
 		switch (d_state) {
 		case desc_miss:
-			return false;
 		case desc_reserved:
-			return false;
 		case desc_committed:
+			return false;
+		case desc_finalized:
 			/*
 			 * This data block is invalid if the descriptor
 			 * does not point back to it.
@@ -599,7 +663,7 @@ static bool data_push_tail(struct printk_ringbuffer *rb,
 			 *    data_make_reusable() may be due to a newly
 			 *    recycled data area causing the tail lpos to
 			 *    have been previously pushed. This pairs with
-			 *    data_alloc:A.
+			 *    data_alloc:A and data_realloc:A.
 			 *
 			 *    Memory barrier involvement:
 			 *
@@ -712,8 +776,9 @@ static bool desc_push_tail(struct printk_ringbuffer *rb,
 		 */
 		return true;
 	case desc_reserved:
-		return false;
 	case desc_committed:
+		return false;
+	case desc_finalized:
 		desc_make_reusable(desc_ring, tail_id);
 		break;
 	case desc_reusable:
@@ -734,7 +799,7 @@ static bool desc_push_tail(struct printk_ringbuffer *rb,
 
 	/*
 	 * Check the next descriptor after @tail_id before pushing the tail
-	 * to it because the tail must always be in a committed or reusable
+	 * to it because the tail must always be in a finalized or reusable
 	 * state. The implementation of prb_first_seq() relies on this.
 	 *
 	 * A successful read implies that the next descriptor is less than or
@@ -743,7 +808,7 @@ static bool desc_push_tail(struct printk_ringbuffer *rb,
 	 */
 	d_state = desc_read(desc_ring, DESC_ID(tail_id + 1), &desc); /* LMM(desc_push_tail:A) */
 
-	if (d_state == desc_committed || d_state == desc_reusable) {
+	if (d_state == desc_finalized || d_state == desc_reusable) {
 		/*
 		 * Guarantee any descriptor states that have transitioned to
 		 * reusable are stored before pushing the tail ID. This allows
@@ -878,6 +943,10 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
 		 *    another CPU may have pushed the tail ID. This pairs
 		 *    with desc_push_tail:C and this also pairs with
 		 *    prb_first_seq:C.
+		 *
+		 * 5. Guarantee the head ID is stored before trying to
+		 *    finalize the previous descriptor. This pairs with
+		 *    _prb_commit:B.
 		 */
 	} while (!atomic_long_try_cmpxchg(&desc_ring->head_id, &head_id,
 					  id)); /* LMM(desc_reserve:D) */
@@ -1007,6 +1076,84 @@ static char *data_alloc(struct printk_ringbuffer *rb,
 	return &blk->data[0];
 }
 
+/*
+ * Try to resize an existing data block associated with the descriptor
+ * specified by @id. If the resized data block should become wrapped, it
+ * copies the old data to the new data block. If @size yields a data block
+ * with the same or less size, the data block is left as is.
+ *
+ * Fail if this is not the last allocated data block or if there is not
+ * enough space or it is not possible make enough space.
+ *
+ * Return a pointer to the beginning of the entire data buffer or NULL on
+ * failure.
+ */
+static char *data_realloc(struct printk_ringbuffer *rb,
+			  struct prb_data_ring *data_ring, unsigned int size,
+			  struct prb_data_blk_lpos *blk_lpos, unsigned long id)
+{
+	struct prb_data_block *blk;
+	unsigned long head_lpos;
+	unsigned long next_lpos;
+	bool wrapped;
+
+	/* Reallocation only works if @blk_lpos is the newest data block. */
+	head_lpos = atomic_long_read(&data_ring->head_lpos);
+	if (head_lpos != blk_lpos->next)
+		return NULL;
+
+	/* Keep track if @blk_lpos was a wrapping data block. */
+	wrapped = (DATA_WRAPS(data_ring, blk_lpos->begin) != DATA_WRAPS(data_ring, blk_lpos->next));
+
+	size = to_blk_size(size);
+
+	next_lpos = get_next_lpos(data_ring, blk_lpos->begin, size);
+
+	/* If the data block does not increase, there is nothing to do. */
+	if (head_lpos - next_lpos < DATA_SIZE(data_ring)) {
+		blk = to_block(data_ring, blk_lpos->begin);
+		return &blk->data[0];
+	}
+
+	if (!data_push_tail(rb, data_ring, next_lpos - DATA_SIZE(data_ring)))
+		return NULL;
+
+	/* The memory barrier involvement is the same as data_alloc:A. */
+	if (!atomic_long_try_cmpxchg(&data_ring->head_lpos, &head_lpos,
+				     next_lpos)) { /* LMM(data_realloc:A) */
+		return NULL;
+	}
+
+	blk = to_block(data_ring, blk_lpos->begin);
+
+	if (DATA_WRAPS(data_ring, blk_lpos->begin) != DATA_WRAPS(data_ring, next_lpos)) {
+		struct prb_data_block *old_blk = blk;
+
+		/* Wrapping data blocks store their data at the beginning. */
+		blk = to_block(data_ring, 0);
+
+		/*
+		 * Store the ID on the wrapped block for consistency.
+		 * The printk_ringbuffer does not actually use it.
+		 */
+		blk->id = id;
+
+		if (!wrapped) {
+			/*
+			 * Since the allocated space is now in the newly
+			 * created wrapping data block, copy the content
+			 * from the old data block.
+			 */
+			memcpy(&blk->data[0], &old_blk->data[0],
+			       (blk_lpos->next - blk_lpos->begin) - sizeof(blk->id));
+		}
+	}
+
+	blk_lpos->next = next_lpos;
+
+	return &blk->data[0];
+}
+
 /* Return the number of bytes used by a data block. */
 static unsigned int space_used(struct prb_data_ring *data_ring,
 			       struct prb_data_blk_lpos *blk_lpos)
@@ -1087,6 +1234,206 @@ static const char *get_data(struct prb_data_ring *data_ring,
 	return &db->data[0];
 }
 
+/*
+ * Attempt to transition the newest descriptor from committed back to reserved
+ * so that the record can be modified by a writer again. This is only possible
+ * if the descriptor is not yet finalized and the provided @caller_id matches.
+ */
+static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
+					 u32 caller_id, unsigned long *id_out)
+{
+	unsigned long prev_state_val;
+	enum desc_state d_state;
+	struct prb_desc desc;
+	struct prb_desc *d;
+	unsigned long id;
+
+	id = atomic_long_read(&desc_ring->head_id);
+
+	/*
+	 * To reduce unnecessarily reopening, first check if the descriptor
+	 * state and caller ID are correct.
+	 */
+	d_state = desc_read(desc_ring, id, &desc);
+	if (d_state != desc_committed || desc.info.caller_id != caller_id)
+		return NULL;
+
+	d = to_desc(desc_ring, id);
+
+	prev_state_val = DESC_SV(id, desc_committed);
+
+	/*
+	 * Guarantee the reserved state is stored before reading any
+	 * record data. A full memory barrier is needed because @state_var
+	 * modification is followed by reading. This pairs with _prb_commit:B.
+	 *
+	 * Memory barrier involvement:
+	 *
+	 * If desc_reopen_last:A reads from _prb_commit:B, then
+	 * prb_reserve_in_last:A reads from _prb_commit:A.
+	 *
+	 * Relies on:
+	 *
+	 * WMB from _prb_commit:A to _prb_commit:B
+	 *    matching
+	 * MB If desc_reopen_last:A to prb_reserve_in_last:A
+	 */
+	if (!atomic_long_try_cmpxchg(&d->state_var, &prev_state_val,
+			DESC_SV(id, desc_reserved))) { /* LMM(desc_reopen_last:A) */
+		return NULL;
+	}
+
+	*id_out = id;
+	return d;
+}
+
+/**
+ * prb_reserve_in_last() - Re-reserve and extend the space in the ringbuffer
+ *                         used by the newest record.
+ *
+ * @e:         The entry structure to setup.
+ * @rb:        The ringbuffer to re-reserve and extend data in.
+ * @r:         The record structure to allocate buffers for.
+ * @caller_id: The caller ID of the caller (reserving writer).
+ *
+ * This is the public function available to writers to re-reserve and extend
+ * data.
+ *
+ * The writer specifies the text size to extend (not the new total size) by
+ * setting the @text_buf_size field of @r. Extending dictionaries is not
+ * supported, so @dict_buf_size of @r should be set to 0. To ensure proper
+ * initialization of @r, prb_rec_init_wr() should be used.
+ *
+ * This function will fail if @caller_id does not match the caller ID of the
+ * newest record. In that case the caller must reserve new data using
+ * prb_reserve().
+ *
+ * Context: Any context. Disables local interrupts on success.
+ * Return: true if text data could be extended, otherwise false.
+ *
+ * On success:
+ *
+ *   - @r->text_buf points to the beginning of the entire text buffer.
+ *
+ *   - @r->text_buf_size is set to the new total size of the buffer.
+ *
+ *   - @r->dict_buf and @r->dict_buf_size are cleared because extending
+ *     the dict buffer is not supported.
+ *
+ *   - @r->info is not touched so that @r->info->text_len could be used
+ *     to append the text.
+ *
+ *   - prb_record_text_space() can be used on @e to query the new
+ *     actually used space.
+ *
+ * Important: All @r->info fields will already be set with the current values
+ *            for the record. I.e. @r->info->text_len will be less than
+ *            @text_buf_size and @r->info->dict_len may be set, even though
+ *            @dict_buf_size is 0. Writers can use @r->info->text_len to know
+ *            where concatenation begins and writers should update
+ *            @r->info->text_len after concatenating.
+ */
+bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
+			 struct printk_record *r, u32 caller_id)
+{
+	unsigned int data_size;
+	struct prb_desc *d;
+	unsigned long id;
+
+	local_irq_save(e->irqflags);
+
+	/* Transition the newest descriptor back to the reserved state. */
+	d = desc_reopen_last(&rb->desc_ring, caller_id, &id);
+	if (!d) {
+		local_irq_restore(e->irqflags);
+		goto fail_reopen;
+	}
+
+	/* Now the writer has exclusive access: LMM(prb_reserve_in_last:A) */
+
+	/*
+	 * Set the @e fields here so that prb_commit() can be used if
+	 * anything fails from now on.
+	 */
+	e->rb = rb;
+	e->id = id;
+
+	/*
+	 * desc_reopen_last() checked the caller_id, but there was no
+	 * exclusive access at that point. The descriptor may have
+	 * changed since then.
+	 */
+	if (caller_id != d->info.caller_id)
+		goto fail;
+
+	if (BLK_DATALESS(&d->text_blk_lpos)) {
+		if (WARN_ON_ONCE(d->info.text_len != 0)) {
+			pr_warn_once("wrong text_len value (%hu, expecting 0)\n",
+				     d->info.text_len);
+			d->info.text_len = 0;
+		}
+
+		if (!data_check_size(&rb->text_data_ring, r->text_buf_size))
+			goto fail;
+
+		r->text_buf = data_alloc(rb, &rb->text_data_ring, r->text_buf_size,
+					 &d->text_blk_lpos, id);
+	} else {
+		if (!get_data(&rb->text_data_ring, &d->text_blk_lpos, &data_size))
+			goto fail;
+
+		/*
+		 * Increase the buffer size to include the original size. If
+		 * the meta data (@text_len) is not sane, use the full data
+		 * block size.
+		 */
+		if (WARN_ON_ONCE(d->info.text_len > data_size)) {
+			pr_warn_once("wrong text_len value (%hu, expecting <=%u)\n",
+				     d->info.text_len, data_size);
+			d->info.text_len = data_size;
+		}
+		r->text_buf_size += d->info.text_len;
+
+		if (!data_check_size(&rb->text_data_ring, r->text_buf_size))
+			goto fail;
+
+		r->text_buf = data_realloc(rb, &rb->text_data_ring, r->text_buf_size,
+					   &d->text_blk_lpos, id);
+	}
+	if (r->text_buf_size && !r->text_buf)
+		goto fail;
+
+	/* Although dictionary data may be in use, it cannot be extended. */
+	r->dict_buf = NULL;
+	r->dict_buf_size = 0;
+
+	r->info = &d->info;
+
+	e->text_space = space_used(&rb->text_data_ring, &d->text_blk_lpos);
+
+	return true;
+fail:
+	prb_commit(e);
+	/* prb_commit() re-enabled interrupts. */
+fail_reopen:
+	/* Make it clear to the caller that the re-reserve failed. */
+	memset(r, 0, sizeof(*r));
+	return false;
+}
+
+/*
+ * Attempt to finalize a specified descriptor. If this fails, the descriptor
+ * is either already final or it will finalize itself when the writer commits.
+ */
+static void desc_make_final(struct prb_desc_ring *desc_ring, unsigned long id)
+{
+	unsigned long prev_state_val = DESC_SV(id, desc_committed);
+	struct prb_desc *d = to_desc(desc_ring, id);
+
+	atomic_long_cmpxchg_relaxed(&d->state_var, prev_state_val,
+			DESC_SV(id, desc_finalized)); /* LMM(desc_make_final:A) */
+}
+
 /**
  * prb_reserve() - Reserve space in the ringbuffer.
  *
@@ -1180,6 +1527,15 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 	else
 		d->info.seq = seq + DESCS_COUNT(desc_ring);
 
+	/*
+	 * New data is about to be reserved. Once that happens, previous
+	 * descriptors are no longer able to be extended. Finalize the
+	 * previous descriptor now so that it can be made available to
+	 * readers. (For seq==0 there is no previous descriptor.)
+	 */
+	if (d->info.seq > 0)
+		desc_make_final(desc_ring, DESC_ID(id - 1));
+
 	r->text_buf = data_alloc(rb, &rb->text_data_ring, r->text_buf_size,
 				 &d->text_blk_lpos, id);
 	/* If text data allocation fails, a data-less record is committed. */
@@ -1210,33 +1566,40 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 	return false;
 }
 
-/**
- * prb_commit() - Commit (previously reserved) data to the ringbuffer.
- *
- * @e: The entry containing the reserved data information.
- *
- * This is the public function available to writers to commit data.
- *
- * Context: Any context. Enables local interrupts.
- */
-void prb_commit(struct prb_reserved_entry *e)
+/* Commit the data (possibly finalizing it) and restore interrupts. */
+static void _prb_commit(struct prb_reserved_entry *e, unsigned long state_val)
 {
 	struct prb_desc_ring *desc_ring = &e->rb->desc_ring;
 	struct prb_desc *d = to_desc(desc_ring, e->id);
 	unsigned long prev_state_val = DESC_SV(e->id, desc_reserved);
 
-	/* Now the writer has finished all writing: LMM(prb_commit:A) */
+	/* Now the writer has finished all writing: LMM(_prb_commit:A) */
 
 	/*
 	 * Set the descriptor as committed. See "ABA Issues" about why
 	 * cmpxchg() instead of set() is used.
 	 *
-	 * Guarantee all record data is stored before the descriptor state
-	 * is stored as committed. A write memory barrier is sufficient for
-	 * this. This pairs with desc_read:B.
+	 * 1  Guarantee all record data is stored before the descriptor state
+	 *    is stored as committed. A write memory barrier is sufficient
+	 *    for this. This pairs with desc_read:B and desc_reopen_last:A.
+	 *
+	 * 2. Guarantee the descriptor state is stored as committed before
+	 *    re-checking the head ID in order to possibly finalize this
+	 *    descriptor. This pairs with desc_reserve:D.
+	 *
+	 *    Memory barrier involvement:
+	 *
+	 *    If prb_commit:A reads from desc_reserve:D, then
+	 *    desc_make_final:A reads from _prb_commit:B.
+	 *
+	 *    Relies on:
+	 *
+	 *    MB _prb_commit:B to prb_commit:A
+	 *       matching
+	 *    MB desc_reserve:D to desc_make_final:A
 	 */
 	if (!atomic_long_try_cmpxchg(&d->state_var, &prev_state_val,
-			DESC_SV(e->id, desc_committed))) { /* LMM(prb_commit:B) */
+			DESC_SV(e->id, state_val))) { /* LMM(_prb_commit:B) */
 		WARN_ON_ONCE(1);
 	}
 
@@ -1244,6 +1607,59 @@ void prb_commit(struct prb_reserved_entry *e)
 	local_irq_restore(e->irqflags);
 }
 
+/**
+ * prb_commit() - Commit (previously reserved) data to the ringbuffer.
+ *
+ * @e: The entry containing the reserved data information.
+ *
+ * This is the public function available to writers to commit data.
+ *
+ * Note that the data is not yet available to readers until it is finalized.
+ * Finalizing happens automatically when space for the next record is
+ * reserved.
+ *
+ * See prb_final_commit() for a version of this function that finalizes
+ * immediately.
+ *
+ * Context: Any context. Enables local interrupts.
+ */
+void prb_commit(struct prb_reserved_entry *e)
+{
+	struct prb_desc_ring *desc_ring = &e->rb->desc_ring;
+	unsigned long head_id;
+
+	_prb_commit(e, desc_committed);
+
+	/*
+	 * If this descriptor is no longer the head (i.e. a new record has
+	 * been allocated), extending the data for this record is no longer
+	 * allowed and therefore it must be finalized.
+	 */
+	head_id = atomic_long_read(&desc_ring->head_id); /* LMM(prb_commit:A) */
+	if (head_id != e->id)
+		desc_make_final(desc_ring, e->id);
+}
+
+/**
+ * prb_final_commit() - Commit and finalize (previously reserved) data to
+ *                      the ringbuffer.
+ *
+ * @e: The entry containing the reserved data information.
+ *
+ * This is the public function available to writers to commit+finalize data.
+ *
+ * By finalizing, the data is made immediately available to readers.
+ *
+ * This function should only be used if there are no intentions of extending
+ * this data using prb_reserve_in_last().
+ *
+ * Context: Any context. Enables local interrupts.
+ */
+void prb_final_commit(struct prb_reserved_entry *e)
+{
+	_prb_commit(e, desc_finalized);
+}
+
 /*
  * Count the number of lines in provided text. All text has at least 1 line
  * (even if @text_size is 0). Each '\n' processed is counted as an additional
@@ -1295,7 +1711,7 @@ static bool copy_data(struct prb_data_ring *data_ring,
 	 * because of the trailing alignment padding.
 	 */
 	if (WARN_ON_ONCE(data_size < (unsigned int)len)) {
-		pr_warn_once("wrong data size (%u, expecting %hu) for data: %.*s\n",
+		pr_warn_once("wrong data size (%u, expecting >=%hu) for data: %.*s\n",
 			     data_size, len, data_size, data);
 		return false;
 	}
@@ -1316,16 +1732,16 @@ static bool copy_data(struct prb_data_ring *data_ring,
 
 /*
  * This is an extended version of desc_read(). It gets a copy of a specified
- * descriptor. However, it also verifies that the record is committed and has
+ * descriptor. However, it also verifies that the record is finalized and has
  * the sequence number @seq. On success, 0 is returned.
  *
  * Error return values:
- * -EINVAL: A committed record with sequence number @seq does not exist.
- * -ENOENT: A committed record with sequence number @seq exists, but its data
+ * -EINVAL: A finalized record with sequence number @seq does not exist.
+ * -ENOENT: A finalized record with sequence number @seq exists, but its data
  *          is not available. This is a valid record, so readers should
  *          continue with the next record.
  */
-static int desc_read_committed_seq(struct prb_desc_ring *desc_ring,
+static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring,
 				   unsigned long id, u64 seq,
 				   struct prb_desc *desc_out)
 {
@@ -1336,11 +1752,12 @@ static int desc_read_committed_seq(struct prb_desc_ring *desc_ring,
 
 	/*
 	 * An unexpected @id (desc_miss) or @seq mismatch means the record
-	 * does not exist. A descriptor in the reserved state means the
-	 * record does not yet exist for the reader.
+	 * does not exist. A descriptor in the reserved or committed state
+	 * means the record does not yet exist for the reader.
 	 */
 	if (d_state == desc_miss ||
 	    d_state == desc_reserved ||
+	    d_state == desc_committed ||
 	    desc_out->info.seq != seq) {
 		return -EINVAL;
 	}
@@ -1362,7 +1779,7 @@ static int desc_read_committed_seq(struct prb_desc_ring *desc_ring,
  * Copy the ringbuffer data from the record with @seq to the provided
  * @r buffer. On success, 0 is returned.
  *
- * See desc_read_committed_seq() for error return values.
+ * See desc_read_finalized_seq() for error return values.
  */
 static int prb_read(struct printk_ringbuffer *rb, u64 seq,
 		    struct printk_record *r, unsigned int *line_count)
@@ -1378,7 +1795,7 @@ static int prb_read(struct printk_ringbuffer *rb, u64 seq,
 	id = DESC_ID(atomic_long_read(state_var));
 
 	/* Get a local copy of the correct descriptor (if available). */
-	err = desc_read_committed_seq(desc_ring, id, seq, &desc);
+	err = desc_read_finalized_seq(desc_ring, id, seq, &desc);
 
 	/*
 	 * If @r is NULL, the caller is only interested in the availability
@@ -1408,8 +1825,8 @@ static int prb_read(struct printk_ringbuffer *rb, u64 seq,
 			r->info->dict_len = 0;
 	}
 
-	/* Ensure the record is still committed and has the same @seq. */
-	return desc_read_committed_seq(desc_ring, id, seq, &desc);
+	/* Ensure the record is still finalized and has the same @seq. */
+	return desc_read_finalized_seq(desc_ring, id, seq, &desc);
 }
 
 /* Get the sequence number of the tail descriptor. */
@@ -1427,9 +1844,9 @@ static u64 prb_first_seq(struct printk_ringbuffer *rb)
 
 		/*
 		 * This loop will not be infinite because the tail is
-		 * _always_ in the committed or reusable state.
+		 * _always_ in the finalized or reusable state.
 		 */
-		if (d_state == desc_committed || d_state == desc_reusable)
+		if (d_state == desc_finalized || d_state == desc_reusable)
 			break;
 
 		/*
@@ -1456,8 +1873,8 @@ static u64 prb_first_seq(struct printk_ringbuffer *rb)
 }
 
 /*
- * Non-blocking read of a record. Updates @seq to the last committed record
- * (which may have no data).
+ * Non-blocking read of a record. Updates @seq to the last finalized record
+ * (which may have no data available).
  *
  * See the description of prb_read_valid() and prb_read_valid_info()
  * for details.
@@ -1483,7 +1900,7 @@ static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
 			(*seq)++;
 
 		} else {
-			/* Non-existent/non-committed record. Must stop. */
+			/* Non-existent/non-finalized record. Must stop. */
 			return false;
 		}
 	}
diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index a9d85a6727b1..853ea62dc5f2 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -116,7 +116,8 @@ struct prb_reserved_entry {
 enum desc_state {
 	desc_miss	=  -1,	/* ID mismatch (pseudo state) */
 	desc_reserved	= 0x0,	/* reserved, in use by writer */
-	desc_committed	= 0x1,	/* committed by writer */
+	desc_committed	= 0x1,	/* committed by writer, could get reopened */
+	desc_finalized	= 0x2,	/* committed, no further modification allowed */
 	desc_reusable	= 0x3,	/* free, not yet used by any writer */
 };
 
@@ -327,7 +328,10 @@ static inline void prb_rec_init_wr(struct printk_record *r,
 
 bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 		 struct printk_record *r);
+bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
+			 struct printk_record *r, u32 caller_id);
 void prb_commit(struct prb_reserved_entry *e);
+void prb_final_commit(struct prb_reserved_entry *e);
 
 void prb_init(struct printk_ringbuffer *rb,
 	      char *text_buf, unsigned int text_buf_size,
diff --git a/scripts/gdb/linux/dmesg.py b/scripts/gdb/linux/dmesg.py
index dd8c0b95063a..bce14de5f610 100644
--- a/scripts/gdb/linux/dmesg.py
+++ b/scripts/gdb/linux/dmesg.py
@@ -79,6 +79,7 @@ class LxDmesg(gdb.Command):
 
         # definitions from kernel/printk/printk_ringbuffer.h
         desc_committed = 1
+        desc_finalized = 2
         desc_sv_bits = utils.get_long_type().sizeof * 8
         desc_flags_shift = desc_sv_bits - 2
         desc_flags_mask = 3 << desc_flags_shift
@@ -98,7 +99,7 @@ class LxDmesg(gdb.Command):
             # skip non-committed record
             state = 3 & (utils.read_u64(descs, desc_off + sv_off +
                                         counter_off) >> desc_flags_shift)
-            if state != desc_committed:
+            if state != desc_committed and state != desc_finalized:
                 if did == head_id:
                     break
                 did = (did + 1) & desc_id_mask
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH printk v5 6/6] printk: reimplement log_cont using record extension
  2020-09-14 12:33 [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling John Ogness
                   ` (4 preceding siblings ...)
  2020-09-14 12:33 ` [PATCH printk v5 5/6] printk: ringbuffer: add finalization/extension support John Ogness
@ 2020-09-14 12:33 ` John Ogness
       [not found]   ` <CGME20200925190829eucas1p2f35866317389cd5e842e99b465632902@eucas1p2.samsung.com>
  2020-09-15  9:47 ` [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling Petr Mladek
  6 siblings, 1 reply; 12+ messages in thread
From: John Ogness @ 2020-09-14 12:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

Use the record extending feature of the ringbuffer to implement
continuous messages. This preserves the existing continuous message
behavior.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 98 +++++++++---------------------------------
 1 file changed, 20 insertions(+), 78 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 964b5701688f..9a2e23191576 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -535,7 +535,10 @@ static int log_store(u32 caller_id, int facility, int level,
 	r.info->caller_id = caller_id;
 
 	/* insert message */
-	prb_commit(&e);
+	if ((flags & LOG_CONT) || !(flags & LOG_NEWLINE))
+		prb_commit(&e);
+	else
+		prb_final_commit(&e);
 
 	return (text_len + trunc_msg_len);
 }
@@ -1084,7 +1087,7 @@ static unsigned int __init add_to_rb(struct printk_ringbuffer *rb,
 	dest_r.info->ts_nsec = r->info->ts_nsec;
 	dest_r.info->caller_id = r->info->caller_id;
 
-	prb_commit(&e);
+	prb_final_commit(&e);
 
 	return prb_record_text_space(&e);
 }
@@ -1884,87 +1887,26 @@ static inline u32 printk_caller_id(void)
 		0x80000000 + raw_smp_processor_id();
 }
 
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-	char buf[LOG_LINE_MAX];
-	size_t len;			/* length == 0 means unused buffer */
-	u32 caller_id;			/* printk_caller_id() of first print */
-	u64 ts_nsec;			/* time of first print */
-	u8 level;			/* log level of first message */
-	u8 facility;			/* log facility of first message */
-	enum log_flags flags;		/* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-	if (cont.len == 0)
-		return;
-
-	log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
-		  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-	cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-		     enum log_flags flags, const char *text, size_t len)
-{
-	/* If the line gets too long, split it up in separate records. */
-	if (cont.len + len > sizeof(cont.buf)) {
-		cont_flush();
-		return false;
-	}
-
-	if (!cont.len) {
-		cont.facility = facility;
-		cont.level = level;
-		cont.caller_id = caller_id;
-		cont.ts_nsec = local_clock();
-		cont.flags = flags;
-	}
-
-	memcpy(cont.buf + cont.len, text, len);
-	cont.len += len;
-
-	// The original flags come from the first line,
-	// but later continuations can add a newline.
-	if (flags & LOG_NEWLINE) {
-		cont.flags |= LOG_NEWLINE;
-		cont_flush();
-	}
-
-	return true;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len)
 {
 	const u32 caller_id = printk_caller_id();
 
-	/*
-	 * If an earlier line was buffered, and we're a continuation
-	 * write from the same context, try to add it to the buffer.
-	 */
-	if (cont.len) {
-		if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-			if (cont_add(caller_id, facility, level, lflags, text, text_len))
-				return text_len;
-		}
-		/* Otherwise, make sure it's flushed */
-		cont_flush();
-	}
-
-	/* Skip empty continuation lines that couldn't be added - they just flush */
-	if (!text_len && (lflags & LOG_CONT))
-		return 0;
-
-	/* If it doesn't end in a newline, try to buffer the current line */
-	if (!(lflags & LOG_NEWLINE)) {
-		if (cont_add(caller_id, facility, level, lflags, text, text_len))
+	if (lflags & LOG_CONT) {
+		struct prb_reserved_entry e;
+		struct printk_record r;
+
+		prb_rec_init_wr(&r, text_len, 0);
+		if (prb_reserve_in_last(&e, prb, &r, caller_id)) {
+			memcpy(&r.text_buf[r.info->text_len], text, text_len);
+			r.info->text_len += text_len;
+			if (lflags & LOG_NEWLINE) {
+				r.info->flags |= LOG_NEWLINE;
+				prb_final_commit(&e);
+			} else {
+				prb_commit(&e);
+			}
 			return text_len;
+		}
 	}
 
 	/* Store it in the record log */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH printk v5 5/6] printk: ringbuffer: add finalization/extension support
  2020-09-14 12:33 ` [PATCH printk v5 5/6] printk: ringbuffer: add finalization/extension support John Ogness
@ 2020-09-15  9:30   ` Petr Mladek
  0 siblings, 0 replies; 12+ messages in thread
From: Petr Mladek @ 2020-09-15  9:30 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

On Mon 2020-09-14 14:39:53, John Ogness wrote:
> Add support for extending the newest data block. For this, introduce
> a new finalization state (desc_finalized) denoting a committed
> descriptor that cannot be extended.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Looks good to me:

Reviewed-by: Petr Mladek <pmladek@suse.com>

There seems to be possible a small clean up, see below. But I would do
it in a followup patch to avoid yet another respin.


> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index 911fbe150e9a..4e526c79f89c 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> +/*
> + * Try to resize an existing data block associated with the descriptor
> + * specified by @id. If the resized data block should become wrapped, it
> + * copies the old data to the new data block. If @size yields a data block
> + * with the same or less size, the data block is left as is.
> + *
> + * Fail if this is not the last allocated data block or if there is not
> + * enough space or it is not possible make enough space.
> + *
> + * Return a pointer to the beginning of the entire data buffer or NULL on
> + * failure.
> + */
> +static char *data_realloc(struct printk_ringbuffer *rb,
> +			  struct prb_data_ring *data_ring, unsigned int size,
> +			  struct prb_data_blk_lpos *blk_lpos, unsigned long id)
> +{
> +	struct prb_data_block *blk;
> +	unsigned long head_lpos;
> +	unsigned long next_lpos;
> +	bool wrapped;
> +
> +	/* Reallocation only works if @blk_lpos is the newest data block. */
> +	head_lpos = atomic_long_read(&data_ring->head_lpos);
> +	if (head_lpos != blk_lpos->next)
> +		return NULL;
> +
> +	/* Keep track if @blk_lpos was a wrapping data block. */
> +	wrapped = (DATA_WRAPS(data_ring, blk_lpos->begin) != DATA_WRAPS(data_ring, blk_lpos->next));
> +
> +	size = to_blk_size(size);
> +
> +	next_lpos = get_next_lpos(data_ring, blk_lpos->begin, size);
> +
> +	/* If the data block does not increase, there is nothing to do. */
> +	if (head_lpos - next_lpos < DATA_SIZE(data_ring)) {
> +		blk = to_block(data_ring, blk_lpos->begin);
> +		return &blk->data[0];
> +	}
> +
> +	if (!data_push_tail(rb, data_ring, next_lpos - DATA_SIZE(data_ring)))
> +		return NULL;
> +
> +	/* The memory barrier involvement is the same as data_alloc:A. */
> +	if (!atomic_long_try_cmpxchg(&data_ring->head_lpos, &head_lpos,
> +				     next_lpos)) { /* LMM(data_realloc:A) */
> +		return NULL;
> +	}
> +
> +	blk = to_block(data_ring, blk_lpos->begin);
> +
> +	if (DATA_WRAPS(data_ring, blk_lpos->begin) != DATA_WRAPS(data_ring, next_lpos)) {
> +		struct prb_data_block *old_blk = blk;
> +
> +		/* Wrapping data blocks store their data at the beginning. */
> +		blk = to_block(data_ring, 0);
> +
> +		/*
> +		 * Store the ID on the wrapped block for consistency.
> +		 * The printk_ringbuffer does not actually use it.
> +		 */
> +		blk->id = id;

Small cleanup: The "id" should already be there when the block has
already been wrapped before. By other words, even the above
need to be done only when (!wrapped).

> +
> +		if (!wrapped) {
> +			/*
> +			 * Since the allocated space is now in the newly
> +			 * created wrapping data block, copy the content
> +			 * from the old data block.
> +			 */
> +			memcpy(&blk->data[0], &old_blk->data[0],
> +			       (blk_lpos->next - blk_lpos->begin) - sizeof(blk->id));
> +		}
> +	}
> +
> +	blk_lpos->next = next_lpos;
> +
> +	return &blk->data[0];
> +}
> +
>  /* Return the number of bytes used by a data block. */
>  static unsigned int space_used(struct prb_data_ring *data_ring,
>  			       struct prb_data_blk_lpos *blk_lpos)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling
  2020-09-14 12:33 [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling John Ogness
                   ` (5 preceding siblings ...)
  2020-09-14 12:33 ` [PATCH printk v5 6/6] printk: reimplement log_cont using record extension John Ogness
@ 2020-09-15  9:47 ` Petr Mladek
  6 siblings, 0 replies; 12+ messages in thread
From: Petr Mladek @ 2020-09-15  9:47 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

On Mon 2020-09-14 14:39:48, John Ogness wrote:
> Hello,
> 
> Here is v5 for the second series to rework the printk subsystem.
> (The v4 is here [0].) This series implements a new ringbuffer
> feature that allows the last record to be extended. Petr Mladek
> provided the initial proof of concept [1] for this.
> 
> Using the record extension feature, LOG_CONT is re-implemented
> in a way that exactly preserves its behavior, but avoids the
> need for an extra buffer. In particular, it avoids the need for
> any synchronization that such a buffer requires.

The series is ready for linux-next from my POV. I am going to push
it there tomorrow if nobody complains.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH printk v5 6/6] printk: reimplement log_cont using record extension
       [not found]   ` <CGME20200925190829eucas1p2f35866317389cd5e842e99b465632902@eucas1p2.samsung.com>
@ 2020-09-25 19:08     ` Marek Szyprowski
       [not found]       ` <CGME20200925193852eucas1p10c459f3f90192c1079f8a8f04b872015@eucas1p1.samsung.com>
  2020-09-25 22:59       ` John Ogness
  0 siblings, 2 replies; 12+ messages in thread
From: Marek Szyprowski @ 2020-09-25 19:08 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

Hi John,

On 14.09.2020 14:33, John Ogness wrote:
> Use the record extending feature of the ringbuffer to implement
> continuous messages. This preserves the existing continuous message
> behavior.
>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> Reviewed-by: Petr Mladek <pmladek@suse.com>

This patch landed recently in linux-next as commit f5f022e53b87 
("printk: reimplement log_cont using record extension"). I've noticed 
that it causes a regression on my test system (ARM 32bit Samsung Exynos 
4412-based Trats2 board). The messages are printed correctly on the 
serial console during boot, but then when I run 'dmesg' command, the log 
is truncated.

Here is are the last lines of the dmesg log after this patch:

[    6.649018] Waiting 2 sec before mounting root device...
[    6.766423] dwc2 12480000.hsotg: new device is high-speed
[    6.845290] dwc2 12480000.hsotg: new device is high-speed
[    6.914217] dwc2 12480000.hsotg: new address 51
[    8.710351] RAMDISK: squashfs filesystem found at block 0

The corresponding dmesg lines before applying this patch:

[    8.864320] RAMDISK: squashfs filesystem found at block 0
[    8.868410] RAMDISK: Loading 37692KiB [1 disk] into ram disk... /
[    9.071670] /
[    9.262498] /
[    9.540711] /
[    9.818031] done.
[   10.660074] VFS: Mounted root (squashfs filesystem) readonly on 
device 1:0.
[   10.739525] EXT4-fs (mmcblk0p1): INFO: recovery required on readonly 
filesystem
[   10.745347] EXT4-fs (mmcblk0p1): write access will be enabled during 
recovery
[   10.861129] EXT4-fs (mmcblk0p1): recovery complete
[   10.878150] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data 
mode. Opts: (null)
[   10.881811] VFS: Mounted root (ext4 filesystem) readonly on device 
179:49.
[   10.889858] Trying to move old root to /initrd ...
[   10.895192] okay
[   10.914411] devtmpfs: mounted
[   10.925087] Freeing unused kernel memory: 1024K
[   10.933222] Run /sbin/init as init process
[   10.941723]   with arguments:
[   10.949890]     /sbin/init
[   10.949900]   with environment:
[   10.949909]     HOME=/
[   10.949917]     TERM=linux
[   12.415991] random: systemd-udevd: uninitialized urandom read (16 
bytes read)
[   12.425361] random: systemd-udevd: uninitialized urandom read (16 
bytes read)
[   12.438578] random: systemd-udevd: uninitialized urandom read (16 
bytes read)

...

I can provide a complete logs if that helps.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH printk v5 6/6] printk: reimplement log_cont using record extension
       [not found]       ` <CGME20200925193852eucas1p10c459f3f90192c1079f8a8f04b872015@eucas1p1.samsung.com>
@ 2020-09-25 19:38         ` Marek Szyprowski
  0 siblings, 0 replies; 12+ messages in thread
From: Marek Szyprowski @ 2020-09-25 19:38 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

Hi again,

On 25.09.2020 21:08, Marek Szyprowski wrote:
> Hi John,
>
> On 14.09.2020 14:33, John Ogness wrote:
>> Use the record extending feature of the ringbuffer to implement
>> continuous messages. This preserves the existing continuous message
>> behavior.
>>
>> Signed-off-by: John Ogness <john.ogness@linutronix.de>
>> Reviewed-by: Petr Mladek <pmladek@suse.com>
>
> This patch landed recently in linux-next as commit f5f022e53b87 
> ("printk: reimplement log_cont using record extension"). I've noticed 
> that it causes a regression on my test system (ARM 32bit Samsung 
> Exynos 4412-based Trats2 board). The messages are printed correctly on 
> the serial console during boot, but then when I run 'dmesg' command, 
> the log is truncated.
>
> Here is are the last lines of the dmesg log after this patch:
>
> [    6.649018] Waiting 2 sec before mounting root device...
> [    6.766423] dwc2 12480000.hsotg: new device is high-speed
> [    6.845290] dwc2 12480000.hsotg: new device is high-speed
> [    6.914217] dwc2 12480000.hsotg: new address 51
> [    8.710351] RAMDISK: squashfs filesystem found at block 0
>
> The corresponding dmesg lines before applying this patch:
>
> [    8.864320] RAMDISK: squashfs filesystem found at block 0
> [    8.868410] RAMDISK: Loading 37692KiB [1 disk] into ram disk... /
> [    9.071670] /
> [    9.262498] /
> [    9.540711] /
> [    9.818031] done.
> [   10.660074] VFS: Mounted root (squashfs filesystem) readonly on 
> device 1:0.
> [   10.739525] EXT4-fs (mmcblk0p1): INFO: recovery required on 
> readonly filesystem
> [   10.745347] EXT4-fs (mmcblk0p1): write access will be enabled 
> during recovery
> [   10.861129] EXT4-fs (mmcblk0p1): recovery complete
> [   10.878150] EXT4-fs (mmcblk0p1): mounted filesystem with ordered 
> data mode. Opts: (null)
> [   10.881811] VFS: Mounted root (ext4 filesystem) readonly on device 
> 179:49.
> [   10.889858] Trying to move old root to /initrd ...
> [   10.895192] okay
> [   10.914411] devtmpfs: mounted
> [   10.925087] Freeing unused kernel memory: 1024K
> [   10.933222] Run /sbin/init as init process
> [   10.941723]   with arguments:
> [   10.949890]     /sbin/init
> [   10.949900]   with environment:
> [   10.949909]     HOME=/
> [   10.949917]     TERM=linux
> [   12.415991] random: systemd-udevd: uninitialized urandom read (16 
> bytes read)
> [   12.425361] random: systemd-udevd: uninitialized urandom read (16 
> bytes read)
> [   12.438578] random: systemd-udevd: uninitialized urandom read (16 
> bytes read)
>
> ...
>
> I can provide a complete logs if that helps.

One more information - this issue happens only if the kernel is compiled 
from exynos_defconfig. If use multi_v7_defconfig, the dmesg works fine 
on that board. exynos_defconfig has quite a lots of debugging options 
enabled...

Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH printk v5 6/6] printk: reimplement log_cont using record extension
  2020-09-25 19:08     ` Marek Szyprowski
       [not found]       ` <CGME20200925193852eucas1p10c459f3f90192c1079f8a8f04b872015@eucas1p1.samsung.com>
@ 2020-09-25 22:59       ` John Ogness
  1 sibling, 0 replies; 12+ messages in thread
From: John Ogness @ 2020-09-25 22:59 UTC (permalink / raw)
  To: Marek Szyprowski, Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt,
	Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner,
	Peter Zijlstra, Andrea Parri, Paul McKenney, kexec, linux-kernel

On 2020-09-25, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> This patch landed recently in linux-next as commit f5f022e53b87 
> ("printk: reimplement log_cont using record extension"). I've noticed 
> that it causes a regression on my test system (ARM 32bit Samsung Exynos 
> 4412-based Trats2 board). The messages are printed correctly on the 
> serial console during boot, but then when I run 'dmesg' command, the log 
> is truncated.
>
> Here is are the last lines of the dmesg log after this patch:
>
> [    6.649018] Waiting 2 sec before mounting root device...
> [    6.766423] dwc2 12480000.hsotg: new device is high-speed
> [    6.845290] dwc2 12480000.hsotg: new device is high-speed
> [    6.914217] dwc2 12480000.hsotg: new address 51
> [    8.710351] RAMDISK: squashfs filesystem found at block 0
>
> The corresponding dmesg lines before applying this patch:
>
> [    8.864320] RAMDISK: squashfs filesystem found at block 0
> [    8.868410] RAMDISK: Loading 37692KiB [1 disk] into ram disk... /
> [    9.071670] /
> [    9.262498] /
> [    9.540711] /
> [    9.818031] done.

Ah. One of the more creative printk users...
init/do_mounts_rd.c:rd_load_image(). This is a set of LOG_CONT messages
that try to display a rotating line, complete with '\b' control
characters. The code is totally broken, but that is no excuse for printk
to break. It should be easy to reproduce on any architecture. I will
investigate it further. Thanks for reporting.

John Ogness

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-09-25 22:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-14 12:33 [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling John Ogness
2020-09-14 12:33 ` [PATCH printk v5 1/6] printk: ringbuffer: relocate get_data() John Ogness
2020-09-14 12:33 ` [PATCH printk v5 2/6] printk: ringbuffer: add BLK_DATALESS() macro John Ogness
2020-09-14 12:33 ` [PATCH printk v5 3/6] printk: ringbuffer: clear initial reserved fields John Ogness
2020-09-14 12:33 ` [PATCH printk v5 4/6] printk: ringbuffer: change representation of states John Ogness
2020-09-14 12:33 ` [PATCH printk v5 5/6] printk: ringbuffer: add finalization/extension support John Ogness
2020-09-15  9:30   ` Petr Mladek
2020-09-14 12:33 ` [PATCH printk v5 6/6] printk: reimplement log_cont using record extension John Ogness
     [not found]   ` <CGME20200925190829eucas1p2f35866317389cd5e842e99b465632902@eucas1p2.samsung.com>
2020-09-25 19:08     ` Marek Szyprowski
     [not found]       ` <CGME20200925193852eucas1p10c459f3f90192c1079f8a8f04b872015@eucas1p1.samsung.com>
2020-09-25 19:38         ` Marek Szyprowski
2020-09-25 22:59       ` John Ogness
2020-09-15  9:47 ` [PATCH printk v5 0/6] printk: reimplement LOG_CONT handling Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).