linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Petr Mladek <pmladek@suse.com>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrea Parri <andrea.parri@amarulasolutions.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Brendan Higgins <brendanhiggins@google.com>,
	kexec@lists.infradead.org
Subject: Re: [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer)
Date: Thu, 05 Dec 2019 13:01:28 +0100	[thread overview]
Message-ID: <874kyf56cn.fsf@linutronix.de> (raw)
In-Reply-To: 20191203091843.678461e4@gandalf.local.home

On 2019-12-03, Steven Rostedt <rostedt@goodmis.org> wrote:
>>>>> +/* Reserve a new descriptor, invalidating the oldest if necessary. */
>>>>> +static bool desc_reserve(struct printk_ringbuffer *rb, u32 *id_out)
>>>>> +{
>>>>> +	struct prb_desc_ring *desc_ring = &rb->desc_ring;
>>>>> +	struct prb_desc *desc;
>>>>> +	u32 id_prev_wrap;
>>>>> +	u32 head_id;
>>>>> +	u32 id;
>>>>> +
>>>>> +	head_id = atomic_read(&desc_ring->head_id);
>>>>> +
>>>>> +	do {
>>>>> +		desc = to_desc(desc_ring, head_id);
>>>>> +
>>>>> +		id = DESC_ID(head_id + 1);
>>>>> +		id_prev_wrap = DESC_ID_PREV_WRAP(desc_ring, id);
>>>>> +
>>>>> +		if (id_prev_wrap == atomic_read(&desc_ring->tail_id)) {
>>>>> +			if (!desc_push_tail(rb, id_prev_wrap))
>>>>> +				return false;
>>>>> +		}
>>>>> +	} while (!atomic_try_cmpxchg(&desc_ring->head_id, &head_id, id));
>>>>
>>>> Hmm, in theory, ABA problem might cause that we successfully
>>>> move desc_ring->head_id when tail has not been pushed yet.
>>>>
>>>> As a result we would never call desc_push_tail() until
>>>> it overflows again.
>>>>
>>>> I am not sure if we need to take care of it. The code is called
>>>> with interrupts disabled. IMHO, only NMI could cause ABA problem in
>>>> reality. But the game (debugging) is lost anyway when NMI ovewrites
>>>> the buffer several times.
>>>>
>>>> Also it should not be a complete failure. We still could bail out
>>>> when the previous state was not reusable. We are on the safe side
>>>> when it was reusable.
>>>>
>>>> Also we could probably detect when desc_ring->tail_id is not
>>>> updated any longer and do something about it.
>>>>
>>>> BTW: If I am counting correctly. The ABA problem would happen when
>>>> exactly 2^30 (1G) messages is written in the mean time.
>>> 
>>> All the ringbuffer code assumes that the use of index numbers
>>> handles the ABA problem (i.e. there must not be 1 billion printk's
>>> within an NMI). If we want to support 1 billion+ printk's within an
>>> NMI, then perhaps the index number should be increased. For 64-bit
>>> systems it would be no problem to go to 62 bits. For 32-bit systems,
>>> I don't know how well the 64-bit atomic operations are supported.
>> 
>> ftrace dumps from NMI (DUMP_ALL type ftrace_dump_on_oops on a $BIG
>> machine)? 1G seems large enough, but who knows.
>
> ftrace dump from NMI is the most likely case to hit this, but when
> that happens, you are in debugging mode, and the system usually
> becomes unreliable at this moment. I agree with Petr, that we should
> not complicate the code more to handle this theoretical condition.

I will change the data block ID size to "unsigned long". Then it is
really only an issue for 32-bit systems.

AFAICT, the only real issue is that the head can be pushed to the
descriptor index that the tail is pointing to. From there, the head can
be advanced beyond and the tail may never move again. So writers just
need to make sure the tail gets pushed beyond the newly reserved head
before making any changes to that descriptor.

Aside from moving to "unsigned long" ID's, I believe this can be handled
with the following 4 changes when reserving a new descriptor:

- also push the tail forward if it falls behind

- after pushing the head, re-check if the tail is still ok

- verify the state of the reserved descriptor before changing it

- use cmpxchg() to change it

Below is the patch snippet I use for this. (Note that the patch is
against a version where ID's have already been changed to unsigned
longs.)

John Ogness


@@ -468,19 +470,53 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
 		id = DESC_ID(head_id + 1);
 		id_prev_wrap = DESC_ID_PREV_WRAP(desc_ring, id);
 
-		if (id_prev_wrap == atomic_long_read(&desc_ring->tail_id)) {
-			if (!desc_push_tail(rb, id_prev_wrap))
+		/*
+		 * Make space for the new descriptor by pushing the tail
+		 * beyond the descriptor to be reserved. Normally this only
+		 * requires pushing the tail once. But due to possible ABA
+		 * problems with the ID, the tail could be too far behind.
+		 * Use a loop to push the tail where it needs to be.
+		 */
+		tail_id = atomic_long_read(&desc_ring->tail_id);
+		while (DESC_ID(id_prev_wrap - tail_id) <
+		       DESCS_COUNT(desc_ring)) {
+
+			if (!desc_push_tail(rb, tail_id))
 				return false;
+			tail_id = DESC_ID(tail_id + 1);
 		}
 	} while (!atomic_long_try_cmpxchg(&desc_ring->head_id, &head_id, id));
 
+	/*
+	 * Before moving the head, it was ensured that the tail was pushed to
+	 * where it needs to be. Due to possible ABA problems with the ID, the
+	 * reserved descriptor may not be what was expected. Re-check the tail
+	 * to see if it is still where it needs to be.
+	 */
+	tail_id = atomic_long_read(&desc_ring->tail_id);
+	if (DESC_ID(id_prev_wrap - tail_id) < DESCS_COUNT(desc_ring))
+		return false;
+
 	desc = to_desc(desc_ring, id);
 
+	/* If the descriptor has been recycled, verify the old state val. */
+	prev_state_val = atomic_long_read(&desc->state_var);
+	if (prev_state_val && prev_state_val != (id_prev_wrap |
+						 DESC_COMMITTED_MASK |
+						 DESC_REUSE_MASK)) {
+		/* Unexpected value. Hit ABA issue for ID? */
+		return false;
+	}
+
+	/*
+	 * Set the descriptor's ID and also set its state to reserved.
+	 * The new ID/state is stored before making any other changes.
+	 */
+	if (!atomic_long_try_cmpxchg(&desc->state_var, &prev_state_val,
+				id | 0)) { /* LMM_TAG(desc_reserve:A) */
+		/* Unexpected value. Hit ABA issue for ID? */
+		return false;
+	}
-	/* Set the descriptor's ID and also set its state to reserved. */
-	atomic_long_set(&desc->state_var, id | 0);
-
-	/* Store new ID/state before making any other changes. */
-	smp_wmb(); /* LMM_TAG(desc_reserve:A) */
 
 	*id_out = id;
 	return true;

  reply	other threads:[~2019-12-05 12:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-28  1:52 [RFC PATCH v5 0/3] printk: new ringbuffer implementation John Ogness
2019-11-28  1:52 ` [RFC PATCH v5 1/3] printk-rb: new printk ringbuffer implementation (writer) John Ogness
2019-12-02 15:48   ` Petr Mladek
2019-12-02 15:59     ` Petr Mladek
2019-12-02 16:37       ` John Ogness
2019-12-03  1:17         ` Sergey Senozhatsky
2019-12-03 14:18           ` Steven Rostedt
2019-12-05 12:01             ` John Ogness [this message]
2019-12-03  8:54         ` Petr Mladek
2019-12-03 14:13     ` John Ogness
2019-12-03 14:36       ` Petr Mladek
2019-12-09  9:19     ` Sergey Senozhatsky
2019-12-09  7:42   ` Sergey Senozhatsky
2019-12-09  9:00     ` John Ogness
2019-12-09  9:27   ` Sergey Senozhatsky
2019-12-09  9:34     ` John Ogness
2019-12-21 14:22   ` Andrea Parri
2019-12-23 16:01     ` John Ogness
2020-01-03 10:24       ` Petr Mladek
2020-01-04 14:33         ` Andrea Parri
2019-11-28  1:52 ` [RFC PATCH v5 2/3] printk-rb: new printk ringbuffer implementation (reader) John Ogness
2019-12-03 12:06   ` Petr Mladek
2019-12-03 13:46     ` John Ogness
2019-12-04 12:54       ` Petr Mladek
2019-12-04 13:28         ` John Ogness
2019-12-09  8:43   ` Sergey Senozhatsky
2019-12-09  9:03     ` Sergey Senozhatsky
2019-12-09  9:09     ` John Ogness
2019-11-28  1:52 ` [RFC PATCH v5 3/3] printk-rb: add test module John Ogness
2019-12-09  8:44   ` Sergey Senozhatsky
2019-12-05 13:46 ` [RFC PATCH v5 0/3] printk: new ringbuffer implementation Prarit Bhargava
2019-12-05 14:05   ` John Ogness
2019-12-06 14:16     ` Prarit Bhargava
2020-01-27 12:20 ` Eugeniu Rosca

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kyf56cn.fsf@linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=brendanhiggins@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).