linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: linux-kernel@vger.kernel.org,
	Andrea Parri <andrea.parri@amarulasolutions.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Brendan Higgins <brendanhiggins@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: numlist_push() barriers Re: [RFC PATCH v4 1/9] printk-rb: add a new printk ringbuffer implementation
Date: Tue, 27 Aug 2019 00:36:18 +0200	[thread overview]
Message-ID: <878srfo8pp.fsf@linutronix.de> (raw)
In-Reply-To: <20190823092109.doduc36avylm5cds@pathway.suse.cz> (Petr Mladek's message of "Fri, 23 Aug 2019 11:21:09 +0200")

Hi Petr,

AndreaP responded with some explanation (and great links!) on the topic
of READ_ONCE. But I feel like your comments about the WRITE_ONCE were
not addressed. I address that (and your other comments) below...

On 2019-08-23, Petr Mladek <pmladek@suse.com> wrote:
>> --- /dev/null
>> +++ b/kernel/printk/numlist.c
>> +/**
>> + * numlist_push() - Add a node to the list and assign it a sequence number.
>> + *
>> + * @nl: The numbered list to push to.
>> + *
>> + * @n:  A node to push to the numbered list.
>> + *      The node must not already be part of a list.
>> + *
>> + * @id: The ID of the node.
>> + *
>> + * A node is added in two steps: The first step is to make this node the
>> + * head, which causes a following push to add to this node. The second step is
>> + * to update @next_id of the former head node to point to this one, which
>> + * makes this node visible to any task that sees the former head node.
>> + */
>> +void numlist_push(struct numlist *nl, struct nl_node *n, unsigned long id)
>> +{
>> +	unsigned long head_id;
>> +	unsigned long seq;
>> +	unsigned long r;
>> +
>> +	/*
>> +	 * bA:
>> +	 *
>> +	 * Setup the node to be a list terminator: next_id == id.
>> +	 */
>> +	WRITE_ONCE(n->next_id, id);
>
> Do we need WRITE_ONCE() here?
> Both "n" and "id" are given as parameters and do not change.
> The assigment must be done before "id" is set as nl->head_id.
> The ordering is enforced by cmpxchg_release().

The cmpxchg_release() ensures that if the node is visible to writers,
then the finalized assignment is also visible. And the store_release()
ensures that if the previous node is visible to any readers, then the
finalized assignment is also visible. In the reader case, if any readers
happen to be sitting on the node, numlist_read() will fail because the
ID was updated when the node was popped. So for all these cases any
compiler optimizations leading to that assigment (tearing, speculation,
etc) should be irrelevant. Therefore, IMO the WRITE_ONCE() is not
needed.

Since all of this is lockless, I used WRITE_ONCE() whenever touching
shared variables. I must admit the decision may be motivated primarily
by fear of compiler optimizations. Although "documenting lockless shared
variable access" did play a role as well.

I will replace the WRITE_ONCE with an assignment.

>> +
>> +	/* bB: #1 */
>> +	head_id = atomic_long_read(&nl->head_id);
>> +
>> +	for (;;) {
>> +		/* bC: */
>> +		while (!numlist_read(nl, head_id, &seq, NULL)) {
>> +			/*
>> +			 * @head_id is invalid. Try again with an
>> +			 * updated value.
>> +			 */
>> +
>> +			cpu_relax();
>
> I have got very confused by this. cpu_relax() suggests that this
> cycle is busy waiting until a particular node becomes valid.
> My first though was that it must cause deadlock in NMI when
> the interrupted code is supposed to make the node valid.
>
> But it is the other way. The head is always valid when it is
> added to the list. It might become invalid when another CPU
> moves the head and the old one gets reused.
>
> Anyway, I do not see any reason for cpu_relax() here.

You are correct. The cpu_relax() should not be there. But there is still
an issue that this could spin hard if the head was recycled and this CPU
does not yet see the new head value.

To handle that, and in preparation for my next version, I'm now using a
read_acquire() to load the ID in the node() callback (matching the
set_release() in assign_desc()). This ensures that if numlist_read()
fails, the new head will be visible.

> Also the entire cycle would deserve a comment to avoid this mistake.
> For example:
>
> 		/*
> 		 * bC: Read seq from current head. Repeat with new
> 		 * head when it has changed and the old one got reused.
> 		 */

Agreed.

>> +
>> +			/* bB: #2 */
>> +			head_id = atomic_long_read(&nl->head_id);
>> +		}
>> +
>> +		/*
>> +		 * bD:
>> +		 *
>> +		 * Set @seq to +1 of @seq from the previous head.
>> +		 *
>> +		 * Memory barrier involvement:
>> +		 *
>> +		 * If bB reads from bE, then bC->aA reads from bD.
>> +		 *
>> +		 * Relies on:
>> +		 *
>> +		 * RELEASE from bD to bE
>> +		 *    matching
>> +		 * ADDRESS DEP. from bB to bC->aA
>> +		 */
>> +		WRITE_ONCE(n->seq, seq + 1);
>
> Do we really need WRITE_ONCE() here? 
> It is the same problem as with setting n->next_id above.

For the same reasons as the other WRITE_ONCE, I will replace the
WRITE_ONCE with an assignment.

>> +
>> +		/*
>> +		 * bE:
>> +		 *
>> +		 * This store_release() guarantees that @seq and @next are
>> +		 * stored before the node with @id is visible to any popping
>> +		 * writers. It pairs with the address dependency between @id
>> +		 * and @seq/@next provided by numlist_read(). See bD and bF
>> +		 * for details.
>> +		 */
>> +		r = atomic_long_cmpxchg_release(&nl->head_id, head_id, id);
>> +		if (r == head_id)
>> +			break;
>> +
>> +		/* bB: #3 */
>> +		head_id = r;
>> +	}
>> +
>> +	n = nl->node(head_id, nl->node_arg);
>> +
>> +	/*
>> +	 * The old head (which is still the list terminator), cannot be
>> +	 * removed because the list will always have at least one node.
>> +	 * Therefore @n must be non-NULL.
>> +	 */
>
> Please, move this comment above the nl->node() call. Both locations
> makes sense. I just see it as an important note for the call and thus
> is should be above. Also it will be better separated from the below
> comments for the _release() barrier.

OK.

>> +	/*
>> +	 * bF: the STORE part for @next_id
>> +	 *
>> +	 * Set @next_id of the previous head to @id.
>> +	 *
>> +	 * Memory barrier involvement:
>> +	 *
>> +	 * If bB reads from bE, then bF overwrites bA.
>> +	 *
>> +	 * Relies on:
>> +	 *
>> +	 * RELEASE from bA to bE
>> +	 *    matching
>> +	 * ADDRESS DEP. from bB to bF
>> +	 */
>> +	/*
>> +	 * bG: the RELEASE part for @next_id
>> +	 *
>> +	 * This _release() guarantees that a reader will see the updates to
>> +	 * this node's @seq/@next_id if the reader saw the @next_id of the
>> +	 * previous node in the list. It pairs with the address dependency
>> +	 * between @id and @seq/@next provided by numlist_read().
>> +	 *
>> +	 * Memory barrier involvement:
>> +	 *
>> +	 * If aB reads from bG, then aA' reads from bD, where aA' is in
>> +	 * numlist_read() to read the node ID from bG.
>> +	 * If aB reads from bG, then aB' reads from bA, where aB' is in
>> +	 * numlist_read() to read the node ID from bG.
>> +	 *
>> +	 * Relies on:
>> +	 *
>> +	 * RELEASE from bG to bD
>> +	 *    matching
>> +	 * ADDRESS DEP. from aB to aA'
>> +	 *
>> +	 * RELEASE from bG to bA
>> +	 *    matching
>> +	 * ADDRESS DEP. from aB to aB'
>> +	 */
>> +	smp_store_release(&n->next_id, id);
>
> Sigh, I see this line one screen below the previous command thanks
> to the extensive comments. Well, bF comment looks redundant.

Yes, it is a lot, but bF and bG are commenting on different things. bF
is the explanation that the _writer_ will overwrite the previous node's
terminating value with the ID of the new node. bG is the explanation
that if the _reader_ reads a non-terminating @next_id value, then the
initialization of @seq and @next_id for that next node will be visible.

Both of these points are key to the numlist implementation because they
guarantee the complete and correct linking of the list for all nodes
pushed.

John Ogness

  parent reply	other threads:[~2019-08-26 22:36 UTC|newest]

Thread overview: 131+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-07 22:26 [RFC PATCH v4 0/9] printk: new ringbuffer implementation John Ogness
2019-08-07 22:26 ` [RFC PATCH v4 1/9] printk-rb: add a new printk " John Ogness
2019-08-20  8:15   ` numlist_pop(): " Petr Mladek
2019-08-21  5:41     ` John Ogness
2019-09-04 12:19     ` Peter Zijlstra
2019-08-20  8:22   ` assign_desc() barriers: " Petr Mladek
2019-08-20 14:14     ` Petr Mladek
2019-08-21  5:52       ` John Ogness
2019-08-22 11:53         ` Petr Mladek
2019-08-25  2:06           ` John Ogness
2019-08-26  8:21             ` John Ogness
2019-08-20  8:55   ` comments style: " Petr Mladek
2019-08-20  9:27     ` Sergey Senozhatsky
2019-08-21  5:46       ` John Ogness
2019-08-22 13:50         ` Petr Mladek
2019-08-22 17:38           ` Andrea Parri
2019-08-23 10:47             ` Petr Mladek
2019-08-23 14:27               ` Andrea Parri
2019-08-23  9:49           ` Sergey Senozhatsky
2019-08-23  5:54         ` Sergey Senozhatsky
2019-08-23 10:29           ` Petr Mladek
2019-08-21  5:42     ` John Ogness
2019-08-22 12:44       ` Petr Mladek
2019-08-20 13:50   ` dataring_push() barriers " Petr Mladek
2019-08-25  2:42     ` John Ogness
2019-08-27 14:36       ` Petr Mladek
2019-08-28 13:43         ` John Ogness
2019-08-20 15:12   ` datablock reuse races " Petr Mladek
2019-08-23  9:21   ` numlist_push() barriers " Petr Mladek
2019-08-26  8:34     ` Andrea Parri
2019-08-26  8:43       ` Andrea Parri
2019-08-26 14:10       ` Petr Mladek
2019-08-26 16:01         ` Andrea Parri
2019-08-26 22:36     ` John Ogness [this message]
2019-08-27  7:40       ` Petr Mladek
2019-08-27 14:28         ` John Ogness
2019-08-27 15:07           ` Petr Mladek
2019-08-28 10:24             ` John Ogness
2019-08-23 17:18   ` numlist API " Petr Mladek
2019-08-26 23:57     ` John Ogness
2019-08-27 13:03       ` Petr Mladek
2019-08-28  7:13         ` John Ogness
2019-08-28  8:58           ` Petr Mladek
2019-08-28 14:03             ` John Ogness
2019-08-29 11:28               ` Petr Mladek
2019-09-03  7:58         ` Sergey Senozhatsky
2019-08-30 14:48   ` dataring " Petr Mladek
2019-08-07 22:26 ` [RFC PATCH v4 2/9] printk-rb: add test module John Ogness
2019-08-07 22:26 ` [RFC PATCH v4 3/9] printk-rb: fix missing includes/exports John Ogness
2019-08-07 22:26 ` [RFC PATCH v4 4/9] printk-rb: initialize new descriptors as invalid John Ogness
2019-08-20  9:23   ` Petr Mladek
2019-08-20 10:16     ` Sergey Senozhatsky
2019-08-21  5:56     ` John Ogness
2019-08-07 22:26 ` [RFC PATCH v4 5/9] printk-rb: remove extra data buffer size allocation John Ogness
2019-08-07 22:26 ` [RFC PATCH v4 6/9] printk-rb: adjust test module ringbuffer sizes John Ogness
2019-08-19 21:29   ` [PATCH] printk-rb: fix test module macro usage John Ogness
2019-08-07 22:26 ` [RFC PATCH v4 7/9] printk-rb: increase size of seq and size variables John Ogness
2019-08-07 22:26 ` [RFC PATCH v4 8/9] printk-rb: new functionality to support printk John Ogness
2019-08-20  9:59   ` Sergey Senozhatsky
2019-08-21  5:47     ` John Ogness
2019-08-07 22:26 ` [RFC PATCH v4 9/9] printk: use a new ringbuffer implementation John Ogness
2019-08-08 19:07   ` Linus Torvalds
2019-08-08 22:55     ` John Ogness
2019-08-08 23:33       ` Linus Torvalds
2019-08-08 23:45         ` Steven Rostedt
2019-08-09  0:21           ` Linus Torvalds
2019-08-09  0:48             ` Steven Rostedt
2019-08-09  1:15               ` Linus Torvalds
2019-08-09 11:15                 ` Thomas Gleixner
2019-08-09 16:00                   ` Linus Torvalds
2019-08-09 20:07                     ` Thomas Gleixner
2019-08-09 20:20                       ` Linus Torvalds
2019-08-09  6:14     ` Peter Zijlstra
2019-08-09  7:08       ` John Ogness
2019-08-09 15:57       ` Linus Torvalds
2019-08-10  5:53         ` Thomas Gleixner
2019-09-10  3:19           ` Sergey Senozhatsky
2019-08-12  9:54       ` Geert Uytterhoeven
2019-08-16  5:46   ` Dave Young
2019-08-16  5:54     ` Dave Young
2019-08-16  9:40     ` John Ogness
2019-09-04 12:35 ` [RFC PATCH v4 0/9] printk: " Peter Zijlstra
2019-09-05 13:05   ` Petr Mladek
2019-09-05 14:31     ` Peter Zijlstra
2019-09-05 15:38       ` Thomas Gleixner
2019-09-05 16:11         ` Steven Rostedt
2019-09-05 21:10           ` John Ogness
2019-09-06  9:39           ` Petr Mladek
2019-09-09 14:11           ` printk meeting at LPC Thomas Gleixner
2019-09-13 13:26             ` John Ogness
2019-09-13 14:48               ` Daniel Vetter
2019-09-15 13:47                 ` John Ogness
2019-09-16  8:44                   ` Daniel Vetter
2019-09-16  4:30               ` Tetsuo Handa
2019-09-16 10:46                 ` Petr Mladek
2019-09-16 13:43                   ` Steven Rostedt
2019-09-16 14:28                     ` John Ogness
2019-09-17  8:11                       ` Petr Mladek
2019-09-17  7:52                     ` Petr Mladek
2019-09-17 13:02                       ` Steven Rostedt
2019-09-17 13:12                         ` Greg Kroah-Hartman
2019-09-17 13:37                           ` Steven Rostedt
2019-09-17 14:08                             ` Tetsuo Handa
2019-09-17  7:51                   ` Sergey Senozhatsky
2019-09-18  1:25               ` Sergey Senozhatsky
2019-09-18  2:08                 ` Steven Rostedt
2019-09-18  2:36                   ` Sergey Senozhatsky
2019-09-18  5:19                     ` Sergey Senozhatsky
2019-09-18  7:42                       ` John Ogness
2019-09-18  8:10                         ` Sergey Senozhatsky
2019-09-18  9:05                           ` John Ogness
2019-09-18  9:11                             ` Sergey Senozhatsky
2019-09-18 16:41                             ` Petr Mladek
2019-09-18 16:48                               ` Steven Rostedt
2019-09-24 14:24                                 ` Petr Mladek
2019-09-19  8:06                         ` Daniel Vetter
2019-09-18  7:33                     ` John Ogness
2019-09-18  8:08                       ` Sergey Senozhatsky
2019-10-04 14:48               ` Tony Asleson
2019-10-07 12:01                 ` Petr Mladek
2019-09-06  9:06       ` [RFC PATCH v4 0/9] printk: new ringbuffer implementation Peter Zijlstra
2019-09-06 10:09         ` Sergey Senozhatsky
2019-09-06 10:49           ` Peter Zijlstra
2019-09-06 13:44             ` Sergey Senozhatsky
2019-09-06 12:42         ` Petr Mladek
2019-09-06 14:01           ` Peter Zijlstra
2019-09-06 14:22             ` Peter Zijlstra
2019-09-06 19:53             ` Sergey Senozhatsky
2019-09-06 22:47             ` John Ogness
2019-09-08 22:18             ` Peter Zijlstra
2019-09-10  3:22             ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878srfo8pp.fsf@linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=brendanhiggins@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).