All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Parri <andrea.parri@amarulasolutions.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [RFC PATCH v2 1/2] printk-rb: add a new printk ringbuffer implementation
Date: Sat, 29 Jun 2019 23:05:28 +0200	[thread overview]
Message-ID: <20190629210528.GA3922@andrea> (raw)
In-Reply-To: <87mui2ujh2.fsf@linutronix.de>

> /**
>  * add_descr_list() - Add a descriptor to the descriptor list.
>  *
>  * @e: An entry that has already reserved data.
>  *
>  * The provided entry contains a pointer to a descriptor that has already
>  * been reserved for this entry. However, the reserved descriptor is not
>  * yet on the list. Add this descriptor as the newest item.
>  *
>  * A descriptor is added in two steps. The first step is to make this
>  * descriptor the newest. The second step is to update @next of the former
>  * newest descriptor to point to this one (or set @oldest to this one if
>  * this will be the first descriptor on the list).
>  */
> static void add_descr_list(struct prb_reserved_entry *e)
> {
> 	struct printk_ringbuffer *rb = e->rb;
> 	struct prb_list *l = &rb->descr_list;
> 	struct prb_descr *d = e->descr;
> 	struct prb_descr *newest_d;
> 	unsigned long newest_id;
> 
> 	WRITE_ONCE(d->next, EOL);

/* C */


> 
> 	do {
> 		newest_id = READ_ONCE(l->newest);

/* A */


> 		newest_d = TO_DESC(rb, newest_id);
> 
> 		if (newest_id == EOL) {
> 			WRITE_ONCE(d->seq, 1);
> 		} else {
> 			/*
> 			 * MB5-read: synchronize setting newest descr
> 			 *
> 			 * context-pair: 2 writers adding a descriptor via
> 			 * add_descr_list().
> 			 *
> 			 * @newest will load before @seq due to a data
> 			 * dependency, therefore, the stores of @seq
> 			 * and @next from the pairing MB5-write context
> 			 * will be visible.
> 			 *
> 			 * Although @next is not loaded by this context,
> 			 * this context must overwrite the stored @next
> 			 * value of the pairing MB5-write context.
> 			 */
> 			WRITE_ONCE(d->seq, READ_ONCE(newest_d->seq) + 1);

/* B: this READ_ONCE() */

Hence you're claiming a data dependency from A to B. (FWIW, the LKMM
would call "A ->dep B" an "address dependency.)

This comment also claims that the "pairing MB5-write" orders "stores
of @seq and @next" (which are to different memory locations w.r.t. A
and B): I do not get why this access to @next (C above?, that's also
"unordered" w.r.t. A) can be relevant; can you elaborate?


> 		}
> 
> 		/*
> 		 * MB5-write: synchronize setting newest descr
> 		 *
> 		 * context-pair: 2 writers adding a descriptor via
> 		 * add_descr_list().
> 		 *
> 		 * Ensure that @next and @seq are stored before @d is
> 		 * visible via @newest. The pairing MB5-read context
> 		 * must load this @seq value and must overwrite this
> 		 * @next value.
> 		 */
> 	} while (cmpxchg_release(&l->newest, newest_id, e->id) != newest_id);
> 
> 	if (unlikely(newest_id == EOL)) {
> 		/*
> 		 * MB0-write: synchronize adding first descr
> 		 *
> 		 * context-pair: 1 writer adding the first descriptor via
> 		 * add_descr_list(), 1 reader getting the beginning of
> 		 * the list via iter_peek_next_id().
> 		 *
> 		 * This context recently assigned new values for @id,
> 		 * @next, @seq. Ensure these are stored before the first
> 		 * store to @oldest so that the new values are visible
> 		 * to the reader in the pairing MB0-read context.
> 		 *
> 		 * Note: Before this store, the value of @oldest is EOL.
> 		 */

My gmail-search foo is unable to locate MB0-read: what am I missing?
Also, can you maybe annotate the memory accesses to @id, @next, @seq
and @oldest (as I did above)? I find myself guessing their location.


> 		smp_store_release(&l->oldest, e->id);
> 	} else {
> 		/*
> 		 * MB6-write: synchronize linking new descr
> 		 *
> 		 * context-pair-1: 1 writer adding a descriptor via
> 		 * add_descr_list(), 1 writer removing a descriptor via
> 		 * remove_oldest_descr().
> 		 *
> 		 * If this is a recycled descriptor, this context
> 		 * recently stored a new @oldest value. Ensure that
> 		 * @oldest is stored before storing @next so that
> 		 * if the pairing MB6-read context sees a non-EOL
> 		 * @next value, it is ensured that it will also see
> 		 * an updated @oldest value.
> 		 *
> 		 * context-pair-2: 1 writer adding a descriptor via
> 		 * add_descr_list(), 1 reader iterating the list via
> 		 * prb_iter_next_valid_entry().
> 		 *
> 		 * This context recently assigned new values for @id,
> 		 * @next, @seq, @data, @data_next. Ensure these are
> 		 * stored before storing @next of the previously
> 		 * newest descriptor so that the new values are
> 		 * visible to the iterating reader in the pairing
> 		 * MB6-read context.
> 		 *
> 		 * Note: Before this store, the value of @next of the
> 		 * previously newest descriptor is EOL.
> 		 */

Same as above but for MB6-read and the accesses to @id, @next, @seq,
@data, @data_next.

In conclusion, I have been unable to produce litmus tests by reading
your comments (meaning I'm lost).

Thanks,
  Andrea


> 		smp_store_release(&newest_d->next, e->id);
> 	}
> }
> 
> The smp_rmb() calls in the reader functions are then commented and
> marked with the appropriate MB0-read and MB6-read labels.
> 
> > Afaict prb_list is a list head not a list node (calling it just _list
> > is confusing at best).
> 
> OK.
> 
> > You have a single linked list going from the tail to the head, while
> > adding to the head and removing from the tail. And that sounds like a
> > FIFO queue:
> 
> Yes, but with one important feature: the nodes in the FIFO queue are
> labeled with ordered sequence numbers. This is important for printk. I
> talk more about this below.
> 
> > 	struct lqueue_head {
> > 		struct lqueue_node *head, *tail;
> > 	};
> >
> > 	struct lqueue_node {
> > 		struct lqueue_node *next;
> > 	};
> >
> > 	void lqueue_push(struct lqueue_head *h, struct lqueue_node *n)
> > 	{
> > 		struct lqueue_node *prev;
> >
> > 		n->next = NULL;
> 
> Is this safe? Do all compilers understand that @next must be stored
> before the xchg() of @head? I would have chosen WRITE_ONCE().
> 
> > 		/*
> > 		 * xchg() implies RELEASE; and thereby ensures @n is
> > 		 * complete before getting published.
> > 		 */
> > 		prev = xchg(&h->head, n);
> 
> Unfortunately it is not that simple because of sequence numbers. A node
> must be assigned a sequence number that is +1 of the previous node. This
> must be done before exchanging the head because immediately after the
> xchg() on the head, another CPU could then add on to us and expects our
> sequence number to already be set.
> 
> This is why I need cmpxchg() here.
> 
> > 		/*
> > 		 * xchg() implies ACQUIRE; and thereby ensures @tail is
> > 		 * written after @head, see lqueue_pop()'s smp_rmb().
> > 		 */
> > 		if (prev)
> > 			WRITE_ONCE(prev->next, n);
> 
> This needs to be a store_release() so that a reader cannot read @n but
> the store to @next is not yet visible. The memory barriers of the above
> xchg() do not apply here because readers never read @head.
> 
> > 		else
> > 			WRITE_ONCE(h->tail, n);
> 
> Ditto, but for the tail node in particular.
> 
> > 	}
> >
> > 	struct lqueue_node *lqueue_pop(struct lqueue_head *h)
> > 	{
> > 		struct lqueue_node *head, *tail, *next;
> >
> > 		do {
> > 			tail = READ_ONCE(h->tail);
> > 			/* If the list is empty, nothing to remove. */
> > 			if (!tail)
> > 				return NULL;
> >
> > 			/*
> > 			 * If we see @tail, we must then also see @head.
> > 			 * Pairs with the xchg() in lqueue_push(),
> > 			 * ensure no false positive on the singleton
> > 			 * test below.
> > 			 */
> > 			smp_rmb();
> > 			head = READ_ONCE(h->head);
> >
> > 			/* If there is but one item; fail to remove. */
> > 			if (head == tail)
> > 				return NULL;
> >
> > 			next = smp_cond_load_relaxed(&tail->next, VAL);
> 
> What if a writer is adding a 2nd node to the queue and is interrupted by
> an NMI directly after the xchg() in lqueue_push()? Then we have:
> 
>     * head != tail
>     * tail->next == NULL
> 
> If that interrupting NMI calls lqueue_pop(), the NMI will spin
> forever. The following cmpxchg() is not allowed to happen as long as
> tail->next is NULL.
> 
> This is why I synchronize on @next instead, using (tail && !tail->next)
> for the singleton test.
> 
> > 		} while (cmpxchg(h->tail, tail, next) != tail);
> >
> > 		return tail;
> > 	}
> >
> > Now, you appear to be using desc_ids instead of pointers, but since
> > you're not using the actual wrap value; I don't see the benefit of
> > using those IDs over straight pointers.
> 
> The documentation mentions that descriptor ids are used to identify
> pointers to invalid descriptors. This is used by the readers, see
> iter_peek_next_id() and prb_iter_next_valid_entry().
> 
> IDs are used for:
> 
> - @next of descriptors on the list
> - @id, @id_next in the reader iterator
> - @id in the data blocks
> 
> If changed to pointers, iterators would need to additionally store @seq
> values to be able to identifiy if the entry they are pointing to is the
> entry they expect.
> 
> The only advantage I see with pointers is that the ringbuffer could be
> more useful generally, independent of whether the data is separate or
> within the nodes or if the nodes are statically or dynamically
> allocated. That is something worth having, even if it is not printk
> related.
> 
> Are you implicitly requesting me to split the prb_ringbuffer and instead
> base it on a new "lockless multi-writer multi-reader sequenced FIFO
> queue" data structure?
> 
> > That is, unless I've overlooked some subtle ABA issue, but then, your
> > code doesn't seem to mention that, and I think we're good because if
> > we re-use an entry, it can never get back in the same location, since
> > we never allow an empty list
> 
> I do not understand what you mean here. If a reader has a pointer to an
> entry, the entry behind that pointer can certainly change. But that
> isn't a problem. The reader will recognize that.
> 
> > (might also be fixable, haven't tought too hard on this).
> 
> :-)
> 
> > That said, the above has cmpxchg() vs WRITE_ONCE() and is therefore
> > not safe on a number of our architectures. We can either not care
> > about performance and use xchg() for the ->tail store, or use
> > atomic_long_t and suffer ugly casting.
> 
> cmpxchg_release() vs WRITE_ONCE() is not safe?! Can you point me to
> documentation about this?
> 
> > But the above is, IMO, a more useful and readable abstraction. Let me
> > continue in another email (probably tomorrow).
> 
> Thank you for taking the time for this.
> 
> John Ogness

  parent reply	other threads:[~2019-06-29 21:05 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-07 16:23 [RFC PATCH v2 0/2] printk: new ringbuffer implementation John Ogness
2019-06-07 16:23 ` [RFC PATCH v2 1/2] printk-rb: add a new printk " John Ogness
2019-06-18  4:51   ` Sergey Senozhatsky
2019-06-18 22:12     ` John Ogness
2019-06-25  6:45       ` Sergey Senozhatsky
2019-06-25  7:15         ` Sergey Senozhatsky
2019-06-25  8:44           ` John Ogness
2019-06-25  9:06             ` Petr Mladek
2019-06-25 10:03               ` Sergey Senozhatsky
2019-06-25 12:03                 ` John Ogness
2019-06-26  2:08                   ` Sergey Senozhatsky
2019-06-26  7:16                     ` John Ogness
2019-06-26  7:45                       ` Sergey Senozhatsky
2019-06-26  7:47                       ` Petr Mladek
2019-06-26  7:59                         ` Sergey Senozhatsky
2019-06-25  9:09             ` Sergey Senozhatsky
2019-06-18 11:12   ` Peter Zijlstra
2019-06-18 22:18     ` John Ogness
2019-06-18 11:22   ` Peter Zijlstra
2019-06-18 22:30     ` John Ogness
2019-06-19 10:46       ` Andrea Parri
2019-06-20 22:50         ` John Ogness
2019-06-21 12:16           ` Andrea Parri
2019-06-19 11:08       ` Peter Zijlstra
2019-06-18 11:47   ` Peter Zijlstra
2019-06-20 22:23     ` John Ogness
2019-06-26 22:40       ` Peter Zijlstra
2019-06-26 22:53         ` Peter Zijlstra
2019-06-28  9:50         ` John Ogness
2019-06-28 15:44           ` Peter Zijlstra
2019-06-28 16:07             ` Peter Zijlstra
2019-07-01 10:39             ` John Ogness
2019-07-01 14:10               ` Peter Zijlstra
2019-07-01 14:11               ` Peter Zijlstra
2019-06-29 21:05           ` Andrea Parri [this message]
2019-06-30  2:03             ` John Ogness
2019-06-30 14:08               ` Andrea Parri
2019-07-02 14:13                 ` John Ogness
2019-06-26 22:47       ` Peter Zijlstra
2019-06-21 14:05   ` Petr Mladek
2019-06-24  8:33     ` John Ogness
2019-06-24 14:09       ` Petr Mladek
2019-06-25 13:29         ` John Ogness
2019-06-26  8:29           ` Petr Mladek
2019-06-26  9:09             ` John Ogness
2019-06-26 21:16       ` Peter Zijlstra
2019-06-26 21:43         ` John Ogness
2019-06-27  8:28           ` Petr Mladek
2019-07-04 10:33     ` [PATCH POC] printk_ringbuffer: Alternative implementation of lockless printk ringbuffer Petr Mladek
2019-07-04 14:59       ` John Ogness
2019-07-08 15:23         ` Petr Mladek
2019-07-09  1:34           ` John Ogness
2019-07-09  9:06             ` Petr Mladek
2019-07-09 10:21               ` John Ogness
2019-07-09 11:58                 ` Petr Mladek
2019-08-14  3:46                   ` John Ogness
2019-06-24 13:55   ` [RFC PATCH v2 1/2] printk-rb: add a new printk ringbuffer implementation John Ogness
2019-06-25  8:55   ` Sergey Senozhatsky
2019-06-25  9:19     ` John Ogness
2019-06-07 16:23 ` [RFC PATCH v2 2/2] printk-rb: add test module John Ogness
2019-06-17 21:09 ` [RFC PATCH v2 0/2] printk: new ringbuffer implementation Thomas Gleixner
2019-06-18  7:15   ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190629210528.GA3922@andrea \
    --to=andrea.parri@amarulasolutions.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.