linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH POC] printk_ringbuffer: Alternative implementation of lockless printk ringbuffer
Date: Tue, 9 Jul 2019 11:06:09 +0200	[thread overview]
Message-ID: <20190709090609.shx7j2mst7wlkbqm@pathway.suse.cz> (raw)
In-Reply-To: <874l3wng7g.fsf@linutronix.de>

On Tue 2019-07-09 03:34:43, John Ogness wrote:
> On 2019-07-08, Petr Mladek <pmladek@suse.com> wrote:
> >> 1. The code claims that the cmpxchg(seq_newest) in prb_reserve_desc()
> >> guarantees that "The descriptor is ours until the COMMITTED bit is
> >> set."  This is not true if in that wind seq_newest wraps, allowing
> >> another writer to gain ownership of the same descriptor. For small
> >> descriptor arrays (such as in my test module), this situation is
> >> quite easy to reproduce.
> >
> Let me inline the function are talking about and add commentary to
> illustrate what I am saying:
> 
> static int prb_reserve_desc(struct prb_reserved_entry *entry)
> {
> 	unsigned long seq, seq_newest, seq_prev_wrap;
> 	struct printk_ringbuffer *rb = entry->rb;
> 	struct prb_desc *desc;
> 	int err;
> 
> 	/* Get descriptor for the next sequence number. */
> 	do {
> 		seq_newest = READ_ONCE(rb->seq_newest);
> 		seq = (seq_newest + 1) & PRB_SEQ_MASK;
> 		seq_prev_wrap = (seq - PRB_DESC_SIZE(rb)) & PRB_SEQ_MASK;
> 
> 		/*
> 		 * Remove conflicting descriptor from the previous wrap
> 		 * if ever used. It might fail when the related data
> 		 * have not been committed yet.
> 		 */
> 		if (seq_prev_wrap == READ_ONCE(rb->seq_oldest)) {
> 			err = prb_remove_desc_oldest(rb, seq_prev_wrap);
> 			if (err)
> 				return err;
> 		}
> 	} while (cmpxchg(&rb->seq_newest, seq_newest, seq) != seq_newest);
> 
> I am referring to this point in the code, after the
> cmpxchg(). seq_newest has been incremented but the descriptor is still
> in the unused state and seq is still 1 wrap behind. If an NMI occurs
> here and the NMI (or some other CPU) inserts enough entries to wrap the
> descriptor array, this descriptor will be reserved again, even though it
> has already been reserved.

Not really, the NMI will not reach the cmpxchg() in this case.
prb_remove_desc_oldest() will return error. It will not
be able to remove the conflicting descriptor because it will
still be occupied by a two-wraps-old descriptor.

BTW: I did meet these problems in some early variatns. But everything
started working at some point. I always looked how you solved
a particular situation in the link-based approach. Then I
somehow translated it into the pure-array variant.


> > BTW: There is one potential problem with my alternative approach.
> >
> >      The descriptors and the related data blocks might get reserved
> >      in different order. Now, the descriptor might get reused only
> >      when the related datablock is moved outside the valid range.
> >      This operation might move also other data blocks outside
> >      the range and invalidate descriptors that were reserved later.
> >      As a result we might need to invalidate more messages in
> >      the log buffer then would be really necessary.
> >
> >      If I get it properly, this problem does not exist with the
> >      implementation using links. It is because the descriptors are
> >      linked in the same order as the reserved data blocks.
> 
> Descriptors in the committed list are ordered in commit order (not the
> reserve order). However, if there are not enough descriptors
> (i.e. avgdatabits is higher than the true average) this problem exists
> with the list approach as well.

Thanks for the explanation.

> >      I am not sure how big the problem, with more invalidated messages,
> >      would be in reality. I am not sure if it would be worth
> >      a more complicated implementation.
> 
> I am also not sure how big the problem is in a practical sense. However
> to help avoid this issue, I will increase the descbits:avgdatabits ratio
> for v3.

Yup, it sounds reasonable. The number of reserved but not committed
descriptors is basically limited by the number of CPUs.

The only unknown variable is the length of the messages. Which brings
another problem. We might need a solution for continuous lines.
People would want it. Also storing one line in many entries would
be quite inefficient. But let's discuss this later when
printk() gets converted into the lockless ring buffer.


> >      Anyway, I still need to fully understand the linked approach
> >      first.
> 
> You may want to wait for v3. I've now split the ringbuffer into multiple
> generic data structures (as unintentionally suggested[0] by PeterZ),
> which helps to clarify the role of each data structure and also isolates
> the memory barriers so that it is clear which data structure requires
> which memory barriers.

Sure, I am interested to see v3.

Best Regards,
Petr

  reply	other threads:[~2019-07-09  9:06 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-07 16:23 [RFC PATCH v2 0/2] printk: new ringbuffer implementation John Ogness
2019-06-07 16:23 ` [RFC PATCH v2 1/2] printk-rb: add a new printk " John Ogness
2019-06-18  4:51   ` Sergey Senozhatsky
2019-06-18 22:12     ` John Ogness
2019-06-25  6:45       ` Sergey Senozhatsky
2019-06-25  7:15         ` Sergey Senozhatsky
2019-06-25  8:44           ` John Ogness
2019-06-25  9:06             ` Petr Mladek
2019-06-25 10:03               ` Sergey Senozhatsky
2019-06-25 12:03                 ` John Ogness
2019-06-26  2:08                   ` Sergey Senozhatsky
2019-06-26  7:16                     ` John Ogness
2019-06-26  7:45                       ` Sergey Senozhatsky
2019-06-26  7:47                       ` Petr Mladek
2019-06-26  7:59                         ` Sergey Senozhatsky
2019-06-25  9:09             ` Sergey Senozhatsky
2019-06-18 11:12   ` Peter Zijlstra
2019-06-18 22:18     ` John Ogness
2019-06-18 11:22   ` Peter Zijlstra
2019-06-18 22:30     ` John Ogness
2019-06-19 10:46       ` Andrea Parri
2019-06-20 22:50         ` John Ogness
2019-06-21 12:16           ` Andrea Parri
2019-06-19 11:08       ` Peter Zijlstra
2019-06-18 11:47   ` Peter Zijlstra
2019-06-20 22:23     ` John Ogness
2019-06-26 22:40       ` Peter Zijlstra
2019-06-26 22:53         ` Peter Zijlstra
2019-06-28  9:50         ` John Ogness
2019-06-28 15:44           ` Peter Zijlstra
2019-06-28 16:07             ` Peter Zijlstra
2019-07-01 10:39             ` John Ogness
2019-07-01 14:10               ` Peter Zijlstra
2019-07-01 14:11               ` Peter Zijlstra
2019-06-29 21:05           ` Andrea Parri
2019-06-30  2:03             ` John Ogness
2019-06-30 14:08               ` Andrea Parri
2019-07-02 14:13                 ` John Ogness
2019-06-26 22:47       ` Peter Zijlstra
2019-06-21 14:05   ` Petr Mladek
2019-06-24  8:33     ` John Ogness
2019-06-24 14:09       ` Petr Mladek
2019-06-25 13:29         ` John Ogness
2019-06-26  8:29           ` Petr Mladek
2019-06-26  9:09             ` John Ogness
2019-06-26 21:16       ` Peter Zijlstra
2019-06-26 21:43         ` John Ogness
2019-06-27  8:28           ` Petr Mladek
2019-07-04 10:33     ` [PATCH POC] printk_ringbuffer: Alternative implementation of lockless printk ringbuffer Petr Mladek
2019-07-04 14:59       ` John Ogness
2019-07-08 15:23         ` Petr Mladek
2019-07-09  1:34           ` John Ogness
2019-07-09  9:06             ` Petr Mladek [this message]
2019-07-09 10:21               ` John Ogness
2019-07-09 11:58                 ` Petr Mladek
2019-08-14  3:46                   ` John Ogness
2019-06-24 13:55   ` [RFC PATCH v2 1/2] printk-rb: add a new printk ringbuffer implementation John Ogness
2019-06-25  8:55   ` Sergey Senozhatsky
2019-06-25  9:19     ` John Ogness
2019-06-07 16:23 ` [RFC PATCH v2 2/2] printk-rb: add test module John Ogness
2019-06-17 21:09 ` [RFC PATCH v2 0/2] printk: new ringbuffer implementation Thomas Gleixner
2019-06-18  7:15   ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190709090609.shx7j2mst7wlkbqm@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).