All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrea Parri <parri.andrea@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Full barrier in data_push_tail(): was [PATCH v2 2/3] printk: add lockless buffer
Date: Tue, 9 Jun 2020 11:48:20 +0200	[thread overview]
Message-ID: <20200609094820.GC23752@linux-b0ei> (raw)
In-Reply-To: <20200501094010.17694-3-john.ogness@linutronix.de>

On Fri 2020-05-01 11:46:09, John Ogness wrote:
> Introduce a multi-reader multi-writer lockless ringbuffer for storing
> the kernel log messages. Readers and writers may use their API from
> any context (including scheduler and NMI). This ringbuffer will make
> it possible to decouple printk() callers from any context, locking,
> or console constraints. It also makes it possible for readers to have
> full access to the ringbuffer contents at any time and context (for
> example from any panic situation).
> 
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> new file mode 100644
> index 000000000000..e0a66468d4f3
> --- /dev/null
> +++ b/kernel/printk/printk_ringbuffer.c
> +/*
> + * Advance the data ring tail to at least @lpos. This function puts
> + * descriptors into the reusable state if the tail is pushed beyond
> + * their associated data block.
> + */
> +static bool data_push_tail(struct printk_ringbuffer *rb,
> +			   struct prb_data_ring *data_ring,
> +			   unsigned long lpos)
> +{
> +	unsigned long tail_lpos;
> +	unsigned long next_lpos;
> +
> +	/* If @lpos is not valid, there is nothing to do. */
> +	if (lpos == INVALID_LPOS)
> +		return true;
> +
> +	tail_lpos = atomic_long_read(&data_ring->tail_lpos);
> +
> +	do {
> +		/* Done, if the tail lpos is already at or beyond @lpos. */
> +		if ((lpos - tail_lpos) - 1 >= DATA_SIZE(data_ring))
> +			break;
> +
> +		/*
> +		 * Make all descriptors reusable that are associated with
> +		 * data blocks before @lpos.
> +		 */
> +		if (!data_make_reusable(rb, data_ring, tail_lpos, lpos,
> +					&next_lpos)) {
> +			/*
> +			 * Guarantee the descriptor state loaded in
> +			 * data_make_reusable() is performed before reloading
> +			 * the tail lpos. The failed data_make_reusable() may
> +			 * be due to a newly recycled descriptor causing
> +			 * the tail lpos to have been previously pushed. This
> +			 * pairs with desc_reserve:D.
> +			 *
> +			 * Memory barrier involvement:
> +			 *
> +			 * If data_make_reusable:D reads from desc_reserve:G,
> +			 * then data_push_tail:B reads from data_push_tail:D.
> +			 *
> +			 * Relies on:
> +			 *
> +			 * MB from data_push_tail:D to desc_reserve:G
> +			 *    matching
> +			 * RMB from data_make_reusable:D to data_push_tail:B
> +			 *
> +			 * Note: data_push_tail:D and desc_reserve:G can be
> +			 *       different CPUs. However, the desc_reserve:G
> +			 *       CPU (which performs the full memory barrier)
> +			 *       must have previously seen data_push_tail:D.
> +			 */
> +			smp_rmb(); /* LMM(data_push_tail:A) */
> +
> +			next_lpos = atomic_long_read(&data_ring->tail_lpos
> +						); /* LMM(data_push_tail:B) */
> +			if (next_lpos == tail_lpos)
> +				return false;
> +
> +			/* Another task pushed the tail. Try again. */
> +			tail_lpos = next_lpos;
> +			continue;
> +		}
> +
> +		/*
> +		 * Guarantee any descriptor states that have transitioned to
> +		 * reusable are stored before pushing the tail lpos. This
> +		 * allows readers to identify if data has expired while
> +		 * reading the descriptor. This pairs with desc_read:D.
> +		 */
> +		smp_mb(); /* LMM(data_push_tail:C) */

The comment does not explain why we need a full barrier here. It talks
about writing descriptor states. It suggests that write barrier should
be enough.

I guess that this is related to the discussion that we had last time,
and the litmus test mentioned in
see https://lore.kernel.org/r/87h7zcjkxy.fsf@linutronix.de

I would add something like:

		* Full barrier is necessary because the descriptors
		* might have been made reusable also by other CPUs.

For people like me, it would be great to add also link to a more
detailed explanation, for example, the litmus tests, or something
even more human readable ;-) I think that it is a "rather" common
problem. I wonder whether it is already documented somewhere.

> +	} while (!atomic_long_try_cmpxchg_relaxed(&data_ring->tail_lpos,
> +			&tail_lpos, next_lpos)); /* LMM(data_push_tail:D) */
> +
> +	return true;
> +}
> +

Best Regards,
Petr

WARNING: multiple messages have this Message-ID (diff)
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Andrea Parri <parri.andrea@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Full barrier in data_push_tail(): was [PATCH v2 2/3] printk: add lockless buffer
Date: Tue, 9 Jun 2020 11:48:20 +0200	[thread overview]
Message-ID: <20200609094820.GC23752@linux-b0ei> (raw)
In-Reply-To: <20200501094010.17694-3-john.ogness@linutronix.de>

On Fri 2020-05-01 11:46:09, John Ogness wrote:
> Introduce a multi-reader multi-writer lockless ringbuffer for storing
> the kernel log messages. Readers and writers may use their API from
> any context (including scheduler and NMI). This ringbuffer will make
> it possible to decouple printk() callers from any context, locking,
> or console constraints. It also makes it possible for readers to have
> full access to the ringbuffer contents at any time and context (for
> example from any panic situation).
> 
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> new file mode 100644
> index 000000000000..e0a66468d4f3
> --- /dev/null
> +++ b/kernel/printk/printk_ringbuffer.c
> +/*
> + * Advance the data ring tail to at least @lpos. This function puts
> + * descriptors into the reusable state if the tail is pushed beyond
> + * their associated data block.
> + */
> +static bool data_push_tail(struct printk_ringbuffer *rb,
> +			   struct prb_data_ring *data_ring,
> +			   unsigned long lpos)
> +{
> +	unsigned long tail_lpos;
> +	unsigned long next_lpos;
> +
> +	/* If @lpos is not valid, there is nothing to do. */
> +	if (lpos == INVALID_LPOS)
> +		return true;
> +
> +	tail_lpos = atomic_long_read(&data_ring->tail_lpos);
> +
> +	do {
> +		/* Done, if the tail lpos is already at or beyond @lpos. */
> +		if ((lpos - tail_lpos) - 1 >= DATA_SIZE(data_ring))
> +			break;
> +
> +		/*
> +		 * Make all descriptors reusable that are associated with
> +		 * data blocks before @lpos.
> +		 */
> +		if (!data_make_reusable(rb, data_ring, tail_lpos, lpos,
> +					&next_lpos)) {
> +			/*
> +			 * Guarantee the descriptor state loaded in
> +			 * data_make_reusable() is performed before reloading
> +			 * the tail lpos. The failed data_make_reusable() may
> +			 * be due to a newly recycled descriptor causing
> +			 * the tail lpos to have been previously pushed. This
> +			 * pairs with desc_reserve:D.
> +			 *
> +			 * Memory barrier involvement:
> +			 *
> +			 * If data_make_reusable:D reads from desc_reserve:G,
> +			 * then data_push_tail:B reads from data_push_tail:D.
> +			 *
> +			 * Relies on:
> +			 *
> +			 * MB from data_push_tail:D to desc_reserve:G
> +			 *    matching
> +			 * RMB from data_make_reusable:D to data_push_tail:B
> +			 *
> +			 * Note: data_push_tail:D and desc_reserve:G can be
> +			 *       different CPUs. However, the desc_reserve:G
> +			 *       CPU (which performs the full memory barrier)
> +			 *       must have previously seen data_push_tail:D.
> +			 */
> +			smp_rmb(); /* LMM(data_push_tail:A) */
> +
> +			next_lpos = atomic_long_read(&data_ring->tail_lpos
> +						); /* LMM(data_push_tail:B) */
> +			if (next_lpos == tail_lpos)
> +				return false;
> +
> +			/* Another task pushed the tail. Try again. */
> +			tail_lpos = next_lpos;
> +			continue;
> +		}
> +
> +		/*
> +		 * Guarantee any descriptor states that have transitioned to
> +		 * reusable are stored before pushing the tail lpos. This
> +		 * allows readers to identify if data has expired while
> +		 * reading the descriptor. This pairs with desc_read:D.
> +		 */
> +		smp_mb(); /* LMM(data_push_tail:C) */

The comment does not explain why we need a full barrier here. It talks
about writing descriptor states. It suggests that write barrier should
be enough.

I guess that this is related to the discussion that we had last time,
and the litmus test mentioned in
see https://lore.kernel.org/r/87h7zcjkxy.fsf@linutronix.de

I would add something like:

		* Full barrier is necessary because the descriptors
		* might have been made reusable also by other CPUs.

For people like me, it would be great to add also link to a more
detailed explanation, for example, the litmus tests, or something
even more human readable ;-) I think that it is a "rather" common
problem. I wonder whether it is already documented somewhere.

> +	} while (!atomic_long_try_cmpxchg_relaxed(&data_ring->tail_lpos,
> +			&tail_lpos, next_lpos)); /* LMM(data_push_tail:D) */
> +
> +	return true;
> +}
> +

Best Regards,
Petr

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  parent reply	other threads:[~2020-06-09  9:48 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-01  9:40 [PATCH v2 0/3] printk: replace ringbuffer John Ogness
2020-05-01  9:40 ` John Ogness
2020-05-01  9:40 ` [PATCH v2 1/3] crash: add VMCOREINFO macro for anonymous structs John Ogness
2020-05-01  9:40   ` John Ogness
2020-06-03 10:16   ` Petr Mladek
2020-06-03 10:16     ` Petr Mladek
2020-05-01  9:40 ` [PATCH v2 2/3] printk: add lockless buffer John Ogness
2020-05-01  9:40   ` John Ogness
2020-05-18 13:03   ` John Ogness
2020-05-18 17:22     ` Linus Torvalds
2020-05-18 17:22       ` Linus Torvalds
2020-05-19 20:34       ` John Ogness
2020-05-19 20:34         ` John Ogness
2020-06-09  7:10   ` blk->id read race: was: " Petr Mladek
2020-06-09  7:10     ` Petr Mladek
2020-06-09 14:18     ` John Ogness
2020-06-09 14:18       ` John Ogness
2020-06-10  8:42       ` Petr Mladek
2020-06-10  8:42         ` Petr Mladek
2020-06-10 13:55         ` John Ogness
2020-06-10 13:55           ` John Ogness
2020-06-09  9:31   ` redundant check in make_data_reusable(): was " Petr Mladek
2020-06-09  9:31     ` Petr Mladek
2020-06-09 14:48     ` John Ogness
2020-06-09 14:48       ` John Ogness
2020-06-10  9:38       ` Petr Mladek
2020-06-10  9:38         ` Petr Mladek
2020-06-10 10:24         ` John Ogness
2020-06-10 10:24           ` John Ogness
2020-06-10 14:56           ` John Ogness
2020-06-10 14:56             ` John Ogness
2020-06-11 19:51             ` John Ogness
2020-06-11 19:51               ` John Ogness
2020-06-11 13:55           ` Petr Mladek
2020-06-11 13:55             ` Petr Mladek
2020-06-11 20:25             ` John Ogness
2020-06-11 20:25               ` John Ogness
2020-06-09  9:48   ` Petr Mladek [this message]
2020-06-09  9:48     ` Full barrier in data_push_tail(): " Petr Mladek
2020-06-09 15:03     ` John Ogness
2020-06-09 15:03       ` John Ogness
2020-06-09 11:37   ` Barrier before pushing desc_ring tail: " Petr Mladek
2020-06-09 11:37     ` Petr Mladek
2020-06-09 15:56     ` John Ogness
2020-06-09 15:56       ` John Ogness
2020-06-11 12:01       ` Petr Mladek
2020-06-11 12:01         ` Petr Mladek
2020-06-11 23:06         ` John Ogness
2020-06-11 23:06           ` John Ogness
2020-06-09 14:38   ` data_ring head_lpos and tail_lpos synchronization: " Petr Mladek
2020-06-09 14:38     ` Petr Mladek
2020-06-10  7:53     ` John Ogness
2020-06-10  7:53       ` John Ogness
2020-05-01  9:40 ` [PATCH v2 3/3] printk: use the lockless ringbuffer John Ogness
2020-05-01  9:40   ` John Ogness
2020-05-06 14:50   ` John Ogness
2020-05-06 14:50     ` John Ogness
2020-05-13 12:05 ` [PATCH v2 0/3] printk: replace ringbuffer Prarit Bhargava
2020-05-13 12:05   ` Prarit Bhargava
2020-05-15 10:24 ` Sergey Senozhatsky
2020-05-15 10:24   ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200609094820.GC23752@linux-b0ei \
    --to=pmladek@suse.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.