linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrea Parri <parri.andrea@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] printk: use the lockless ringbuffer
Date: Thu, 13 Feb 2020 23:36:25 +0100	[thread overview]
Message-ID: <87pneiyv12.fsf@linutronix.de> (raw)
In-Reply-To: <20200213115957.GC36551@google.com> (Sergey Senozhatsky's message of "Thu, 13 Feb 2020 20:59:57 +0900")

On 2020-02-13, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>>>> @@ -890,30 +758,26 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
>>>>  
>>>>  		logbuf_unlock_irq();
>>>>  		ret = wait_event_interruptible(log_wait,
>>>> -					       user->seq != log_next_seq);
>>>> +					prb_read_valid(prb, user->seq, r));
>>>>  		if (ret)
>>>>  			goto out;
>>>>  		logbuf_lock_irq();
>>>>  	}
>>>>  
>>>> -	if (user->seq < log_first_seq) {
>>>> -		/* our last seen message is gone, return error and reset */
>>>> -		user->idx = log_first_idx;
>>>> -		user->seq = log_first_seq;
>>>> +	if (user->seq < r->info->seq) {
>>>> +		/* the expected message is gone, return error and reset */
>>>> +		user->seq = r->info->seq;
>>>>  		ret = -EPIPE;
>>>>  		logbuf_unlock_irq();
>>>>  		goto out;
>>>>  	}
>>>
>>> Sorry, why doesn't this do something like
>>>
>>> 	if (user->seq < prb_first_seq(prb)) {
>>> 		/* the expected message is gone, return error and reset */
>>> 		user->seq = prb_first_seq(prb);
>>> 		ret = -EPIPE;
>>> 		...
>>> 	}
>> 
>> Here prb_read_valid() was successful, so a record _was_ read. The
>> kerneldoc for the prb_read_valid() says:
>
> Hmm, yeah. That's true.
>
> OK, something weird...
>
> I ran some random printk-pressure test (mostly printks from IRQs;
> + some NMI printk-s, but they are routed through nmi printk-safe
> buffers; + some limited number of printk-safe printk-s, routed
> via printk-safe buffer (so, once again, IRQ); + user-space
> journalctl -f syslog reader), and after the test 'cat /dev/kmsg'
> is terminally broken
>
> [..]
> cat /dev/kmsg
> cat: /dev/kmsg: Broken pipe

In mainline you can have this "problem" as well. Once the ringbuffer has
wrapped, any read to a newly opened /dev/kmsg when a new message arrived
will result in an EPIPE. This happens quite easily once the ringbuffer
has wrapped because each new message is overwriting the oldest message.

Although it can be convenient, cat(1) is actually a poor tool for
viewing the ringbuffer for this reason. Unfortunately dmesg(1) is
sub-optimal as well because it does not show the sequence numbers. So
with dmesg(1) you cannot see if a message was dropped. :-/

> So I printed seq numbers from devksmg read to a seq buffer and dumped
> it via procfs, just seq numbers before we adjust user->seq (set to
> r->seq) and after
>
> +                       offt += snprintf(BUF + offt,
> +                                       sizeof(BUF) - offt,
> +                                       "%s: devkmsg_read() error %llu %llu %llu\n",
> +                                       current->comm,
> +                                       user->seq,
> +                                       r->info->seq,
> +                                       prb_first_seq(prb));
>
>
> ...
> systemd-journal: devkmsg_read() error 1979281 1982465 1980933
> systemd-journal: corrected seq 1982465 1982465
> cat: devkmsg_read() error 1980987 1982531 1980987
> cat: corrected seq 1982531 1982531
> cat: devkmsg_read() error 1981015 1982563 1981015
> cat: corrected seq 1982563 1982563

The situation with a data-less record is the same as when the ringbuffer
wraps: cat is hitting that EPIPE. But re-opening the file descriptor is
not going to help because it will not be able to get past that data-less
record.

We could implement it such that devkmsg_read() will skip over data-less
records instead of issuing an EPIPE. (That is what dmesg does.) But then
do we need EPIPE at all? The reader can see that is has missed records
by tracking the sequence number, so could we just get rid of EPIPE? Then
cat(1) would be a great tool to view the raw ringbuffer. Please share
your thoughts on this.


On a side note (but related to data-less records): I hacked the
ringbuffer code to inject data-less records at various times in order to
verify your report. And I stumbled upon a bug in the ringbuffer, which
can lead to an infinite loop in console_unlock(). The problem occurs at:

    retry = prb_read_valid(prb, console_seq, NULL);

which will erroneously return true if console_seq is pointing to a
data-less record but there are no valid records after it. The following
patch fixes the bug. And yes, for v2 I have added comments to the
desc_read_committed() code.

I now have 2 bugfixes queued up for v2. The first one is here[0].

[0] https://lkml.kernel.org/r/87wo919grz.fsf@linutronix.de

John Ogness


diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 796257f226ee..31893051ad6b 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1074,6 +1071,7 @@ static int desc_read_committed(struct prb_desc_ring *desc_ring,
 			       unsigned long id, u64 seq,
 			       struct prb_desc *desc)
 {
+	struct prb_data_blk_lpos *blk_lpos = &desc->text_blk_lpos;
 	enum desc_state d_state;
 
 	d_state = desc_read(desc_ring, id, desc);
@@ -1084,6 +1082,11 @@ static int desc_read_committed(struct prb_desc_ring *desc_ring,
 	else if (d_state != desc_committed)
 		return -EINVAL;
 
+	if (blk_lpos->begin == INVALID_LPOS &&
+	    blk_lpos->next == INVALID_LPOS) {
+		return -ENOENT;
+	}
+
 	return 0;
 }
 

  reply	other threads:[~2020-02-13 22:36 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28 16:19 [PATCH 0/2] printk: replace ringbuffer John Ogness
2020-01-28 16:19 ` [PATCH 1/2] printk: add lockless buffer John Ogness
2020-01-29  3:53   ` Steven Rostedt
2020-02-21 11:54   ` more barriers: " Petr Mladek
2020-02-27 12:04     ` John Ogness
2020-03-04 15:08       ` Petr Mladek
2020-03-13 10:13         ` John Ogness
2020-02-21 12:05   ` misc nits " Petr Mladek
2020-03-02 10:38     ` John Ogness
2020-03-02 12:17       ` Joe Perches
2020-03-02 12:32       ` Petr Mladek
2020-03-02 13:43         ` John Ogness
2020-03-03  9:47           ` Petr Mladek
2020-03-03 15:42             ` John Ogness
2020-03-04 10:09               ` Petr Mladek
2020-03-04  9:40           ` Petr Mladek
2020-01-28 16:19 ` [PATCH 2/2] printk: use the lockless ringbuffer John Ogness
2020-02-13  9:07   ` Sergey Senozhatsky
2020-02-13  9:42     ` John Ogness
2020-02-13 11:59       ` Sergey Senozhatsky
2020-02-13 22:36         ` John Ogness [this message]
2020-02-14  1:41           ` Sergey Senozhatsky
2020-02-14  2:09             ` Sergey Senozhatsky
2020-02-14  9:48             ` John Ogness
2020-02-14 13:29   ` lijiang
2020-02-14 13:50     ` John Ogness
2020-02-15  4:15       ` lijiang
2020-02-17 15:40       ` crashdump: " Petr Mladek
2020-02-17 16:14         ` John Ogness
2020-02-17 14:41   ` misc details: " Petr Mladek
2020-02-25 20:11     ` John Ogness
2020-02-26  9:54       ` Petr Mladek
2020-02-05  4:25 ` [PATCH 0/2] printk: replace ringbuffer lijiang
2020-02-05  4:42   ` Sergey Senozhatsky
2020-02-05  4:48   ` Sergey Senozhatsky
2020-02-05  5:02     ` Sergey Senozhatsky
2020-02-05  5:38       ` lijiang
2020-02-05  6:36         ` Sergey Senozhatsky
2020-02-05  9:00           ` John Ogness
2020-02-05  9:28             ` Sergey Senozhatsky
2020-02-05 10:19             ` lijiang
2020-02-05 16:12               ` John Ogness
2020-02-06  9:12                 ` lijiang
2020-02-13 13:07                 ` Petr Mladek
2020-02-14  1:07                   ` Sergey Senozhatsky
2020-02-05 11:07             ` Sergey Senozhatsky
2020-02-05 15:48               ` John Ogness
2020-02-05 19:29                 ` Joe Perches
2020-02-06  6:31                 ` Sergey Senozhatsky
2020-02-06  7:30                 ` lijiang
2020-02-07  1:40                 ` Steven Rostedt
2020-02-07  7:43                   ` John Ogness
2020-02-14 15:56                 ` Petr Mladek
2020-02-17 11:13                   ` John Ogness
2020-02-17 14:50                     ` Petr Mladek
2020-02-25 19:27                       ` John Ogness
2020-02-05  9:36           ` lijiang
2020-02-06  9:21 ` lijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pneiyv12.fsf@linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).