From: John Ogness <john.ogness@linutronix.de>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>,
Peter Zijlstra <peterz@infradead.org>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Andrea Parri <parri.andrea@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] printk: use the lockless ringbuffer
Date: Fri, 14 Feb 2020 10:48:51 +0100 [thread overview]
Message-ID: <87r1yxh530.fsf@linutronix.de> (raw)
In-Reply-To: <20200214014113.GE36551@google.com> (Sergey Senozhatsky's message of "Fri, 14 Feb 2020 10:41:13 +0900")
On 2020-02-14, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>>> cat /dev/kmsg
>>> cat: /dev/kmsg: Broken pipe
>>
>> In mainline you can have this "problem" as well. Once the ringbuffer
>> has wrapped, any read to a newly opened /dev/kmsg when a new message
>> arrived will result in an EPIPE. This happens quite easily once the
>> ringbuffer has wrapped because each new message is overwriting the
>> oldest message.
>
> Hmm. Something doesn't add up.
>
> Looking at the numbers, both r->info->seq and prb_first_seq(prb)
> do increase, so there are new messages in the ring buffer
>
> u->seq r->seq prb_first_seq
> [..]
> cat: devkmsg_read() error 1981080 1982633 1981080
> cat: devkmsg_read() error 1981080 1982633 1981080
> cat: devkmsg_read() error 1981095 1982652 1981095
> cat: devkmsg_read() error 1981095 1982652 1981095
> cat: devkmsg_read() error 1981095 1982652 1981095
> [..]
>
> but 'cat' still wouldn't read anything from the logbuf - EPIPE.
>
> NOTE: I don't run 'cat /dev/kmsg' during the test. I run the test
> first, then I run 'cat /dev/kmsg', after the test, when
> printk-pressure is gone.
Sure. The problem is not the printk-pressure. The problem is you have
data-less records in your ringbuffer (from your previous
printk-pressure). If you used your own program that continued to read
after EPIPE, then you would see the sequence numbers jumping over the
data-less records.
> I can't reproduce it with current logbuf. 'cat' reads from /dev/kmsg
> after heavy printk-pressure test. So chances are some loggers can also
> experience problems. This might be a regression.
Mainline doesn't have data-less records. In mainline such failed
printk's are silently ignored (after attepting truncation). So for
mainline you can only reproduce the overflow case.
1. Boot 5.6.0-rc1 (without any console= slowing down printk)
2. Fill the ringbuffer and let it overflow with:
$ while true; do echo filling buffer > /dev/kmsg; done &
3. Once you can see the ringbuffer has overflowed (and continues to
overflow), try to read from /dev/kmsg
$ strace head /dev/kmsg
In most cases you will see:
read(3, 0x7f7307ac1000, 4096) = -1 EPIPE (Broken pipe)
Current readers need to be able to handle EPIPE. cat(1) does not and so
(unfortunately) is not a good candidate for reading the ringbuffer.
>>> ...
>>> systemd-journal: devkmsg_read() error 1979281 1982465 1980933
>>> systemd-journal: corrected seq 1982465 1982465
>>> cat: devkmsg_read() error 1980987 1982531 1980987
>>> cat: corrected seq 1982531 1982531
>>> cat: devkmsg_read() error 1981015 1982563 1981015
>>> cat: corrected seq 1982563 1982563
>>
>> The situation with a data-less record is the same as when the ringbuffer
>> wraps: cat is hitting that EPIPE. But re-opening the file descriptor is
>> not going to help because it will not be able to get past that data-less
>> record.
>
> So maybe this is the case with broken 'cat' on my system?
I think it is appropriate for an application to close the descriptor
after an EPIPE. /dev/kmsg is special because the reader should continue
reading anyway.
>> We could implement it such that devkmsg_read() will skip over data-less
>> records instead of issuing an EPIPE. (That is what dmesg does.) But then
>> do we need EPIPE at all? The reader can see that is has missed records
>> by tracking the sequence number, so could we just get rid of EPIPE? Then
>> cat(1) would be a great tool to view the raw ringbuffer. Please share
>> your thoughts on this.
>
> Looking at systemd/src/journal/journald-kmsg.c :
> server_read_dev_kmsg() -EPIPE is just one of the erronos they handle,
> nothing special.
Yes, but what does systemd-journald do when the EPIPE is _not_ returned
and instead there is a jump in the sequence number? Looking at
dev_kmsg_record(), systemd actually does it the way I would hope. It
tracks the sequence number correctly.
/* Did we lose any? */
if (serial > *s->kernel_seqnum)
server_driver_message(s, 0,
"MESSAGE_ID="
SD_MESSAGE_JOURNAL_MISSED_STR,
LOG_MESSAGE("Missed %"PRIu64" kernel messages",
serial - *s->kernel_seqnum),
NULL);
> Could it be the case that some other loggers would have special
> handling for EPIPE? I'm not sure, let's look around.
>
> I'd say that EPIPE removal looks OK to me. But before we do that, I'm
> not sure that we have clear understanding of 'cat /dev/kmsg' behaviour
> change.
In mainline, with regard to /dev/kmsg, sequence numbers will never
jump. If there would be a jump (due to lost messages) then EPIPE is
issued. The reader can either:
1. continue reading and see the jump
2. reopen the file descriptor, possibly having missed a ton more
messages due to reopening, and then start from the oldest available
message
With my series, #2 is no longer an option because the lost messages
could exist in a part of the ringbuffer not yet overwritten.
If we remove EPIPE, then readers will need to track the sequence number
to identify jumps. systemd-journald does this already. And tools like
cat(1) would "just work" because cat does not care if messages were
lost.
John Ogness
next prev parent reply other threads:[~2020-02-14 9:48 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-28 16:19 [PATCH 0/2] printk: replace ringbuffer John Ogness
2020-01-28 16:19 ` [PATCH 1/2] printk: add lockless buffer John Ogness
2020-01-29 3:53 ` Steven Rostedt
2020-02-21 11:54 ` more barriers: " Petr Mladek
2020-02-27 12:04 ` John Ogness
2020-03-04 15:08 ` Petr Mladek
2020-03-13 10:13 ` John Ogness
2020-02-21 12:05 ` misc nits " Petr Mladek
2020-03-02 10:38 ` John Ogness
2020-03-02 12:17 ` Joe Perches
2020-03-02 12:32 ` Petr Mladek
2020-03-02 13:43 ` John Ogness
2020-03-03 9:47 ` Petr Mladek
2020-03-03 15:42 ` John Ogness
2020-03-04 10:09 ` Petr Mladek
2020-03-04 9:40 ` Petr Mladek
2020-01-28 16:19 ` [PATCH 2/2] printk: use the lockless ringbuffer John Ogness
2020-02-13 9:07 ` Sergey Senozhatsky
2020-02-13 9:42 ` John Ogness
2020-02-13 11:59 ` Sergey Senozhatsky
2020-02-13 22:36 ` John Ogness
2020-02-14 1:41 ` Sergey Senozhatsky
2020-02-14 2:09 ` Sergey Senozhatsky
2020-02-14 9:48 ` John Ogness [this message]
2020-02-14 13:29 ` lijiang
2020-02-14 13:50 ` John Ogness
2020-02-15 4:15 ` lijiang
2020-02-17 15:40 ` crashdump: " Petr Mladek
2020-02-17 16:14 ` John Ogness
2020-02-17 14:41 ` misc details: " Petr Mladek
2020-02-25 20:11 ` John Ogness
2020-02-26 9:54 ` Petr Mladek
2020-02-05 4:25 ` [PATCH 0/2] printk: replace ringbuffer lijiang
2020-02-05 4:42 ` Sergey Senozhatsky
2020-02-05 4:48 ` Sergey Senozhatsky
2020-02-05 5:02 ` Sergey Senozhatsky
2020-02-05 5:38 ` lijiang
2020-02-05 6:36 ` Sergey Senozhatsky
2020-02-05 9:00 ` John Ogness
2020-02-05 9:28 ` Sergey Senozhatsky
2020-02-05 10:19 ` lijiang
2020-02-05 16:12 ` John Ogness
2020-02-06 9:12 ` lijiang
2020-02-13 13:07 ` Petr Mladek
2020-02-14 1:07 ` Sergey Senozhatsky
2020-02-05 11:07 ` Sergey Senozhatsky
2020-02-05 15:48 ` John Ogness
2020-02-05 19:29 ` Joe Perches
2020-02-06 6:31 ` Sergey Senozhatsky
2020-02-06 7:30 ` lijiang
2020-02-07 1:40 ` Steven Rostedt
2020-02-07 7:43 ` John Ogness
2020-02-14 15:56 ` Petr Mladek
2020-02-17 11:13 ` John Ogness
2020-02-17 14:50 ` Petr Mladek
2020-02-25 19:27 ` John Ogness
2020-02-05 9:36 ` lijiang
2020-02-06 9:21 ` lijiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1yxh530.fsf@linutronix.de \
--to=john.ogness@linutronix.de \
--cc=gregkh@linuxfoundation.org \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=parri.andrea@gmail.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).