linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrea Parri <parri.andrea@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] printk: use the lockless ringbuffer
Date: Fri, 14 Feb 2020 10:48:51 +0100	[thread overview]
Message-ID: <87r1yxh530.fsf@linutronix.de> (raw)
In-Reply-To: <20200214014113.GE36551@google.com> (Sergey Senozhatsky's message of "Fri, 14 Feb 2020 10:41:13 +0900")

On 2020-02-14, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>>> cat /dev/kmsg
>>> cat: /dev/kmsg: Broken pipe
>>
>> In mainline you can have this "problem" as well. Once the ringbuffer
>> has wrapped, any read to a newly opened /dev/kmsg when a new message
>> arrived will result in an EPIPE. This happens quite easily once the
>> ringbuffer has wrapped because each new message is overwriting the
>> oldest message.
>
> Hmm. Something doesn't add up.
>
> Looking at the numbers, both r->info->seq and prb_first_seq(prb)
> do increase, so there are new messages in the ring buffer
>
>                            u->seq    r->seq    prb_first_seq
> [..]
> cat: devkmsg_read() error 1981080   1982633   1981080
> cat: devkmsg_read() error 1981080   1982633   1981080
> cat: devkmsg_read() error 1981095   1982652   1981095
> cat: devkmsg_read() error 1981095   1982652   1981095
> cat: devkmsg_read() error 1981095   1982652   1981095
> [..]
>
> but 'cat' still wouldn't read anything from the logbuf - EPIPE.
>
> NOTE: I don't run 'cat /dev/kmsg' during the test. I run the test
> first, then I run 'cat /dev/kmsg', after the test, when
> printk-pressure is gone.

Sure. The problem is not the printk-pressure. The problem is you have
data-less records in your ringbuffer (from your previous
printk-pressure). If you used your own program that continued to read
after EPIPE, then you would see the sequence numbers jumping over the
data-less records.

> I can't reproduce it with current logbuf. 'cat' reads from /dev/kmsg
> after heavy printk-pressure test. So chances are some loggers can also
> experience problems. This might be a regression.

Mainline doesn't have data-less records. In mainline such failed
printk's are silently ignored (after attepting truncation). So for
mainline you can only reproduce the overflow case.

1. Boot 5.6.0-rc1 (without any console= slowing down printk)

2. Fill the ringbuffer and let it overflow with:

   $ while true; do echo filling buffer > /dev/kmsg; done &

3. Once you can see the ringbuffer has overflowed (and continues to
   overflow), try to read from /dev/kmsg

   $ strace head /dev/kmsg

In most cases you will see:

read(3, 0x7f7307ac1000, 4096)           = -1 EPIPE (Broken pipe)

Current readers need to be able to handle EPIPE. cat(1) does not and so
(unfortunately) is not a good candidate for reading the ringbuffer.

>>> ...
>>> systemd-journal: devkmsg_read() error 1979281 1982465 1980933
>>> systemd-journal: corrected seq 1982465 1982465
>>> cat: devkmsg_read() error 1980987 1982531 1980987
>>> cat: corrected seq 1982531 1982531
>>> cat: devkmsg_read() error 1981015 1982563 1981015
>>> cat: corrected seq 1982563 1982563
>>
>> The situation with a data-less record is the same as when the ringbuffer
>> wraps: cat is hitting that EPIPE. But re-opening the file descriptor is
>> not going to help because it will not be able to get past that data-less
>> record.
>
> So maybe this is the case with broken 'cat' on my system?

I think it is appropriate for an application to close the descriptor
after an EPIPE. /dev/kmsg is special because the reader should continue
reading anyway.

>> We could implement it such that devkmsg_read() will skip over data-less
>> records instead of issuing an EPIPE. (That is what dmesg does.) But then
>> do we need EPIPE at all? The reader can see that is has missed records
>> by tracking the sequence number, so could we just get rid of EPIPE? Then
>> cat(1) would be a great tool to view the raw ringbuffer. Please share
>> your thoughts on this.
>
> Looking at systemd/src/journal/journald-kmsg.c :
> server_read_dev_kmsg() -EPIPE is just one of the erronos they handle,
> nothing special.

Yes, but what does systemd-journald do when the EPIPE is _not_ returned
and instead there is a jump in the sequence number? Looking at
dev_kmsg_record(), systemd actually does it the way I would hope. It
tracks the sequence number correctly.

    /* Did we lose any? */
    if (serial > *s->kernel_seqnum)
         server_driver_message(s, 0,
                               "MESSAGE_ID="
                               SD_MESSAGE_JOURNAL_MISSED_STR,
                               LOG_MESSAGE("Missed %"PRIu64" kernel messages",
                               serial - *s->kernel_seqnum),
                               NULL);

> Could it be the case that some other loggers would have special
> handling for EPIPE?  I'm not sure, let's look around.
>
> I'd say that EPIPE removal looks OK to me. But before we do that, I'm
> not sure that we have clear understanding of 'cat /dev/kmsg' behaviour
> change.

In mainline, with regard to /dev/kmsg, sequence numbers will never
jump. If there would be a jump (due to lost messages) then EPIPE is
issued. The reader can either:

1. continue reading and see the jump

2. reopen the file descriptor, possibly having missed a ton more
   messages due to reopening, and then start from the oldest available
   message

With my series, #2 is no longer an option because the lost messages
could exist in a part of the ringbuffer not yet overwritten.

If we remove EPIPE, then readers will need to track the sequence number
to identify jumps. systemd-journald does this already. And tools like
cat(1) would "just work" because cat does not care if messages were
lost.

John Ogness

  parent reply	other threads:[~2020-02-14  9:48 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28 16:19 [PATCH 0/2] printk: replace ringbuffer John Ogness
2020-01-28 16:19 ` [PATCH 1/2] printk: add lockless buffer John Ogness
2020-01-29  3:53   ` Steven Rostedt
2020-02-21 11:54   ` more barriers: " Petr Mladek
2020-02-27 12:04     ` John Ogness
2020-03-04 15:08       ` Petr Mladek
2020-03-13 10:13         ` John Ogness
2020-02-21 12:05   ` misc nits " Petr Mladek
2020-03-02 10:38     ` John Ogness
2020-03-02 12:17       ` Joe Perches
2020-03-02 12:32       ` Petr Mladek
2020-03-02 13:43         ` John Ogness
2020-03-03  9:47           ` Petr Mladek
2020-03-03 15:42             ` John Ogness
2020-03-04 10:09               ` Petr Mladek
2020-03-04  9:40           ` Petr Mladek
2020-01-28 16:19 ` [PATCH 2/2] printk: use the lockless ringbuffer John Ogness
2020-02-13  9:07   ` Sergey Senozhatsky
2020-02-13  9:42     ` John Ogness
2020-02-13 11:59       ` Sergey Senozhatsky
2020-02-13 22:36         ` John Ogness
2020-02-14  1:41           ` Sergey Senozhatsky
2020-02-14  2:09             ` Sergey Senozhatsky
2020-02-14  9:48             ` John Ogness [this message]
2020-02-14 13:29   ` lijiang
2020-02-14 13:50     ` John Ogness
2020-02-15  4:15       ` lijiang
2020-02-17 15:40       ` crashdump: " Petr Mladek
2020-02-17 16:14         ` John Ogness
2020-02-17 14:41   ` misc details: " Petr Mladek
2020-02-25 20:11     ` John Ogness
2020-02-26  9:54       ` Petr Mladek
2020-02-05  4:25 ` [PATCH 0/2] printk: replace ringbuffer lijiang
2020-02-05  4:42   ` Sergey Senozhatsky
2020-02-05  4:48   ` Sergey Senozhatsky
2020-02-05  5:02     ` Sergey Senozhatsky
2020-02-05  5:38       ` lijiang
2020-02-05  6:36         ` Sergey Senozhatsky
2020-02-05  9:00           ` John Ogness
2020-02-05  9:28             ` Sergey Senozhatsky
2020-02-05 10:19             ` lijiang
2020-02-05 16:12               ` John Ogness
2020-02-06  9:12                 ` lijiang
2020-02-13 13:07                 ` Petr Mladek
2020-02-14  1:07                   ` Sergey Senozhatsky
2020-02-05 11:07             ` Sergey Senozhatsky
2020-02-05 15:48               ` John Ogness
2020-02-05 19:29                 ` Joe Perches
2020-02-06  6:31                 ` Sergey Senozhatsky
2020-02-06  7:30                 ` lijiang
2020-02-07  1:40                 ` Steven Rostedt
2020-02-07  7:43                   ` John Ogness
2020-02-14 15:56                 ` Petr Mladek
2020-02-17 11:13                   ` John Ogness
2020-02-17 14:50                     ` Petr Mladek
2020-02-25 19:27                       ` John Ogness
2020-02-05  9:36           ` lijiang
2020-02-06  9:21 ` lijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1yxh530.fsf@linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).