* [RFC 0/2] printk: Add more metadata for each record @ 2020-09-23 13:56 Petr Mladek 2020-09-23 13:56 ` [PATCH 1/2] printk: Store all three timestamps Petr Mladek 2020-09-23 13:56 ` [RFC 2/2] printk: Add more information about the printk caller Petr Mladek 0 siblings, 2 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-23 13:56 UTC (permalink / raw) To: Sergey Senozhatsky, Steven Rostedt, John Ogness Cc: Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel, Petr Mladek There have been repeated requests to store and print more metadata for each line, see the two patches for more details. This patchset handles the storing part so that we could do it together with the other changes of the ring buffer structure. It would be nice to get it into the same kernel release to reduce pain on the crash tools side. My proposal is to make this information available via /dev/kmsg as dictionary values. The console output would probably need to be configurable. I would ignore syslog syscall that is hard to extend and is basically obsoleted by /dev/kmsg. The patchset applies on top of the current printk/linux.git, branch printk-rework (after the dict ring removal). You also need to merge "timekeeping-for-printk-2020-08-23" tag from tip.git. It provides "struct ktime_timestamps" and ktime_get_fast_timestamps(). I still need to add the gdb-related changes if there is not a strong opposition to these changes. I just wanted to send what I had before starting review of the logbuf_lock removal patchset. Petr Mladek (2): printk: Store all three timestamps printk: Add more information about the printk caller kernel/printk/printk.c | 129 ++++++++++++++++++++---------- kernel/printk/printk_ringbuffer.c | 43 +++++----- kernel/printk/printk_ringbuffer.h | 42 +++++++++- 3 files changed, 148 insertions(+), 66 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/2] printk: Store all three timestamps 2020-09-23 13:56 [RFC 0/2] printk: Add more metadata for each record Petr Mladek @ 2020-09-23 13:56 ` Petr Mladek 2020-09-23 22:12 ` John Ogness 2020-09-24 0:00 ` John Ogness 2020-09-23 13:56 ` [RFC 2/2] printk: Add more information about the printk caller Petr Mladek 1 sibling, 2 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-23 13:56 UTC (permalink / raw) To: Sergey Senozhatsky, Steven Rostedt, John Ogness Cc: Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel, Petr Mladek printk() historically shows the timestamps from the monotonic clock. It is fast, available early during boot, in any context, and even without using any lock. There are repeated requests [1][2][3] to show the timestamps from other clocks. The main motivation is to make it easier to correlate the kernel and userspace logs. Where userspace logs usually use the real time clock. Unfortunately, it is not possible to simply replace the default clock. Userspace tools, like journalctl, dmesg, expect to get the timestamps from the mono via /dev/kmsg interface or syslog syscall [4]. Also administrators would be confused when logs from different systems use different clocks depending on kernel version or build option [5]. As a result, the mono clock has to stay as the default clock and has to be used in the current user interfaces. The problem is solved by storing timestamps from all three clock sources in struct printk_info for each message. The other timestamps can be later passed to userspace by extending the existing formats. One might ask if the timestamp from the boot clock is really needed. The values are mostly redundant with the mono clock. And there really does not seems to be any strong reason. Anyway, the boot clock is useful when suspend and resume is part of the debugged/monitored systems. And it is still easier to handle than a full real time format. [1] https://lore.kernel.org/r/1500985047-23038-1-git-send-email-prarit@redhat.com [2] https://lore.kernel.org/r/20200729114423.30606-1-zhang.lyra@gmail.com [3] https://lore.kernel.org/r/1597120822-11999-1-git-send-email-orsonzhai@gmail.com [4] https://lore.kernel.org/r/CA+55aFzzvBD4_WYm-5Bm7TeRGR8nNu1oz0oWNcRNmTNJm=M4Lw@mail.gmail.com [5] https://lore.kernel.org/r/20171115080402.gz5k3qzfaexucc3p@gmail.com Signed-off-by: Petr Mladek <pmladek@suse.com> --- kernel/printk/printk.c | 30 +++++++++++++++++++----------- kernel/printk/printk_ringbuffer.h | 3 ++- 2 files changed, 21 insertions(+), 12 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1560649cbd35..0ed8901916f4 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -326,7 +326,9 @@ static int console_msg_format = MSG_FORMAT_DEFAULT; * Example of record values: * record.text_buf = "it's a line" (unterminated) * record.info.seq = 56 - * record.info.ts_nsec = 36863 + * record.info.ts.mono = 363863 + * record.info.ts.boot = 5122562 + * record.info.ts.real = 1599211091487260162 * record.info.text_len = 11 * record.info.facility = 0 (LOG_KERN) * record.info.flags = 0 @@ -493,7 +495,7 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len) /* insert record into the buffer, discard old ones, update heads */ static int log_store(u32 caller_id, int facility, int level, - enum log_flags flags, u64 ts_nsec, + enum log_flags flags, struct ktime_timestamps *ts, const struct dev_printk_info *dev_info, const char *text, u16 text_len) { @@ -520,10 +522,10 @@ static int log_store(u32 caller_id, int facility, int level, r.info->facility = facility; r.info->level = level & 7; r.info->flags = flags & 0x1f; - if (ts_nsec > 0) - r.info->ts_nsec = ts_nsec; + if (ts) + r.info->ts = *ts; else - r.info->ts_nsec = local_clock(); + ktime_get_fast_timestamps(&r.info->ts); r.info->caller_id = caller_id; if (dev_info) memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info)); @@ -589,7 +591,7 @@ static void append_char(char **pp, char *e, char c) static ssize_t info_print_ext_header(char *buf, size_t size, struct printk_info *info) { - u64 ts_usec = info->ts_nsec; + u64 ts_mono_usec = info->ts.mono; char caller[20]; #ifdef CONFIG_PRINTK_CALLER u32 id = info->caller_id; @@ -600,11 +602,12 @@ static ssize_t info_print_ext_header(char *buf, size_t size, caller[0] = '\0'; #endif - do_div(ts_usec, 1000); + do_div(ts_mono_usec, 1000); return scnprintf(buf, size, "%u,%llu,%llu,%c%s;", (info->facility << 3) | info->level, info->seq, - ts_usec, info->flags & LOG_CONT ? 'c' : '-', caller); + ts_mono_usec, info->flags & LOG_CONT ? 'c' : '-', + caller); } static ssize_t msg_add_ext_text(char *buf, size_t size, @@ -982,11 +985,16 @@ void log_buf_vmcoreinfo_setup(void) VMCOREINFO_STRUCT_SIZE(printk_info); VMCOREINFO_OFFSET(printk_info, seq); - VMCOREINFO_OFFSET(printk_info, ts_nsec); + VMCOREINFO_OFFSET(printk_info, ts); VMCOREINFO_OFFSET(printk_info, text_len); VMCOREINFO_OFFSET(printk_info, caller_id); VMCOREINFO_OFFSET(printk_info, dev_info); + VMCOREINFO_STRUCT_SIZE(ktime_timestamps); + VMCOREINFO_OFFSET(ktime_timestamps, mono); + VMCOREINFO_OFFSET(ktime_timestamps, boot); + VMCOREINFO_OFFSET(ktime_timestamps, real); + VMCOREINFO_STRUCT_SIZE(dev_printk_info); VMCOREINFO_OFFSET(dev_printk_info, subsystem); VMCOREINFO_LENGTH(printk_info_subsystem, sizeof(dev_info->subsystem)); @@ -1094,7 +1102,7 @@ static unsigned int __init add_to_rb(struct printk_ringbuffer *rb, dest_r.info->facility = r->info->facility; dest_r.info->level = r->info->level; dest_r.info->flags = r->info->flags; - dest_r.info->ts_nsec = r->info->ts_nsec; + dest_r.info->ts = r->info->ts; dest_r.info->caller_id = r->info->caller_id; memcpy(&dest_r.info->dev_info, &r->info->dev_info, sizeof(dest_r.info->dev_info)); @@ -1320,7 +1328,7 @@ static size_t info_print_prefix(const struct printk_info *info, bool syslog, len = print_syslog((info->facility << 3) | info->level, buf); if (time) - len += print_time(info->ts_nsec, buf + len); + len += print_time(info->ts.mono, buf + len); len += print_caller(info->caller_id, buf + len); diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h index 0adaa685d1ca..09082c8472d3 100644 --- a/kernel/printk/printk_ringbuffer.h +++ b/kernel/printk/printk_ringbuffer.h @@ -5,6 +5,7 @@ #include <linux/atomic.h> #include <linux/dev_printk.h> +#include <linux/timekeeping.h> /* * Meta information about each stored message. @@ -14,7 +15,7 @@ */ struct printk_info { u64 seq; /* sequence number */ - u64 ts_nsec; /* timestamp in nanoseconds */ + struct ktime_timestamps ts; /* timestamps */ u16 text_len; /* length of text message */ u8 facility; /* syslog facility */ u8 flags:5; /* internal record flags */ -- 2.26.2 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] printk: Store all three timestamps 2020-09-23 13:56 ` [PATCH 1/2] printk: Store all three timestamps Petr Mladek @ 2020-09-23 22:12 ` John Ogness 2020-09-24 1:45 ` Sergey Senozhatsky 2020-09-24 11:49 ` Petr Mladek 2020-09-24 0:00 ` John Ogness 1 sibling, 2 replies; 21+ messages in thread From: John Ogness @ 2020-09-23 22:12 UTC (permalink / raw) To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt Cc: Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel, Petr Mladek On 2020-09-23, Petr Mladek <pmladek@suse.com> wrote: > printk() historically shows the timestamps from the monotonic clock. printk() uses the local clock, not the monotonic clock. > It is fast, available early during boot, in any context, and even > without using any lock. > > There are repeated requests [1][2][3] to show the timestamps from other > clocks. The main motivation is to make it easier to correlate the kernel > and userspace logs. Where userspace logs usually use the real time > clock. > > Unfortunately, it is not possible to simply replace the default clock. > Userspace tools, like journalctl, dmesg, expect to get the timestamps > from the mono via /dev/kmsg interface or syslog syscall [4]. > Also administrators would be confused when logs from different > systems use different clocks depending on kernel version or > build option [5]. > > As a result, the mono clock has to stay as the default clock > and has to be used in the current user interfaces. Actually this series is changing the default clock from local to monotonic. I for one welcome this change (and wish ftrace would do it as well), but it is a change. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 1560649cbd35..0ed8901916f4 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -520,10 +522,10 @@ static int log_store(u32 caller_id, int facility, int level, > r.info->facility = facility; > r.info->level = level & 7; > r.info->flags = flags & 0x1f; > - if (ts_nsec > 0) > - r.info->ts_nsec = ts_nsec; > + if (ts) > + r.info->ts = *ts; > else > - r.info->ts_nsec = local_clock(); > + ktime_get_fast_timestamps(&r.info->ts); I am wondering if we still want to keep the local_clock() as well (and as the default). ftrace also uses it by default, which means traces and printk logs could be coordinated by default until now. The two clocks can vary quite a bit. I have a laptop where the local clock drifts away from monotonic at about 50us per second. > diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h > index 0adaa685d1ca..09082c8472d3 100644 > --- a/kernel/printk/printk_ringbuffer.h > +++ b/kernel/printk/printk_ringbuffer.h > @@ -14,7 +15,7 @@ > */ > struct printk_info { > u64 seq; /* sequence number */ > - u64 ts_nsec; /* timestamp in nanoseconds */ > + struct ktime_timestamps ts; /* timestamps */ > u16 text_len; /* length of text message */ > u8 facility; /* syslog facility */ > u8 flags:5; /* internal record flags */ If we wanted to keep the local clock, should the local clock be a part of struct ktime_timestamps? Or should struct printk_info maintain that separately (either as @ts_nsec or @ts_local or whatever). John Ogness ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] printk: Store all three timestamps 2020-09-23 22:12 ` John Ogness @ 2020-09-24 1:45 ` Sergey Senozhatsky 2020-09-24 11:49 ` Petr Mladek 1 sibling, 0 replies; 21+ messages in thread From: Sergey Senozhatsky @ 2020-09-24 1:45 UTC (permalink / raw) To: John Ogness Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On (20/09/24 00:18), John Ogness wrote: > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 1560649cbd35..0ed8901916f4 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -520,10 +522,10 @@ static int log_store(u32 caller_id, int facility, int level, > > r.info->facility = facility; > > r.info->level = level & 7; > > r.info->flags = flags & 0x1f; > > - if (ts_nsec > 0) > > - r.info->ts_nsec = ts_nsec; > > + if (ts) > > + r.info->ts = *ts; > > else > > - r.info->ts_nsec = local_clock(); > > + ktime_get_fast_timestamps(&r.info->ts); > > I am wondering if we still want to keep the local_clock() as well (and > as the default). ftrace also uses it by default, which means traces and > printk logs could be coordinated by default until now. Good point. > If we wanted to keep the local clock, should the local clock be a part > of struct ktime_timestamps? Or should struct printk_info maintain that > separately (either as @ts_nsec or @ts_local or whatever). I like the idea of having a dedicated printk_timestamps structure with the timestamps. -ss ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] printk: Store all three timestamps 2020-09-23 22:12 ` John Ogness 2020-09-24 1:45 ` Sergey Senozhatsky @ 2020-09-24 11:49 ` Petr Mladek [not found] ` <CAJ-C09hqwOJhSXx1h40q96xhNZFXxP6dUVfjUQZpO4ZhOMZLbQ@mail.gmail.com> 1 sibling, 1 reply; 21+ messages in thread From: Petr Mladek @ 2020-09-24 11:49 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Thu 2020-09-24 00:18:53, John Ogness wrote: > On 2020-09-23, Petr Mladek <pmladek@suse.com> wrote: > > printk() historically shows the timestamps from the monotonic clock. > > printk() uses the local clock, not the monotonic clock. > > > It is fast, available early during boot, in any context, and even > > without using any lock. > > > > There are repeated requests [1][2][3] to show the timestamps from other > > clocks. The main motivation is to make it easier to correlate the kernel > > and userspace logs. Where userspace logs usually use the real time > > clock. > > > > Unfortunately, it is not possible to simply replace the default clock. > > Userspace tools, like journalctl, dmesg, expect to get the timestamps > > from the mono via /dev/kmsg interface or syslog syscall [4]. > > Also administrators would be confused when logs from different > > systems use different clocks depending on kernel version or > > build option [5]. > > > > As a result, the mono clock has to stay as the default clock > > and has to be used in the current user interfaces. > > Actually this series is changing the default clock from local to > monotonic. I for one welcome this change (and wish ftrace would do it as > well), but it is a change. Good point! I expected that it is enough when boot timestamps are monotonic. IMHO, it is really important to use the same clock by printk() and ftrace! Otherwise, it would be hard to match them. Honestly, I do not know enough about the different clocks and their users. My view is: 1. printk() needs a clock that is: + monotonic (always increasing) + lockless access + updated fast enough (precision) + available during early boot + match clock used by other kernel logs (ftrace) 2. ftrace needs a clock that is: + same requirements as for printk() + fast access to reduce performance problems 3. user space developers need: + real time clock to match userspace logs Now, we have 4 clocks. IMHO, storing all 4 timestamps is not worth it. We need at least two: + local, mono, or boot clock as legacy kernel timestamp + real clock to match timestamps from userspace tools I am not able to say what clock is the best one for kernel timestamps. The following questions come to my mind: Is mono or boot clock fast enough for ftrace (performance wise)? Is mono or boot clock available early enough during boot? Is is boot clock acceptable as the default kernel clock? IMHO, the boot clock is interesting because it has similar semantic as the real time (running even during sleep). But I am not sure if some developers do not rely on the mono clock or even local clock. I guess that this need opinion from a wide audience or developers experienced in many areas. I do not feel quialified to make a decision. Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CAJ-C09hqwOJhSXx1h40q96xhNZFXxP6dUVfjUQZpO4ZhOMZLbQ@mail.gmail.com>]
* Re: [PATCH 1/2] printk: Store all three timestamps [not found] ` <CAJ-C09hqwOJhSXx1h40q96xhNZFXxP6dUVfjUQZpO4ZhOMZLbQ@mail.gmail.com> @ 2020-09-25 9:51 ` Petr Mladek 0 siblings, 0 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-25 9:51 UTC (permalink / raw) To: Mark Salyzyn Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, LKML On Thu 2020-09-24 10:07:57, Mark Salyzyn wrote: > I would hope for monotonic_raw, boottime and realtime as being the most > useful for most situations. > > [TL;DR] > > Currently kernel logs actually uses monotonic_raw (no timing clock > correction), not monotonic (timing correction). > > Whereas boot (timing clock adjusted, still monotonic) and realtime (timing > clock _and_ time adjusted, non monotonic), when we try to correlate to user > space is workable, but we will have troubles correlating monotonic (w.r.t. > monotonic_raw) clocks if used in user space. > > In Android we have the option of monotonic and realtime only right now for > the user space logger (which integrates logd, klogd and auditd, the later > two come from the kernel). I have bugs open to consider boottime, but it is > blocked on boottime availability from kernel space logging (this change). I > have another bug to consider switching the logger to monotonic_raw instead > of monotonic, to make it correlate better with existing kernel logs. But > alas a lot of resistance for phones switching to monotonic(_raw), the only > devices that chose monotonic(_raw) is everything else (google glass, > watches, ...). As such, phones, and the associated developers, will > continue to want realtime correlated in the kernel logs (yes, this change > too). > > realtime sucks for the first 10 seconds on Android, since phones generally > do not get their time correction until then from network resources, and > many of their rtc clocks are not adjustable, they store a correction factor > that does not get picked up from user space until userdata is mounted > (about 20 seconds in). But only kernel developers care about this first > part of boot, everything after that (and associated correlated kernel > interactions) are for user space folks. Thanks a lot for this detailed feedback. Just to be sure that I understand it correctly. You suggest to store three timestamps: local_clock(), boot and real clock. It makes sense to me. I just wonder if there might be any use case when the adjusted mono clock is needed or preferred over local_clock(). Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] printk: Store all three timestamps 2020-09-23 13:56 ` [PATCH 1/2] printk: Store all three timestamps Petr Mladek 2020-09-23 22:12 ` John Ogness @ 2020-09-24 0:00 ` John Ogness 2020-09-24 10:37 ` Petr Mladek 1 sibling, 1 reply; 21+ messages in thread From: John Ogness @ 2020-09-24 0:00 UTC (permalink / raw) To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt Cc: Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel, Petr Mladek On 2020-09-23, Petr Mladek <pmladek@suse.com> wrote: > diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h > index 0adaa685d1ca..09082c8472d3 100644 > --- a/kernel/printk/printk_ringbuffer.h > +++ b/kernel/printk/printk_ringbuffer.h > @@ -14,7 +15,7 @@ > */ > struct printk_info { > u64 seq; /* sequence number */ > - u64 ts_nsec; /* timestamp in nanoseconds */ > + struct ktime_timestamps ts; /* timestamps */ Until now struct printk_info has contained generic types. If we add struct ktime_timestamps, we may start storing more than we need. For example, if more (possibly internal) fields are added to struct ktime_timestamps that printk doesn't care about. We may prefer to generically and explicitly store the information we care about: u64 ts_mono; u64 ts_boot; u64 ts_real; Or create our own struct printk_ts to copy the fields of interest to. John Ogness ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] printk: Store all three timestamps 2020-09-24 0:00 ` John Ogness @ 2020-09-24 10:37 ` Petr Mladek 0 siblings, 0 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-24 10:37 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Thu 2020-09-24 02:06:12, John Ogness wrote: > On 2020-09-23, Petr Mladek <pmladek@suse.com> wrote: > > diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h > > index 0adaa685d1ca..09082c8472d3 100644 > > --- a/kernel/printk/printk_ringbuffer.h > > +++ b/kernel/printk/printk_ringbuffer.h > > @@ -14,7 +15,7 @@ > > */ > > struct printk_info { > > u64 seq; /* sequence number */ > > - u64 ts_nsec; /* timestamp in nanoseconds */ > > + struct ktime_timestamps ts; /* timestamps */ > > Until now struct printk_info has contained generic types. If we add > struct ktime_timestamps, we may start storing more than we need. For > example, if more (possibly internal) fields are added to struct > ktime_timestamps that printk doesn't care about. We may prefer to > generically and explicitly store the information we care about: > > u64 ts_mono; > u64 ts_boot; > u64 ts_real; > > Or create our own struct printk_ts to copy the fields of interest to. I would like to have a structure if we have more timestamps. Honestly, printk-specific structure sounds like an overhead to me. How big is the chance that struct ktime_timestamps ts would get modified? It has been created for printk after all. That said, I could live with printk-specific structure. We might even need it if we need to store also local_clock(). Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
* [RFC 2/2] printk: Add more information about the printk caller 2020-09-23 13:56 [RFC 0/2] printk: Add more metadata for each record Petr Mladek 2020-09-23 13:56 ` [PATCH 1/2] printk: Store all three timestamps Petr Mladek @ 2020-09-23 13:56 ` Petr Mladek 2020-09-24 1:40 ` Sergey Senozhatsky ` (3 more replies) 1 sibling, 4 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-23 13:56 UTC (permalink / raw) To: Sergey Senozhatsky, Steven Rostedt, John Ogness Cc: Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel, Petr Mladek The information about the printk caller has been added by the commit 15ff2069cb7f967da ("printk: Add caller information to printk() output."). The main motivation was to reconstruct original messages when they longer output from different CPUs got mixed. But there are more usecases. The number of CPUs and tasks running on a single system is growing. It brings bigger challenges when debugging problems. The context in which kernel reports its state is sometimes pretty useful. In particular, people suggest to remember the task id, cpu id, more details about the interrupt context, and even the task name [0][1]. Prepare for extending the caller information by replacing caller_id with struct printk_caller. And always store task id, cpu id, and exact interrupt context. The output to console and /dev/kmsg stays the same for now. I will get improved by followup patches. [0] https://lore.kernel.org/r/CAHk-=wibPWj9m1jQWeJUq+iPMG9tHaLANY-UaruakHuezFHr4w@mail.gmail.com [1] https://lore.kernel.org/r/20200904082438.20707-1-changki.kim@samsung.com Signed-off-by: Petr Mladek <pmladek@suse.com> --- kernel/printk/printk.c | 107 +++++++++++++++++++----------- kernel/printk/printk_ringbuffer.c | 43 ++++++------ kernel/printk/printk_ringbuffer.h | 39 ++++++++++- 3 files changed, 131 insertions(+), 58 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 0ed8901916f4..2f904bb744a6 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -333,7 +333,8 @@ static int console_msg_format = MSG_FORMAT_DEFAULT; * record.info.facility = 0 (LOG_KERN) * record.info.flags = 0 * record.info.level = 3 (LOG_ERR) - * record.info.caller_id = 299 (task 299) + * record.info.caller.pid = 299 (task 299) + * record.info.cpu_ctx = 0x80000003 (in hardirq context on cpu 3) * record.info.dev_info.subsystem = "pci" (terminated) * record.info.dev_info.device = "+pci:0000:00:01.0" (terminated) * @@ -494,8 +495,9 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len) } /* insert record into the buffer, discard old ones, update heads */ -static int log_store(u32 caller_id, int facility, int level, - enum log_flags flags, struct ktime_timestamps *ts, +static int log_store(const struct printk_caller *caller, + int facility, int level, enum log_flags flags, + struct ktime_timestamps *ts, const struct dev_printk_info *dev_info, const char *text, u16 text_len) { @@ -526,7 +528,7 @@ static int log_store(u32 caller_id, int facility, int level, r.info->ts = *ts; else ktime_get_fast_timestamps(&r.info->ts); - r.info->caller_id = caller_id; + r.info->caller = *caller; if (dev_info) memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info)); @@ -588,26 +590,40 @@ static void append_char(char **pp, char *e, char c) *(*pp)++ = c; } +static size_t info_print_caller_id(char *buf, size_t size, + const struct printk_caller *caller) +{ + enum printk_caller_ctx ctx; + + ctx = printk_to_caller_ctx(caller->cpu_ctx); + + if (ctx == printk_ctx_task) + return snprintf(buf, size, "T%u", caller->pid); + + return snprintf(buf, size, "C%u", printk_to_caller_cpu(caller->cpu_ctx)); +} + static ssize_t info_print_ext_header(char *buf, size_t size, struct printk_info *info) { u64 ts_mono_usec = info->ts.mono; - char caller[20]; -#ifdef CONFIG_PRINTK_CALLER - u32 id = info->caller_id; - - snprintf(caller, sizeof(caller), ",caller=%c%u", - id & 0x80000000 ? 'C' : 'T', id & ~0x80000000); -#else - caller[0] = '\0'; -#endif + char caller_id[12]; + size_t len; do_div(ts_mono_usec, 1000); - return scnprintf(buf, size, "%u,%llu,%llu,%c%s;", - (info->facility << 3) | info->level, info->seq, - ts_mono_usec, info->flags & LOG_CONT ? 'c' : '-', - caller); + len = scnprintf(buf, size, "%u,%llu,%llu,%c", + (info->facility << 3) | info->level, info->seq, + ts_mono_usec, info->flags & LOG_CONT ? 'c' : '-'); + + if (IS_ENABLED(CONFIG_PRINTK_CALLER)) { + info_print_caller_id(caller_id, sizeof(caller_id), &info->caller); + len += scnprintf(buf + len, size - len, ",caller=%s", caller_id); + } + + len += scnprintf(buf + len, size - len, ";"); + + return len; } static ssize_t msg_add_ext_text(char *buf, size_t size, @@ -987,7 +1003,7 @@ void log_buf_vmcoreinfo_setup(void) VMCOREINFO_OFFSET(printk_info, seq); VMCOREINFO_OFFSET(printk_info, ts); VMCOREINFO_OFFSET(printk_info, text_len); - VMCOREINFO_OFFSET(printk_info, caller_id); + VMCOREINFO_OFFSET(printk_info, caller); VMCOREINFO_OFFSET(printk_info, dev_info); VMCOREINFO_STRUCT_SIZE(ktime_timestamps); @@ -995,6 +1011,10 @@ void log_buf_vmcoreinfo_setup(void) VMCOREINFO_OFFSET(ktime_timestamps, boot); VMCOREINFO_OFFSET(ktime_timestamps, real); + VMCOREINFO_STRUCT_SIZE(printk_caller); + VMCOREINFO_OFFSET(printk_caller, pid); + VMCOREINFO_OFFSET(printk_caller, cpu_ctx); + VMCOREINFO_STRUCT_SIZE(dev_printk_info); VMCOREINFO_OFFSET(dev_printk_info, subsystem); VMCOREINFO_LENGTH(printk_info_subsystem, sizeof(dev_info->subsystem)); @@ -1103,7 +1123,7 @@ static unsigned int __init add_to_rb(struct printk_ringbuffer *rb, dest_r.info->level = r->info->level; dest_r.info->flags = r->info->flags; dest_r.info->ts = r->info->ts; - dest_r.info->caller_id = r->info->caller_id; + dest_r.info->caller = r->info->caller; memcpy(&dest_r.info->dev_info, &r->info->dev_info, sizeof(dest_r.info->dev_info)); prb_final_commit(&e); @@ -1306,18 +1326,13 @@ static size_t print_time(u64 ts, char *buf) (unsigned long)ts, rem_nsec / 1000); } -#ifdef CONFIG_PRINTK_CALLER -static size_t print_caller(u32 id, char *buf) +static size_t print_caller_id(const struct printk_caller *caller, char *buf) { - char caller[12]; + char caller_id[12]; - snprintf(caller, sizeof(caller), "%c%u", - id & 0x80000000 ? 'C' : 'T', id & ~0x80000000); - return sprintf(buf, "[%6s]", caller); + info_print_caller_id(caller_id, sizeof(caller_id), caller); + return sprintf(buf, "[%6s]", caller_id); } -#else -#define print_caller(id, buf) 0 -#endif static size_t info_print_prefix(const struct printk_info *info, bool syslog, bool time, char *buf) @@ -1330,7 +1345,8 @@ static size_t info_print_prefix(const struct printk_info *info, bool syslog, if (time) len += print_time(info->ts.mono, buf + len); - len += print_caller(info->caller_id, buf + len); + if (IS_ENABLED(CONFIG_PRINTK_CALLER)) + len += print_caller_id(&info->caller, buf + len); if (IS_ENABLED(CONFIG_PRINTK_CALLER) || time) { buf[len++] = ' '; @@ -1902,24 +1918,42 @@ static inline void printk_delay(void) } } -static inline u32 printk_caller_id(void) +static enum printk_caller_ctx get_printk_caller_ctx(void) +{ + if (in_nmi()) + return printk_ctx_nmi; + + if (in_irq()) + return printk_ctx_hardirq; + + if (in_softirq()) + return printk_ctx_softirq; + + return printk_ctx_task; +} + +static void set_current_printk_caller(struct printk_caller *caller) { - return in_task() ? task_pid_nr(current) : - 0x80000000 + raw_smp_processor_id(); + caller->pid = task_pid_nr(current); + + caller->cpu_ctx = printk_to_caller_cpu_ctx(raw_smp_processor_id(), + get_printk_caller_ctx()); } static size_t log_output(int facility, int level, enum log_flags lflags, const struct dev_printk_info *dev_info, char *text, size_t text_len) { - const u32 caller_id = printk_caller_id(); + struct printk_caller caller; + + set_current_printk_caller(&caller); if (lflags & LOG_CONT) { struct prb_reserved_entry e; struct printk_record r; prb_rec_init_wr(&r, text_len); - if (prb_reserve_in_last(&e, prb, &r, caller_id)) { + if (prb_reserve_in_last(&e, prb, &r, &caller)) { memcpy(&r.text_buf[r.info->text_len], text, text_len); r.info->text_len += text_len; if (lflags & LOG_NEWLINE) { @@ -1933,7 +1967,7 @@ static size_t log_output(int facility, int level, enum log_flags lflags, } /* Store it in the record log */ - return log_store(caller_id, facility, level, lflags, 0, + return log_store(&caller, facility, level, lflags, 0, dev_info, text, text_len); } @@ -3095,8 +3129,7 @@ EXPORT_SYMBOL(__printk_ratelimit); * printk_timed_ratelimit - caller-controlled printk ratelimiting * @caller_jiffies: pointer to caller's state * @interval_msecs: minimum interval between prints - * - * printk_timed_ratelimit() returns true if more than @interval_msecs + *_timed_ratelimit() returns true if more than @interval_msecs * milliseconds have elapsed since the last time printk_timed_ratelimit() * returned true. */ diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c index 13b94b92342e..86d98e692d75 100644 --- a/kernel/printk/printk_ringbuffer.c +++ b/kernel/printk/printk_ringbuffer.c @@ -175,7 +175,7 @@ * * r.info->text_len = strlen(textstr); * r.info->ts_nsec = local_clock(); - * r.info->caller_id = printk_caller_id(); + * set_current_printk_caller(&r.info->caller); * * // commit and finalize the record * prb_final_commit(&e); @@ -191,7 +191,7 @@ * * r.info->text_len = strlen(textstr); * r.info->ts_nsec = local_clock(); - * r.info->caller_id = printk_caller_id(); + * set_current_printk_caller(&r.info->caller); * * // commit the record (but do not finalize yet) * prb_commit(&e); @@ -202,7 +202,7 @@ * // specify additional 5 bytes text space to extend * prb_rec_init_wr(&r, 5); * - * if (prb_reserve_in_last(&e, &test_rb, &r, printk_caller_id())) { + * if (prb_reserve_in_last(&e, &test_rb, &r, &caller)) { * snprintf(&r.text_buf[r.info->text_len], * r.text_buf_size - r.info->text_len, "hello"); * @@ -424,13 +424,13 @@ static enum desc_state get_desc_state(unsigned long id, * descriptor is in an inconsistent state (miss or reserved), the caller can * only expect the descriptor's @state_var field to be valid. * - * The sequence number and caller_id can be optionally retrieved. Like all - * non-state_var data, they are only valid if the descriptor is in a - * consistent state. + * The sequence number and caller information can be optionally retrieved. + * Like all non-state_var data, they are only valid if the descriptor is + * in a consistent state. */ static enum desc_state desc_read(struct prb_desc_ring *desc_ring, unsigned long id, struct prb_desc *desc_out, - u64 *seq_out, u32 *caller_id_out) + u64 *seq_out, struct printk_caller *caller_out) { struct printk_info *info = to_info(desc_ring, id); struct prb_desc *desc = to_desc(desc_ring, id); @@ -477,8 +477,8 @@ static enum desc_state desc_read(struct prb_desc_ring *desc_ring, sizeof(desc_out->text_blk_lpos)); /* LMM(desc_read:C) */ if (seq_out) *seq_out = info->seq; /* also part of desc_read:C */ - if (caller_id_out) - *caller_id_out = info->caller_id; /* also part of desc_read:C */ + if (caller_out) + *caller_out = info->caller; /* also part of desc_read:C */ /* * 1. Guarantee the descriptor content is loaded before re-checking @@ -1253,24 +1253,28 @@ static const char *get_data(struct prb_data_ring *data_ring, * if the descriptor is not yet finalized and the provided @caller_id matches. */ static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring, - u32 caller_id, unsigned long *id_out) + const struct printk_caller *caller, + unsigned long *id_out) { unsigned long prev_state_val; + struct printk_caller cal; enum desc_state d_state; struct prb_desc desc; struct prb_desc *d; unsigned long id; - u32 cid; id = atomic_long_read(&desc_ring->head_id); /* * To reduce unnecessarily reopening, first check if the descriptor - * state and caller ID are correct. + * state and caller infromation are correct. */ - d_state = desc_read(desc_ring, id, &desc, NULL, &cid); - if (d_state != desc_committed || cid != caller_id) + d_state = desc_read(desc_ring, id, &desc, NULL, &cal); + if (d_state != desc_committed || + cal.pid != caller->pid || + cal.cpu_ctx != caller->cpu_ctx) { return NULL; + } d = to_desc(desc_ring, id); @@ -1308,7 +1312,7 @@ static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring, * @e: The entry structure to setup. * @rb: The ringbuffer to re-reserve and extend data in. * @r: The record structure to allocate buffers for. - * @caller_id: The caller ID of the caller (reserving writer). + * @caller: The information about the caller (reserving writer). * * This is the public function available to writers to re-reserve and extend * data. @@ -1317,7 +1321,7 @@ static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring, * setting the @text_buf_size field of @r. To ensure proper initialization * of @r, prb_rec_init_wr() should be used. * - * This function will fail if @caller_id does not match the caller ID of the + * This function will fail if @caller does not match the caller of the * newest record. In that case the caller must reserve new data using * prb_reserve(). * @@ -1343,7 +1347,7 @@ static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring, * @r->info->text_len after concatenating. */ bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer *rb, - struct printk_record *r, u32 caller_id) + struct printk_record *r, const struct printk_caller *caller) { struct prb_desc_ring *desc_ring = &rb->desc_ring; struct printk_info *info; @@ -1354,7 +1358,7 @@ bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer local_irq_save(e->irqflags); /* Transition the newest descriptor back to the reserved state. */ - d = desc_reopen_last(desc_ring, caller_id, &id); + d = desc_reopen_last(desc_ring, caller, &id); if (!d) { local_irq_restore(e->irqflags); goto fail_reopen; @@ -1376,7 +1380,8 @@ bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer * exclusive access at that point. The descriptor may have * changed since then. */ - if (caller_id != info->caller_id) + if (caller->pid != info->caller.pid || + caller->cpu_ctx != info->caller.cpu_ctx) goto fail; if (BLK_DATALESS(&d->text_blk_lpos)) { diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h index 09082c8472d3..93117bd82afb 100644 --- a/kernel/printk/printk_ringbuffer.h +++ b/kernel/printk/printk_ringbuffer.h @@ -7,6 +7,12 @@ #include <linux/dev_printk.h> #include <linux/timekeeping.h> +/* Information about the process and context that adds the message */ +struct printk_caller { + pid_t pid; /* thread id */ + u32 cpu_ctx; /* processor id and interrupt context */ +}; + /* * Meta information about each stored message. * @@ -20,8 +26,8 @@ struct printk_info { u8 facility; /* syslog facility */ u8 flags:5; /* internal record flags */ u8 level:3; /* syslog level */ - u32 caller_id; /* thread id or processor id */ + struct printk_caller caller; /* information about the caller */ struct dev_printk_info dev_info; }; @@ -118,6 +124,35 @@ enum desc_state { desc_reusable = 0x3, /* free, not yet used by any writer */ }; +/* Handling the shared cpu id and interupt context caller value */ +#define PRINTK_CALLER_CTX_MASK 0xc0000000 +#define PRINTK_CALLER_CPU_MASK 0x3fffffff + +#define PRINTK_CALLER_CTX_SHIFT (32 - 2) + +enum printk_caller_ctx { + printk_ctx_task = 0x0, + printk_ctx_softirq = 0x1, + printk_ctx_hardirq = 0x2, + printk_ctx_nmi = 0x3, +}; + +/* Get context id from the common cpu and interrupt context id value. */ +#define printk_to_caller_ctx(cpu_ctx) \ + (enum printk_caller_ctx) \ + (((cpu_ctx) & PRINTK_CALLER_CTX_MASK) >> PRINTK_CALLER_CTX_SHIFT) + +/* Get cpu id from the common cpu and interrupt context id value. */ +#define printk_to_caller_cpu(cpu_ctx) \ + (u32)((cpu_ctx) & PRINTK_CALLER_CPU_MASK) + +/* Merge cpu and interrupt context id into a single value */ +#define printk_to_caller_cpu_ctx(cpu, ctx) \ + (u32) \ + (((cpu) & PRINTK_CALLER_CPU_MASK) | \ + (((ctx) << PRINTK_CALLER_CTX_SHIFT) & PRINTK_CALLER_CTX_MASK)) + +/* Ringbuffer handling */ #define _DATA_SIZE(sz_bits) (1UL << (sz_bits)) #define _DESCS_COUNT(ct_bits) (1U << (ct_bits)) #define DESC_SV_BITS (sizeof(unsigned long) * 8) @@ -304,7 +339,7 @@ static inline void prb_rec_init_wr(struct printk_record *r, bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb, struct printk_record *r); bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer *rb, - struct printk_record *r, u32 caller_id); + struct printk_record *r, const struct printk_caller *caller); void prb_commit(struct prb_reserved_entry *e); void prb_final_commit(struct prb_reserved_entry *e); -- 2.26.2 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-23 13:56 ` [RFC 2/2] printk: Add more information about the printk caller Petr Mladek @ 2020-09-24 1:40 ` Sergey Senozhatsky 2020-09-24 11:58 ` Petr Mladek 2020-09-24 2:17 ` Sergey Senozhatsky ` (2 subsequent siblings) 3 siblings, 1 reply; 21+ messages in thread From: Sergey Senozhatsky @ 2020-09-24 1:40 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On (20/09/23 15:56), Petr Mladek wrote: > The information about the printk caller has been added by the commit > 15ff2069cb7f967da ("printk: Add caller information to printk() output."). > The main motivation was to reconstruct original messages when they > longer output from different CPUs got mixed. > > But there are more usecases. The number of CPUs and tasks running on > a single system is growing. It brings bigger challenges when debugging > problems. The context in which kernel reports its state is sometimes > pretty useful. > > In particular, people suggest to remember the task id, cpu id, more details > about the interrupt context, and even the task name [0][1]. > > Prepare for extending the caller information by replacing caller_id > with struct printk_caller. And always store task id, cpu id, and > exact interrupt context. Wild idea: Currently, we have two sides to the problem, I think. There are tasks that store messages, but then there are tasks that print those messages on the consoles. And those tasks and contexts are not always the same. What I found helpful in the past was not only the caller_id, but the emitter_id (especially preemption count and irqs state of the context that prints messages on the slow consoles). -ss ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-24 1:40 ` Sergey Senozhatsky @ 2020-09-24 11:58 ` Petr Mladek 0 siblings, 0 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-24 11:58 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Thu 2020-09-24 10:40:10, Sergey Senozhatsky wrote: > On (20/09/23 15:56), Petr Mladek wrote: > > The information about the printk caller has been added by the commit > > 15ff2069cb7f967da ("printk: Add caller information to printk() output."). > > The main motivation was to reconstruct original messages when they > > longer output from different CPUs got mixed. > > > > But there are more usecases. The number of CPUs and tasks running on > > a single system is growing. It brings bigger challenges when debugging > > problems. The context in which kernel reports its state is sometimes > > pretty useful. > > > > In particular, people suggest to remember the task id, cpu id, more details > > about the interrupt context, and even the task name [0][1]. > > > > Prepare for extending the caller information by replacing caller_id > > with struct printk_caller. And always store task id, cpu id, and > > exact interrupt context. > > Wild idea: > > Currently, we have two sides to the problem, I think. There are tasks > that store messages, but then there are tasks that print those messages > on the consoles. And those tasks and contexts are not always the same. > What I found helpful in the past was not only the caller_id, but the > emitter_id (especially preemption count and irqs state of the context > that prints messages on the slow consoles). Yeah, it might be useful. But it should get discussed in a separate thread. The information about the emitter do not need to get stored into the metadata. It is mostly need only for debugging printk-related problems. It should become less important if we succeed with introducing the printk kthreads. Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-23 13:56 ` [RFC 2/2] printk: Add more information about the printk caller Petr Mladek 2020-09-24 1:40 ` Sergey Senozhatsky @ 2020-09-24 2:17 ` Sergey Senozhatsky 2020-09-24 8:23 ` John Ogness 2020-09-24 13:06 ` Petr Mladek 2020-09-24 4:24 ` Ahmed S. Darwish 2020-10-21 11:48 ` 김창기 3 siblings, 2 replies; 21+ messages in thread From: Sergey Senozhatsky @ 2020-09-24 2:17 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On (20/09/23 15:56), Petr Mladek wrote: [..] > /* > * To reduce unnecessarily reopening, first check if the descriptor > - * state and caller ID are correct. > + * state and caller infromation are correct. > */ > - d_state = desc_read(desc_ring, id, &desc, NULL, &cid); > - if (d_state != desc_committed || cid != caller_id) > + d_state = desc_read(desc_ring, id, &desc, NULL, &cal); > + if (d_state != desc_committed || > + cal.pid != caller->pid || > + cal.cpu_ctx != caller->cpu_ctx) { You probably might want to factor out ctx check into a static inline helper. Since you use this check in several places, and we may check more context fields in the future. [..] > +/* Information about the process and context that adds the message */ > +struct printk_caller { > + pid_t pid; /* thread id */ > + u32 cpu_ctx; /* processor id and interrupt context */ > +}; A question. Suppose we have a task which does CPU0 pr_err(...); preempt_disable(); pr_err(...); preempt_enable(); pr_err(...); rcu_read_lock(); pr_info(...); rcu_read_unlock(); Should we distinguish those as 3 different contexts? - normal printk - printk under disabled preemption (affects scheduling) - printk under RCU read side lock (affects RCU grace periods) -ss ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-24 2:17 ` Sergey Senozhatsky @ 2020-09-24 8:23 ` John Ogness 2020-09-24 13:26 ` Petr Mladek 2020-09-24 13:06 ` Petr Mladek 1 sibling, 1 reply; 21+ messages in thread From: John Ogness @ 2020-09-24 8:23 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On 2020-09-24, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > A question. Suppose we have a task which does > > CPU0 > > pr_err(...); > > preempt_disable(); > pr_err(...); > preempt_enable(); > > pr_err(...); > > rcu_read_lock(); > pr_info(...); > rcu_read_unlock(); > > Should we distinguish those as 3 different contexts? > > - normal printk > - printk under disabled preemption (affects scheduling) > - printk under RCU read side lock (affects RCU grace periods) The internal purpose of the printk_info's @caller_id is to support _some_ sanity with LOG_CONT. True LOG_CONT sanity will not be available until we provide a context API like we previously talked about [0]. But for now it is good enough (even if it is not good). We can also store various flags and counters to describe the current context. But these would only exist to provide the user with information rather than because printk itself needs it. I only mention this so that we don't get things mixed up. We can add as much information as want, but LOG_CONT will only be interested in a subset of that. John Ogness ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-24 8:23 ` John Ogness @ 2020-09-24 13:26 ` Petr Mladek 0 siblings, 0 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-24 13:26 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Thu 2020-09-24 10:29:31, John Ogness wrote: > On 2020-09-24, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > > A question. Suppose we have a task which does > > > > CPU0 > > > > pr_err(...); > > > > preempt_disable(); > > pr_err(...); > > preempt_enable(); > > > > pr_err(...); > > > > rcu_read_lock(); > > pr_info(...); > > rcu_read_unlock(); > > > > Should we distinguish those as 3 different contexts? > > > > - normal printk > > - printk under disabled preemption (affects scheduling) > > - printk under RCU read side lock (affects RCU grace periods) > > The internal purpose of the printk_info's @caller_id is to support > _some_ sanity with LOG_CONT. True LOG_CONT sanity will not be available > until we provide a context API like we previously talked about [0]. But > for now it is good enough (even if it is not good). > > We can also store various flags and counters to describe the current > context. But these would only exist to provide the user with information > rather than because printk itself needs it. I only mention this so that > we don't get things mixed up. We can add as much information as want, > but LOG_CONT will only be interested in a subset of that. Yeah, we need to keep this in mind, so that some "tiny" changes in the context would not prevent concatenating related pieces of the message. Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-24 2:17 ` Sergey Senozhatsky 2020-09-24 8:23 ` John Ogness @ 2020-09-24 13:06 ` Petr Mladek 1 sibling, 0 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-24 13:06 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Thu 2020-09-24 11:17:56, Sergey Senozhatsky wrote: > On (20/09/23 15:56), Petr Mladek wrote: > [..] > > /* > > * To reduce unnecessarily reopening, first check if the descriptor > > - * state and caller ID are correct. > > + * state and caller infromation are correct. > > */ > > - d_state = desc_read(desc_ring, id, &desc, NULL, &cid); > > - if (d_state != desc_committed || cid != caller_id) > > + d_state = desc_read(desc_ring, id, &desc, NULL, &cal); > > + if (d_state != desc_committed || > > + cal.pid != caller->pid || > > + cal.cpu_ctx != caller->cpu_ctx) { > > You probably might want to factor out ctx check into a static > inline helper. Since you use this check in several places, and > we may check more context fields in the future. Makes sense. > [..] > > +/* Information about the process and context that adds the message */ > > +struct printk_caller { > > + pid_t pid; /* thread id */ > > + u32 cpu_ctx; /* processor id and interrupt context */ > > +}; > > A question. Suppose we have a task which does > > CPU0 > > pr_err(...); > > preempt_disable(); > pr_err(...); > preempt_enable(); > > pr_err(...); > > rcu_read_lock(); > pr_info(...); > rcu_read_unlock(); > > Should we distinguish those as 3 different contexts? > > - normal printk > - printk under disabled preemption (affects scheduling) > - printk under RCU read side lock (affects RCU grace periods) Good question. Well, these contexts could not get detected in PREEMPT_NONE or PREEMPT_VOLUNTARY. Also I wonder where it would stop. There are several RCU flavors. I would not distinguish them unless there is a real demand. Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-23 13:56 ` [RFC 2/2] printk: Add more information about the printk caller Petr Mladek 2020-09-24 1:40 ` Sergey Senozhatsky 2020-09-24 2:17 ` Sergey Senozhatsky @ 2020-09-24 4:24 ` Ahmed S. Darwish 2020-09-24 12:53 ` Petr Mladek 2020-10-21 11:48 ` 김창기 3 siblings, 1 reply; 21+ messages in thread From: Ahmed S. Darwish @ 2020-09-24 4:24 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Wed, Sep 23, 2020 at 03:56:17PM +0200, Petr Mladek wrote: ... > > -static inline u32 printk_caller_id(void) > +static enum printk_caller_ctx get_printk_caller_ctx(void) > +{ > + if (in_nmi()) > + return printk_ctx_nmi; > + > + if (in_irq()) > + return printk_ctx_hardirq; > + > + if (in_softirq()) > + return printk_ctx_softirq; > + > + return printk_ctx_task; > +} > + in_softirq() here will be true for both softirq contexts *and* BH-disabled regions. Did you mean in_serving_softirq() instead? Thanks, -- Ahmed S. Darwish Linutronix GmbH ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-24 4:24 ` Ahmed S. Darwish @ 2020-09-24 12:53 ` Petr Mladek 2020-09-24 13:38 ` Petr Mladek 0 siblings, 1 reply; 21+ messages in thread From: Petr Mladek @ 2020-09-24 12:53 UTC (permalink / raw) To: Ahmed S. Darwish Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Thu 2020-09-24 06:24:14, Ahmed S. Darwish wrote: > On Wed, Sep 23, 2020 at 03:56:17PM +0200, Petr Mladek wrote: > ... > > > > -static inline u32 printk_caller_id(void) > > +static enum printk_caller_ctx get_printk_caller_ctx(void) > > +{ > > + if (in_nmi()) > > + return printk_ctx_nmi; > > + > > + if (in_irq()) > > + return printk_ctx_hardirq; > > + > > + if (in_softirq()) > > + return printk_ctx_softirq; > > + > > + return printk_ctx_task; > > +} > > + > > in_softirq() here will be true for both softirq contexts *and* > BH-disabled regions. Did you mean in_serving_softirq() instead? Good question! I am not sure if people would want to distinguish these two situations. Otherwise, I think that is_softirq() more close to the meaning of in_irq(). They both describe a context where a new interrupt has to wait until the handling gets enabled again. Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-24 12:53 ` Petr Mladek @ 2020-09-24 13:38 ` Petr Mladek 2020-09-25 0:54 ` Sergey Senozhatsky 0 siblings, 1 reply; 21+ messages in thread From: Petr Mladek @ 2020-09-24 13:38 UTC (permalink / raw) To: Ahmed S. Darwish Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Thu 2020-09-24 14:53:01, Petr Mladek wrote: > On Thu 2020-09-24 06:24:14, Ahmed S. Darwish wrote: > > On Wed, Sep 23, 2020 at 03:56:17PM +0200, Petr Mladek wrote: > > ... > > > > > > -static inline u32 printk_caller_id(void) > > > +static enum printk_caller_ctx get_printk_caller_ctx(void) > > > +{ > > > + if (in_nmi()) > > > + return printk_ctx_nmi; > > > + > > > + if (in_irq()) > > > + return printk_ctx_hardirq; > > > + > > > + if (in_softirq()) > > > + return printk_ctx_softirq; > > > + > > > + return printk_ctx_task; > > > +} > > > + > > > > in_softirq() here will be true for both softirq contexts *and* > > BH-disabled regions. Did you mean in_serving_softirq() instead? > > Good question! > > I am not sure if people would want to distinguish these two > situations. > > Otherwise, I think that is_softirq() more close to the meaning of > in_irq(). They both describe a context where a new interrupt has > to wait until the handling gets enabled again. Grrrr, I wonder why I thought that in_irq() covered also the situation when IRQ was disabled. It was likely my wish because disabled interrupts are problem for printk() because the console might cause a softlockup. in_irq() actually behaves like in_serving_softirq(). I am confused and puzzled now. I wonder what contexts are actually interesting for developers. It goes back to the ideas from Sergey about preemption disabled, ... /me feels shameful and is going to hide under a stone. Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-24 13:38 ` Petr Mladek @ 2020-09-25 0:54 ` Sergey Senozhatsky 2020-09-25 10:20 ` Petr Mladek 0 siblings, 1 reply; 21+ messages in thread From: Sergey Senozhatsky @ 2020-09-25 0:54 UTC (permalink / raw) To: Petr Mladek Cc: Ahmed S. Darwish, Sergey Senozhatsky, Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On (20/09/24 15:38), Petr Mladek wrote: [..] > > Grrrr, I wonder why I thought that in_irq() covered also the situation > when IRQ was disabled. It was likely my wish because disabled > interrupts are problem for printk() because the console might > cause a softlockup. preempt_disable() can also trigger softlockup. > in_irq() actually behaves like in_serving_softirq(). > > I am confused and puzzled now. I wonder what contexts are actually > interesting for developers. It goes back to the ideas from Sergey > about preemption disabled, ... Are we talking about context tracking for LOG_CONT or context on the serial console and /dev/kmsg? If the latter, then my 5 cents, is that something like preemptible(), which checks (preempt_count() == 0 && !irqs_disabled()) does not look completely unreasonable. We had a rather OK context tracking in printk() before, but for a completely different purpose: console_may_schedule = !oops_in_progress && preemptible() && !rcu_preempt_depth(); We know that printk() can cause RCU stalls [0]. Tracking this part of the context state is sort of meaningful. Let's look at this from this POV - why do we add in_irq()/etc tracking info? Perhaps because we want to connect the dots between printk() caller state and watchdog reports. Do we cover all watchdogs? No, I don't think so. RCU stalls, local_irq_disable(), preempt_disable() are not covered. Do we have any technical reasons not to add those missing bits? [0] https://lkml.org/lkml/2018/1/9/485 -ss ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC 2/2] printk: Add more information about the printk caller 2020-09-25 0:54 ` Sergey Senozhatsky @ 2020-09-25 10:20 ` Petr Mladek 0 siblings, 0 replies; 21+ messages in thread From: Petr Mladek @ 2020-09-25 10:20 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Ahmed S. Darwish, Steven Rostedt, John Ogness, Linus Torvalds, Thomas Gleixner, Prarit Bhargava, Mark Salyzyn, Chunyan Zhang, Orson Zhai, Changki Kim, Sergey Senozhatsky, linux-kernel On Fri 2020-09-25 09:54:00, Sergey Senozhatsky wrote: > On (20/09/24 15:38), Petr Mladek wrote: > [..] > > > > Grrrr, I wonder why I thought that in_irq() covered also the situation > > when IRQ was disabled. It was likely my wish because disabled > > interrupts are problem for printk() because the console might > > cause a softlockup. > > preempt_disable() can also trigger softlockup. > > > in_irq() actually behaves like in_serving_softirq(). > > > > I am confused and puzzled now. I wonder what contexts are actually > > interesting for developers. It goes back to the ideas from Sergey > > about preemption disabled, ... > > Are we talking about context tracking for LOG_CONT or context on > the serial console and /dev/kmsg? OK, it is clear that LOG_CONT need to know when a different code is called suddenly. I mean task code vs. an interrupt handler. But it was actually also the original purpose of the caller_id. AFAIK, people wanted to sort related messages when they were mixed with ones from other CPUs. > If the latter, then my 5 cents, is that something like preemptible(), > which checks > > (preempt_count() == 0 && !irqs_disabled()) > > does not look completely unreasonable. > > We had a rather OK context tracking in printk() before, but for a > completely different purpose: > > console_may_schedule = !oops_in_progress && > preemptible() && > !rcu_preempt_depth(); > > We know that printk() can cause RCU stalls [0]. Tracking this part > of the context state is sort of meaningful. > > Let's look at this from this POV - why do we add in_irq()/etc tracking > info? Perhaps because we want to connect the dots between printk() caller > state and watchdog reports. Do we cover all watchdogs? No, I don't think > so. RCU stalls, local_irq_disable(), preempt_disable() are not covered. I agree that it would be handy to see this context as well. It might make it easier when hunting down various lockups and stall. But I have some concerns. First, the information is not always reliable (PREEMPT_NONE). I wonder if it might cause more harm than good. People might get confused or they might want to fix it by some crazy printk code. Second, the information might not be detailed enough. Many lockups depends on the fact that a particular lock is held. Backtraces are likely more important. Or people would need to distinguish many contexts. It would require another complex code. I am not sure that this is woth it. After all, it might be enough to distinguish the 4 basic contexts just to allow sorting mixed messages. Best Regards, Petr ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [RFC 2/2] printk: Add more information about the printk caller 2020-09-23 13:56 ` [RFC 2/2] printk: Add more information about the printk caller Petr Mladek ` (2 preceding siblings ...) 2020-09-24 4:24 ` Ahmed S. Darwish @ 2020-10-21 11:48 ` 김창기 3 siblings, 0 replies; 21+ messages in thread From: 김창기 @ 2020-10-21 11:48 UTC (permalink / raw) To: 'Petr Mladek', 'Sergey Senozhatsky', 'Steven Rostedt', 'John Ogness' Cc: 'Linus Torvalds', 'Thomas Gleixner', 'Prarit Bhargava', 'Mark Salyzyn', 'Chunyan Zhang', 'Orson Zhai', 'Sergey Senozhatsky', linux-kernel > +static size_t info_print_caller_id(char *buf, size_t size, > + const struct printk_caller *caller) { > + enum printk_caller_ctx ctx; > + > + ctx = printk_to_caller_ctx(caller->cpu_ctx); > + > + if (ctx == printk_ctx_task) > + return snprintf(buf, size, "T%u", caller->pid); > + > + return snprintf(buf, size, "C%u", > +printk_to_caller_cpu(caller->cpu_ctx)); > +} > + When I apply and test this patch, there is no change of print format compared with previous PRINTK_CALLER. The patch that I made is always to print the CPU ID and process name together. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-10-21 11:48 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-23 13:56 [RFC 0/2] printk: Add more metadata for each record Petr Mladek 2020-09-23 13:56 ` [PATCH 1/2] printk: Store all three timestamps Petr Mladek 2020-09-23 22:12 ` John Ogness 2020-09-24 1:45 ` Sergey Senozhatsky 2020-09-24 11:49 ` Petr Mladek [not found] ` <CAJ-C09hqwOJhSXx1h40q96xhNZFXxP6dUVfjUQZpO4ZhOMZLbQ@mail.gmail.com> 2020-09-25 9:51 ` Petr Mladek 2020-09-24 0:00 ` John Ogness 2020-09-24 10:37 ` Petr Mladek 2020-09-23 13:56 ` [RFC 2/2] printk: Add more information about the printk caller Petr Mladek 2020-09-24 1:40 ` Sergey Senozhatsky 2020-09-24 11:58 ` Petr Mladek 2020-09-24 2:17 ` Sergey Senozhatsky 2020-09-24 8:23 ` John Ogness 2020-09-24 13:26 ` Petr Mladek 2020-09-24 13:06 ` Petr Mladek 2020-09-24 4:24 ` Ahmed S. Darwish 2020-09-24 12:53 ` Petr Mladek 2020-09-24 13:38 ` Petr Mladek 2020-09-25 0:54 ` Sergey Senozhatsky 2020-09-25 10:20 ` Petr Mladek 2020-10-21 11:48 ` 김창기
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).