* [PATCH next v2 0/3] printk: remove logbuf_lock @ 2020-12-01 20:53 John Ogness 2020-12-01 20:53 ` [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() John Ogness ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: John Ogness @ 2020-12-01 20:53 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel Hello, Here is a v2 of the next series for the printk-rework. It removes the @logbuf_lock. v1 is here [0]. Since this lock protected more than just the ringbuffer, a new finer-grained @syslog_lock is introduced. With this series, vprintk_store() becomes lockless. A consequence of this locklessness is that a buffer for temporary sprint space is no longer available. To get around this, vsnprintf() is used to determine the full length to reserve in the ringbuffer and then the message is directly sprint'ed to the reserved space. Since the sprint'ed message may include syslog prefix and newline (both of which get stripped), there will be some bytes of wasted space per message. This space could be reclaimed by introducing a "trimming function" to the ringbuffer, which could reduce the size of a reserved data block if no newer block is reserved. However, such a function is not part of this series. This series falls in line with the printk-rework plan as presented [1] at Linux Plumbers in Lisbon 2019. The series is based on next-20201201. John Ogness [0] https://lkml.kernel.org/r/20200922153816.5883-1-john.ogness@linutronix.de [1] https://linuxplumbersconf.org/event/4/contributions/290/attachments/276/463/lpc2019_jogness_printk.pdf (slide 23) John Ogness (3): printk: inline log_output(),log_store() in vprintk_store() printk: change @clear_seq to atomic64_t printk: remove logbuf_lock, add syslog_lock include/linux/printk.h | 1 + kernel/printk/internal.h | 4 +- kernel/printk/printk.c | 411 ++++++++++++++++++------------------ kernel/printk/printk_safe.c | 18 +- 4 files changed, 212 insertions(+), 222 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() 2020-12-01 20:53 [PATCH next v2 0/3] printk: remove logbuf_lock John Ogness @ 2020-12-01 20:53 ` John Ogness 2020-12-03 15:57 ` Petr Mladek 2020-12-01 20:53 ` [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t John Ogness 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness 2 siblings, 1 reply; 47+ messages in thread From: John Ogness @ 2020-12-01 20:53 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel In preparation for removing logbuf_lock, inline log_output() and log_store() into vprintk_store(). This will simplify dealing with the various code branches and fallbacks that are possible. Signed-off-by: John Ogness <john.ogness@linutronix.de> --- kernel/printk/printk.c | 134 +++++++++++++++++------------------------ 1 file changed, 56 insertions(+), 78 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index f279d4fbd9dd..fc5e3a7d6d89 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -491,52 +491,6 @@ static void truncate_msg(u16 *text_len, u16 *trunc_msg_len) *trunc_msg_len = 0; } -/* insert record into the buffer, discard old ones, update heads */ -static int log_store(u32 caller_id, int facility, int level, - enum log_flags flags, u64 ts_nsec, - const struct dev_printk_info *dev_info, - const char *text, u16 text_len) -{ - struct prb_reserved_entry e; - struct printk_record r; - u16 trunc_msg_len = 0; - - prb_rec_init_wr(&r, text_len); - - if (!prb_reserve(&e, prb, &r)) { - /* truncate the message if it is too long for empty buffer */ - truncate_msg(&text_len, &trunc_msg_len); - prb_rec_init_wr(&r, text_len + trunc_msg_len); - /* survive when the log buffer is too small for trunc_msg */ - if (!prb_reserve(&e, prb, &r)) - return 0; - } - - /* fill message */ - memcpy(&r.text_buf[0], text, text_len); - if (trunc_msg_len) - memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len); - r.info->text_len = text_len + trunc_msg_len; - r.info->facility = facility; - r.info->level = level & 7; - r.info->flags = flags & 0x1f; - if (ts_nsec > 0) - r.info->ts_nsec = ts_nsec; - else - r.info->ts_nsec = local_clock(); - r.info->caller_id = caller_id; - if (dev_info) - memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info)); - - /* A message without a trailing newline can be continued. */ - if (!(flags & LOG_NEWLINE)) - prb_commit(&e); - else - prb_final_commit(&e); - - return (text_len + trunc_msg_len); -} - int dmesg_restrict = IS_ENABLED(CONFIG_SECURITY_DMESG_RESTRICT); static int syslog_action_restricted(int type) @@ -1907,44 +1861,22 @@ static inline u32 printk_caller_id(void) 0x80000000 + raw_smp_processor_id(); } -static size_t log_output(int facility, int level, enum log_flags lflags, - const struct dev_printk_info *dev_info, - char *text, size_t text_len) -{ - const u32 caller_id = printk_caller_id(); - - if (lflags & LOG_CONT) { - struct prb_reserved_entry e; - struct printk_record r; - - prb_rec_init_wr(&r, text_len); - if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) { - memcpy(&r.text_buf[r.info->text_len], text, text_len); - r.info->text_len += text_len; - if (lflags & LOG_NEWLINE) { - r.info->flags |= LOG_NEWLINE; - prb_final_commit(&e); - } else { - prb_commit(&e); - } - return text_len; - } - } - - /* Store it in the record log */ - return log_store(caller_id, facility, level, lflags, 0, - dev_info, text, text_len); -} - /* Must be called under logbuf_lock. */ int vprintk_store(int facility, int level, const struct dev_printk_info *dev_info, const char *fmt, va_list args) { + const u32 caller_id = printk_caller_id(); static char textbuf[LOG_LINE_MAX]; - char *text = textbuf; - size_t text_len; + struct prb_reserved_entry e; enum log_flags lflags = 0; + struct printk_record r; + u16 trunc_msg_len = 0; + char *text = textbuf; + u16 text_len; + u64 ts_nsec; + + ts_nsec = local_clock(); /* * The printf needs to come first; we need the syslog @@ -1983,7 +1915,53 @@ int vprintk_store(int facility, int level, if (dev_info) lflags |= LOG_NEWLINE; - return log_output(facility, level, lflags, dev_info, text, text_len); + if (lflags & LOG_CONT) { + prb_rec_init_wr(&r, text_len); + if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) { + memcpy(&r.text_buf[r.info->text_len], text, text_len); + r.info->text_len += text_len; + + if (lflags & LOG_NEWLINE) { + r.info->flags |= LOG_NEWLINE; + prb_final_commit(&e); + } else { + prb_commit(&e); + } + + return text_len; + } + } + + prb_rec_init_wr(&r, text_len); + if (!prb_reserve(&e, prb, &r)) { + /* truncate the message if it is too long for empty buffer */ + truncate_msg(&text_len, &trunc_msg_len); + + prb_rec_init_wr(&r, text_len + trunc_msg_len); + if (!prb_reserve(&e, prb, &r)) + return 0; + } + + /* fill message */ + memcpy(&r.text_buf[0], text, text_len); + if (trunc_msg_len) + memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len); + r.info->text_len = text_len + trunc_msg_len; + r.info->facility = facility; + r.info->level = level & 7; + r.info->flags = lflags & 0x1f; + r.info->ts_nsec = ts_nsec; + r.info->caller_id = caller_id; + if (dev_info) + memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info)); + + /* A message without a trailing newline can be continued. */ + if (!(lflags & LOG_NEWLINE)) + prb_commit(&e); + else + prb_final_commit(&e); + + return (text_len + trunc_msg_len); } asmlinkage int vprintk_emit(int facility, int level, -- 2.20.1 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() 2020-12-01 20:53 ` [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() John Ogness @ 2020-12-03 15:57 ` Petr Mladek 2020-12-03 16:25 ` John Ogness 0 siblings, 1 reply; 47+ messages in thread From: Petr Mladek @ 2020-12-03 15:57 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Tue 2020-12-01 21:59:39, John Ogness wrote: > In preparation for removing logbuf_lock, inline log_output() > and log_store() into vprintk_store(). This will simplify dealing > with the various code branches and fallbacks that are possible. > --- > kernel/printk/printk.c | 134 +++++++++++++++++------------------------ > 1 file changed, 56 insertions(+), 78 deletions(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index f279d4fbd9dd..fc5e3a7d6d89 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1983,7 +1915,53 @@ int vprintk_store(int facility, int level, > if (dev_info) > lflags |= LOG_NEWLINE; > > - return log_output(facility, level, lflags, dev_info, text, text_len); > + if (lflags & LOG_CONT) { > + prb_rec_init_wr(&r, text_len); > + if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) { > + memcpy(&r.text_buf[r.info->text_len], text, text_len); > + r.info->text_len += text_len; > + > + if (lflags & LOG_NEWLINE) { > + r.info->flags |= LOG_NEWLINE; > + prb_final_commit(&e); > + } else { > + prb_commit(&e); > + } > + > + return text_len; > + } > + } > + > + prb_rec_init_wr(&r, text_len); This is called in both branches. I would do it just once at the beginning. > + if (!prb_reserve(&e, prb, &r)) { > + /* truncate the message if it is too long for empty buffer */ > + truncate_msg(&text_len, &trunc_msg_len); > + > + prb_rec_init_wr(&r, text_len + trunc_msg_len); > + if (!prb_reserve(&e, prb, &r)) > + return 0; > + } > + > + /* fill message */ > + memcpy(&r.text_buf[0], text, text_len); > + if (trunc_msg_len) > + memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len); > + r.info->text_len = text_len + trunc_msg_len; > + r.info->facility = facility; > + r.info->level = level & 7; > + r.info->flags = lflags & 0x1f; > + r.info->ts_nsec = ts_nsec; This is the only location where ts_nsec is used. I would remove the variable and call: r.info->ts_nsec = local_clock(); > + r.info->caller_id = caller_id; > + if (dev_info) > + memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info)); > + > + /* A message without a trailing newline can be continued. */ > + if (!(lflags & LOG_NEWLINE)) > + prb_commit(&e); > + else > + prb_final_commit(&e); > + > + return (text_len + trunc_msg_len); > } Both changes are cosmetic and I do not resist on them. Please, do the changes if v3 is needed and you agree with them. Anyway, feel free to use: Reviewed-by: Petr Mladek <pmladek@suse.com> Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() 2020-12-03 15:57 ` Petr Mladek @ 2020-12-03 16:25 ` John Ogness 2020-12-04 6:13 ` Sergey Senozhatsky 2020-12-04 8:26 ` Petr Mladek 0 siblings, 2 replies; 47+ messages in thread From: John Ogness @ 2020-12-03 16:25 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-03, Petr Mladek <pmladek@suse.com> wrote: >> + if (lflags & LOG_CONT) { >> + prb_rec_init_wr(&r, text_len); >> + if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) { >> + memcpy(&r.text_buf[r.info->text_len], text, text_len); >> + r.info->text_len += text_len; >> + >> + if (lflags & LOG_NEWLINE) { >> + r.info->flags |= LOG_NEWLINE; >> + prb_final_commit(&e); >> + } else { >> + prb_commit(&e); >> + } >> + >> + return text_len; >> + } >> + } >> + >> + prb_rec_init_wr(&r, text_len); > > This is called in both branches. I would do it just once at the > beginning. Actually that leads to a crash when prb_reserve_in_last() fails and the fallback code uses a record that prb_reserve_in_last() has already touched. The implementation of prb_reserve_in_last() would have to be changed so that failure guarantees that @r has not been modified. Currently prb_reserve_in_last() can fail after modifying @r. >> + if (!prb_reserve(&e, prb, &r)) { >> + /* truncate the message if it is too long for empty buffer */ >> + truncate_msg(&text_len, &trunc_msg_len); >> + >> + prb_rec_init_wr(&r, text_len + trunc_msg_len); >> + if (!prb_reserve(&e, prb, &r)) >> + return 0; >> + } >> + >> + /* fill message */ >> + memcpy(&r.text_buf[0], text, text_len); >> + if (trunc_msg_len) >> + memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len); >> + r.info->text_len = text_len + trunc_msg_len; >> + r.info->facility = facility; >> + r.info->level = level & 7; >> + r.info->flags = lflags & 0x1f; >> + r.info->ts_nsec = ts_nsec; > > This is the only location where ts_nsec is used. I would remove the > variable and call: > > r.info->ts_nsec = local_clock(); My reason for grabbing the clock at the beginning is so that the timestamp is as close to the printk() call as possible. IMHO it is a more deterministic timestamp than if it is taken after reservation(s) and sprint'ing. I prefer to keep it as it is, but will not object if such a change is necessary for mailine acceptance. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() 2020-12-03 16:25 ` John Ogness @ 2020-12-04 6:13 ` Sergey Senozhatsky 2020-12-04 8:26 ` Petr Mladek 1 sibling, 0 replies; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-04 6:13 UTC (permalink / raw) To: John Ogness Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/03 17:31), John Ogness wrote: [..] > >> + if (!prb_reserve(&e, prb, &r)) { > >> + /* truncate the message if it is too long for empty buffer */ > >> + truncate_msg(&text_len, &trunc_msg_len); > >> + > >> + prb_rec_init_wr(&r, text_len + trunc_msg_len); > >> + if (!prb_reserve(&e, prb, &r)) > >> + return 0; > >> + } > >> + > >> + /* fill message */ > >> + memcpy(&r.text_buf[0], text, text_len); > >> + if (trunc_msg_len) > >> + memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len); > >> + r.info->text_len = text_len + trunc_msg_len; > >> + r.info->facility = facility; > >> + r.info->level = level & 7; > >> + r.info->flags = lflags & 0x1f; > >> + r.info->ts_nsec = ts_nsec; > > > > This is the only location where ts_nsec is used. I would remove the > > variable and call: > > > > r.info->ts_nsec = local_clock(); > > My reason for grabbing the clock at the beginning is so that the > timestamp is as close to the printk() call as possible. IMHO it is a > more deterministic timestamp than if it is taken after reservation(s) > and sprint'ing. I prefer to keep it as it is, but will not object if > such a change is necessary for mailine acceptance. Sounds reasonable Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() 2020-12-03 16:25 ` John Ogness 2020-12-04 6:13 ` Sergey Senozhatsky @ 2020-12-04 8:26 ` Petr Mladek 1 sibling, 0 replies; 47+ messages in thread From: Petr Mladek @ 2020-12-04 8:26 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Thu 2020-12-03 17:31:29, John Ogness wrote: > On 2020-12-03, Petr Mladek <pmladek@suse.com> wrote: > >> + if (lflags & LOG_CONT) { > >> + prb_rec_init_wr(&r, text_len); > >> + if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) { > >> + memcpy(&r.text_buf[r.info->text_len], text, text_len); > >> + r.info->text_len += text_len; > >> + > >> + if (lflags & LOG_NEWLINE) { > >> + r.info->flags |= LOG_NEWLINE; > >> + prb_final_commit(&e); > >> + } else { > >> + prb_commit(&e); > >> + } > >> + > >> + return text_len; > >> + } > >> + } > >> + > >> + prb_rec_init_wr(&r, text_len); > > > > This is called in both branches. I would do it just once at the > > beginning. > > Actually that leads to a crash when prb_reserve_in_last() fails and the > fallback code uses a record that prb_reserve_in_last() has already > touched. The implementation of prb_reserve_in_last() would have to be > changed so that failure guarantees that @r has not been > modified. Currently prb_reserve_in_last() can fail after modifying @r. I see. It would deserve a comment so that nobody repeats my mistake again. I am sure that I would be able to do it once again few months from now ;-) > >> + if (!prb_reserve(&e, prb, &r)) { > >> + /* truncate the message if it is too long for empty buffer */ > >> + truncate_msg(&text_len, &trunc_msg_len); > >> + > >> + prb_rec_init_wr(&r, text_len + trunc_msg_len); > >> + if (!prb_reserve(&e, prb, &r)) > >> + return 0; > >> + } > >> + > >> + /* fill message */ > >> + memcpy(&r.text_buf[0], text, text_len); > >> + if (trunc_msg_len) > >> + memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len); > >> + r.info->text_len = text_len + trunc_msg_len; > >> + r.info->facility = facility; > >> + r.info->level = level & 7; > >> + r.info->flags = lflags & 0x1f; > >> + r.info->ts_nsec = ts_nsec; > > > > This is the only location where ts_nsec is used. I would remove the > > variable and call: > > > > r.info->ts_nsec = local_clock(); > > My reason for grabbing the clock at the beginning is so that the > timestamp is as close to the printk() call as possible. IMHO it is a > more deterministic timestamp than if it is taken after reservation(s) > and sprint'ing. I prefer to keep it as it is, but will not object if > such a change is necessary for mailine acceptance. Fair enough. It would deserve a comment as well. Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-01 20:53 [PATCH next v2 0/3] printk: remove logbuf_lock John Ogness 2020-12-01 20:53 ` [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() John Ogness @ 2020-12-01 20:53 ` John Ogness 2020-12-04 9:12 ` Petr Mladek 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness 2 siblings, 1 reply; 47+ messages in thread From: John Ogness @ 2020-12-01 20:53 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel Currently @clear_seq access is protected by @logbuf_lock. Once @logbuf_lock is removed some other form of synchronization will be required. Change the type of @clear_seq to atomic64_t to provide the synchronization. Signed-off-by: John Ogness <john.ogness@linutronix.de> --- kernel/printk/printk.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index fc5e3a7d6d89..e9018c4e1b66 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -403,7 +403,7 @@ static u64 exclusive_console_stop_seq; static unsigned long console_dropped; /* the next printk record to read after the last 'clear' command */ -static u64 clear_seq; +static atomic64_t clear_seq = ATOMIC64_INIT(0); #ifdef CONFIG_PRINTK_CALLER #define PREFIX_MAX 48 @@ -797,7 +797,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence) * like issued by 'dmesg -c'. Reading /dev/kmsg itself * changes no global state, and does not clear anything. */ - user->seq = clear_seq; + user->seq = atomic64_read(&clear_seq); break; case SEEK_END: /* after the last record */ @@ -914,6 +914,9 @@ void log_buf_vmcoreinfo_setup(void) * parse it and detect any changes to structure down the line. */ + VMCOREINFO_SIZE(atomic64_t); + VMCOREINFO_TYPE_OFFSET(atomic64_t, counter); + VMCOREINFO_STRUCT_SIZE(printk_ringbuffer); VMCOREINFO_OFFSET(printk_ringbuffer, desc_ring); VMCOREINFO_OFFSET(printk_ringbuffer, text_data_ring); @@ -1476,6 +1479,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear) struct printk_info info; unsigned int line_count; struct printk_record r; + u64 clr_seq; char *text; int len = 0; u64 seq; @@ -1487,15 +1491,17 @@ static int syslog_print_all(char __user *buf, int size, bool clear) time = printk_time; logbuf_lock_irq(); + clr_seq = atomic64_read(&clear_seq); + /* * Find first record that fits, including all following records, * into the user-provided buffer for this dump. */ - prb_for_each_info(clear_seq, prb, seq, &info, &line_count) + prb_for_each_info(clr_seq, prb, seq, &info, &line_count) len += get_record_print_text_size(&info, line_count, true, time); /* move first record forward until length fits into the buffer */ - prb_for_each_info(clear_seq, prb, seq, &info, &line_count) { + prb_for_each_info(clr_seq, prb, seq, &info, &line_count) { if (len <= size) break; len -= get_record_print_text_size(&info, line_count, true, time); @@ -1526,7 +1532,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear) } if (clear) - clear_seq = seq; + atomic64_set(&clear_seq, seq); logbuf_unlock_irq(); kfree(text); @@ -1536,7 +1542,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear) static void syslog_clear(void) { logbuf_lock_irq(); - clear_seq = prb_next_seq(prb); + atomic64_set(&clear_seq, prb_next_seq(prb)); logbuf_unlock_irq(); } @@ -3205,7 +3211,7 @@ void kmsg_dump(enum kmsg_dump_reason reason) dumper->active = true; logbuf_lock_irqsave(flags); - dumper->cur_seq = clear_seq; + dumper->cur_seq = atomic64_read(&clear_seq); dumper->next_seq = prb_next_seq(prb); logbuf_unlock_irqrestore(flags); @@ -3412,7 +3418,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer); */ void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper) { - dumper->cur_seq = clear_seq; + dumper->cur_seq = atomic64_read(&clear_seq); dumper->next_seq = prb_next_seq(prb); } -- 2.20.1 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-01 20:53 ` [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t John Ogness @ 2020-12-04 9:12 ` Petr Mladek 2020-12-06 20:23 ` John Ogness 2020-12-08 20:34 ` Sergey Senozhatsky 0 siblings, 2 replies; 47+ messages in thread From: Petr Mladek @ 2020-12-04 9:12 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, linux-kernel On Tue 2020-12-01 21:59:40, John Ogness wrote: > Currently @clear_seq access is protected by @logbuf_lock. Once > @logbuf_lock is removed some other form of synchronization will be > required. Change the type of @clear_seq to atomic64_t to provide the > synchronization. > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index fc5e3a7d6d89..e9018c4e1b66 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -3412,7 +3418,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer); > */ > void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper) > { > - dumper->cur_seq = clear_seq; > + dumper->cur_seq = atomic64_read(&clear_seq); Sigh, atomic64_read() uses a spin lock in the generic implementation that is used on some architectures. Hmm, this seems to be the only location where the lock must not be used. At the same time, a best effort might be acceptable here. I am not super-happy with the following hack but it might work: /* * Use the best effort to avoid locks. In the worst case, * the bottom and upper halves will be inconsistent. Then * the value will be far too big or far too low. Fallback * to the first available sequence number when it is * too big. */ if (IS_ENABLED(CONFIG_GENERIC_ATOMIC64)) { u64 first_seq = prb_first_seq(prb); dumper->cur_seq = READ_ONCE(&clear_seq->counter); if (dumper->cur_seq > first_seq) dumper->cur_seq = first_seq; } else { dumper->cur_seq = atomic64_read(&clear_seq); } Alternative solution would to always fallback to the first_seq on these architectures. Few people would complain when they see more messages. We could always improve it when it causes problems. Adding Peter Zijstra for his opinion [*]. > dumper->next_seq = prb_next_seq(prb); [*] I am going to hide under a stone for the above hack. Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-04 9:12 ` Petr Mladek @ 2020-12-06 20:23 ` John Ogness 2020-12-07 9:34 ` Peter Zijlstra 2020-12-08 20:34 ` Sergey Senozhatsky 1 sibling, 1 reply; 47+ messages in thread From: John Ogness @ 2020-12-06 20:23 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, linux-kernel On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: > On Tue 2020-12-01 21:59:40, John Ogness wrote: >> Currently @clear_seq access is protected by @logbuf_lock. Once >> @logbuf_lock is removed some other form of synchronization will be >> required. Change the type of @clear_seq to atomic64_t to provide the >> synchronization. >> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index fc5e3a7d6d89..e9018c4e1b66 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -3412,7 +3418,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer); >> */ >> void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper) >> { >> - dumper->cur_seq = clear_seq; >> + dumper->cur_seq = atomic64_read(&clear_seq); > > Sigh, atomic64_read() uses a spin lock in the generic implementation > that is used on some architectures. > > Hmm, this seems to be the only location where the lock must not be > used. Yes, and it is read-only access. Perhaps atomic64_t is the wrong thing to use here. We could use a seqcount_latch and a shadow variable so that if a writer has been preempted, we can use the previous value. (Only kmsg_dump would need to use the lockless variant to read the value.) void clear_seq_set(u64 val) { spin_lock_irq(&clear_lock); raw_write_seqcount_latch(&clear_latch); clear_seq[0] = val; raw_write_seqcount_latch(&clear_latch); clear_seq[1] = val; spin_unlock_irq(&clear_lock); } u64 clear_seq_get_nolock(void) { unsigned int seq, idx; u64 val; do { seq = raw_read_seqcount_latch(&clear_latch); idx = seq & 0x1; val = clear_seq[idx]; } while (read_seqcount_latch_retry(&clear_latch, seq)); return val; } u64 clear_seq_get(void) { u64 val; spin_lock_irq(&clear_lock); val = clear_seq[0]; spin_unlock_irq(&clear_lock); return val; } > Alternative solution would to always fallback to the first_seq on > these architectures. Few people would complain when they see more > messages. We could always improve it when it causes problems. I am also OK with this solution. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-06 20:23 ` John Ogness @ 2020-12-07 9:34 ` Peter Zijlstra 2020-12-07 10:03 ` John Ogness 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2020-12-07 9:34 UTC (permalink / raw) To: John Ogness Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Sun, Dec 06, 2020 at 09:29:59PM +0106, John Ogness wrote: > Yes, and it is read-only access. Perhaps atomic64_t is the wrong thing > to use here. We could use a seqcount_latch and a shadow variable so that > if a writer has been preempted, we can use the previous value. (Only > kmsg_dump would need to use the lockless variant to read the value.) > > void clear_seq_set(u64 val) > { > spin_lock_irq(&clear_lock); > raw_write_seqcount_latch(&clear_latch); > clear_seq[0] = val; > raw_write_seqcount_latch(&clear_latch); > clear_seq[1] = val; > spin_unlock_irq(&clear_lock); > } > > u64 clear_seq_get_nolock(void) > { > unsigned int seq, idx; > u64 val; > > do { > seq = raw_read_seqcount_latch(&clear_latch); > idx = seq & 0x1; > val = clear_seq[idx]; > } while (read_seqcount_latch_retry(&clear_latch, seq)); > > return val; > } That's overly complicated. If you're going to double the storage you can simply do: seq = val smp_wmb(); seq_copy = val; vs do { tmp = seq_copy; smp_rmb(); val = seq; } while (val != tmp); Also look for CONFIG_64_BIT in kernel/sched/fair.c. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-07 9:34 ` Peter Zijlstra @ 2020-12-07 10:03 ` John Ogness 2020-12-07 12:56 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: John Ogness @ 2020-12-07 10:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-07, Peter Zijlstra <peterz@infradead.org> wrote: >> Yes, and it is read-only access. Perhaps atomic64_t is the wrong thing >> to use here. We could use a seqcount_latch and a shadow variable so that >> if a writer has been preempted, we can use the previous value. (Only >> kmsg_dump would need to use the lockless variant to read the value.) >> >> void clear_seq_set(u64 val) >> { >> spin_lock_irq(&clear_lock); >> raw_write_seqcount_latch(&clear_latch); >> clear_seq[0] = val; >> raw_write_seqcount_latch(&clear_latch); >> clear_seq[1] = val; >> spin_unlock_irq(&clear_lock); >> } >> >> u64 clear_seq_get_nolock(void) >> { >> unsigned int seq, idx; >> u64 val; >> >> do { >> seq = raw_read_seqcount_latch(&clear_latch); >> idx = seq & 0x1; >> val = clear_seq[idx]; >> } while (read_seqcount_latch_retry(&clear_latch, seq)); >> >> return val; >> } > > That's overly complicated. > > If you're going to double the storage you can simply do: > > > seq = val > smp_wmb(); > seq_copy = val; > > vs > > do { > tmp = seq_copy; > smp_rmb(); > val = seq; > } while (val != tmp); That will not work. We are talking about a situation where the writer is preempted. So seq will never equal seq_copy in that situation. I expect that the seqcount_latch is necessary. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-07 10:03 ` John Ogness @ 2020-12-07 12:56 ` Peter Zijlstra 2020-12-07 12:56 ` Petr Mladek 2020-12-07 16:46 ` David Laight 2 siblings, 0 replies; 47+ messages in thread From: Peter Zijlstra @ 2020-12-07 12:56 UTC (permalink / raw) To: John Ogness Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Mon, Dec 07, 2020 at 11:09:39AM +0106, John Ogness wrote: > That will not work. We are talking about a situation where the writer is > preempted. So seq will never equal seq_copy in that situation. I expect > that the seqcount_latch is necessary. Interrupted rather, I would think, specifically NMIs? Then yes, latch should work. Gets you either the old or new, but never something in between. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-07 10:03 ` John Ogness 2020-12-07 12:56 ` Peter Zijlstra @ 2020-12-07 12:56 ` Petr Mladek 2020-12-07 16:46 ` David Laight 2 siblings, 0 replies; 47+ messages in thread From: Petr Mladek @ 2020-12-07 12:56 UTC (permalink / raw) To: John Ogness Cc: Peter Zijlstra, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Mon 2020-12-07 11:09:39, John Ogness wrote: > On 2020-12-07, Peter Zijlstra <peterz@infradead.org> wrote: > >> Yes, and it is read-only access. Perhaps atomic64_t is the wrong thing > >> to use here. We could use a seqcount_latch and a shadow variable so that > >> if a writer has been preempted, we can use the previous value. (Only > >> kmsg_dump would need to use the lockless variant to read the value.) > >> > >> void clear_seq_set(u64 val) > >> { > >> spin_lock_irq(&clear_lock); > >> raw_write_seqcount_latch(&clear_latch); > >> clear_seq[0] = val; > >> raw_write_seqcount_latch(&clear_latch); > >> clear_seq[1] = val; > >> spin_unlock_irq(&clear_lock); > >> } > >> > >> u64 clear_seq_get_nolock(void) > >> { > >> unsigned int seq, idx; > >> u64 val; > >> > >> do { > >> seq = raw_read_seqcount_latch(&clear_latch); > >> idx = seq & 0x1; > >> val = clear_seq[idx]; > >> } while (read_seqcount_latch_retry(&clear_latch, seq)); > >> > >> return val; > >> } > > > > That's overly complicated. > > > > If you're going to double the storage you can simply do: > > > > > > seq = val > > smp_wmb(); > > seq_copy = val; > > > > vs > > > > do { > > tmp = seq_copy; > > smp_rmb(); > > val = seq; > > } while (val != tmp); > > That will not work. We are talking about a situation where the writer is > preempted. So seq will never equal seq_copy in that situation. I expect > that the seqcount_latch is necessary. Or we could disable interrupts around the writer. But seqcount_latch will actually be need so that it works in panic(). The writer might be on a CPU that has been stopped using NMI. And this code is used by dumpers() that are called during panic(). Sigh, I have to take a coffee and try to really understand the latch code ;-) Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-07 10:03 ` John Ogness 2020-12-07 12:56 ` Peter Zijlstra 2020-12-07 12:56 ` Petr Mladek @ 2020-12-07 16:46 ` David Laight 2 siblings, 0 replies; 47+ messages in thread From: David Laight @ 2020-12-07 16:46 UTC (permalink / raw) To: 'John Ogness', Peter Zijlstra Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel From: John Ogness > Sent: 07 December 2020 10:04 > > On 2020-12-07, Peter Zijlstra <peterz@infradead.org> wrote: > >> Yes, and it is read-only access. Perhaps atomic64_t is the wrong thing > >> to use here. We could use a seqcount_latch and a shadow variable so that > >> if a writer has been preempted, we can use the previous value. (Only > >> kmsg_dump would need to use the lockless variant to read the value.) > >> > >> void clear_seq_set(u64 val) > >> { > >> spin_lock_irq(&clear_lock); > >> raw_write_seqcount_latch(&clear_latch); > >> clear_seq[0] = val; > >> raw_write_seqcount_latch(&clear_latch); > >> clear_seq[1] = val; > >> spin_unlock_irq(&clear_lock); > >> } > >> > >> u64 clear_seq_get_nolock(void) > >> { > >> unsigned int seq, idx; > >> u64 val; > >> > >> do { > >> seq = raw_read_seqcount_latch(&clear_latch); > >> idx = seq & 0x1; > >> val = clear_seq[idx]; > >> } while (read_seqcount_latch_retry(&clear_latch, seq)); > >> > >> return val; > >> } > > > > That's overly complicated. > > > > If you're going to double the storage you can simply do: > > > > > > seq = val > > smp_wmb(); > > seq_copy = val; > > > > vs > > > > do { > > tmp = seq_copy; > > smp_rmb(); > > val = seq; > > } while (val != tmp); > > That will not work. We are talking about a situation where the writer is > preempted. So seq will never equal seq_copy in that situation. I expect > that the seqcount_latch is necessary. Is the value just being incremented?? If so you can do: seq_hi_0 = val >> 32; smp_wmb(); seq_lo = val; smp_wmb(); seq_hi_1 = val >> 32; Then the reader can assume that seq_lo is zero if seq_h1_0 and seq_hi_1 differ. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-04 9:12 ` Petr Mladek 2020-12-06 20:23 ` John Ogness @ 2020-12-08 20:34 ` Sergey Senozhatsky 2020-12-08 22:30 ` John Ogness 2020-12-09 8:07 ` Peter Zijlstra 1 sibling, 2 replies; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-08 20:34 UTC (permalink / raw) To: Petr Mladek Cc: John Ogness, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, linux-kernel On (20/12/04 10:12), Petr Mladek wrote: > On Tue 2020-12-01 21:59:40, John Ogness wrote: > > Currently @clear_seq access is protected by @logbuf_lock. Once > > @logbuf_lock is removed some other form of synchronization will be > > required. Change the type of @clear_seq to atomic64_t to provide the > > synchronization. > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index fc5e3a7d6d89..e9018c4e1b66 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -3412,7 +3418,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer); > > */ > > void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper) > > { > > - dumper->cur_seq = clear_seq; > > + dumper->cur_seq = atomic64_read(&clear_seq); > > Sigh, atomic64_read() uses a spin lock in the generic implementation > that is used on some architectures. Oh... So on those archs prb is not lockless in fact, it actually takes the spin_lock each time we read the descriptor state? desc_read() atomic_long_read(state_var) atomic64_read() raw_spin_lock_irqsave(lock, flags) << NMI panic >> Am I missing something? -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-08 20:34 ` Sergey Senozhatsky @ 2020-12-08 22:30 ` John Ogness 2020-12-09 1:04 ` Sergey Senozhatsky 2020-12-09 8:16 ` Peter Zijlstra 2020-12-09 8:07 ` Peter Zijlstra 1 sibling, 2 replies; 47+ messages in thread From: John Ogness @ 2020-12-08 22:30 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, linux-kernel On 2020-12-09, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: >> Sigh, atomic64_read() uses a spin lock in the generic implementation >> that is used on some architectures. > > Oh... So on those archs prb is not lockless in fact, it actually > takes the spin_lock each time we read the descriptor state? > > desc_read() > atomic_long_read(state_var) > atomic64_read() > raw_spin_lock_irqsave(lock, flags) > << NMI panic >> > > Am I missing something? For the state variable we chose atomic_long_t instead of atomic64_t for this reason. atomic_long_t operations are available atomically on all architectures. However, for clear_seq we need 64-bit (even on 32-bit machines). The seqcount_latch is an excellent solution here since clear_seq does not require lockless writers. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-08 22:30 ` John Ogness @ 2020-12-09 1:04 ` Sergey Senozhatsky 2020-12-09 8:16 ` Peter Zijlstra 1 sibling, 0 replies; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-09 1:04 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Peter Zijlstra, Thomas Gleixner, linux-kernel On (20/12/08 23:36), John Ogness wrote: > On 2020-12-09, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > >> Sigh, atomic64_read() uses a spin lock in the generic implementation > >> that is used on some architectures. > > > > Oh... So on those archs prb is not lockless in fact, it actually > > takes the spin_lock each time we read the descriptor state? > > > > desc_read() > > atomic_long_read(state_var) > > atomic64_read() > > raw_spin_lock_irqsave(lock, flags) > > << NMI panic >> > > > > Am I missing something? > > For the state variable we chose atomic_long_t instead of atomic64_t for > this reason. atomic_long_t operations are available atomically on all > architectures. Right. Looking more at Kconfigs, it seems that when atomic_long_t is atomic64 (64BIT) then GENERIC_ATOMIC64 is not selected. Those archs that select GENERIC_ATOMIC64 unconditionally all seem to be 32-bit. Thanks. -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-08 22:30 ` John Ogness 2020-12-09 1:04 ` Sergey Senozhatsky @ 2020-12-09 8:16 ` Peter Zijlstra 2020-12-09 9:22 ` Sergey Senozhatsky 1 sibling, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2020-12-09 8:16 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Tue, Dec 08, 2020 at 11:36:44PM +0106, John Ogness wrote: > For the state variable we chose atomic_long_t instead of atomic64_t for > this reason. atomic_long_t operations are available atomically on all > architectures. Please put on your eye cancer gear and inspect the atomic implementation of PA-RISC, Sparc32, feh, I forgot who else. Those SMP capable architectures are gifted with just one XCHG like atomic instruction :/ Anyway, as said in the other email, they also don't have NMIs so it mostly works. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-09 8:16 ` Peter Zijlstra @ 2020-12-09 9:22 ` Sergey Senozhatsky 2020-12-09 10:46 ` Sergey Senozhatsky 0 siblings, 1 reply; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-09 9:22 UTC (permalink / raw) To: Peter Zijlstra Cc: John Ogness, Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/09 09:16), Peter Zijlstra wrote: > On Tue, Dec 08, 2020 at 11:36:44PM +0106, John Ogness wrote: > > For the state variable we chose atomic_long_t instead of atomic64_t for > > this reason. atomic_long_t operations are available atomically on all > > architectures. > > Please put on your eye cancer gear and inspect the atomic implementation > of PA-RISC, Sparc32, feh, I forgot who else. > > Those SMP capable architectures are gifted with just one XCHG like > atomic instruction :/ Anyway, as said in the other email, they also > don't have NMIs so it mostly works. Hmm, wow. OK, I definitely want to look further. When some CONFIG_DEBUG_FOO_BAR code wants to pr_err from prb->atomic_op on those archs then we deadlock in printk once again? -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-09 9:22 ` Sergey Senozhatsky @ 2020-12-09 10:46 ` Sergey Senozhatsky 2020-12-09 11:00 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-09 10:46 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, John Ogness, Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/09 18:22), Sergey Senozhatsky wrote: > > > > Please put on your eye cancer gear and inspect the atomic implementation > > of PA-RISC, Sparc32, feh, I forgot who else. > > > > Those SMP capable architectures are gifted with just one XCHG like > > atomic instruction :/ Anyway, as said in the other email, they also > > don't have NMIs so it mostly works. PeterZ, thanks for the pointers! > Hmm, wow. OK, I definitely want to look further. > > When some CONFIG_DEBUG_FOO_BAR code wants to pr_err from prb->atomic_op > on those archs then we deadlock in printk once again? E.g. arch/sparc/lib/atomic32.c spinlock_t __atomic_hash[ATOMIC_HASH_SIZE]; atomic_foo() { spin_lock_irqsave(ATOMIC_HASH(v), flags) ... spin_unlock_irqrestore(ATOMIC_HASH(v), flags); } So another potential re-entry path is atomic_foo() spin_lock_irqsave(ATOMIC_HASH(v), flags) printk() prb() atomic_foo() spin_lock_irqsave(ATOMIC_HASH(v), flags) which can deadlock, in theory, if both atomics HASH to the same key (same spin_lock). I wonder what else am I missing. -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-09 10:46 ` Sergey Senozhatsky @ 2020-12-09 11:00 ` Peter Zijlstra 2020-12-09 11:28 ` Sergey Senozhatsky 0 siblings, 1 reply; 47+ messages in thread From: Peter Zijlstra @ 2020-12-09 11:00 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, John Ogness, Petr Mladek, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Wed, Dec 09, 2020 at 07:46:13PM +0900, Sergey Senozhatsky wrote: > On (20/12/09 18:22), Sergey Senozhatsky wrote: > > > > > > Please put on your eye cancer gear and inspect the atomic implementation > > > of PA-RISC, Sparc32, feh, I forgot who else. > > > > > > Those SMP capable architectures are gifted with just one XCHG like > > > atomic instruction :/ Anyway, as said in the other email, they also > > > don't have NMIs so it mostly works. > > PeterZ, thanks for the pointers! > > > > Hmm, wow. OK, I definitely want to look further. > > > > When some CONFIG_DEBUG_FOO_BAR code wants to pr_err from prb->atomic_op > > on those archs then we deadlock in printk once again? > > E.g. arch/sparc/lib/atomic32.c > > spinlock_t __atomic_hash[ATOMIC_HASH_SIZE]; > atomic_foo() > { > spin_lock_irqsave(ATOMIC_HASH(v), flags) > ... > spin_unlock_irqrestore(ATOMIC_HASH(v), flags); > } > > So another potential re-entry path is > > atomic_foo() > spin_lock_irqsave(ATOMIC_HASH(v), flags) > printk() > prb() > atomic_foo() > spin_lock_irqsave(ATOMIC_HASH(v), flags) > > which can deadlock, in theory, if both atomics HASH to the same > key (same spin_lock). Yep, but see the 'mostly' in the 'they mostly work'. Given the limitiations of these architectures there's really only so much you can do. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-09 11:00 ` Peter Zijlstra @ 2020-12-09 11:28 ` Sergey Senozhatsky 2020-12-09 12:29 ` Peter Zijlstra 0 siblings, 1 reply; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-09 11:28 UTC (permalink / raw) To: Peter Zijlstra Cc: Sergey Senozhatsky, Sergey Senozhatsky, John Ogness, Petr Mladek, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/09 12:00), Peter Zijlstra wrote: > > So another potential re-entry path is > > > > atomic_foo() > > spin_lock_irqsave(ATOMIC_HASH(v), flags) > > printk() > > prb() > > atomic_foo() > > spin_lock_irqsave(ATOMIC_HASH(v), flags) > > > > which can deadlock, in theory, if both atomics HASH to the same > > key (same spin_lock). > > Yep, but see the 'mostly' in the 'they mostly work'. Given the > limitiations of these architectures there's really only so much you can > do. Right, agreed. Nevertheless TIL that lockless printk buffer is not always lockless. Perhaps, people that work with those archs need to also know this. I haven't checked all the archs, but if, somehow, (IF) some of them can panic the system with the atomic hash entries locked, then on those archs new printk may not be able to flush-on-panic. Because while printk iterates logbuf it may HASH to the atomic hash table entry, that will never be unlocked. So there are some changes in atomic/printk department on those archs. -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-09 11:28 ` Sergey Senozhatsky @ 2020-12-09 12:29 ` Peter Zijlstra 0 siblings, 0 replies; 47+ messages in thread From: Peter Zijlstra @ 2020-12-09 12:29 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, John Ogness, Petr Mladek, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Wed, Dec 09, 2020 at 08:28:52PM +0900, Sergey Senozhatsky wrote: > On (20/12/09 12:00), Peter Zijlstra wrote: > > > So another potential re-entry path is > > > > > > atomic_foo() > > > spin_lock_irqsave(ATOMIC_HASH(v), flags) > > > printk() > > > prb() > > > atomic_foo() > > > spin_lock_irqsave(ATOMIC_HASH(v), flags) > > > > > > which can deadlock, in theory, if both atomics HASH to the same > > > key (same spin_lock). > > > > Yep, but see the 'mostly' in the 'they mostly work'. Given the > > limitiations of these architectures there's really only so much you can > > do. > > Right, agreed. > > Nevertheless TIL that lockless printk buffer is not always lockless. > Perhaps, people that work with those archs need to also know this. Last time I broke them, they were aware they're 'special' and IIRC they're mostly just limping along on prayers. > I haven't checked all the archs, but if, somehow, (IF) some of them > can panic the system with the atomic hash entries locked, then on > those archs new printk may not be able to flush-on-panic. Because > while printk iterates logbuf it may HASH to the atomic hash table > entry, that will never be unlocked. So there are some changes in > atomic/printk department on those archs. Yeah, so I wouldn't put too much effort into thinking about it. Hopefully we can eventually delete these architectures and really forget they exist. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t 2020-12-08 20:34 ` Sergey Senozhatsky 2020-12-08 22:30 ` John Ogness @ 2020-12-09 8:07 ` Peter Zijlstra 1 sibling, 0 replies; 47+ messages in thread From: Peter Zijlstra @ 2020-12-09 8:07 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Wed, Dec 09, 2020 at 05:34:19AM +0900, Sergey Senozhatsky wrote: > On (20/12/04 10:12), Petr Mladek wrote: > > On Tue 2020-12-01 21:59:40, John Ogness wrote: > > > Currently @clear_seq access is protected by @logbuf_lock. Once > > > @logbuf_lock is removed some other form of synchronization will be > > > required. Change the type of @clear_seq to atomic64_t to provide the > > > synchronization. > > > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > > index fc5e3a7d6d89..e9018c4e1b66 100644 > > > --- a/kernel/printk/printk.c > > > +++ b/kernel/printk/printk.c > > > @@ -3412,7 +3418,7 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer); > > > */ > > > void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper) > > > { > > > - dumper->cur_seq = clear_seq; > > > + dumper->cur_seq = atomic64_read(&clear_seq); > > > > Sigh, atomic64_read() uses a spin lock in the generic implementation > > that is used on some architectures. > > Oh... So on those archs prb is not lockless in fact, it actually > takes the spin_lock each time we read the descriptor state? Yeah, many 32bit archs cannot natively do 64bit atomics and get to use the horrible hashed spinlock crap. But it gets even worse, we have a few architectures that cannot do atomics _at_all_ and _always_ use the horrible hashed spinlock crap for all atomics, even native word length ones. I consider these architectures broken crap, and they work mostly by accident than anything else, but we have them :/ The good new is that they don't have NMIs either, so that helps. ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-01 20:53 [PATCH next v2 0/3] printk: remove logbuf_lock John Ogness 2020-12-01 20:53 ` [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() John Ogness 2020-12-01 20:53 ` [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t John Ogness @ 2020-12-01 20:53 ` John Ogness 2020-12-04 6:41 ` Sergey Senozhatsky ` (5 more replies) 2 siblings, 6 replies; 47+ messages in thread From: John Ogness @ 2020-12-01 20:53 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel Since the ringbuffer is lockless, there is no need for it to be protected by @logbuf_lock. Remove @logbuf_lock. This means that printk_nmi_direct and printk_safe_flush_on_panic() no longer need to acquire any lock to run. The global variables @syslog_seq, @syslog_partial, @syslog_time were also protected by @logbuf_lock. Introduce @syslog_lock to protect these. @console_seq, @exclusive_console_stop_seq, @console_dropped are protected by @console_lock. Without @logbuf_lock it is no longer possible to use the single static buffer for temporarily sprint'ing the message. Instead, use vsnprintf() to determine the length and perform the real vscnprintf() using the area reserved from the ringbuffer. This leads to suboptimal packing of the message data, but will result in less wasted storage than multiple per-cpu buffers to support lockless temporary sprint'ing. Signed-off-by: John Ogness <john.ogness@linutronix.de> --- include/linux/printk.h | 1 + kernel/printk/internal.h | 4 +- kernel/printk/printk.c | 285 +++++++++++++++++++----------------- kernel/printk/printk_safe.c | 18 +-- 4 files changed, 157 insertions(+), 151 deletions(-) diff --git a/include/linux/printk.h b/include/linux/printk.h index fe7eb2351610..6d8f844bfdff 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -45,6 +45,7 @@ static inline const char *printk_skip_headers(const char *buffer) } #define CONSOLE_EXT_LOG_MAX 8192 +#define CONSOLE_LOG_MAX 1024 /* printk's without a loglevel use this.. */ #define MESSAGE_LOGLEVEL_DEFAULT CONFIG_MESSAGE_LOGLEVEL_DEFAULT diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 3a8fd491758c..e7acc2888c8e 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -12,8 +12,6 @@ #define PRINTK_NMI_CONTEXT_OFFSET 0x010000000 -extern raw_spinlock_t logbuf_lock; - __printf(4, 0) int vprintk_store(int facility, int level, const struct dev_printk_info *dev_info, @@ -59,7 +57,7 @@ void defer_console_output(void); __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; } /* - * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem + * In !PRINTK builds we still export console_sem * semaphore and some of console functions (console_unlock()/etc.), so * printk-safe must preserve the existing local IRQ guarantees. */ diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index e9018c4e1b66..7385101210be 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -355,48 +355,18 @@ enum log_flags { LOG_CONT = 8, /* text is a fragment of a continuation line */ }; -/* - * The logbuf_lock protects kmsg buffer, indices, counters. This can be taken - * within the scheduler's rq lock. It must be released before calling - * console_unlock() or anything else that might wake up a process. - */ -DEFINE_RAW_SPINLOCK(logbuf_lock); - -/* - * Helper macros to lock/unlock logbuf_lock and switch between - * printk-safe/unsafe modes. - */ -#define logbuf_lock_irq() \ - do { \ - printk_safe_enter_irq(); \ - raw_spin_lock(&logbuf_lock); \ - } while (0) - -#define logbuf_unlock_irq() \ - do { \ - raw_spin_unlock(&logbuf_lock); \ - printk_safe_exit_irq(); \ - } while (0) - -#define logbuf_lock_irqsave(flags) \ - do { \ - printk_safe_enter_irqsave(flags); \ - raw_spin_lock(&logbuf_lock); \ - } while (0) - -#define logbuf_unlock_irqrestore(flags) \ - do { \ - raw_spin_unlock(&logbuf_lock); \ - printk_safe_exit_irqrestore(flags); \ - } while (0) +/* The syslog_lock protects syslog_* variables. */ +static DEFINE_SPINLOCK(syslog_lock); #ifdef CONFIG_PRINTK DECLARE_WAIT_QUEUE_HEAD(log_wait); +/* All 3 protected by @syslog_lock. */ /* the next printk record to read by syslog(READ) or /proc/kmsg */ static u64 syslog_seq; static size_t syslog_partial; static bool syslog_time; +/* All 3 protected by @console_sem. */ /* the next printk record to write to the console */ static u64 console_seq; static u64 exclusive_console_stop_seq; @@ -410,7 +380,7 @@ static atomic64_t clear_seq = ATOMIC64_INIT(0); #else #define PREFIX_MAX 32 #endif -#define LOG_LINE_MAX (1024 - PREFIX_MAX) +#define LOG_LINE_MAX (CONSOLE_LOG_MAX - PREFIX_MAX) #define LOG_LEVEL(v) ((v) & 0x07) #define LOG_FACILITY(v) ((v) >> 3 & 0xff) @@ -720,27 +690,22 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf, if (ret) return ret; - logbuf_lock_irq(); if (!prb_read_valid(prb, user->seq, r)) { if (file->f_flags & O_NONBLOCK) { ret = -EAGAIN; - logbuf_unlock_irq(); goto out; } - logbuf_unlock_irq(); ret = wait_event_interruptible(log_wait, prb_read_valid(prb, user->seq, r)); if (ret) goto out; - logbuf_lock_irq(); } if (user->seq < prb_first_valid_seq(prb)) { /* our last seen message is gone, return error and reset */ user->seq = prb_first_valid_seq(prb); ret = -EPIPE; - logbuf_unlock_irq(); goto out; } @@ -750,7 +715,6 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf, &r->info->dev_info); user->seq = r->info->seq + 1; - logbuf_unlock_irq(); if (len > count) { ret = -EINVAL; @@ -785,7 +749,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence) if (offset) return -ESPIPE; - logbuf_lock_irq(); switch (whence) { case SEEK_SET: /* the first record */ @@ -806,7 +769,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence) default: ret = -EINVAL; } - logbuf_unlock_irq(); return ret; } @@ -820,7 +782,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait) poll_wait(file, &log_wait, wait); - logbuf_lock_irq(); if (prb_read_valid(prb, user->seq, NULL)) { /* return error when data has vanished underneath us */ if (user->seq < prb_first_valid_seq(prb)) @@ -828,7 +789,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait) else ret = EPOLLIN|EPOLLRDNORM; } - logbuf_unlock_irq(); return ret; } @@ -861,9 +821,7 @@ static int devkmsg_open(struct inode *inode, struct file *file) prb_rec_init_rd(&user->record, &user->info, &user->text_buf[0], sizeof(user->text_buf)); - logbuf_lock_irq(); user->seq = prb_first_valid_seq(prb); - logbuf_unlock_irq(); file->private_data = user; return 0; @@ -1071,7 +1029,6 @@ void __init setup_log_buf(int early) struct printk_record r; size_t new_descs_size; size_t new_infos_size; - unsigned long flags; char *new_log_buf; unsigned int free; u64 seq; @@ -1129,8 +1086,6 @@ void __init setup_log_buf(int early) new_descs, ilog2(new_descs_count), new_infos); - logbuf_lock_irqsave(flags); - log_buf_len = new_log_buf_len; log_buf = new_log_buf; new_log_buf_len = 0; @@ -1146,8 +1101,6 @@ void __init setup_log_buf(int early) */ prb = &printk_rb_dynamic; - logbuf_unlock_irqrestore(flags); - if (seq != prb_next_seq(&printk_rb_static)) { pr_err("dropped %llu messages\n", prb_next_seq(&printk_rb_static) - seq); @@ -1423,9 +1376,9 @@ static int syslog_print(char __user *buf, int size) size_t n; size_t skip; - logbuf_lock_irq(); + spin_lock_irq(&syslog_lock); if (!prb_read_valid(prb, syslog_seq, &r)) { - logbuf_unlock_irq(); + spin_unlock_irq(&syslog_lock); break; } if (r.info->seq != syslog_seq) { @@ -1454,7 +1407,7 @@ static int syslog_print(char __user *buf, int size) syslog_partial += n; } else n = 0; - logbuf_unlock_irq(); + spin_unlock_irq(&syslog_lock); if (!n) break; @@ -1479,6 +1432,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear) struct printk_info info; unsigned int line_count; struct printk_record r; + u64 newest_seq; u64 clr_seq; char *text; int len = 0; @@ -1490,19 +1444,30 @@ static int syslog_print_all(char __user *buf, int size, bool clear) return -ENOMEM; time = printk_time; - logbuf_lock_irq(); clr_seq = atomic64_read(&clear_seq); /* * Find first record that fits, including all following records, * into the user-provided buffer for this dump. */ + prb_for_each_info(clr_seq, prb, seq, &info, &line_count) len += get_record_print_text_size(&info, line_count, true, time); - /* move first record forward until length fits into the buffer */ + /* + * Keep track of the latest in case new records are coming in fast + * and overwriting the older records. + */ + newest_seq = seq; + + /* + * Move first record forward until length fits into the buffer. This + * is a best effort attempt. If @newest_seq is reached because the + * ringbuffer is wrapping too fast, just start filling the buffer + * from there. + */ prb_for_each_info(clr_seq, prb, seq, &info, &line_count) { - if (len <= size) + if (len <= size || info.seq > newest_seq) break; len -= get_record_print_text_size(&info, line_count, true, time); } @@ -1520,12 +1485,10 @@ static int syslog_print_all(char __user *buf, int size, bool clear) break; } - logbuf_unlock_irq(); if (copy_to_user(buf + len, text, textlen)) len = -EFAULT; else len += textlen; - logbuf_lock_irq(); if (len < 0) break; @@ -1533,7 +1496,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear) if (clear) atomic64_set(&clear_seq, seq); - logbuf_unlock_irq(); kfree(text); return len; @@ -1541,9 +1503,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear) static void syslog_clear(void) { - logbuf_lock_irq(); atomic64_set(&clear_seq, prb_next_seq(prb)); - logbuf_unlock_irq(); } int do_syslog(int type, char __user *buf, int len, int source) @@ -1551,6 +1511,7 @@ int do_syslog(int type, char __user *buf, int len, int source) bool clear = false; static int saved_console_loglevel = LOGLEVEL_DEFAULT; int error; + u64 seq; error = check_syslog_permissions(type, source); if (error) @@ -1568,8 +1529,11 @@ int do_syslog(int type, char __user *buf, int len, int source) return 0; if (!access_ok(buf, len)) return -EFAULT; + spin_lock_irq(&syslog_lock); + seq = syslog_seq; + spin_unlock_irq(&syslog_lock); error = wait_event_interruptible(log_wait, - prb_read_valid(prb, syslog_seq, NULL)); + prb_read_valid(prb, seq, NULL)); if (error) return error; error = syslog_print(buf, len); @@ -1617,7 +1581,7 @@ int do_syslog(int type, char __user *buf, int len, int source) break; /* Number of chars in the log buffer */ case SYSLOG_ACTION_SIZE_UNREAD: - logbuf_lock_irq(); + spin_lock_irq(&syslog_lock); if (syslog_seq < prb_first_valid_seq(prb)) { /* messages are gone, move to first one */ syslog_seq = prb_first_valid_seq(prb); @@ -1644,7 +1608,7 @@ int do_syslog(int type, char __user *buf, int len, int source) } error -= syslog_partial; } - logbuf_unlock_irq(); + spin_unlock_irq(&syslog_lock); break; /* Size of the log buffer */ case SYSLOG_ACTION_SIZE_BUFFER: @@ -1847,6 +1811,65 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, } } +#ifdef CONFIG_PRINTK_NMI +#define NUM_RECURSION_CTX 2 +#else +#define NUM_RECURSION_CTX 1 +#endif + +struct printk_recursion { + char count[NUM_RECURSION_CTX]; +}; + +static DEFINE_PER_CPU(struct printk_recursion, percpu_printk_recursion); +static char printk_recursion_count[NUM_RECURSION_CTX]; + +static char *get_printk_count(void) +{ + struct printk_recursion *rec; + char *count; + + if (!printk_percpu_data_ready()) { + count = &printk_recursion_count[0]; + } else { + rec = this_cpu_ptr(&percpu_printk_recursion); + + count = &rec->count[0]; + } + +#ifdef CONFIG_PRINTK_NMI + if (in_nmi()) + count++; +#endif + + return count; +} + +static bool printk_enter(unsigned long *flags) +{ + char *count; + + local_irq_save(*flags); + count = get_printk_count(); + /* Only 1 level of recursion allowed. */ + if (*count > 1) { + local_irq_restore(*flags); + return false; + } + (*count)++; + + return true; +} + +static void printk_exit(unsigned long flags) +{ + char *count; + + count = get_printk_count(); + (*count)--; + local_irq_restore(flags); +} + int printk_delay_msec __read_mostly; static inline void printk_delay(void) @@ -1867,40 +1890,75 @@ static inline u32 printk_caller_id(void) 0x80000000 + raw_smp_processor_id(); } -/* Must be called under logbuf_lock. */ +static u16 printk_sprint(char *text, u16 size, int facility, enum log_flags *lflags, + const char *fmt, va_list args) +{ + char *orig_text = text; + u16 text_len; + + text_len = vscnprintf(text, size, fmt, args); + + /* Mark and strip a trailing newline. */ + if (text_len && text[text_len - 1] == '\n') { + text_len--; + *lflags |= LOG_NEWLINE; + } + + /* Strip kernel syslog prefix. */ + if (facility == 0) { + while (text_len >= 2 && printk_get_level(text)) { + text_len -= 2; + text += 2; + } + + if (text != orig_text) + memmove(orig_text, text, text_len); + } + + return text_len; +} + +__printf(4, 0) int vprintk_store(int facility, int level, const struct dev_printk_info *dev_info, const char *fmt, va_list args) { const u32 caller_id = printk_caller_id(); - static char textbuf[LOG_LINE_MAX]; struct prb_reserved_entry e; enum log_flags lflags = 0; struct printk_record r; + unsigned long irqflags; u16 trunc_msg_len = 0; - char *text = textbuf; + char lvlbuf[8]; + va_list args2; u16 text_len; + int ret = 0; u64 ts_nsec; ts_nsec = local_clock(); + if (!printk_enter(&irqflags)) + return 0; + + va_copy(args2, args); + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. */ - text_len = vscnprintf(text, sizeof(textbuf), fmt, args); + text_len = vsnprintf(&lvlbuf[0], sizeof(lvlbuf), fmt, args) + 1; + if (text_len > CONSOLE_LOG_MAX) + text_len = CONSOLE_LOG_MAX; - /* mark and strip a trailing newline */ - if (text_len && text[text_len-1] == '\n') { - text_len--; - lflags |= LOG_NEWLINE; - } - - /* strip kernel syslog prefix and extract log level or control flags */ + /* Extract log level or control flags. */ if (facility == 0) { int kern_level; + int i; - while ((kern_level = printk_get_level(text)) != 0) { + for (i = 0; i < sizeof(lvlbuf); i += 2) { + kern_level = printk_get_level(&lvlbuf[i]); + if (!kern_level) + break; switch (kern_level) { case '0' ... '7': if (level == LOGLEVEL_DEFAULT) @@ -1909,9 +1967,6 @@ int vprintk_store(int facility, int level, case 'c': /* KERN_CONT */ lflags |= LOG_CONT; } - - text_len -= 2; - text += 2; } } @@ -1924,7 +1979,8 @@ int vprintk_store(int facility, int level, if (lflags & LOG_CONT) { prb_rec_init_wr(&r, text_len); if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) { - memcpy(&r.text_buf[r.info->text_len], text, text_len); + text_len = printk_sprint(&r.text_buf[r.info->text_len], text_len, + facility, &lflags, fmt, args2); r.info->text_len += text_len; if (lflags & LOG_NEWLINE) { @@ -1934,7 +1990,8 @@ int vprintk_store(int facility, int level, prb_commit(&e); } - return text_len; + ret = text_len; + goto out; } } @@ -1945,11 +2002,11 @@ int vprintk_store(int facility, int level, prb_rec_init_wr(&r, text_len + trunc_msg_len); if (!prb_reserve(&e, prb, &r)) - return 0; + goto out; } /* fill message */ - memcpy(&r.text_buf[0], text, text_len); + text_len = printk_sprint(&r.text_buf[0], text_len, facility, &lflags, fmt, args2); if (trunc_msg_len) memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len); r.info->text_len = text_len + trunc_msg_len; @@ -1967,7 +2024,11 @@ int vprintk_store(int facility, int level, else prb_final_commit(&e); - return (text_len + trunc_msg_len); + ret = text_len + trunc_msg_len; +out: + va_end(args2); + printk_exit(irqflags); + return ret; } asmlinkage int vprintk_emit(int facility, int level, @@ -1976,7 +2037,6 @@ asmlinkage int vprintk_emit(int facility, int level, { int printed_len; bool in_sched = false; - unsigned long flags; /* Suppress unimportant messages after panic happens */ if (unlikely(suppress_printk)) @@ -1991,9 +2051,7 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay(); /* This stops the holder of console_sem just where we want him */ - logbuf_lock_irqsave(flags); printed_len = vprintk_store(facility, level, dev_info, fmt, args); - logbuf_unlock_irqrestore(flags); /* If called from the scheduler, we can not call up(). */ if (!in_sched) { @@ -2432,7 +2490,6 @@ void console_unlock(void) size_t len; printk_safe_enter_irqsave(flags); - raw_spin_lock(&logbuf_lock); skip: if (!prb_read_valid(prb, console_seq, &r)) break; @@ -2476,7 +2533,6 @@ void console_unlock(void) console_msg_format & MSG_FORMAT_SYSLOG, printk_time); console_seq++; - raw_spin_unlock(&logbuf_lock); /* * While actively printing out messages, if another printk() @@ -2503,8 +2559,6 @@ void console_unlock(void) console_locked = 0; - raw_spin_unlock(&logbuf_lock); - up_console_sem(); /* @@ -2513,9 +2567,7 @@ void console_unlock(void) * there's a new owner and the console_unlock() from them will do the * flush, no worries. */ - raw_spin_lock(&logbuf_lock); retry = prb_read_valid(prb, console_seq, NULL); - raw_spin_unlock(&logbuf_lock); printk_safe_exit_irqrestore(flags); if (retry && console_trylock()) @@ -2579,13 +2631,8 @@ void console_flush_on_panic(enum con_flush_mode mode) console_trylock(); console_may_schedule = 0; - if (mode == CONSOLE_REPLAY_ALL) { - unsigned long flags; - - logbuf_lock_irqsave(flags); + if (mode == CONSOLE_REPLAY_ALL) console_seq = prb_first_valid_seq(prb); - logbuf_unlock_irqrestore(flags); - } console_unlock(); } @@ -2809,11 +2856,7 @@ void register_console(struct console *newcon) nr_ext_console_drivers++; if (newcon->flags & CON_PRINTBUFFER) { - /* - * console_unlock(); will print out the buffered messages - * for us. - */ - logbuf_lock_irqsave(flags); + spin_lock_irqsave(&syslog_lock, flags); /* * We're about to replay the log buffer. Only do this to the * just-registered console to avoid excessive message spam to @@ -2826,7 +2869,7 @@ void register_console(struct console *newcon) exclusive_console = newcon; exclusive_console_stop_seq = console_seq; console_seq = syslog_seq; - logbuf_unlock_irqrestore(flags); + spin_unlock_irqrestore(&syslog_lock, flags); } console_unlock(); console_sysfs_notify(); @@ -3190,7 +3233,6 @@ EXPORT_SYMBOL_GPL(kmsg_dump_reason_str); void kmsg_dump(enum kmsg_dump_reason reason) { struct kmsg_dumper *dumper; - unsigned long flags; rcu_read_lock(); list_for_each_entry_rcu(dumper, &dump_list, list) { @@ -3210,10 +3252,7 @@ void kmsg_dump(enum kmsg_dump_reason reason) /* initialize iterator with data about the stored records */ dumper->active = true; - logbuf_lock_irqsave(flags); - dumper->cur_seq = atomic64_read(&clear_seq); - dumper->next_seq = prb_next_seq(prb); - logbuf_unlock_irqrestore(flags); + kmsg_dump_rewind_nolock(dumper); /* invoke dumper which will iterate over records */ dumper->dump(dumper, reason); @@ -3300,14 +3339,7 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog, bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog, char *line, size_t size, size_t *len) { - unsigned long flags; - bool ret; - - logbuf_lock_irqsave(flags); - ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len); - logbuf_unlock_irqrestore(flags); - - return ret; + return kmsg_dump_get_line_nolock(dumper, syslog, line, size, len); } EXPORT_SYMBOL_GPL(kmsg_dump_get_line); @@ -3336,7 +3368,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog, struct printk_info info; unsigned int line_count; struct printk_record r; - unsigned long flags; u64 seq; u64 next_seq; size_t l = 0; @@ -3348,17 +3379,14 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog, if (!dumper->active || !buf || !size) goto out; - logbuf_lock_irqsave(flags); if (dumper->cur_seq < prb_first_valid_seq(prb)) { /* messages are gone, move to first available one */ dumper->cur_seq = prb_first_valid_seq(prb); } /* last entry */ - if (dumper->cur_seq >= dumper->next_seq) { - logbuf_unlock_irqrestore(flags); + if (dumper->cur_seq >= dumper->next_seq) goto out; - } /* calculate length of entire buffer */ seq = dumper->cur_seq; @@ -3398,7 +3426,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog, dumper->next_seq = next_seq; ret = true; - logbuf_unlock_irqrestore(flags); out: if (len) *len = l; @@ -3413,8 +3440,6 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer); * Reset the dumper's iterator so that kmsg_dump_get_line() and * kmsg_dump_get_buffer() can be called again and used multiple * times within the same dumper.dump() callback. - * - * The function is similar to kmsg_dump_rewind(), but grabs no locks. */ void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper) { @@ -3432,11 +3457,7 @@ void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper) */ void kmsg_dump_rewind(struct kmsg_dumper *dumper) { - unsigned long flags; - - logbuf_lock_irqsave(flags); kmsg_dump_rewind_nolock(dumper); - logbuf_unlock_irqrestore(flags); } EXPORT_SYMBOL_GPL(kmsg_dump_rewind); diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index a0e6f746de6c..7248b6e3df6c 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -266,18 +266,6 @@ void printk_safe_flush(void) */ void printk_safe_flush_on_panic(void) { - /* - * Make sure that we could access the main ring buffer. - * Do not risk a double release when more CPUs are up. - */ - if (raw_spin_is_locked(&logbuf_lock)) { - if (num_online_cpus() > 1) - return; - - debug_locks_off(); - raw_spin_lock_init(&logbuf_lock); - } - printk_safe_flush(); } @@ -371,17 +359,15 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args) * Try to use the main logbuf even in NMI. But avoid calling console * drivers that might have their own locks. */ - if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK) && - raw_spin_trylock(&logbuf_lock)) { + if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) { int len; len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args); - raw_spin_unlock(&logbuf_lock); defer_console_output(); return len; } - /* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */ + /* Use extra buffer in NMI or in safe mode. */ if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) return vprintk_nmi(fmt, args); -- 2.20.1 ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness @ 2020-12-04 6:41 ` Sergey Senozhatsky 2020-12-06 20:44 ` John Ogness 2020-12-04 15:52 ` devkmsg: was " Petr Mladek ` (4 subsequent siblings) 5 siblings, 1 reply; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-04 6:41 UTC (permalink / raw) To: John Ogness Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/01 21:59), John Ogness wrote: > > +#ifdef CONFIG_PRINTK_NMI > +#define NUM_RECURSION_CTX 2 > +#else > +#define NUM_RECURSION_CTX 1 > +#endif > + > +struct printk_recursion { > + char count[NUM_RECURSION_CTX]; > +}; > + > +static DEFINE_PER_CPU(struct printk_recursion, percpu_printk_recursion); > +static char printk_recursion_count[NUM_RECURSION_CTX]; > + > +static char *get_printk_count(void) A nit: I think get_foo() has some sort of special meaning and one would expect that there should be put_foo() as well, and that those have something to do with the ref counting. > +{ > + struct printk_recursion *rec; > + char *count; > + > + if (!printk_percpu_data_ready()) { > + count = &printk_recursion_count[0]; > + } else { > + rec = this_cpu_ptr(&percpu_printk_recursion); > + > + count = &rec->count[0]; > + } > + > +#ifdef CONFIG_PRINTK_NMI > + if (in_nmi()) > + count++; > +#endif > + > + return count; > +} > + > +static bool printk_enter(unsigned long *flags) This better explicitly state that it disables local IRQs printk_enter_irqsave() I'm not very certain that printk_enter/printk_exit are best names: if (!printk_enter()) return; Maybe it can spell out why we couldn't enter printk so the function name can contain recursion_limit or something. > +{ > + char *count; > + > + local_irq_save(*flags); > + count = get_printk_count(); > + /* Only 1 level of recursion allowed. */ > + if (*count > 1) { > + local_irq_restore(*flags); > + return false; > + } > + (*count)++; > + > + return true; > +} > + > +static void printk_exit(unsigned long flags) This enables local IRQs, so printk_exit_irqrestore() > +{ > + char *count; > + > + count = get_printk_count(); > + (*count)--; > + local_irq_restore(flags); > +} -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-04 6:41 ` Sergey Senozhatsky @ 2020-12-06 20:44 ` John Ogness 0 siblings, 0 replies; 47+ messages in thread From: John Ogness @ 2020-12-06 20:44 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-04, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: >> +static char *get_printk_count(void) > > A nit: I think get_foo() has some sort of special meaning and one > would expect that there should be put_foo() as well, and that those > have something to do with the ref counting. OK. How about: static char *printk_recursion_counter(void) ? >> +static bool printk_enter(unsigned long *flags) > > This better explicitly state that it disables local IRQs > > printk_enter_irqsave() > > I'm not very certain that printk_enter/printk_exit are best names: > > if (!printk_enter()) > return; > > Maybe it can spell out why we couldn't enter printk so the function > name can contain recursion_limit or something. printk_recurse_irqsave() ? I would prefer printk_enter_irqsave() and just add comments that say it is checking the recursion level. It is only used in one place. >> +static void printk_exit(unsigned long flags) > > This enables local IRQs, so > > printk_exit_irqrestore() Yes, I like that. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* devkmsg: was [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness 2020-12-04 6:41 ` Sergey Senozhatsky @ 2020-12-04 15:52 ` Petr Mladek 2020-12-06 20:51 ` John Ogness 2020-12-04 15:57 ` syslog: was: " Petr Mladek ` (3 subsequent siblings) 5 siblings, 1 reply; 47+ messages in thread From: Petr Mladek @ 2020-12-04 15:52 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Tue 2020-12-01 21:59:41, John Ogness wrote: > Since the ringbuffer is lockless, there is no need for it to be > protected by @logbuf_lock. Remove @logbuf_lock. I am going to split the feedback into few mails. It might make sense to split also this patch into few more pieces that would remove the lock from a particular interface. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index e9018c4e1b66..7385101210be 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -785,7 +749,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence) > if (offset) > return -ESPIPE; > > - logbuf_lock_irq(); user->seq manipulation is not longer safe from the atomicity point of view. One solution would be to use atomic variable in struct devkmsg_user(). Another solution would be to synchronize it with user->lock like we do in devkmsg_read(). user->lock looks like an overhead. But it actually would make sense to prevent seek in the middle of a read. > switch (whence) { > case SEEK_SET: > /* the first record */ > @@ -820,7 +782,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait) > > poll_wait(file, &log_wait, wait); > > - logbuf_lock_irq(); > if (prb_read_valid(prb, user->seq, NULL)) { Same here. The atomicity of user->seq read/write is not guaranteed. > /* return error when data has vanished underneath us */ > if (user->seq < prb_first_valid_seq(prb)) Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: devkmsg: was [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-04 15:52 ` devkmsg: was " Petr Mladek @ 2020-12-06 20:51 ` John Ogness 2020-12-07 9:56 ` Petr Mladek 0 siblings, 1 reply; 47+ messages in thread From: John Ogness @ 2020-12-06 20:51 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: >> Since the ringbuffer is lockless, there is no need for it to be >> protected by @logbuf_lock. Remove @logbuf_lock. > > It might make sense to split also this patch into few more pieces that > would remove the lock from a particular interface. OK. >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index e9018c4e1b66..7385101210be 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -785,7 +749,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence) >> if (offset) >> return -ESPIPE; >> >> - logbuf_lock_irq(); > > user->seq manipulation is not longer safe from the atomicity point of > view. > > One solution would be to use atomic variable in struct devkmsg_user(). > Another solution would be to synchronize it with user->lock like we do > in devkmsg_read(). > > user->lock looks like an overhead. But it actually would make sense to > prevent seek in the middle of a read. I would prefer using atomic64_t. Using user->lock could introduce some wacky regression. >> switch (whence) { >> case SEEK_SET: >> /* the first record */ >> @@ -820,7 +782,6 @@ static __poll_t devkmsg_poll(struct file *file, poll_table *wait) >> >> poll_wait(file, &log_wait, wait); >> >> - logbuf_lock_irq(); >> if (prb_read_valid(prb, user->seq, NULL)) { > > Same here. The atomicity of user->seq read/write is not guaranteed. Right. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: devkmsg: was [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-06 20:51 ` John Ogness @ 2020-12-07 9:56 ` Petr Mladek 0 siblings, 0 replies; 47+ messages in thread From: Petr Mladek @ 2020-12-07 9:56 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Sun 2020-12-06 21:57:46, John Ogness wrote: > On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: > >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > >> index e9018c4e1b66..7385101210be 100644 > >> --- a/kernel/printk/printk.c > >> +++ b/kernel/printk/printk.c > >> @@ -785,7 +749,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence) > >> if (offset) > >> return -ESPIPE; > >> > >> - logbuf_lock_irq(); > > > > user->seq manipulation is not longer safe from the atomicity point of > > view. > > > > One solution would be to use atomic variable in struct devkmsg_user(). > > Another solution would be to synchronize it with user->lock like we do > > in devkmsg_read(). > > > > user->lock looks like an overhead. But it actually would make sense to > > prevent seek in the middle of a read. > > I would prefer using atomic64_t. Using user->lock could introduce some > wacky regression. OK, fair enough. User space might do crazy stuff. Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* syslog: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness 2020-12-04 6:41 ` Sergey Senozhatsky 2020-12-04 15:52 ` devkmsg: was " Petr Mladek @ 2020-12-04 15:57 ` Petr Mladek 2020-12-06 21:06 ` John Ogness 2020-12-04 16:10 ` recursion handling: " Petr Mladek ` (2 subsequent siblings) 5 siblings, 1 reply; 47+ messages in thread From: Petr Mladek @ 2020-12-04 15:57 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Tue 2020-12-01 21:59:41, John Ogness wrote: > Since the ringbuffer is lockless, there is no need for it to be > protected by @logbuf_lock. Remove @logbuf_lock. > > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1490,19 +1444,30 @@ static int syslog_print_all(char __user *buf, int size, bool clear) > return -ENOMEM; > > time = printk_time; > - logbuf_lock_irq(); > clr_seq = atomic64_read(&clear_seq); > > /* > * Find first record that fits, including all following records, > * into the user-provided buffer for this dump. > */ > + > prb_for_each_info(clr_seq, prb, seq, &info, &line_count) > len += get_record_print_text_size(&info, line_count, true, time); > > - /* move first record forward until length fits into the buffer */ > + /* > + * Keep track of the latest in case new records are coming in fast > + * and overwriting the older records. > + */ "overwriting the older records" sounds like the code is somehow able to remove the overwritten records from "len". But it is not true. > + newest_seq = seq; > + > + /* > + * Move first record forward until length fits into the buffer. This > + * is a best effort attempt. If @newest_seq is reached because the > + * ringbuffer is wrapping too fast, just start filling the buffer > + * from there. > + */ It might be that I do not understand English well. But "start filling the buffer from there" sounds like we start filling the buffer from "newest_seq". What about the following? /* * Move first record forward until length fits into the buffer. * Ignore newest messages that were not counted in the above * cycle. Messages might appear and get lost in the meantime. * This is the best effort that prevents an infinite loop. */ newest_seq = seq; > prb_for_each_info(clr_seq, prb, seq, &info, &line_count) { > - if (len <= size) > + if (len <= size || info.seq > newest_seq) > break; > len -= get_record_print_text_size(&info, line_count, true, time); > } > @@ -1568,8 +1529,11 @@ int do_syslog(int type, char __user *buf, int len, int source) > return 0; > if (!access_ok(buf, len)) > return -EFAULT; > + spin_lock_irq(&syslog_lock); > + seq = syslog_seq; > + spin_unlock_irq(&syslog_lock); It would deserve a comment that the locking is needed to guarantee atomicity of the operation. > error = wait_event_interruptible(log_wait, > - prb_read_valid(prb, syslog_seq, NULL)); > + prb_read_valid(prb, seq, NULL)); > if (error) > return error; > error = syslog_print(buf, len); > @@ -2809,11 +2856,7 @@ void register_console(struct console *newcon) > nr_ext_console_drivers++; > > if (newcon->flags & CON_PRINTBUFFER) { > - /* > - * console_unlock(); will print out the buffered messages > - * for us. > - */ > - logbuf_lock_irqsave(flags); > + spin_lock_irqsave(&syslog_lock, flags); We should take the lock only around assigning syslog_seq. And add a comment that it guarantees atomic update. > /* > * We're about to replay the log buffer. Only do this to the > * just-registered console to avoid excessive message spam to > @@ -2826,7 +2869,7 @@ void register_console(struct console *newcon) > exclusive_console = newcon; > exclusive_console_stop_seq = console_seq; > console_seq = syslog_seq; > - logbuf_unlock_irqrestore(flags); > + spin_unlock_irqrestore(&syslog_lock, flags); > } > console_unlock(); > console_sysfs_notify(); Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: syslog: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-04 15:57 ` syslog: was: " Petr Mladek @ 2020-12-06 21:06 ` John Ogness 2020-12-07 10:01 ` Petr Mladek 0 siblings, 1 reply; 47+ messages in thread From: John Ogness @ 2020-12-06 21:06 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: > On Tue 2020-12-01 21:59:41, John Ogness wrote: >> Since the ringbuffer is lockless, there is no need for it to be >> protected by @logbuf_lock. Remove @logbuf_lock. >> >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -1490,19 +1444,30 @@ static int syslog_print_all(char __user *buf, int size, bool clear) >> return -ENOMEM; >> >> time = printk_time; >> - logbuf_lock_irq(); >> clr_seq = atomic64_read(&clear_seq); >> >> /* >> * Find first record that fits, including all following records, >> * into the user-provided buffer for this dump. >> */ >> + >> prb_for_each_info(clr_seq, prb, seq, &info, &line_count) >> len += get_record_print_text_size(&info, line_count, true, time); >> >> - /* move first record forward until length fits into the buffer */ >> + /* >> + * Keep track of the latest in case new records are coming in fast >> + * and overwriting the older records. >> + */ Your suggestion to merge this and the next comment block is fine. >> + newest_seq = seq; >> + >> + /* >> + * Move first record forward until length fits into the buffer. This >> + * is a best effort attempt. If @newest_seq is reached because the >> + * ringbuffer is wrapping too fast, just start filling the buffer >> + * from there. >> + */ > > It might be that I do not understand English well. But "start filling > the buffer from there" sounds like we start filling the buffer from > "newest_seq". > > What about the following? > > /* > * Move first record forward until length fits into the buffer. > * Ignore newest messages that were not counted in the above > * cycle. Messages might appear and get lost in the meantime. > * This is the best effort that prevents an infinite loop. > */ > newest_seq = seq; OK. >> prb_for_each_info(clr_seq, prb, seq, &info, &line_count) { >> - if (len <= size) >> + if (len <= size || info.seq > newest_seq) >> break; >> len -= get_record_print_text_size(&info, line_count, true, time); >> } >> @@ -1568,8 +1529,11 @@ int do_syslog(int type, char __user *buf, int len, int source) >> return 0; >> if (!access_ok(buf, len)) >> return -EFAULT; >> + spin_lock_irq(&syslog_lock); >> + seq = syslog_seq; >> + spin_unlock_irq(&syslog_lock); > > It would deserve a comment that the locking is needed to guarantee > atomicity of the operation. OK. >> error = wait_event_interruptible(log_wait, >> - prb_read_valid(prb, syslog_seq, NULL)); >> + prb_read_valid(prb, seq, NULL)); >> if (error) >> return error; >> error = syslog_print(buf, len); >> @@ -2809,11 +2856,7 @@ void register_console(struct console *newcon) >> nr_ext_console_drivers++; >> >> if (newcon->flags & CON_PRINTBUFFER) { >> - /* >> - * console_unlock(); will print out the buffered messages >> - * for us. >> - */ >> - logbuf_lock_irqsave(flags); >> + spin_lock_irqsave(&syslog_lock, flags); > > We should take the lock only around assigning syslog_seq. And add a > comment that it guarantees atomic update. OK. So you just want "exclusive_console = newcon;" moved outside the critical section. >> /* >> * We're about to replay the log buffer. Only do this to the >> * just-registered console to avoid excessive message spam to >> @@ -2826,7 +2869,7 @@ void register_console(struct console *newcon) >> exclusive_console = newcon; >> exclusive_console_stop_seq = console_seq; >> console_seq = syslog_seq; >> - logbuf_unlock_irqrestore(flags); >> + spin_unlock_irqrestore(&syslog_lock, flags); >> } >> console_unlock(); >> console_sysfs_notify(); John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: syslog: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-06 21:06 ` John Ogness @ 2020-12-07 10:01 ` Petr Mladek 0 siblings, 0 replies; 47+ messages in thread From: Petr Mladek @ 2020-12-07 10:01 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Sun 2020-12-06 22:12:21, John Ogness wrote: > On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: > > On Tue 2020-12-01 21:59:41, John Ogness wrote: > >> Since the ringbuffer is lockless, there is no need for it to be > >> protected by @logbuf_lock. Remove @logbuf_lock. > >> > >> --- a/kernel/printk/printk.c > >> +++ b/kernel/printk/printk.c > >> @@ -2809,11 +2856,7 @@ void register_console(struct console *newcon) > >> nr_ext_console_drivers++; > >> > >> if (newcon->flags & CON_PRINTBUFFER) { > >> - /* > >> - * console_unlock(); will print out the buffered messages > >> - * for us. > >> - */ > >> - logbuf_lock_irqsave(flags); > >> + spin_lock_irqsave(&syslog_lock, flags); > > > > We should take the lock only around assigning syslog_seq. And add a > > comment that it guarantees atomic update. > > OK. So you just want "exclusive_console = newcon;" moved outside the > critical section. Exactly, I would like to make it clear that it synchronizes only the single assignment. Otherwise, people might get wrong assumption. I know that there is a comment describing the scope of each lock. But people might miss it or do not search for it at all. Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness ` (2 preceding siblings ...) 2020-12-04 15:57 ` syslog: was: " Petr Mladek @ 2020-12-04 16:10 ` Petr Mladek 2020-12-05 4:25 ` Sergey Senozhatsky ` (2 more replies) 2020-12-04 16:15 ` vprintk_store: was: " Petr Mladek 2020-12-04 16:19 ` consoles: " Petr Mladek 5 siblings, 3 replies; 47+ messages in thread From: Petr Mladek @ 2020-12-04 16:10 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Tue 2020-12-01 21:59:41, John Ogness wrote: > Since the ringbuffer is lockless, there is no need for it to be > protected by @logbuf_lock. Remove @logbuf_lock. > > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1847,6 +1811,65 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, > } > } The recursion-related code needs some explanation or we should do it another way. I spent quite some time on it and I am still not sure that I understand it. Let me describe how I understand it. > +#ifdef CONFIG_PRINTK_NMI > +#define NUM_RECURSION_CTX 2 > +#else > +#define NUM_RECURSION_CTX 1 > +#endif OK, the number of context is limited because interrupts are disabled inside print_enter()/printk_exit(). It is basically the same reason why we have only two printk_safe buffers (NNI + other contexts). What is the exact reason to disable interrupts around the entire vprintk_store(), please? It should get documented. One reason is the use of per-cpu variables. Alternative solution would be to store printk_context into task_struct. Well, I am not sure if "current" task is available during early boot. But it might solve problems with per-cpu variables that are not working during early boot. That said, I am not sure if it is worth it. > + > +struct printk_recursion { > + char count[NUM_RECURSION_CTX]; > +}; > > +static DEFINE_PER_CPU(struct printk_recursion, percpu_printk_recursion); > +static char printk_recursion_count[NUM_RECURSION_CTX]; This is pretty confusing. The array is hidden in a struct when per-cpu variables are used. And a naked array is used for early boot. Is the structure really needed? What about? static DEFINE_PER_CPU(char [PRINTK_CTX_NUM], printk_count); static char printk_count_early[NUM_RECURSION_CTX]; > + > +static char *get_printk_count(void) > +{ > + struct printk_recursion *rec; > + char *count; > + > + if (!printk_percpu_data_ready()) { > + count = &printk_recursion_count[0]; I see why you avoided per-cpu variables for early boot. I am just curious how printk_context variable works these days. It is used by any printk(), including early code, see vprintk_func(). > + } else { > + rec = this_cpu_ptr(&percpu_printk_recursion); > + > + count = &rec->count[0]; > + } > + > +#ifdef CONFIG_PRINTK_NMI > + if (in_nmi()) > + count++; > +#endif This is extremely confusing. It is far from obvious that the pointer and not the value is incremented. If we really need this to avoid per-cpu variables during early boot then a more clear implementation would be: char *get_printk_counter_by_ctx() { int ctx = 0; if (in_nmi) ctx = 1; if (!printk_percpu_data_ready()) return &printk_count_early[ctx]; return this_cpu_ptr(printk_count[ctx]); } > + > + return count; > +} > + > +static bool printk_enter(unsigned long *flags) > +{ > + char *count; > + > + local_irq_save(*flags); > + count = get_printk_count(); > + /* Only 1 level of recursion allowed. */ We should allow at least some level of recursion. Otherwise, we would not see warnings printed from vsprintf code. > + if (*count > 1) { > + local_irq_restore(*flags); > + return false; > + } > + (*count)++; > + > + return true; > +} This should be unified with printk_context, printk_nmi_enter(), printk_nmi_exit(). It does not make sense to have two separate printk context counters. Or is there any plan to remove printk_safe and printk_context? BTW: I prefer to use the bitmask approach. It allows to check the normal context by a single operation (no bit is set). There is no need to go through all counters. Note that we need at least one more context for gdb. Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-04 16:10 ` recursion handling: " Petr Mladek @ 2020-12-05 4:25 ` Sergey Senozhatsky 2020-12-06 22:08 ` John Ogness 2020-12-05 9:41 ` Sergey Senozhatsky 2020-12-06 21:44 ` John Ogness 2 siblings, 1 reply; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-05 4:25 UTC (permalink / raw) To: Petr Mladek Cc: John Ogness, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/04 17:10), Petr Mladek wrote: [..] > char *get_printk_counter_by_ctx() > { > int ctx = 0; > > if (in_nmi) > ctx = 1; > > if (!printk_percpu_data_ready()) > return &printk_count_early[ctx]; > > return this_cpu_ptr(printk_count[ctx]); > } > > > + > > + return count; > > +} > > + > > +static bool printk_enter(unsigned long *flags) > > +{ > > + char *count; > > + > > + local_irq_save(*flags); > > + count = get_printk_count(); > > + /* Only 1 level of recursion allowed. */ > > We should allow at least some level of recursion. Otherwise, we would > not see warnings printed from vsprintf code. One level of recursion seems reasonable on one hand, but on the other hand I also wonder if we can get useful info from recursion levels 2 and higher. Would be great to maybe list possible scenarios. vsprintf() still call re-enter printk() -- WARN_ONCE()-s in the code -- external format specifiers handlers also can. Would we need to let two levels of recursion printk->vsprintf->printk->???->printk or one is just enough? It also would make sense to add the lost messages counter to per-CPU recursion counter struct, to count the number of times we bailout of printk due to recursion limit. So that we'll at least have "%d recursive printk messages lost" hints. Overall... I wonder where does the "limit printk recursion" come from? printk_safe doesn't impose any strict limits (print_context is limited, but still) and we've been running it for years now; have we ever seen any reports of printk recursion overflows? > > + if (*count > 1) { > > + local_irq_restore(*flags); > > + return false; > > + } > > + (*count)++; > > + > > + return true; > > +} > > This should be unified with printk_context, printk_nmi_enter(), > printk_nmi_exit(). It does not make sense to have two separate > printk context counters. Agreed. > Or is there any plan to remove printk_safe and printk_context? That's a good point. This patch set and printk_safe answer the same question in different ways, as far as I understand it. The question is "Why do we want to track printk recursion"? This patch set merely wants to, correct me if I'm wrong, avoid the very deep vprintk_store() recursion stacks (which is a subset of printk() recursion superset): vprintk_store() { if (!printk_enter()) return; vsprintf/prb print_exit(); } And that's pretty much it, at least for the time being. printk_safe()'s answer is - we don't want to re-enter parts of the kernel that sit in the core, behind the scenes, and that are not ready to be re-entered. Things like printk() down_console_sem() down() raw_spin_lock_irqsave(&sem->lock) printk() down_console_sem() down() raw_spin_lock_irqsave(&sem->lock) -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-05 4:25 ` Sergey Senozhatsky @ 2020-12-06 22:08 ` John Ogness 0 siblings, 0 replies; 47+ messages in thread From: John Ogness @ 2020-12-06 22:08 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-05, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: >> We should allow at least some level of recursion. Otherwise, we would >> not see warnings printed from vsprintf code. > > One level of recursion seems reasonable on one hand, but on the other > hand I also wonder if we can get useful info from recursion levels 2 > and higher. Would be great to maybe list possible scenarios. vsprintf() > still call re-enter printk() -- WARN_ONCE()-s in the code -- external > format specifiers handlers also can. Would we need to let two levels of > recursion printk->vsprintf->printk->???->printk or one is just enough? > > It also would make sense to add the lost messages counter to per-CPU > recursion counter struct, to count the number of times we bailout > of printk due to recursion limit. So that we'll at least have > "%d recursive printk messages lost" hints. We do not need such a counter to be per-cpu. A global atomic_long_t would suffice. Although I am not sure if that should be separate from the @fail member of the ringbuffer. > Overall... > I wonder where does the "limit printk recursion" come from? printk_safe > doesn't impose any strict limits (print_context is limited, but still) > and we've been running it for years now; have we ever seen any reports > of printk recursion overflows? The printk code is undergoing some major changes and we have already had bugs slip into the merge window. IMHO the additional code to track the recursion does not add significant complexity or runtime overhead. I would prefer to keep it until we are finished with this major rework. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-04 16:10 ` recursion handling: " Petr Mladek 2020-12-05 4:25 ` Sergey Senozhatsky @ 2020-12-05 9:41 ` Sergey Senozhatsky 2020-12-06 22:17 ` John Ogness 2020-12-06 21:44 ` John Ogness 2 siblings, 1 reply; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-05 9:41 UTC (permalink / raw) To: Petr Mladek Cc: John Ogness, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/04 17:10), Petr Mladek wrote: > > One reason is the use of per-cpu variables. Alternative solution would > be to store printk_context into task_struct. We can keep per-CPU, disable preemption and have counters for every context (task, soft/hard irq, NMI). Shouldn't be a problem vprintk_emit() { preempt_disable() vprintk_store() preempt_enable() preempt_disable() console_unlock() preempt_enable() } vprintk_store() is a small fraction of console_unlock() time wise. -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-05 9:41 ` Sergey Senozhatsky @ 2020-12-06 22:17 ` John Ogness 0 siblings, 0 replies; 47+ messages in thread From: John Ogness @ 2020-12-06 22:17 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-05, Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: >> One reason is the use of per-cpu variables. Alternative solution would >> be to store printk_context into task_struct. > > We can keep per-CPU, disable preemption and have counters for > every context (task, soft/hard irq, NMI). Shouldn't be a problem These contexts are difficult to track correctly for all the preemption models (especially when PREEMPT_RT is included). But I will look into this idea. It would be nice to keep interrupts enabled for that first vsnprintf() in vprintk_store(). John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-04 16:10 ` recursion handling: " Petr Mladek 2020-12-05 4:25 ` Sergey Senozhatsky 2020-12-05 9:41 ` Sergey Senozhatsky @ 2020-12-06 21:44 ` John Ogness 2020-12-07 11:17 ` Petr Mladek 2 siblings, 1 reply; 47+ messages in thread From: John Ogness @ 2020-12-06 21:44 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: > On Tue 2020-12-01 21:59:41, John Ogness wrote: >> Since the ringbuffer is lockless, there is no need for it to be >> protected by @logbuf_lock. Remove @logbuf_lock. >> >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -1847,6 +1811,65 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, >> } >> } > > The recursion-related code needs some explanation or we should do it > another way. I spent quite some time on it and I am still not sure > that I understand it. Sorry. :-/ > Let me describe how I understand it. > >> +#ifdef CONFIG_PRINTK_NMI >> +#define NUM_RECURSION_CTX 2 >> +#else >> +#define NUM_RECURSION_CTX 1 >> +#endif > > OK, the number of context is limited because interrupts are disabled > inside print_enter()/printk_exit(). It is basically the same reason > why we have only two printk_safe buffers (NNI + other contexts). Correct. > What is the exact reason to disable interrupts around the entire > vprintk_store(), please? It should get documented. It simplifies the context tracking. Also, in mainline interrupts are already disabled for all of vprintk_store(). AFAIK latencies due to logbuf_lock contention were not an issue. > One reason is the use of per-cpu variables. Alternative solution would > be to store printk_context into task_struct. Well, I am not sure if > "current" task is available during early boot. But it might solve > problems with per-cpu variables that are not working during early > boot. > > That said, I am not sure if it is worth it. I really don't want to touch task_struct. IMHO the usefulness of that struct is limited, considering that printk can be called from scheduling and interrupting contexts. >> + >> +struct printk_recursion { >> + char count[NUM_RECURSION_CTX]; >> +}; >> >> +static DEFINE_PER_CPU(struct printk_recursion, percpu_printk_recursion); >> +static char printk_recursion_count[NUM_RECURSION_CTX]; > > This is pretty confusing. The array is hidden in a struct when per-cpu > variables are used. And a naked array is used for early boot. > > Is the structure really needed? What about? > > static DEFINE_PER_CPU(char [PRINTK_CTX_NUM], printk_count); > static char printk_count_early[NUM_RECURSION_CTX]; OK. >> + >> +static char *get_printk_count(void) >> +{ >> + struct printk_recursion *rec; >> + char *count; >> + >> + if (!printk_percpu_data_ready()) { >> + count = &printk_recursion_count[0]; > > I see why you avoided per-cpu variables for early boot. I am just > curious how printk_context variable works these days. It is used by > any printk(), including early code, see vprintk_func(). IMO printk_context is serving a different purpose. With the existance of logbuf_lock, printk_context exists for the sole purpose of making sure logbuf_lock is not taken recursively or that the CPU does not spin on it in NMI context. printk_context is simply gating calls to the safe buffers. For the lockless ringbuffer, there is no issue of taking a lock recursively or dangers from NMI. There is no need for the printk_context "gate". However, IMHO there is a real danger if a bug in printk (or its ringbuffer) lead to infinite recursion. This new recursion counter is offering safety against this scenario. Until now this scenario has been ignored. So I suppose if we are comfortable with continuing to ignore the scenario, then we don't need to track the recursion level. To test, I triggered artificial WARNs in vsnprintf() of printk code. I found it nice to be able to see the stack trace going into printk and at the same time I was relieved that such a nested warning was not blasting the system into infinite recursion. >> + } else { >> + rec = this_cpu_ptr(&percpu_printk_recursion); >> + >> + count = &rec->count[0]; >> + } >> + >> +#ifdef CONFIG_PRINTK_NMI >> + if (in_nmi()) >> + count++; >> +#endif > > This is extremely confusing. It is far from obvious that > the pointer and not the value is incremented. > > If we really need this to avoid per-cpu variables during early boot > then a more clear implementation would be: > > char *get_printk_counter_by_ctx() > { > int ctx = 0; > > if (in_nmi) > ctx = 1; > > if (!printk_percpu_data_ready()) > return &printk_count_early[ctx]; > > return this_cpu_ptr(printk_count[ctx]); > } Yes, much cleaner. Thank you. >> + >> + return count; >> +} >> + >> +static bool printk_enter(unsigned long *flags) >> +{ >> + char *count; >> + >> + local_irq_save(*flags); >> + count = get_printk_count(); >> + /* Only 1 level of recursion allowed. */ > > We should allow at least some level of recursion. Otherwise, we would > not see warnings printed from vsprintf code. With 1 level, you will see warnings from vsprintf code. I'm not sure it makes sense to allow more than 1 level. It causes exponential logging. >> + if (*count > 1) { >> + local_irq_restore(*flags); >> + return false; >> + } >> + (*count)++; >> + >> + return true; >> +} > > This should be unified with printk_context, printk_nmi_enter(), > printk_nmi_exit(). It does not make sense to have two separate > printk context counters. > > Or is there any plan to remove printk_safe and printk_context? Yes, I plan to remove the safe buffers, which also removes printk_safe.c and the printk_context "gate". > BTW: I prefer to use the bitmask approach. It allows to check > the normal context by a single operation (no bit is set). > There is no need to go through all counters. OK. > Note that we need at least one more context for gdb. Ah yes, thank you. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: recursion handling: Re: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-06 21:44 ` John Ogness @ 2020-12-07 11:17 ` Petr Mladek 0 siblings, 0 replies; 47+ messages in thread From: Petr Mladek @ 2020-12-07 11:17 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Sun 2020-12-06 22:50:54, John Ogness wrote: > On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: > > On Tue 2020-12-01 21:59:41, John Ogness wrote: > >> Since the ringbuffer is lockless, there is no need for it to be > >> protected by @logbuf_lock. Remove @logbuf_lock. > >> > >> --- a/kernel/printk/printk.c > >> +++ b/kernel/printk/printk.c > > What is the exact reason to disable interrupts around the entire > > vprintk_store(), please? It should get documented. > > It simplifies the context tracking. Also, in mainline interrupts are > already disabled for all of vprintk_store(). AFAIK latencies due to > logbuf_lock contention were not an issue. > > I really don't want to touch task_struct. IMHO the usefulness of that > struct is limited, considering that printk can be called from scheduling > and interrupting contexts. Fair enough. I am fine with the per-CPU variables and the disabled interrupts around vprintk_store(). Note: We should also prevent calling console_trylock() for recursive messages to avoid infinite loop or even deadlock in this part of the code. > >> + > >> +struct printk_recursion { > >> + char count[NUM_RECURSION_CTX]; > >> +}; > >> > >> +static DEFINE_PER_CPU(struct printk_recursion, percpu_printk_recursion); > >> +static char printk_recursion_count[NUM_RECURSION_CTX]; > > > > This is pretty confusing. The array is hidden in a struct when per-cpu > > variables are used. And a naked array is used for early boot. > > > > Is the structure really needed? What about? > > > > static DEFINE_PER_CPU(char [PRINTK_CTX_NUM], printk_count); > > static char printk_count_early[NUM_RECURSION_CTX]; > > OK. > > >> + > >> +static char *get_printk_count(void) > >> +{ > >> + struct printk_recursion *rec; > >> + char *count; > >> + > >> + if (!printk_percpu_data_ready()) { > >> + count = &printk_recursion_count[0]; > > > > I see why you avoided per-cpu variables for early boot. I am just > > curious how printk_context variable works these days. It is used by > > any printk(), including early code, see vprintk_func(). > > IMO printk_context is serving a different purpose. With the existance of > logbuf_lock, printk_context exists for the sole purpose of making sure > logbuf_lock is not taken recursively or that the CPU does not spin on it > in NMI context. printk_context is simply gating calls to the safe > buffers. Well, both printk_count and printk_context are able to count recursion in different context. They both are used to decide how printk() will behave... Anyway, it is not necessary to fight over words. You write below that the plan is to remove printk_safe, including printk_context. It will solve my problem. I am fine with having both of them for some transition period. I guess that it will make our life easier, from the coding and review point of view. > >> +static bool printk_enter(unsigned long *flags) > >> +{ > >> + char *count; > >> + > >> + local_irq_save(*flags); > >> + count = get_printk_count(); > >> + /* Only 1 level of recursion allowed. */ > > > > We should allow at least some level of recursion. Otherwise, we would > > not see warnings printed from vsprintf code. > > With 1 level, you will see warnings from vsprintf code. I'm not sure it > makes sense to allow more than 1 level. It causes exponential logging. Shame on me. I have missed that 1 level was actually enabled. Anyway, I would like to allow 2 level recursion at least. However, for example, 5 level recursion, would be even better. We need to know when there is problem to store the recursive/nested message. And I would rather see the same message repeated 3 times than to do not see it at all. Note that the risk of infinite recursion is pretty low. We have most of the code called from vprintk_emit() under control. There are many pr_once() or WARN_ONCE(). The error messages have rather simple and commonly used formatting, so the risk of recursive errors in vsprintf() code is low. > > Or is there any plan to remove printk_safe and printk_context? I am feeling relief to read this. Do not take me wrong. printk_safe() was really great idea and served well its purpose. But it is yet another tricky lockless code. There is another complexity with flushing the temporary buffers and handling panic(). It is nice that we could remove some of this complexity. Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* vprintk_store: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness ` (3 preceding siblings ...) 2020-12-04 16:10 ` recursion handling: " Petr Mladek @ 2020-12-04 16:15 ` Petr Mladek 2020-12-06 22:30 ` John Ogness 2020-12-04 16:19 ` consoles: " Petr Mladek 5 siblings, 1 reply; 47+ messages in thread From: Petr Mladek @ 2020-12-04 16:15 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Tue 2020-12-01 21:59:41, John Ogness wrote: > Since the ringbuffer is lockless, there is no need for it to be > protected by @logbuf_lock. Remove @logbuf_lock. > > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1867,40 +1890,75 @@ static inline u32 printk_caller_id(void) > 0x80000000 + raw_smp_processor_id(); > } > > -/* Must be called under logbuf_lock. */ > +static u16 printk_sprint(char *text, u16 size, int facility, enum log_flags *lflags, > + const char *fmt, va_list args) > +{ > + char *orig_text = text; > + u16 text_len; > + > + text_len = vscnprintf(text, size, fmt, args); > + > + /* Mark and strip a trailing newline. */ > + if (text_len && text[text_len - 1] == '\n') { > + text_len--; > + *lflags |= LOG_NEWLINE; > + } We reserve the space for '\n' anyway. It would make sense to just store it and remove all these LOG_NEWLINE-specific hacks. Well, let's leave it as is now. It is still possible that people will not love this approach and we will need to format the message into some temporary buffer first. > + /* Strip kernel syslog prefix. */ Syslog actually uses: <level> format. We are skipping log level and control flags here. > + if (facility == 0) { > + while (text_len >= 2 && printk_get_level(text)) { > + text_len -= 2; > + text += 2; > + } We should avoid two completely different approaches that handle printk_level prefix. One solution is to implement something like: static char *parse_prefix(text, &level, &flags) That would return pointer to the text after the prefix. And fill level and flags only when non-NULL pointers are passed. Another solution would be to pass this information from vprintk_store(). The prefix has already been parsed after all. > + > + if (text != orig_text) > + memmove(orig_text, text, text_len); > + } We should clear the freed space to make the ring buffer as human readable as possible when someone just dumps the memory. Sigh, I have to admit that I missed the problem with prefix and trailing '\n' when I suggested to avoid the temporary buffers. This memmove() and the space wasting is pity. Well, it is typically 3 bytes per message. And the copying would be necessary even with the temporary buffer. So, I am less convinced but I would still try to avoid the temporary buffers for now. > + > + return text_len; > +} > + > +__printf(4, 0) Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: vprintk_store: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-04 16:15 ` vprintk_store: was: " Petr Mladek @ 2020-12-06 22:30 ` John Ogness 2020-12-07 12:46 ` Petr Mladek 0 siblings, 1 reply; 47+ messages in thread From: John Ogness @ 2020-12-06 22:30 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: >> + /* Strip kernel syslog prefix. */ > > Syslog actually uses: <level> format. We are skipping log level > and control flags here. OK. >> + if (facility == 0) { >> + while (text_len >= 2 && printk_get_level(text)) { >> + text_len -= 2; >> + text += 2; >> + } > > We should avoid two completely different approaches > that handle printk_level prefix. > > One solution is to implement something like: > > static char *parse_prefix(text, &level, &flags) > > That would return pointer to the text after the prefix. > And fill level and flags only when non-NULL pointers are passed. OK. > Another solution would be to pass this information from > vprintk_store(). The prefix has already been parsed > after all. Well, there is a vscnprintf() that happens in between and I don't think we should trust the parsed offset from the first vsnprintf(). >> + >> + if (text != orig_text) >> + memmove(orig_text, text, text_len); >> + } > > We should clear the freed space to make the ring buffer as > human readable as possible when someone just dumps the memory. Data blocks are currently padded and that padding is not cleared. So it is already not perfectly human readable on a raw dump. > Sigh, I have to admit that I missed the problem with prefix and > trailing '\n' when I suggested to avoid the temporary buffers. > This memmove() and the space wasting is pity. > > Well, it is typically 3 bytes per message. And the copying would > be necessary even with the temporary buffer. So, I am less convinced > but I would still try to avoid the temporary buffers for now. Agreed. I think this approach is better than the temporary buffers I previously used. Also, if we add a trimming feature to the ringbuffer, it will keep the ringbuffer mostly clean anyway. Something like this: prb_rec_init_wr(&r, text_len); prb_reserve(&e, prb, &r); text_len = printk_sprint(&r.text_buf[0], text_len, ...); r.info->text_len = text_len; prb_trim_rec(&e, &r); <--- try to reduce datablock size to @text_len prb_commit(&e); I see no urgency to add such a feature. But I think we should keep it on our radar. John Ogness ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: vprintk_store: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-06 22:30 ` John Ogness @ 2020-12-07 12:46 ` Petr Mladek 0 siblings, 0 replies; 47+ messages in thread From: Petr Mladek @ 2020-12-07 12:46 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Sun 2020-12-06 23:36:53, John Ogness wrote: > On 2020-12-04, Petr Mladek <pmladek@suse.com> wrote: > >> + if (facility == 0) { > >> + while (text_len >= 2 && printk_get_level(text)) { > >> + text_len -= 2; > >> + text += 2; > >> + } > > > > We should avoid two completely different approaches > > that handle printk_level prefix. > > > > One solution is to implement something like: > > > > static char *parse_prefix(text, &level, &flags) > > > > That would return pointer to the text after the prefix. > > And fill level and flags only when non-NULL pointers are passed. > > OK. > > > Another solution would be to pass this information from > > vprintk_store(). The prefix has already been parsed > > after all. > > Well, there is a vscnprintf() that happens in between and I don't think > we should trust the parsed offset from the first vsnprintf(). Good point! > >> + > >> + if (text != orig_text) > >> + memmove(orig_text, text, text_len); > >> + } > > > > We should clear the freed space to make the ring buffer as > > human readable as possible when someone just dumps the memory. > > Data blocks are currently padded and that padding is not cleared. So it > is already not perfectly human readable on a raw dump. It would be nice to clean up the padding as well. But it is a cosmetic improvement that might be done anytime later. > > Sigh, I have to admit that I missed the problem with prefix and > > trailing '\n' when I suggested to avoid the temporary buffers. > > This memmove() and the space wasting is pity. > > > > Well, it is typically 3 bytes per message. And the copying would > > be necessary even with the temporary buffer. So, I am less convinced > > but I would still try to avoid the temporary buffers for now. > > Agreed. I think this approach is better than the temporary buffers I > previously used. Another motivation is that it allows to simply handle recursion/nesting. Othrewise, we would need temporary buffers for each allowed recursion level or some tricky code. > Also, if we add a trimming feature to the ringbuffer, > it will keep the ringbuffer mostly clean anyway. Something like this: > > prb_rec_init_wr(&r, text_len); > prb_reserve(&e, prb, &r); > text_len = printk_sprint(&r.text_buf[0], text_len, ...); > r.info->text_len = text_len; > prb_trim_rec(&e, &r); <--- try to reduce datablock size to @text_len > prb_commit(&e); > > I see no urgency to add such a feature. But I think we should keep it on > our radar. Yup. I thought about it as well. I agree that it is not a priority. Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* consoles: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness ` (4 preceding siblings ...) 2020-12-04 16:15 ` vprintk_store: was: " Petr Mladek @ 2020-12-04 16:19 ` Petr Mladek 2020-12-05 4:39 ` Sergey Senozhatsky 5 siblings, 1 reply; 47+ messages in thread From: Petr Mladek @ 2020-12-04 16:19 UTC (permalink / raw) To: John Ogness Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Tue 2020-12-01 21:59:41, John Ogness wrote: > Since the ringbuffer is lockless, there is no need for it to be > protected by @logbuf_lock. Remove @logbuf_lock. > > --- a/kernel/printk/internal.h > +++ b/kernel/printk/internal.h > @@ -59,7 +57,7 @@ void defer_console_output(void); > __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; } > > /* > - * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem > + * In !PRINTK builds we still export console_sem > * semaphore and some of console functions (console_unlock()/etc.), so > * printk-safe must preserve the existing local IRQ guarantees. We should revisit whether it is still needed just for console_sem. Well, I wonder why we need printk_safe at all. > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2432,7 +2490,6 @@ void console_unlock(void) > size_t len; > > printk_safe_enter_irqsave(flags); Why do we actually need to use the printk_safe context here? There is not longer a risk of deadlock caused by logbuf_lock. All other recursions should be prevented by console_trylock() in printk(). Do I miss anything? Note that we still need to disable interrupts around console_lock_spinning_enable(); console_lock_spinning_disable_and_check(); to make sure that printk() could busy wait for passing the console lock. > - raw_spin_lock(&logbuf_lock); > skip: > if (!prb_read_valid(prb, console_seq, &r)) > break; Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: consoles: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-04 16:19 ` consoles: " Petr Mladek @ 2020-12-05 4:39 ` Sergey Senozhatsky 2020-12-07 9:50 ` Petr Mladek 0 siblings, 1 reply; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-05 4:39 UTC (permalink / raw) To: Petr Mladek Cc: John Ogness, Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/04 17:19), Petr Mladek wrote: [..] > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -2432,7 +2490,6 @@ void console_unlock(void) > > size_t len; > > > > printk_safe_enter_irqsave(flags); > > Why do we actually need to use the printk_safe context here? > There is not longer a risk of deadlock caused by logbuf_lock. > All other recursions should be prevented by console_trylock() > in printk(). All semaphore functions, including down_trylock(), acquire semaphore spin_lock; and then some call into the scheduler (or other kernel core functions) under semaphore's spin_lock. For instance up() raw_spin_lock_irqsave(&sem->lock) __up() wake_up_process() try_to_wake_up() -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: consoles: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-05 4:39 ` Sergey Senozhatsky @ 2020-12-07 9:50 ` Petr Mladek 2020-12-08 20:51 ` Sergey Senozhatsky 0 siblings, 1 reply; 47+ messages in thread From: Petr Mladek @ 2020-12-07 9:50 UTC (permalink / raw) To: Sergey Senozhatsky Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On Sat 2020-12-05 13:39:53, Sergey Senozhatsky wrote: > On (20/12/04 17:19), Petr Mladek wrote: > [..] > > > --- a/kernel/printk/printk.c > > > +++ b/kernel/printk/printk.c > > > @@ -2432,7 +2490,6 @@ void console_unlock(void) > > > size_t len; > > > > > > printk_safe_enter_irqsave(flags); > > > > Why do we actually need to use the printk_safe context here? > > There is not longer a risk of deadlock caused by logbuf_lock. > > All other recursions should be prevented by console_trylock() > > in printk(). > > All semaphore functions, including down_trylock(), acquire > semaphore spin_lock; This has a very easy solution. The patch adds a code that counts recursion level. We just must not call console_trylock() when being inside a recursive printk. printk_safe() has two functions: 1. It allows to store messages a lockless way. This is obsoleted by the lockless ringbuffer. 2. It prevents calling consoles. We could move this check into vprintk_store(). We already have there similar check for printk_deferred(). > and then some call into the scheduler > (or other kernel core functions) under semaphore's spin_lock. > For instance > > up() > raw_spin_lock_irqsave(&sem->lock) > __up() > wake_up_process() > try_to_wake_up() This problem is partly solved by printk_deferred(). In each case, printk_safe() does not help here. I still do _not_ see a reason to keep printk_safe()! Best Regards, Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: consoles: was: [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock 2020-12-07 9:50 ` Petr Mladek @ 2020-12-08 20:51 ` Sergey Senozhatsky 0 siblings, 0 replies; 47+ messages in thread From: Sergey Senozhatsky @ 2020-12-08 20:51 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, John Ogness, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds, Greg Kroah-Hartman, Thomas Gleixner, linux-kernel On (20/12/07 10:50), Petr Mladek wrote: [..] > > and then some call into the scheduler > > (or other kernel core functions) under semaphore's spin_lock. > > For instance > > > > up() > > raw_spin_lock_irqsave(&sem->lock) > > __up() > > wake_up_process() > > try_to_wake_up() > > This problem is partly solved by printk_deferred(). In each > case, printk_safe() does not help here. printk_deferred() has never been used in all the critical code paths. So I think printk_safe does help here; it takes care of all the remaining cases, that are not "partly solved by printk_deferred()". > I still do _not_ see a reason to keep printk_safe()! Not sure I'm following, sorry. To put it simply - we better keep printk_safe until "new recursion prevention" >= printk_safe(). In this patch set "new recursion prevention" < printk_safe(). -ss ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2020-12-09 12:30 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-01 20:53 [PATCH next v2 0/3] printk: remove logbuf_lock John Ogness 2020-12-01 20:53 ` [PATCH next v2 1/3] printk: inline log_output(),log_store() in vprintk_store() John Ogness 2020-12-03 15:57 ` Petr Mladek 2020-12-03 16:25 ` John Ogness 2020-12-04 6:13 ` Sergey Senozhatsky 2020-12-04 8:26 ` Petr Mladek 2020-12-01 20:53 ` [PATCH next v2 2/3] printk: change @clear_seq to atomic64_t John Ogness 2020-12-04 9:12 ` Petr Mladek 2020-12-06 20:23 ` John Ogness 2020-12-07 9:34 ` Peter Zijlstra 2020-12-07 10:03 ` John Ogness 2020-12-07 12:56 ` Peter Zijlstra 2020-12-07 12:56 ` Petr Mladek 2020-12-07 16:46 ` David Laight 2020-12-08 20:34 ` Sergey Senozhatsky 2020-12-08 22:30 ` John Ogness 2020-12-09 1:04 ` Sergey Senozhatsky 2020-12-09 8:16 ` Peter Zijlstra 2020-12-09 9:22 ` Sergey Senozhatsky 2020-12-09 10:46 ` Sergey Senozhatsky 2020-12-09 11:00 ` Peter Zijlstra 2020-12-09 11:28 ` Sergey Senozhatsky 2020-12-09 12:29 ` Peter Zijlstra 2020-12-09 8:07 ` Peter Zijlstra 2020-12-01 20:53 ` [PATCH next v2 3/3] printk: remove logbuf_lock, add syslog_lock John Ogness 2020-12-04 6:41 ` Sergey Senozhatsky 2020-12-06 20:44 ` John Ogness 2020-12-04 15:52 ` devkmsg: was " Petr Mladek 2020-12-06 20:51 ` John Ogness 2020-12-07 9:56 ` Petr Mladek 2020-12-04 15:57 ` syslog: was: " Petr Mladek 2020-12-06 21:06 ` John Ogness 2020-12-07 10:01 ` Petr Mladek 2020-12-04 16:10 ` recursion handling: " Petr Mladek 2020-12-05 4:25 ` Sergey Senozhatsky 2020-12-06 22:08 ` John Ogness 2020-12-05 9:41 ` Sergey Senozhatsky 2020-12-06 22:17 ` John Ogness 2020-12-06 21:44 ` John Ogness 2020-12-07 11:17 ` Petr Mladek 2020-12-04 16:15 ` vprintk_store: was: " Petr Mladek 2020-12-06 22:30 ` John Ogness 2020-12-07 12:46 ` Petr Mladek 2020-12-04 16:19 ` consoles: " Petr Mladek 2020-12-05 4:39 ` Sergey Senozhatsky 2020-12-07 9:50 ` Petr Mladek 2020-12-08 20:51 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).