* Re: kernel CI: printk loglevels in kernel boot logs? [not found] ` <20171122015610.x3kgzqgtwywlurmz@wfg-t540p.sh.intel.com> @ 2017-11-22 3:27 ` Fengguang Wu 2017-11-22 5:26 ` Sergey Senozhatsky 2017-11-22 11:34 ` Petr Mladek 0 siblings, 2 replies; 20+ messages in thread From: Fengguang Wu @ 2017-11-22 3:27 UTC (permalink / raw) To: Kevin Hilman Cc: Mark Brown, Greg Kroah-Hartman, LKML, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds [CC LKML for possible printk improvements] On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote: >Hi Kevin, > >On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote: >>Hi Fenguang, >> >>In automated testing, for ease of parsing kernel boot logs (especially >>separating warnings and errors from debug, info etc.) >> >>Right now we can get this info from a "dmesg --raw" after bootup, but >>it would be really nice in certain automation frameworks to have a >>kernel command-line option to enable printing of loglevels in default >>boot log. > >Agreed. > >>This is espeically useful when ingesting kernel logs into advanced >>search/analytics frameworks (I'm playing with and ELK stack: Elastic >>Search, Logstash, Kibana). >> >>Does that sound like a feature you'd be interested in? > >Yes, sure. > >>In kernelCI, we're considering submitting a patch to add a >>"show_loglevel" command-line argument to enable that option on kernel >>boot. > >Thanks for doing that patch! It'll obviously make it easier to catch >various warnings, which will be useful when used with caution, >especially when false warnings (wrt. real problems that should be >fixed) can be effectively filtered out. [...] >As you may know I'm currently reporting kernel oops in mainline >kernel, hoping to clear noisy oops there -- they obviously hurt >catching and bisecting new oops. > >I have to say the warnings are much more noisy than kernel oops >in 2 ways: > >1) It's common for a *normal* kernel boot to have a dozen of old >warnings. > >2) Many warnings do not necessarily mean something should or could be >fixed -- they may well be mentioning some HW problem, or an alert >message to the user. > >So there is a much bigger and messy problem than catching the warnings: >ways to effectively mark or filter real warnings that automated testing >should catch and report. > >For filtering, we currently accumulated the below blacklist: > >https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist > [...] For the marking part, I wonder if there can be a clear rule that allows developers to distinguish 2 kind of information for users and testers: - "bug" message: indicating a regression that should be reported and fixed - "fact" message: well there's a problem, but we kernel developers probably can do nothing about it. It's just to let the user know about the fact. The fix might be something like replacing a broken disk drive. Those message types are orthogonal to severity of the problem (log levels), so the current log levels are actually not sufficient for distinguishing these kind of situations. Here are the list of warning ids waiting for bisect in 0day test farm. As you may see, only relatively few ones may worth bisecting and reporting. We've already built a pretty large blacklist, however it's proved to be awkward and ineffective -- it obviously suffers from both coverage and expertise problems. 777 hub#-#:#:config_failed,can't_get_hub_status(err#) 328 Firmware_Bug]:TSC_DEADLINE_disabled_due_to_Errata;please_update_microcode_to_version:#(or_later) 31 ACPI_Error:Field[CPB3]at_bit_offset/length#exceeds_size_of_target_Buffer(#bits)(#/dsopcode-#) 30 ACPI_Error:Method_parse/execution_failed~_SB.PMI0._PMC,AE_NOT_EXIST(#/psparse-#) 29 ACPI_Error:Method_parse/execution_failed~_SB.PMI0._GHL,AE_NOT_EXIST(#/psparse-#) 28 ACPI_Error:Region_IPMI(ID=#)has_no_handler(#/exfldio-#) 28 ACPI_Error:No_handler_for_Region[SYSI](#)[IPMI](#/evregion-#) 27 ACPI_Error:Method_parse/execution_failed~_SB._OSC,AE_AML_BUFFER_LIMIT(#/psparse-#) 25 tpm_tpm#:A_TPM_error(#)occurred_attempting_get_random 24 IP-Config:Reopening_network_devices 19 DHCP/BOOTP:Ignoring_fragmented_reply 8 xhci_hcd#:#:#:init#:#:#fail 8 xhci_hcd#:#:#:can't_setup 6 in_atomic():#,irqs_disabled():#,pid:#,name:swapper 4 megaraid_sas#:#:#:Init_cmd_return_status_SUCCESS_for_SCSI_host 4 in_atomic():#,irqs_disabled():#,pid:#,name:kworker 4 drm:drm_atomic_helper_commit_cleanup_done[drm_kms_helper]]*ERROR*[CRTC:#:pipe_A]flip_done_timed_out 2 print_req_error:I/O_error,dev_loop#,sector 2 IP-Config:Failed_to_open_erspan0 2 in_atomic():#,irqs_disabled():#,pid:#,name:systemd-udevd 2 in_atomic():#,irqs_disabled():#,pid:#,name:perf 2 acerhdf:unknown(unsupported)BIOS_version_QEMU/Standard_PC(i440FX+PIIX,#).#-#,please_report,aborting 1 XFS(sda8):metadata_I/O_error:block#(~xfs_readlink_bmap_ilocked~)error#numblks 1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_symlink_read_verify[xfs],xfs_symlink_block 1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_dir3_data_read_verify[xfs],xfs_dir3_data_block 1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_da3_node_read_verify[xfs],xfs_da3_node_block 1 XFS(sda8):Metadata_CRC_error_detected_at_xfs_attr3_leaf_read_verify[xfs],xfs_attr3_leaf_block 1 Memory_failure:#:recovery_action_for_huge_page:Recovered 1 Memory_failure:#:recovery_action_for_clean_LRU_page:Recovered 1 Memory_failure:#:Killing_tinjpage:#due_to_hardware_memory_corruption 1 mce:[Hardware_Error]:TSC#MISC#df87b000d9eff 1 in_atomic():#,irqs_disabled():#,pid:#,name:xargs 1 in_atomic():#,irqs_disabled():#,pid:#,name:triad_loop 1 in_atomic():#,irqs_disabled():#,pid:#,name:tchain_edit 1 in_atomic():#,irqs_disabled():#,pid:#,name:sh 1 in_atomic():#,irqs_disabled():#,pid:#,name:sed 1 in_atomic():#,irqs_disabled():#,pid:#,name:run.sh 1 in_atomic():#,irqs_disabled():#,pid:#,name:perf_test 1 in_atomic():#,irqs_disabled():#,pid:#,name:noploop.sh 1 in_atomic():#,irqs_disabled():#,pid:#,name:jbd2/sda1 1 in_atomic():#,irqs_disabled():#,pid:#,name:grep 1 in_atomic():#,irqs_disabled():#,pid:#,name:fallocate 1 in_atomic():#,irqs_disabled():#,pid:#,name:dmesg 1 in_atomic():#,irqs_disabled():#,pid:#,name:cc1 1 do_IRQ:#No_irq_handler_for_vector 1 Buffer_I/O_error_on_dev_dm-#,logical_block#,async_page_read We already have a well known answer to "marking bugs", ie. to dump a pile of call trace that no one can tolerate. The down side is, for some kind of bugs the developer may not need the call trace at all. And the call trace might make users unnecessarily nervous. That's why we have this email thread -- to catch regressions indicated by some 1-liner kernel warnings. For that kind of situation, it may be enough to add a common prefix for such messages. For example, [kernel bug] your warning/error printk message ~~~~~~~~~~~~ Alternatively, some printk messages already clearly stated "please report" (they seem more targeted for users than testers): wfg /c/linux% git grep '".*please report'|head block/elevator.c:645: "(nr_sorted=%u), please report this\n", drivers/ata/pata_hpt37x.c:867: pr_err("Unknown HPT366 subtype, please report (%d)\n", drivers/ata/pata_hpt37x.c:908: pr_err("PCI table is bogus, please report (%d)\n", dev->device); drivers/ata/pata_hpt3x2n.c:532: pr_err("PCI table is bogus, please report (%d)\n", dev->device); drivers/edac/mce_amd.c:725: " please report on LKML.\n"); drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:395: DRM_INFO(" DDC: no ddc bus - possible BIOS bug - please report to xorg-driver-ati@lists.x.org\n"); drivers/gpu/drm/nouveau/nouveau_bios.c:1458: "please report\n"); drivers/gpu/drm/radeon/radeon_display.c:803: DRM_INFO(" DDC: no ddc bus - possible BIOS bug - please report to xorg-driver-ati@lists.x.org\n"); drivers/hwmon/ibmaem.c:773: "Unknown AEM v%d; please report this to the maintainer.\n", drivers/hwmon/w83781d.c:1367: "If reset=1 solved a problem you were having, please report!\n"); Thanks, Fengguang ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 3:27 ` kernel CI: printk loglevels in kernel boot logs? Fengguang Wu @ 2017-11-22 5:26 ` Sergey Senozhatsky 2017-11-22 10:42 ` Mark Brown 2017-11-22 11:34 ` Petr Mladek 1 sibling, 1 reply; 20+ messages in thread From: Sergey Senozhatsky @ 2017-11-22 5:26 UTC (permalink / raw) To: Fengguang Wu Cc: Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds Hello, On (11/22/17 11:27), Fengguang Wu wrote: > [CC LKML for possible printk improvements] [..] > > > In kernelCI, we're considering submitting a patch to add a > > > "show_loglevel" command-line argument to enable that option on kernel > > > boot. > > > > Thanks for doing that patch! It'll obviously make it easier to catch > > various warnings, which will be useful when used with caution, > > especially when false warnings (wrt. real problems that should be > > fixed) can be effectively filtered out. both /dev/kmsg and /proc/kmsg show log levels. user space then can hide those levels or show them via "--raw" mode/etc. we don't (*sort of don't*) have a "--raw" mode for serial consoles. we do have extended consoles, which print metadata on consoles: msg_print_ext_header() and msg_print_ext_body() sprintf(... "%u,%llu,%llu,%c;", (msg->facility << 3) | msg->level, seq, ts_usec, ....) basically the same format as for /dev/kmsg. but extended console drivers come at a price - broken pr_cont(). "sad!". and, besides, may be ext consoles can show more data than needed. /proc/kmsg has different format. it does not do the ext_header/ext_body thing, but instead calls msg_print_text()->print_prefix(), which simply does prefix = (msg->facility << 3) | msg->level; sprintf(buf, "<%u>", prefix); console_unlock(), the function that prints messages on the consoles, calls msg_print_text() function, but does not do print_prefix() ("bool syslog" is false). so may be we can add "syslog" mode for console drivers and do print_prefix() in console_unlock() [unless have nr_ext_console_drivers]. if this is what you guys are talking about. (frankly, I have no idea if there are tools that parse output from serial consoles/net consoles and thus might get confused by extra <%u> in every message). -ss ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 5:26 ` Sergey Senozhatsky @ 2017-11-22 10:42 ` Mark Brown 0 siblings, 0 replies; 20+ messages in thread From: Mark Brown @ 2017-11-22 10:42 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Fengguang Wu, Kevin Hilman, Greg Kroah-Hartman, LKML, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 708 bytes --] On Wed, Nov 22, 2017 at 02:26:30PM +0900, Sergey Senozhatsky wrote: > so may be we can add "syslog" mode for console drivers and do print_prefix() > in console_unlock() [unless have nr_ext_console_drivers]. if this is what > you guys are talking about. (frankly, I have no idea if there are tools > that parse output from serial consoles/net consoles and thus might get > confused by extra <%u> in every message). In this case the goal is actually to use this with tools parsing the serial console output - it's in the context of kernelci where the boot logs are most interesting when we fail to get to the point of being able to look in /proc. Kevin's initial proposal was to have a command line option. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 3:27 ` kernel CI: printk loglevels in kernel boot logs? Fengguang Wu 2017-11-22 5:26 ` Sergey Senozhatsky @ 2017-11-22 11:34 ` Petr Mladek 2017-11-22 12:38 ` Sergey Senozhatsky 2017-11-22 14:10 ` Fengguang Wu 1 sibling, 2 replies; 20+ messages in thread From: Petr Mladek @ 2017-11-22 11:34 UTC (permalink / raw) To: Fengguang Wu Cc: Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On Wed 2017-11-22 11:27:02, Fengguang Wu wrote: > [CC LKML for possible printk improvements] > > On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote: > > Hi Kevin, > > > > On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote: > > > Hi Fenguang, > > > > > > In automated testing, for ease of parsing kernel boot logs (especially > > > separating warnings and errors from debug, info etc.) Would be enough to use console_loglevel to filter out the non-interesting messages? > > > Right now we can get this info from a "dmesg --raw" after bootup, but > > > it would be really nice in certain automation frameworks to have a > > > kernel command-line option to enable printing of loglevels in default > > > boot log. > > > > Agreed. > > > > > This is espeically useful when ingesting kernel logs into advanced > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic > > > Search, Logstash, Kibana). This suggests that people are already filtering the kernel log using external tools. Therefore there is a danger that the format might have already became the kernel ABI and we could not change it easily. A solution would be to printk the loglevel after the timestamp. But this would be ugly because dmesg --raw prints it at the beginning. To make it clear. I understand that "show_loglevel" command line argument would be useful for you. But I am afraid that it is not worth changing the format. There would need to be wide interest into the change. Also there would need to be evidence that the existing solutions (dmesg --raw, console_loglevel) are not enough in many real life scenarios. > [...] > > > As you may know I'm currently reporting kernel oops in mainline > > kernel, hoping to clear noisy oops there -- they obviously hurt > > catching and bisecting new oops. > > > > I have to say the warnings are much more noisy than kernel oops > > in 2 ways: > > > > 1) It's common for a *normal* kernel boot to have a dozen of old > > warnings. I guess that you solve this by a "clever" diff between old and new log. The extra info suggested below might help to filter out some messages. But the problems will still be there. I am not sure that the categories are worth it, see below. > > 2) Many warnings do not necessarily mean something should or could be > > fixed -- they may well be mentioning some HW problem, or an alert > > message to the user. > > > > So there is a much bigger and messy problem than catching the warnings: > > ways to effectively mark or filter real warnings that automated testing > > should catch and report. > > > > For filtering, we currently accumulated the below blacklist: > > > > https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist > > > [...] > > For the marking part, I wonder if there can be a clear rule that > allows developers to distinguish 2 kind of information for users and > testers: > > - "bug" message: indicating a regression that should be reported and fixed > > - "fact" message: well there's a problem, but we kernel developers > probably can do nothing about it. It's just to let the user know > about the fact. The fix might be something like replacing a broken > disk drive. > > Those message types are orthogonal to severity of the problem (log > levels), so the current log levels are actually not sufficient for > distinguishing these kind of situations. It is interesting idea but I am afraid that it would not work in reality from several reasons: + It is yet another complexity. I am not sure that developers would want to think about it when creating new messages. + Who would set this for the existing ~100k messages? + If I get it correctly, the motivation is to help normal users to decide when a kernel bug should get reported. This might heavily increase the number of poor bug reports. The recent discussion about kernel regression suggests that there we are not ready for this, see https://lwn.net/Articles/738216/ + The proposal it rather strigthforward. But does all messages clearly fit into one of the two categories? What about messages that are not about problems? For example, most of pr_info(), pr_notice() messages? + Are the two categories worth the effort? Especially when the second one might still eventually be a kernel bug? If we add more categories, it might get too complicated. For example, like setting severities and priorities in the classic bugzilla. Best Regards, Petr ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 11:34 ` Petr Mladek @ 2017-11-22 12:38 ` Sergey Senozhatsky 2017-11-22 12:52 ` Fengguang Wu 2017-11-22 20:22 ` Kevin Hilman 2017-11-22 14:10 ` Fengguang Wu 1 sibling, 2 replies; 20+ messages in thread From: Sergey Senozhatsky @ 2017-11-22 12:38 UTC (permalink / raw) To: Petr Mladek Cc: Fengguang Wu, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On (11/22/17 12:34), Petr Mladek wrote: [..] > > > > This is espeically useful when ingesting kernel logs into advanced > > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic > > > > Search, Logstash, Kibana). [..] > To make it clear. I understand that "show_loglevel" command line argument > would be useful for you. But I am afraid that it is not worth changing > the format. There would need to be wide interest into the change. > Also there would need to be evidence that the existing solutions > (dmesg --raw, console_loglevel) are not enough in many real life > scenarios. well, I think that that "consoles_format=syslog" command line parameter will be enabled only by those who actually want to have it - Fengguang's build robot and kernelCI (+ may be more setups). so I'd probably assume there are low risks here. may be I'm wrong. I think it makes sense to have syslog's format "<%u>[timestamp] text\n" on serial consoles (time stamp when PRINTK_TIME set; <%u> when consoles_format=syslog set). -ss ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 12:38 ` Sergey Senozhatsky @ 2017-11-22 12:52 ` Fengguang Wu 2017-11-23 2:59 ` Sergey Senozhatsky 2017-11-23 10:04 ` Petr Mladek 2017-11-22 20:22 ` Kevin Hilman 1 sibling, 2 replies; 20+ messages in thread From: Fengguang Wu @ 2017-11-22 12:52 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On Wed, Nov 22, 2017 at 09:38:21PM +0900, Sergey Senozhatsky wrote: >On (11/22/17 12:34), Petr Mladek wrote: >[..] >> > > > This is espeically useful when ingesting kernel logs into advanced >> > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic >> > > > Search, Logstash, Kibana). >[..] >> To make it clear. I understand that "show_loglevel" command line argument >> would be useful for you. But I am afraid that it is not worth changing >> the format. There would need to be wide interest into the change. >> Also there would need to be evidence that the existing solutions >> (dmesg --raw, console_loglevel) are not enough in many real life >> scenarios. > >well, I think that that "consoles_format=syslog" command line parameter >will be enabled only by those who actually want to have it - Fengguang's >build robot and kernelCI (+ may be more setups). so I'd probably assume >there are low risks here. may be I'm wrong. Yes, since it needs explicit enabling, local parsers should stay working. Unless we send dmesg to other developers, but then they might also receive $(dmesg --raw) outputs and need to handle <%u> prefixes, too. >I think it makes sense to have syslog's format "<%u>[timestamp] text\n" >on serial consoles (time stamp when PRINTK_TIME set; <%u> when >consoles_format=syslog set). Thanks. Since the 0day scripts can already parse that format (code is listed below), we should quickly benefit from that feature when it's mainlined. https://github.com/fengguang/lkp-tests/blob/master/lib/dmesg.rb oops = `#{grep} -a -E -e '^<[0123]>' -e '^kern :(err |crit |alert |emerg ): ' #{kmsg_file} | sed -r 's/\\x1b\\[([0-9;]+m|[mK])//g' | grep -a -v -E -f #{LKP_SRC}/etc/oops-pattern | grep -a -v -F -f #{LKP_SRC}/etc/kmsg-blacklist` Thanks, Fengguang ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 12:52 ` Fengguang Wu @ 2017-11-23 2:59 ` Sergey Senozhatsky 2017-11-23 3:14 ` Fengguang Wu 2017-11-29 0:13 ` Kevin Hilman 2017-11-23 10:04 ` Petr Mladek 1 sibling, 2 replies; 20+ messages in thread From: Sergey Senozhatsky @ 2017-11-23 2:59 UTC (permalink / raw) To: Fengguang Wu Cc: Sergey Senozhatsky, Petr Mladek, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On (11/22/17 20:52), Fengguang Wu wrote: [..] > > well, I think that that "consoles_format=syslog" command line parameter > > will be enabled only by those who actually want to have it - Fengguang's > > build robot and kernelCI (+ may be more setups). so I'd probably assume > > there are low risks here. may be I'm wrong. > > Yes, since it needs explicit enabling, local parsers should stay > working. Unless we send dmesg to other developers, but then they > might also receive $(dmesg --raw) outputs and need to handle <%u> > prefixes, too. > > > I think it makes sense to have syslog's format "<%u>[timestamp] text\n" > > on serial consoles (time stamp when PRINTK_TIME set; <%u> when > > consoles_format=syslog set). > > Thanks. Since the 0day scripts can already parse that format (code > is listed below), we should quickly benefit from that feature when > it's mainlined. okay... who's going to send the patch? kernelCI folks? I have some sort of a patch. added console_msg_format= with the only available option so far - "syslog". may be people would want to have boot time, etc. on their consoles. who knows. may be msg_format will be extended someday in the future or may be not, and we can do something like "console_format_syslog" (or any better name). Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++ kernel/printk/printk.c | 21 ++++++++++++++++++++- 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 62436bd5f34a..42113e5181b2 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -640,6 +640,17 @@ console=brl,ttyS0 For now, only VisioBraille is supported. + console_msg_format= + [KNL] Control message format + By default we print messages in "[time stamp] text\n" + format (time stamp may not be printed, depending on + CONFIG_PRINTK_TIME or `printk_time' param). + syslog + Switch to syslog format (similar to "dmesg --raw" or + reading from /proc/kmsg): "<%u>[time stamp] text\n" + IOW, each message is getting prepended with the + facility and loglevel prefix. + consoleblank= [KNL] The console blank (screen saver) timeout in seconds. A value of 0 disables the blank timer. Defaults to 0. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 5d81206a572d..034f3ad85f5b 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -277,6 +277,13 @@ EXPORT_SYMBOL(console_set_on_cmdline); /* Flag: console code may call schedule() */ static int console_may_schedule; +enum con_msg_format { + MSG_FORMAT_DEFAULT = 1, + MSG_FORMAT_SYSLOG = 2, +}; + +static int console_msg_format = MSG_FORMAT_DEFAULT; + /* * The printk log buffer consists of a chain of concatenated variable * length records. Every record starts with a record header, containing @@ -1913,6 +1920,15 @@ static int __add_preferred_console(char *name, int idx, char *options, c->index = idx; return 0; } + +static int __init console_msg_format_setup(char *str) +{ + if (!strncmp(str, "syslog", 6)) + console_msg_format = MSG_FORMAT_SYSLOG; + return 1; +} +__setup("console_msg_format=", console_msg_format_setup); + /* * Set up a console. Called via do_early_param() in init/main.c * for each "console=" parameter in the boot command line. @@ -2215,7 +2231,10 @@ void console_unlock(void) goto skip; } - len += msg_print_text(msg, false, text + len, sizeof(text) - len); + len += msg_print_text(msg, + console_msg_format & MSG_FORMAT_SYSLOG, + text + len, + sizeof(text) - len); if (nr_ext_console_drivers) { ext_len = msg_print_ext_header(ext_text, sizeof(ext_text), -- 2.15.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-23 2:59 ` Sergey Senozhatsky @ 2017-11-23 3:14 ` Fengguang Wu 2017-11-23 4:31 ` Sergey Senozhatsky 2017-11-29 0:13 ` Kevin Hilman 1 sibling, 1 reply; 20+ messages in thread From: Fengguang Wu @ 2017-11-23 3:14 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On Thu, Nov 23, 2017 at 11:59:57AM +0900, Sergey Senozhatsky wrote: >On (11/22/17 20:52), Fengguang Wu wrote: >[..] >> > well, I think that that "consoles_format=syslog" command line parameter >> > will be enabled only by those who actually want to have it - Fengguang's >> > build robot and kernelCI (+ may be more setups). so I'd probably assume >> > there are low risks here. may be I'm wrong. >> >> Yes, since it needs explicit enabling, local parsers should stay >> working. Unless we send dmesg to other developers, but then they >> might also receive $(dmesg --raw) outputs and need to handle <%u> >> prefixes, too. >> >> > I think it makes sense to have syslog's format "<%u>[timestamp] text\n" >> > on serial consoles (time stamp when PRINTK_TIME set; <%u> when >> > consoles_format=syslog set). >> >> Thanks. Since the 0day scripts can already parse that format (code >> is listed below), we should quickly benefit from that feature when >> it's mainlined. > >okay... who's going to send the patch? kernelCI folks? > >I have some sort of a patch. added console_msg_format= with the only The patch looks good to me. Perhaps you can send it out after getting review comments from kernelci folks? Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> Thanks, Fengguang >available option so far - "syslog". may be people would want to have >boot time, etc. on their consoles. who knows. may be msg_format will >be extended someday in the future or may be not, and we can do >something like "console_format_syslog" (or any better name). > >Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> >--- > Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++ > kernel/printk/printk.c | 21 ++++++++++++++++++++- > 2 files changed, 31 insertions(+), 1 deletion(-) > >diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >index 62436bd5f34a..42113e5181b2 100644 >--- a/Documentation/admin-guide/kernel-parameters.txt >+++ b/Documentation/admin-guide/kernel-parameters.txt >@@ -640,6 +640,17 @@ > console=brl,ttyS0 > For now, only VisioBraille is supported. > >+ console_msg_format= >+ [KNL] Control message format >+ By default we print messages in "[time stamp] text\n" >+ format (time stamp may not be printed, depending on >+ CONFIG_PRINTK_TIME or `printk_time' param). >+ syslog >+ Switch to syslog format (similar to "dmesg --raw" or >+ reading from /proc/kmsg): "<%u>[time stamp] text\n" >+ IOW, each message is getting prepended with the >+ facility and loglevel prefix. >+ > consoleblank= [KNL] The console blank (screen saver) timeout in > seconds. A value of 0 disables the blank timer. > Defaults to 0. >diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >index 5d81206a572d..034f3ad85f5b 100644 >--- a/kernel/printk/printk.c >+++ b/kernel/printk/printk.c >@@ -277,6 +277,13 @@ EXPORT_SYMBOL(console_set_on_cmdline); > /* Flag: console code may call schedule() */ > static int console_may_schedule; > >+enum con_msg_format { >+ MSG_FORMAT_DEFAULT = 1, >+ MSG_FORMAT_SYSLOG = 2, >+}; >+ >+static int console_msg_format = MSG_FORMAT_DEFAULT; >+ > /* > * The printk log buffer consists of a chain of concatenated variable > * length records. Every record starts with a record header, containing >@@ -1913,6 +1920,15 @@ static int __add_preferred_console(char *name, int idx, char *options, > c->index = idx; > return 0; > } >+ >+static int __init console_msg_format_setup(char *str) >+{ >+ if (!strncmp(str, "syslog", 6)) >+ console_msg_format = MSG_FORMAT_SYSLOG; >+ return 1; >+} >+__setup("console_msg_format=", console_msg_format_setup); >+ > /* > * Set up a console. Called via do_early_param() in init/main.c > * for each "console=" parameter in the boot command line. >@@ -2215,7 +2231,10 @@ void console_unlock(void) > goto skip; > } > >- len += msg_print_text(msg, false, text + len, sizeof(text) - len); >+ len += msg_print_text(msg, >+ console_msg_format & MSG_FORMAT_SYSLOG, >+ text + len, >+ sizeof(text) - len); > if (nr_ext_console_drivers) { > ext_len = msg_print_ext_header(ext_text, > sizeof(ext_text), >-- >2.15.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-23 3:14 ` Fengguang Wu @ 2017-11-23 4:31 ` Sergey Senozhatsky 0 siblings, 0 replies; 20+ messages in thread From: Sergey Senozhatsky @ 2017-11-23 4:31 UTC (permalink / raw) To: Fengguang Wu Cc: Sergey Senozhatsky, Petr Mladek, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds Hello, On (11/23/17 11:14), Fengguang Wu wrote: > > > Thanks. Since the 0day scripts can already parse that format (code > > > is listed below), we should quickly benefit from that feature when > > > it's mainlined. > > > > okay... who's going to send the patch? kernelCI folks? > > > > I have some sort of a patch. added console_msg_format= with the only > > The patch looks good to me. Perhaps you can send it out after getting > review comments from kernelci folks? agree, would be lovely to hear more opinions/etc. and I need some help with the credits (Suggested-by: ? and so on). > Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> thanks. -ss ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-23 2:59 ` Sergey Senozhatsky 2017-11-23 3:14 ` Fengguang Wu @ 2017-11-29 0:13 ` Kevin Hilman 2017-11-29 7:25 ` Sergey Senozhatsky 1 sibling, 1 reply; 20+ messages in thread From: Kevin Hilman @ 2017-11-29 0:13 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Fengguang Wu, Petr Mladek, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> writes: > On (11/22/17 20:52), Fengguang Wu wrote: > [..] >> > well, I think that that "consoles_format=syslog" command line parameter >> > will be enabled only by those who actually want to have it - Fengguang's >> > build robot and kernelCI (+ may be more setups). so I'd probably assume >> > there are low risks here. may be I'm wrong. >> >> Yes, since it needs explicit enabling, local parsers should stay >> working. Unless we send dmesg to other developers, but then they >> might also receive $(dmesg --raw) outputs and need to handle <%u> >> prefixes, too. >> >> > I think it makes sense to have syslog's format "<%u>[timestamp] text\n" >> > on serial consoles (time stamp when PRINTK_TIME set; <%u> when >> > consoles_format=syslog set). >> >> Thanks. Since the 0day scripts can already parse that format (code >> is listed below), we should quickly benefit from that feature when >> it's mainlined. > > okay... who's going to send the patch? kernelCI folks? > > I have some sort of a patch. added console_msg_format= with the only > available option so far - "syslog". may be people would want to have > boot time, etc. on their consoles. who knows. may be msg_format will > be extended someday in the future or may be not, and we can do > something like "console_format_syslog" (or any better name). > > Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> For kernelCI (part of how this thread got started): Acked-by: Kevin Hilman <khilman@baylibre.com> Reviewed-by: Kevin Hilman <khilman@baylibre.com> I also tested it on an arm64 platform with the new command-line arg: Tested-by: Kevin Hilman <khilman@baylibre.com> Thanks, Kevin > --- > Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++++ > kernel/printk/printk.c | 21 ++++++++++++++++++++- > 2 files changed, 31 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 62436bd5f34a..42113e5181b2 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -640,6 +640,17 @@ > console=brl,ttyS0 > For now, only VisioBraille is supported. > > + console_msg_format= > + [KNL] Control message format > + By default we print messages in "[time stamp] text\n" > + format (time stamp may not be printed, depending on > + CONFIG_PRINTK_TIME or `printk_time' param). > + syslog > + Switch to syslog format (similar to "dmesg --raw" or > + reading from /proc/kmsg): "<%u>[time stamp] text\n" > + IOW, each message is getting prepended with the > + facility and loglevel prefix. > + > consoleblank= [KNL] The console blank (screen saver) timeout in > seconds. A value of 0 disables the blank timer. > Defaults to 0. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 5d81206a572d..034f3ad85f5b 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -277,6 +277,13 @@ EXPORT_SYMBOL(console_set_on_cmdline); > /* Flag: console code may call schedule() */ > static int console_may_schedule; > > +enum con_msg_format { > + MSG_FORMAT_DEFAULT = 1, > + MSG_FORMAT_SYSLOG = 2, > +}; > + > +static int console_msg_format = MSG_FORMAT_DEFAULT; > + > /* > * The printk log buffer consists of a chain of concatenated variable > * length records. Every record starts with a record header, containing > @@ -1913,6 +1920,15 @@ static int __add_preferred_console(char *name, int idx, char *options, > c->index = idx; > return 0; > } > + > +static int __init console_msg_format_setup(char *str) > +{ > + if (!strncmp(str, "syslog", 6)) > + console_msg_format = MSG_FORMAT_SYSLOG; > + return 1; > +} > +__setup("console_msg_format=", console_msg_format_setup); > + > /* > * Set up a console. Called via do_early_param() in init/main.c > * for each "console=" parameter in the boot command line. > @@ -2215,7 +2231,10 @@ void console_unlock(void) > goto skip; > } > > - len += msg_print_text(msg, false, text + len, sizeof(text) - len); > + len += msg_print_text(msg, > + console_msg_format & MSG_FORMAT_SYSLOG, > + text + len, > + sizeof(text) - len); > if (nr_ext_console_drivers) { > ext_len = msg_print_ext_header(ext_text, > sizeof(ext_text), ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-29 0:13 ` Kevin Hilman @ 2017-11-29 7:25 ` Sergey Senozhatsky 2017-11-30 17:45 ` Kevin Hilman 0 siblings, 1 reply; 20+ messages in thread From: Sergey Senozhatsky @ 2017-11-29 7:25 UTC (permalink / raw) To: Kevin Hilman Cc: Sergey Senozhatsky, Fengguang Wu, Petr Mladek, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On (11/28/17 16:13), Kevin Hilman wrote: [..] > > okay... who's going to send the patch? kernelCI folks? > > > > I have some sort of a patch. added console_msg_format= with the only > > available option so far - "syslog". may be people would want to have > > boot time, etc. on their consoles. who knows. may be msg_format will > > be extended someday in the future or may be not, and we can do > > something like "console_format_syslog" (or any better name). > > > > Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > > For kernelCI (part of how this thread got started): > > Acked-by: Kevin Hilman <khilman@baylibre.com> > Reviewed-by: Kevin Hilman <khilman@baylibre.com> > > I also tested it on an arm64 platform with the new command-line arg: > > Tested-by: Kevin Hilman <khilman@baylibre.com> thanks. -ss ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-29 7:25 ` Sergey Senozhatsky @ 2017-11-30 17:45 ` Kevin Hilman 2017-12-01 1:25 ` Sergey Senozhatsky 0 siblings, 1 reply; 20+ messages in thread From: Kevin Hilman @ 2017-11-30 17:45 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Fengguang Wu, Petr Mladek, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds Hi Sergey, On Tue, Nov 28, 2017 at 11:25 PM, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > On (11/28/17 16:13), Kevin Hilman wrote: > [..] >> > okay... who's going to send the patch? kernelCI folks? >> > >> > I have some sort of a patch. added console_msg_format= with the only >> > available option so far - "syslog". may be people would want to have >> > boot time, etc. on their consoles. who knows. may be msg_format will >> > be extended someday in the future or may be not, and we can do >> > something like "console_format_syslog" (or any better name). >> > >> > Not-yet-signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> >> >> For kernelCI (part of how this thread got started): >> >> Acked-by: Kevin Hilman <khilman@baylibre.com> >> Reviewed-by: Kevin Hilman <khilman@baylibre.com> >> >> I also tested it on an arm64 platform with the new command-line arg: >> >> Tested-by: Kevin Hilman <khilman@baylibre.com> > > thanks. So do you plan to submit an official signed-off version of this? We're ready to use it ASAP in kernelCI. Kevin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-30 17:45 ` Kevin Hilman @ 2017-12-01 1:25 ` Sergey Senozhatsky 0 siblings, 0 replies; 20+ messages in thread From: Sergey Senozhatsky @ 2017-12-01 1:25 UTC (permalink / raw) To: Kevin Hilman Cc: Sergey Senozhatsky, Fengguang Wu, Petr Mladek, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds Hello, On (11/30/17 09:45), Kevin Hilman wrote: [..] > >> For kernelCI (part of how this thread got started): > >> > >> Acked-by: Kevin Hilman <khilman@baylibre.com> > >> Reviewed-by: Kevin Hilman <khilman@baylibre.com> > >> > >> I also tested it on an arm64 platform with the new command-line arg: > >> > >> Tested-by: Kevin Hilman <khilman@baylibre.com> > > > > thanks. > > So do you plan to submit an official signed-off version of this? > We're ready to use it ASAP in kernelCI. yes, will do. -ss ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 12:52 ` Fengguang Wu 2017-11-23 2:59 ` Sergey Senozhatsky @ 2017-11-23 10:04 ` Petr Mladek 1 sibling, 0 replies; 20+ messages in thread From: Petr Mladek @ 2017-11-23 10:04 UTC (permalink / raw) To: Fengguang Wu Cc: Sergey Senozhatsky, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On Wed 2017-11-22 20:52:45, Fengguang Wu wrote: > On Wed, Nov 22, 2017 at 09:38:21PM +0900, Sergey Senozhatsky wrote: > > On (11/22/17 12:34), Petr Mladek wrote: > > [..] > > > > > > This is espeically useful when ingesting kernel logs into advanced > > > > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic > > > > > > Search, Logstash, Kibana). > > [..] > > > To make it clear. I understand that "show_loglevel" command line argument > > > would be useful for you. But I am afraid that it is not worth changing > > > the format. There would need to be wide interest into the change. > > > Also there would need to be evidence that the existing solutions > > > (dmesg --raw, console_loglevel) are not enough in many real life > > > scenarios. > > > > well, I think that that "consoles_format=syslog" command line parameter > > will be enabled only by those who actually want to have it - Fengguang's > > build robot and kernelCI (+ may be more setups). so I'd probably assume > > there are low risks here. may be I'm wrong. > > Yes, since it needs explicit enabling, local parsers should stay > working. Unless we send dmesg to other developers, but then they > might also receive $(dmesg --raw) outputs and need to handle <%u> > prefixes, too. I would have agreed with this few days ago. But I am not sure now. We tried to upstream a feature that allowed to use different clocks for the printk timestamps. The current one is local clock. Some people were interested into the real time clock that would allow to match messages from userspace. Some other users were interested into boot time clock that would count also the time when the system is suspended. We made this configurable (config option + command line option). I thought that it was safe because local clock stayed default and the other possibilities would be used just for debugging. Linus rejected this, see https://lkml.kernel.org/r/CA+55aFwUfA__6MgMgjENnx+_RYY2ZOOLiSx2ea1AvYhSZN+78A@mail.gmail.com https://lkml.kernel.org/r/CA+55aFzLH9crdMtUFkD-PtNGuxu_fsG5GH2ACni69ug9iM=09g@mail.gmail.com https://lkml.kernel.org/r/CA+55aFzzvBD4_WYm-5Bm7TeRGR8nNu1oz0oWNcRNmTNJm=M4Lw@mail.gmail.com The feature with timestamps was more complicated. The different time sources had problems on its own. We had troubles to suggest a good one. Therefore it was wrong to just move the complex decision to a poor user. But one thing is similar. Both features change to format/semantic of the printed lines (extra field for message level vs. semantic of the timestamp). In case of the timestamps, Linus was aware o f a clear userspace dependency (systemd expected time since boot). The question if there is a user space application that depends on the message format on the consoles. I have no idea. I would imagine that there are some system monitors that expect the strings starting with the timestamp. You might argue that this will get disabled by default. But the local clock stayed default in the above feature as well. One thing is that there is CONFIG_PRINTK_TIME and printk.time command line option. Therefore the format already is kind of configurable. But I doubt that this is disabled on any sane system. Anyway, I would like to wait a bit for the discussion about the timestamps feature before we add this console_format stuff. We still need to discuss how to pass the timestamps to the user and I would like to hear Linus' opinion there. You might want to follow/join the discussion around alternative approach by Thomas Gleixner, see https://lkml.kernel.org/r/20171115181531.322572387@linutronix.de It does not solve the userspace side at the moment. Best Regards, Petr ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 12:38 ` Sergey Senozhatsky 2017-11-22 12:52 ` Fengguang Wu @ 2017-11-22 20:22 ` Kevin Hilman 1 sibling, 0 replies; 20+ messages in thread From: Kevin Hilman @ 2017-11-22 20:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Fengguang Wu, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On Wed, Nov 22, 2017 at 4:38 AM, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > On (11/22/17 12:34), Petr Mladek wrote: > [..] >> > > > This is espeically useful when ingesting kernel logs into advanced >> > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic >> > > > Search, Logstash, Kibana). > [..] >> To make it clear. I understand that "show_loglevel" command line argument >> would be useful for you. But I am afraid that it is not worth changing >> the format. There would need to be wide interest into the change. >> Also there would need to be evidence that the existing solutions >> (dmesg --raw, console_loglevel) are not enough in many real life >> scenarios. > > well, I think that that "consoles_format=syslog" command line parameter > will be enabled only by those who actually want to have it - Fengguang's > build robot and kernelCI (+ may be more setups). so I'd probably assume > there are low risks here. may be I'm wrong. Agreed. The goal of the command-line parameter is so that only automation frameworks that would use it. The other important reason for having this on the command line is that for testing linux-next (and other bleeding edge developer branches), it's common that we never make it to userspace, so can't even run "dmesg --raw" (or equivalent.) So we really want this on the primary boot (serial) console. > I think it makes sense to have syslog's format "<%u>[timestamp] text\n" > on serial consoles (time stamp when PRINTK_TIME set; <%u> when > consoles_format=syslog set). Yes, I think it should match "dmesg --raw". Kevin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 11:34 ` Petr Mladek 2017-11-22 12:38 ` Sergey Senozhatsky @ 2017-11-22 14:10 ` Fengguang Wu 2017-12-05 15:55 ` Petr Mladek 1 sibling, 1 reply; 20+ messages in thread From: Fengguang Wu @ 2017-11-22 14:10 UTC (permalink / raw) To: Petr Mladek Cc: Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On Wed, Nov 22, 2017 at 12:34:48PM +0100, Petr Mladek wrote: >On Wed 2017-11-22 11:27:02, Fengguang Wu wrote: >> [CC LKML for possible printk improvements] >> >> On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote: >> > Hi Kevin, >> > >> > On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote: >> > > Hi Fenguang, >> > > >> > > In automated testing, for ease of parsing kernel boot logs (especially >> > > separating warnings and errors from debug, info etc.) > >Would be enough to use console_loglevel to filter out the non-interesting >messages? Nah we do prefer and actually require keeping the full dmesg. For one thing, it's good to have the context information. The 0day test farm will dump logs and debug info to the serial console. And we'll parse the normal dmesg lines to detect when the kernel boots up and shutdown (to split the continuous serial console log into dmesg files for each kernel boot), when the initramfs is loading (to judge whether the kernel oops is related to the initrd), when the testing begins, etc. For embedded systems people may even control machine boot and collect artifacts via the serial console. >> > > Right now we can get this info from a "dmesg --raw" after bootup, but >> > > it would be really nice in certain automation frameworks to have a >> > > kernel command-line option to enable printing of loglevels in default >> > > boot log. >> > >> > Agreed. >> > >> > > This is espeically useful when ingesting kernel logs into advanced >> > > search/analytics frameworks (I'm playing with and ELK stack: Elastic >> > > Search, Logstash, Kibana). > >This suggests that people are already filtering the kernel log using >external tools. Therefore there is a danger that the format might >have already became the kernel ABI and we could not change it easily. > >A solution would be to printk the loglevel after the timestamp. >But this would be ugly because dmesg --raw prints it at the beginning. > >To make it clear. I understand that "show_loglevel" command line argument >would be useful for you. But I am afraid that it is not worth changing >the format. There would need to be wide interest into the change. >Also there would need to be evidence that the existing solutions >(dmesg --raw, console_loglevel) are not enough in many real life >scenarios. As Sergey said in the other thread, it should be safe because the syslog format will be explicitly enabled by those who want and ready to parse it. >> [...] >> >> > As you may know I'm currently reporting kernel oops in mainline >> > kernel, hoping to clear noisy oops there -- they obviously hurt >> > catching and bisecting new oops. >> > >> > I have to say the warnings are much more noisy than kernel oops >> > in 2 ways: >> > >> > 1) It's common for a *normal* kernel boot to have a dozen of old >> > warnings. > >I guess that you solve this by a "clever" diff between old and new >log. That does help filtering out the majority old warnings. For example, if it's an inherent HW issue, it may remain there forever and never show up after diff. However some noises will still go through and the number is large enough to make "tester side blacklist" a big maintenance headache. As you may see from the listing of warning messages waiting for bisect in 0day now. >The extra info suggested below might help to filter out >some messages. But the problems will still be there. >I am not sure that the categories are worth it, see below. > > >> > 2) Many warnings do not necessarily mean something should or could be >> > fixed -- they may well be mentioning some HW problem, or an alert >> > message to the user. >> > >> > So there is a much bigger and messy problem than catching the warnings: >> > ways to effectively mark or filter real warnings that automated testing >> > should catch and report. >> > >> > For filtering, we currently accumulated the below blacklist: >> > >> > https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist >> > >> [...] >> >> For the marking part, I wonder if there can be a clear rule that >> allows developers to distinguish 2 kind of information for users and >> testers: >> >> - "bug" message: indicating a regression that should be reported and fixed >> >> - "fact" message: well there's a problem, but we kernel developers >> probably can do nothing about it. It's just to let the user know >> about the fact. The fix might be something like replacing a broken >> disk drive. >> >> Those message types are orthogonal to severity of the problem (log >> levels), so the current log levels are actually not sufficient for >> distinguishing these kind of situations. > >It is interesting idea but I am afraid that it would not work >in reality from several reasons: > > + It is yet another complexity. I am not sure that developers would > want to think about it when creating new messages. [moved ahead the below 2 points] > + The proposal it rather strigthforward. But does all messages > clearly fit into one of the two categories? What about > messages that are not about problems? For example, most > of pr_info(), pr_notice() messages? > > > + Are the two categories worth the effort? Especially when the > second one might still eventually be a kernel bug? > > If we add more categories, it might get too complicated. > For example, like setting severities and priorities > in the classic bugzilla. > Let me answer the 3 points together. Sorry if I was not clear -- I merely give the 2 categories as example. However the key point is not about creating categories. My proposal would be something like this: #define KERNEL_BUG "[kernel bug]" Then use that tag wherever maintainers want to track regressions: pr_err(KERNEL_BUG "alg: Unexpected test result for %s: %d\n", name, err); ~~~~~~~~~~ If such tag is widely used, it effectively enables whitelist based kernel warning regression tracking, comparing to the currently messy and ineffective blacklist: https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist > + Who would set this for the existing ~100k messages? That's a good question and main challenge. Perhaps the modules under active development and have the desire to track regressions will be more willing to add the tag. > + If I get it correctly, the motivation is to help normal users > to decide when a kernel bug should get reported. This might My purpose is mainly for using the kernel bug tag for auto regression tracking in test infrastructures. Including distro's dmesg monitoring and auto reporting tools which could pop up a window saying "I find a bug in your dmesg, shall I auto upload it for analyze?" But yeah, it'll be useful for end users too -- now they'll be easier to tell when there are kernel bugs. > heavily increase the number of poor bug reports. > > The recent discussion about kernel regression suggests > that there we are not ready for this, see > https://lwn.net/Articles/738216/ Agreed. It'll be good if we can have some user report aggregation service setup first. I'd imagine that service to start with a big search box where a user can paste his dmesg. The service then parses the dmesg and help check whether it's a new bug, or looks similar to some already reported one. Thanks, Fengguang ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-11-22 14:10 ` Fengguang Wu @ 2017-12-05 15:55 ` Petr Mladek 2017-12-05 16:13 ` Sergey Senozhatsky 2017-12-05 20:54 ` Steven Rostedt 0 siblings, 2 replies; 20+ messages in thread From: Petr Mladek @ 2017-12-05 15:55 UTC (permalink / raw) To: Fengguang Wu Cc: Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On Wed 2017-11-22 22:10:58, Fengguang Wu wrote: > On Wed, Nov 22, 2017 at 12:34:48PM +0100, Petr Mladek wrote: > > On Wed 2017-11-22 11:27:02, Fengguang Wu wrote: > > > [CC LKML for possible printk improvements] > > > > > > On Wed, Nov 22, 2017 at 09:56:10AM +0800, Fengguang Wu wrote: > > > > Hi Kevin, > > > > > > > > On Tue, Nov 21, 2017 at 12:27:48PM -0800, Kevin Hilman wrote: > > > > > Hi Fenguang, > > > > > > > > > > In automated testing, for ease of parsing kernel boot logs (especially > > > > > separating warnings and errors from debug, info etc.) > > > > Would be enough to use console_loglevel to filter out the non-interesting > > messages? > > Nah we do prefer and actually require keeping the full dmesg. For one > thing, it's good to have the context information. The 0day test farm > will dump logs and debug info to the serial console. And we'll parse > the normal dmesg lines to detect when the kernel boots up and > shutdown (to split the continuous serial console log into dmesg files > for each kernel boot), when the initramfs is loading (to judge whether > the kernel oops is related to the initrd), when the testing begins, > etc. For embedded systems people may even control machine boot and > collect artifacts via the serial console. OK, I understand why you would need this feature. > > > > > Right now we can get this info from a "dmesg --raw" after bootup, but > > > > > it would be really nice in certain automation frameworks to have a > > > > > kernel command-line option to enable printing of loglevels in default > > > > > boot log. > > > > > > > > Agreed. > > > > > > > > > This is espeically useful when ingesting kernel logs into advanced > > > > > search/analytics frameworks (I'm playing with and ELK stack: Elastic > > > > > Search, Logstash, Kibana). > > > > This suggests that people are already filtering the kernel log using > > external tools. Therefore there is a danger that the format might > > have already became the kernel ABI and we could not change it easily. > > > > A solution would be to printk the loglevel after the timestamp. > > But this would be ugly because dmesg --raw prints it at the beginning. > > > > To make it clear. I understand that "show_loglevel" command line argument > > would be useful for you. But I am afraid that it is not worth changing > > the format. There would need to be wide interest into the change. > > Also there would need to be evidence that the existing solutions > > (dmesg --raw, console_loglevel) are not enough in many real life > > scenarios. > > As Sergey said in the other thread, it should be safe because the > syslog format will be explicitly enabled by those who want and ready > to parse it. I would have agreed with this argument few weeks ago but I do not get it any longer. If there are tools depending on the message format and people enable the extended format, the tools might stop working. You might argue that these people are shooting into their legs. But other people might argue that kernel changed ABI and broke userspace tools. Note that these tools do not have any chance to force kernel using the old format, they are consumers. It might be especially annoying if these tools are part of some complex infrastructure that is used to access the boot logs and admins get blind. Then it might be hard for them to debug the problem, especially if they got this feature as part of a bigger system update. Am I too paranoid? Steven? Linus? > > > [...] > > > > > > > As you may know I'm currently reporting kernel oops in mainline > > > > kernel, hoping to clear noisy oops there -- they obviously hurt > > > > catching and bisecting new oops. > > > > > > > > I have to say the warnings are much more noisy than kernel oops > > > > in 2 ways: > > > > > > > > 1) It's common for a *normal* kernel boot to have a dozen of old > > > > warnings. > > > > I guess that you solve this by a "clever" diff between old and new > > log. > > That does help filtering out the majority old warnings. For example, > if it's an inherent HW issue, it may remain there forever and never > show up after diff. > > However some noises will still go through and the number is large > enough to make "tester side blacklist" a big maintenance headache. > As you may see from the listing of warning messages waiting for bisect > in 0day now. > > > The extra info suggested below might help to filter out > > some messages. But the problems will still be there. > > I am not sure that the categories are worth it, see below. > > > > > > > > 2) Many warnings do not necessarily mean something should or could be > > > > fixed -- they may well be mentioning some HW problem, or an alert > > > > message to the user. > > > > > > > > So there is a much bigger and messy problem than catching the warnings: > > > > ways to effectively mark or filter real warnings that automated testing > > > > should catch and report. > > > > > > > > For filtering, we currently accumulated the below blacklist: > > > > > > > > https://github.com/fengguang/lkp-tests/blob/master/etc/kmsg-blacklist > > > > > > > [...] > > > > > > For the marking part, I wonder if there can be a clear rule that > > > allows developers to distinguish 2 kind of information for users and > > > testers: > > > > > > - "bug" message: indicating a regression that should be reported and fixed > > > > > > - "fact" message: well there's a problem, but we kernel developers > > > probably can do nothing about it. It's just to let the user know > > > about the fact. The fix might be something like replacing a broken > > > disk drive. > > > > > > Those message types are orthogonal to severity of the problem (log > > > levels), so the current log levels are actually not sufficient for > > > distinguishing these kind of situations. > > > > It is interesting idea but I am afraid that it would not work > > in reality from several reasons: > > > > + It is yet another complexity. I am not sure that developers would > > want to think about it when creating new messages. > > [moved ahead the below 2 points] > > + The proposal it rather strigthforward. But does all messages > > clearly fit into one of the two categories? What about > > messages that are not about problems? For example, most > > of pr_info(), pr_notice() messages? > > > > > > + Are the two categories worth the effort? Especially when the > > second one might still eventually be a kernel bug? > > > > If we add more categories, it might get too complicated. > > For example, like setting severities and priorities > > in the classic bugzilla. > > > > Let me answer the 3 points together. Sorry if I was not clear -- I > merely give the 2 categories as example. However the key point is not > about creating categories. My proposal would be something like this: > > #define KERNEL_BUG "[kernel bug]" > > Then use that tag wherever maintainers want to track regressions: > > pr_err(KERNEL_BUG "alg: Unexpected test result for %s: %d\n", name, err); We already have something like this, see include/linux/printk.h #define FW_BUG "[Firmware Bug]: " #define FW_WARN "[Firmware Warn]: " #define FW_INFO "[Firmware Info]: " #define HW_ERR "[Hardware Error]: " IMHO, the last one is a good example. It was added 7 years ago by the commit c6de9f08912311ddc1 ("x86, mce: Add HW_ERR printk prefix for hardware error logging" and it is still used only in two mce-related files: arch/x86/kernel/cpu/mcheck/mce.c drivers/edac/mce_amd.c I guess that this prefix would make sense in many more messages but people do not use it. IMHO, they are either not aware of it or they simply do not care. IMHO, this is a good example that this approach does not work. It seems that Andrew shared my skepticism when HW_ERR was introduced. See the sentence "There have been various proposals to improve the usefulness and to rationally categorise things in way which are more useful to operators, but nothing seems to ever get over the line." at https://lkml.kernel.org/r/20100610164607.b2908334.akpm@linux-foundation.org IMHO, we would need to come up with another approach that would be easy to understand and could be "more" enforced. Best Regards, Petr ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-12-05 15:55 ` Petr Mladek @ 2017-12-05 16:13 ` Sergey Senozhatsky 2017-12-05 20:54 ` Steven Rostedt 1 sibling, 0 replies; 20+ messages in thread From: Sergey Senozhatsky @ 2017-12-05 16:13 UTC (permalink / raw) To: Petr Mladek Cc: Fengguang Wu, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Steven Rostedt, Linus Torvalds On (12/05/17 16:55), Petr Mladek wrote: [..] > > Am I too paranoid? Steven? Linus? well, if there is a tool that believels that there will always be [ (part of timestamp marker) as the first symbol then that tool has a problem. there is no "guaranteed" format of console messages. one can print pr_err("message\n message\n message\n); or one can compile the kernel with no CONFIG_PRINTK_TIME. or one can disable printk_time via printk_time module param. -ss ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-12-05 15:55 ` Petr Mladek 2017-12-05 16:13 ` Sergey Senozhatsky @ 2017-12-05 20:54 ` Steven Rostedt 2017-12-06 13:54 ` Petr Mladek 1 sibling, 1 reply; 20+ messages in thread From: Steven Rostedt @ 2017-12-05 20:54 UTC (permalink / raw) To: Petr Mladek Cc: Fengguang Wu, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Linus Torvalds On Tue, 5 Dec 2017 16:55:37 +0100 Petr Mladek <pmladek@suse.com> wrote: > I would have agreed with this argument few weeks ago but I do not get > it any longer. If there are tools depending on the message format > and people enable the extended format, the tools might stop working. > > You might argue that these people are shooting into their legs. > But other people might argue that kernel changed ABI and broke > userspace tools. Note that these tools do not have any chance > to force kernel using the old format, they are consumers. > > It might be especially annoying if these tools are part of some > complex infrastructure that is used to access the boot logs > and admins get blind. Then it might be hard for them to debug > the problem, especially if they got this feature as part of > a bigger system update. > > Am I too paranoid? Steven? Linus? Remember, an ABI is only broken if a tool breaks. If it proves to break userspace, then it can be reverted. But if there is no tool that depends on it, it's fine to change. That said, I haven't had a chance to fully read this thread. Mostly because of the Turkey holiday which put me behind in other areas, and also I currently traveling, and will be taking PTO for the rest of the week. -- Steve ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: kernel CI: printk loglevels in kernel boot logs? 2017-12-05 20:54 ` Steven Rostedt @ 2017-12-06 13:54 ` Petr Mladek 0 siblings, 0 replies; 20+ messages in thread From: Petr Mladek @ 2017-12-06 13:54 UTC (permalink / raw) To: Steven Rostedt Cc: Fengguang Wu, Kevin Hilman, Mark Brown, Greg Kroah-Hartman, LKML, Sergey Senozhatsky, Linus Torvalds On Tue 2017-12-05 15:54:09, Steven Rostedt wrote: > On Tue, 5 Dec 2017 16:55:37 +0100 > Petr Mladek <pmladek@suse.com> wrote: > > > I would have agreed with this argument few weeks ago but I do not get > > it any longer. If there are tools depending on the message format > > and people enable the extended format, the tools might stop working. > > > > You might argue that these people are shooting into their legs. > > But other people might argue that kernel changed ABI and broke > > userspace tools. Note that these tools do not have any chance > > to force kernel using the old format, they are consumers. > > > > It might be especially annoying if these tools are part of some > > complex infrastructure that is used to access the boot logs > > and admins get blind. Then it might be hard for them to debug > > the problem, especially if they got this feature as part of > > a bigger system update. > > > > Am I too paranoid? Steven? Linus? > > Remember, an ABI is only broken if a tool breaks. If it proves to break > userspace, then it can be reverted. But if there is no tool that > depends on it, it's fine to change. > > That said, I haven't had a chance to fully read this thread. Mostly > because of the Turkey holiday which put me behind in other areas, and > also I currently traveling, and will be taking PTO for the rest of the > week. OK, I feel relieved. I am not aware of any tool that depends on the format. I just could imagine that there might be some. Otherwise, I agree that showing log levels on the console might be useful. So I am going to review the RFC implementation, see https://lkml.kernel.org/r/20171201104404.1885-1-sergey.senozhatsky@gmail.com Best Regards, Petr ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-12-06 13:54 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAOi56cVORdyjTXK4QGYRfyD1Q=QBgsU3B__gZT0xj6OBKaasLQ@mail.gmail.com> [not found] ` <20171122015610.x3kgzqgtwywlurmz@wfg-t540p.sh.intel.com> 2017-11-22 3:27 ` kernel CI: printk loglevels in kernel boot logs? Fengguang Wu 2017-11-22 5:26 ` Sergey Senozhatsky 2017-11-22 10:42 ` Mark Brown 2017-11-22 11:34 ` Petr Mladek 2017-11-22 12:38 ` Sergey Senozhatsky 2017-11-22 12:52 ` Fengguang Wu 2017-11-23 2:59 ` Sergey Senozhatsky 2017-11-23 3:14 ` Fengguang Wu 2017-11-23 4:31 ` Sergey Senozhatsky 2017-11-29 0:13 ` Kevin Hilman 2017-11-29 7:25 ` Sergey Senozhatsky 2017-11-30 17:45 ` Kevin Hilman 2017-12-01 1:25 ` Sergey Senozhatsky 2017-11-23 10:04 ` Petr Mladek 2017-11-22 20:22 ` Kevin Hilman 2017-11-22 14:10 ` Fengguang Wu 2017-12-05 15:55 ` Petr Mladek 2017-12-05 16:13 ` Sergey Senozhatsky 2017-12-05 20:54 ` Steven Rostedt 2017-12-06 13:54 ` Petr Mladek
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.