lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] lctl debug_kernel: usability improvements?
@ 2023-04-10 14:19 Bertschinger, Thomas Andrew Hjorth via lustre-devel
  2023-04-17 14:55 ` Bertschinger, Thomas Andrew Hjorth via lustre-devel
  0 siblings, 1 reply; 2+ messages in thread
From: Bertschinger, Thomas Andrew Hjorth via lustre-devel @ 2023-04-10 14:19 UTC (permalink / raw)
  To: lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 2115 bytes --]

Hello,

Recently when using "lctl dk" I have found myself wanting some of the "quality of life" features that exist in the similar linux tool dmesg. In particular, having the ability to "follow" the debug log like "dmesg -w" would be very handy IMO.

I've attempted to implement this in userspace with the existing tooling (using "lctl debug_daemon" to write the encoded log to a file, and "lctl debug_file" to decode it) but have run into challenges. I first tried creating a FIFO and had debug_daemon write to it and debug_file read from it. Unfortunately this fails because the kernel thread that writes to this file (tracefiled in libcfs/libcfs/tracefile.c) repeatedly opens and closes the file, but after the first close reading the FIFO fails.

My next idea was to have debug_daemon write to a regular file and debug_file read it like "tail -f". This should work in theory but has disadvantages: the user must remember to delete the file when done (the tool could do this but not if it exits uncleanly), and also entries could be missed if the file is deleted while the tool is still running.

I think the cleanest solution is to rework the debug_kernel interface to be like linux's /dev/kmsg. A character device, perhaps named /dev/lmsg, could be created that outputs the buffer contents when read. Implementing "follow" would be trivial with this interface. The existing userspace tools could also easily be updated to use this interface, and it would bring other benefits, for example "lctl dk" not needing to copy the message buffer to a tmp file. The disadvantage here is that this could be a significant kernel-side refactor.

I feel the ability to follow Lustre's debug log would be useful to both sysadmins and developers but want to get some other input. Would this be valuable to anyone? If this would be useful -- and feasible -- I would be happy to submit a JIRA ticket and work on a patch but wanted to get some more opinions. I'm not very familiar with the kernel side code yet so I'm not sure how complicated this would be.

- Thomas Bertschinger
________________________________

[-- Attachment #1.2: Type: text/html, Size: 3651 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [lustre-devel] lctl debug_kernel: usability improvements?
  2023-04-10 14:19 [lustre-devel] lctl debug_kernel: usability improvements? Bertschinger, Thomas Andrew Hjorth via lustre-devel
@ 2023-04-17 14:55 ` Bertschinger, Thomas Andrew Hjorth via lustre-devel
  0 siblings, 0 replies; 2+ messages in thread
From: Bertschinger, Thomas Andrew Hjorth via lustre-devel @ 2023-04-17 14:55 UTC (permalink / raw)
  To: lustre-devel


[-- Attachment #1.1: Type: text/plain, Size: 3851 bytes --]

After looking into this more I see there is a fair challenge associated with this idea. With each CPU (and execution context) having its own distinct buffer for messages (struct cfs_trace_cpu_data), the messages are not chronologically sorted in kernel memory. Instead, they are written to a regular file in CPU order and then the sorted chronologically in userspace prior to printing.

Implementing a  "/dev/lmsg" device would be challenging with the existing data structures because sorting would have to happen in kernel space as the device responds to reads.

I found this issue: LU-14428 "Convert tracefile to use ring_buffer from linux" which does not look to be completed, seeing as a ring_buffer is not currently in use here -- but if this is completed, implementing "/dev/lmsg" with the same interface as "/dev/kmsg" would be much simpler. (Assuming I understand correctly that the proposal is a single global ring buffer. Let me know if I am mistaken and the proposal is a set of per-CPU ring buffers, because then the sorting problem is not avoided.)

I reported a new issue LU-16746 "Convert tracefile to export debug logs via character device" for this idea.This can be worked on after LU-14428 is completed. If I can be of assistance on LU-14428 by helping with any sub-tasks, let me know. I am interested in helping with this area of Lustre.

Thanks,

Thomas Bertschinger

________________________________
From: lustre-devel <lustre-devel-bounces@lists.lustre.org> on behalf of Bertschinger, Thomas Andrew Hjorth via lustre-devel <lustre-devel@lists.lustre.org>
Sent: Monday, April 10, 2023 8:19 AM
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] lctl debug_kernel: usability improvements?


Hello,

Recently when using "lctl dk" I have found myself wanting some of the "quality of life" features that exist in the similar linux tool dmesg. In particular, having the ability to "follow" the debug log like "dmesg -w" would be very handy IMO.

I've attempted to implement this in userspace with the existing tooling (using "lctl debug_daemon" to write the encoded log to a file, and "lctl debug_file" to decode it) but have run into challenges. I first tried creating a FIFO and had debug_daemon write to it and debug_file read from it. Unfortunately this fails because the kernel thread that writes to this file (tracefiled in libcfs/libcfs/tracefile.c) repeatedly opens and closes the file, but after the first close reading the FIFO fails.

My next idea was to have debug_daemon write to a regular file and debug_file read it like "tail -f". This should work in theory but has disadvantages: the user must remember to delete the file when done (the tool could do this but not if it exits uncleanly), and also entries could be missed if the file is deleted while the tool is still running.

I think the cleanest solution is to rework the debug_kernel interface to be like linux's /dev/kmsg. A character device, perhaps named /dev/lmsg, could be created that outputs the buffer contents when read. Implementing "follow" would be trivial with this interface. The existing userspace tools could also easily be updated to use this interface, and it would bring other benefits, for example "lctl dk" not needing to copy the message buffer to a tmp file. The disadvantage here is that this could be a significant kernel-side refactor.

I feel the ability to follow Lustre's debug log would be useful to both sysadmins and developers but want to get some other input. Would this be valuable to anyone? If this would be useful -- and feasible -- I would be happy to submit a JIRA ticket and work on a patch but wanted to get some more opinions. I'm not very familiar with the kernel side code yet so I'm not sure how complicated this would be.

- Thomas Bertschinger
________________________________

[-- Attachment #1.2: Type: text/html, Size: 6095 bytes --]

[-- Attachment #2: Type: text/plain, Size: 165 bytes --]

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-04-17 14:58 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-10 14:19 [lustre-devel] lctl debug_kernel: usability improvements? Bertschinger, Thomas Andrew Hjorth via lustre-devel
2023-04-17 14:55 ` Bertschinger, Thomas Andrew Hjorth via lustre-devel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).