[RFC,v2,0/4] make jbd2 debug switch per device
mbox series

Message ID cover.1611402263.git.brookxu@tencent.com
Headers show
Series
  • make jbd2 debug switch per device
Related show

Message

brookxu Jan. 23, 2021, noon UTC
On a multi-disk machine, because jbd2 debugging switch is global, this
confuses the logs of multiple disks. It is not easy to distinguish the
logs of each disk and the amount of generated logs is very large. Maybe
a separate debugging switch for each disk would be better, so that we
can easily distinguish the logs of a certain disk. 

We can enable jbd2 debugging of a device in the following ways:
echo X > /proc/fs/jbd2/sdX/jbd2_debug

But there is a small disadvantage here. Because the debugging switch is
placed in the journal_t object, the log before the object is initialized
will be lost. However, usually this will not have much impact on
debugging.

Chunguang Xu (4):
  jbd2: make jdb2_debug module parameter per device
  jbd2: introduce some new log interfaces
  jbd2: replace jbd_debug with the new log interface
  ext4: replace jbd_debug with the new log interface

 fs/ext4/balloc.c      |   2 +-
 fs/ext4/ext4_jbd2.c   |   3 +-
 fs/ext4/fast_commit.c |  64 +++++++++++++-----------
 fs/ext4/indirect.c    |   4 +-
 fs/ext4/inode.c       |   3 +-
 fs/ext4/namei.c       |  10 ++--
 fs/ext4/super.c       |  16 +++---
 fs/jbd2/checkpoint.c  |   6 +--
 fs/jbd2/commit.c      |  36 ++++++-------
 fs/jbd2/journal.c     | 114 +++++++++++++++++++++++++++++-------------
 fs/jbd2/recovery.c    |  59 +++++++++++-----------
 fs/jbd2/revoke.c      |   8 +--
 fs/jbd2/transaction.c |  35 ++++++-------
 include/linux/jbd2.h  |  66 +++++++++++++++++-------
 14 files changed, 258 insertions(+), 168 deletions(-)

Comments

Theodore Ts'o Jan. 25, 2021, 9:50 p.m. UTC | #1
On Sat, Jan 23, 2021 at 08:00:42PM +0800, Chunguang Xu wrote:
> On a multi-disk machine, because jbd2 debugging switch is global, this
> confuses the logs of multiple disks. It is not easy to distinguish the
> logs of each disk and the amount of generated logs is very large. Maybe
> a separate debugging switch for each disk would be better, so that we
> can easily distinguish the logs of a certain disk. 
> 
> We can enable jbd2 debugging of a device in the following ways:
> echo X > /proc/fs/jbd2/sdX/jbd2_debug
> 
> But there is a small disadvantage here. Because the debugging switch is
> placed in the journal_t object, the log before the object is initialized
> will be lost. However, usually this will not have much impact on
> debugging.

The jbd debugging infrastructure dates back to the very beginnings of
ext3, when Stephen Tweedie added them while he was first implementing
the jbd layer.  So this dates back to a time before we had other
schemes like dynamic debug or tracepoints or eBPF.

I wonder if instead of trying to enhance our own bespoke debugging
system, instead we set up something like tracepoints where they would
be useful.  I'm not proposing that we try to replace all jbd_debug()
statements with tracepoints but I think it would be useful to look at
what sort of information would actually be *useful* on a production
server, and add those tracepoints to the jbd2 layer.  What I like
about tracepoints is you can enable them on a much more fine-grained
fashion; information is sent to userspace in a much more efficient
manner than printk; you can filter tracepoint events in the kernel,
before sending them to userspace; and if you want more sophisticated
filtering or aggregation, you can use eBPF.

What was the original use case which inspired this?  Were you indeed
trying to debug some kind of problem on a production system?  (Why did
you have multiple disks active at the same time?)  Was there a
specific problem you were trying to debug?  What debug level were you
using?  Which jbd_debug statements were most useful to you?  Which
just got in the way (but which had to be enabled given the log level
you needed to get the debug messages that you needed)?

    	      	      	    	     	      - Ted
brookxu Jan. 26, 2021, 12:50 a.m. UTC | #2
Theodore Ts'o wrote on 2021/1/26 5:50:
> On Sat, Jan 23, 2021 at 08:00:42PM +0800, Chunguang Xu wrote:
>> On a multi-disk machine, because jbd2 debugging switch is global, this
>> confuses the logs of multiple disks. It is not easy to distinguish the
>> logs of each disk and the amount of generated logs is very large. Maybe
>> a separate debugging switch for each disk would be better, so that we
>> can easily distinguish the logs of a certain disk. 
>>
>> We can enable jbd2 debugging of a device in the following ways:
>> echo X > /proc/fs/jbd2/sdX/jbd2_debug
>>
>> But there is a small disadvantage here. Because the debugging switch is
>> placed in the journal_t object, the log before the object is initialized
>> will be lost. However, usually this will not have much impact on
>> debugging.
> 
> The jbd debugging infrastructure dates back to the very beginnings of
> ext3, when Stephen Tweedie added them while he was first implementing
> the jbd layer.  So this dates back to a time before we had other
> schemes like dynamic debug or tracepoints or eBPF.
> I wonder if instead of trying to enhance our own bespoke debugging
> system, instead we set up something like tracepoints where they would
> be useful.  I'm not proposing that we try to replace all jbd_debug()
> statements with tracepoints but I think it would be useful to look at
> what sort of information would actually be *useful* on a production
> server, and add those tracepoints to the jbd2 layer.  What I like
> about tracepoints is you can enable them on a much more fine-grained
> fashion; information is sent to userspace in a much more efficient
> manner than printk; you can filter tracepoint events in the kernel,
> before sending them to userspace; and if you want more sophisticated
> filtering or aggregation, you can use eBPF.

trace point, eBPF and other hook technologies are better for production
environments. But for pure debugging work, adding hook points feels a bit
heavy. However, your suggestion is very valuable, thank you very much.

> What was the original use case which inspired this?  Were you indeed
> trying to debug some kind of problem on a production system?  (Why did
> you have multiple disks active at the same time?)  Was there a
> specific problem you were trying to debug?  What debug level were you
> using?  Which jbd_debug statements were most useful to you?  Which
> just got in the way (but which had to be enabled given the log level
> you needed to get the debug messages that you needed)?

We only do this in the test environment, mainly to facilitate debugging.
We will dynamically adjust the log level, sometimes it is 1, sometimes
higher. There are two main reasons for multiple disks working at the same
time. The first is that the system management tool will update the system
disk, and the second is that the collaborative task will update other
disks. During the actual debugging, we added more additional logs. The
original logs of the system are useful, but some logs don't feel very
meaningful. Thanks.

>     	      	      	    	     	      - Ted
>
Theodore Ts'o Jan. 27, 2021, 4:21 p.m. UTC | #3
On Tue, Jan 26, 2021 at 08:50:02AM +0800, brookxu wrote:
> 
> trace point, eBPF and other hook technologies are better for production
> environments. But for pure debugging work, adding hook points feels a bit
> heavy. However, your suggestion is very valuable, thank you very much.

What feels heavy?  The act of adding a new jbd_debug() statement to
the sources, versus adding a new tracepoint?  Or how to enable a set
of tracepoints versus setting a jbd_debug level (either globally, or
per mount point)?  Or something else?

If it's the latter (which is what I think it is), how often are you
needing to add a new jbd_debug() statement *and* needing to run in a
test environment where you have multiple disks?  How often is it
useful to have multiple disks when doing your debugging?

I'm trying to understand why this has been useful to you, since that
generally doesn't match with my development, testing, or debugging
experience.  In general I try to test with one file system at a time,
since I'm trying to find something reproducible.  Do you have cases
where you need multiple file systems in your test environment in order
to do your testing?  Why is that?  Is it because you're trying to use
your production server code as your test reproducers?  And if so, I
would have thought adding the jbd_debug() statements and sending lots
of console print messages would distort the timing enough to make it
hard to reproduce a problem in found in your production environment.

It sounds like you have a very different set of test practices than
what I'm used to, and I'm trying to understand it better.

Cheers,

						- Ted
brookxu Jan. 28, 2021, 11:39 a.m. UTC | #4
Theodore Ts'o wrote on 2021/1/28 0:21:
> On Tue, Jan 26, 2021 at 08:50:02AM +0800, brookxu wrote:
>>
>> trace point, eBPF and other hook technologies are better for production
>> environments. But for pure debugging work, adding hook points feels a bit
>> heavy. However, your suggestion is very valuable, thank you very much.
> 
> What feels heavy?  The act of adding a new jbd_debug() statement to
> the sources, versus adding a new tracepoint?  Or how to enable a set
> of tracepoints versus setting a jbd_debug level (either globally, or
> per mount point)?  Or something else?

Sorry, I didn't make it clear here. I mean the amount of code modification
and data analysis. Since we mainly do some process confirmation, if it is
to add trace points, the amount of code is relatively large, if it is to
add log, it is relatively simple. Secondly, the modification of the kernel
and analysis scripts is relatively simple.

> If it's the latter (which is what I think it is), how often are you
> needing to add a new jbd_debug() statement *and* needing to run in a
> test environment where you have multiple disks?  How often is it
> useful to have multiple disks when doing your debugging?


We don't use JBD2_DEBUG much in our work. In most cases, we tend to add
hook points and analyze data from hook points. But here because it is a
process confirmation, if the hook point method is adopted, there are more
hook points and the workload is relatively large. Secondly, these hook
points are not needed in the production environment, maybe it is a waste
of time.

> I'm trying to understand why this has been useful to you, since that
> generally doesn't match with my development, testing, or debugging
> experience.  In general I try to test with one file system at a time,
> since I'm trying to find something reproducible.  Do you have cases
> where you need multiple file systems in your test environment in order
> to do your testing?  Why is that?  Is it because you're trying to use
> your production server code as your test reproducers?  And if so, I
> would have thought adding the jbd_debug() statements and sending lots
> of console print messages would distort the timing enough to make it
> hard to reproduce a problem in found in your production environment.


In our mixed deployment production environment, we occasionally find that
containers will have priority inversion problems, that is, low-priority
containers will affect the Qos of high-priority containers. We try to do
something to make ext4 work better in the container scene. After a basic
test, we will use the business program to test, because the IO behavior
of the business program is relatively more complicated. It is worth noting
that here we are mainly concerned with the correctness of the process, not
particularly concerned with performance.

> It sounds like you have a very different set of test practices than
> what I'm used to, and I'm trying to understand it better.

:), Perhaps my verification method is not optimal, but I found that jbd2
has a similar framework, and tried to use it, and then found that some
things can be optimized.
> Cheers,
> 
> 						- Ted
>