All of lore.kernel.org
 help / color / mirror / Atom feed
* [INFO] xfs: kernel debugging with trace_printk()
@ 2016-09-06  8:20 Dave Chinner
  2016-09-13 15:02 ` Joshua Kinard
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2016-09-06  8:20 UTC (permalink / raw)
  To: linux-xfs; +Cc: xfs

Hi folks,

I learnt something today that made my life immediately better:
trace_printk(). After I discovered it, Darrick mentioned on #xfs
that he didn't know about it either, hence I figured I should send
an email to the list so that everyone working with XFS knows about
it....

Yes, this community service starts with me admitting I am a heavy
user of printk() for debugging code, and I'm about to go cold
turkey.

"Me Ugg, Caveman. Big Club mine. Hit Stuff Hard. Fix It Good."

But, really, everyone does the same thing - anyone what says they
don't is lying. :P

You all know the drill: add a printk, recompile, test, see what gets
output on the console, go back to step one. It starts like this:

Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.

Then if there's lots of output, you turn off the console output with
'dmesg -n X' and look at the output in /var/log/messages instead, or
directly run dmesg to see it. You spend time wading through reams of
output.

Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.

Then you start going mad because the code runs so slow and it's
extremely hard to correlate with other diagnostic information (such
as trace events) that you're also using to try work out what is
going on. And then you end up adding so much debug that it turns
into a heisenbug and you think you're going insane...

Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.

And, finally, when you think you've fixed the problem, you remove
the debug printk()s and find the problem is still there....

Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.
Eat, sleep, printk(), repeat.

So, avoid the existential crisis and the hard liquor and drugs:
use trace_printk() and `trace-cmd record -e printk` to maintain your
sanity!

(You all know about trace-cmd, right?)

The name makes it obvious: it's printk() for trace events which lets
you add custom output to the trace event stream. It even
automatically adds the function the trace_printk is located in to
the output, so it's easy to identify and filter.

For example, I'm trying to debug a new extent list interval tree
index, and so I need to see all the existing extent list trace
points, and also see fine grained debug on the tree operations to
work out where it's going wrong. I'm running tests that push tens of
thousands of extents into the tree, so there's a /lot/ of debug.


I don't want to write tracepoints yet, because I don't know what I
need, and tracepoints are a pain to write. So, I've ended up with
code like this:

.....
	/* check the lookaside cache first */
	if (ext->last_hdr &&
	    idx >= ext->last_hdr->idx &&
	    idx < ext->last_hdr->idx + ext->last_hdr->count) {
		hdr = ext->last_hdr;
		trace_printk("lookaside found %p, idx %d, count %d, end %d",
			hdr, hdr->idx, hdr->count, hdr->end);
		return hdr;
	}

	trace_printk("find idx %d", idx);
	while (node) {
		struct xfs_iext_hdr *rbhdr =
			rb_entry(node, struct xfs_iext_hdr, rb);

	trace_printk("find rbhdr %p, idx %d, count %d, end %d",
		rbhdr, rbhdr->idx, rbhdr->count, rbhdr->end);

		if (node->rb_left &&
		    idx < interval_end(node->rb_left)) {
			/* continue searching left */
			node = node->rb_left;
.....

And when I run a workload and trace it via something like this:

# trace-cmd record -e xfs_iext\* -e xfs_alloc\* -e xfs_bun\* -e xfs_bmap\* -e xfs_iomap\* -e xfs_file\* -e printk

I end up with an event report that looks like this from generic/017.
It runs for *4-5 seconds* before assert failing, and this is at the
end of about /1.5GB/ of diagnostic trace report output:

          xfs_io-11186 [004]    97.303199: bprint:               xfs_iext_find_extbuf: lookaside found 0xffff880239cff000, idx 631, count 253, end 885
          xfs_io-11186 [004]    97.303203: bprint:               xfs_iext_find_extbuf: find idx 884
          xfs_io-11186 [004]    97.303203: bprint:               xfs_iext_find_extbuf: find rbhdr 0xffff8808395e4000, idx 1390, count 253, end 4740
          xfs_io-11186 [004]    97.303205: bprint:               xfs_iext_find_extbuf: find rbhdr 0xffff880239c6a000, idx 379, count 252, end 1390
       trace-cmd-4840  [004]    97.303227: bprint:               xfs_iext_find_extbuf: lookaside found 0xffff880139d94000, idx 0, count 2, end 2
       trace-cmd-4840  [004]    97.303233: bprint:               xfs_iext_find_extbuf: lookaside found 0xffff880139d94000, idx 0, count 2, end 2
       trace-cmd-4840  [004]    97.303234: bprint:               xfs_iext_find_extbuf: lookaside found 0xffff880139d94000, idx 0, count 2, end 2
       trace-cmd-4840  [004]    97.303237: xfs_file_buffered_write: dev 253:32 ino 0x1b2a8 size 0x51000 offset 0xf6c000 count 0x1000
       trace-cmd-4840  [004]    97.303238: bprint:               xfs_iext_bno_to_ext: lookaside found 0xffff880139d94000 idx 0, count 2, end 2 offset 3948
       trace-cmd-4840  [004]    97.303239: bprint:               xfs_iext_find_extbuf: lookaside found 0xffff880139d94000, idx 0, count 2, end 2
       trace-cmd-4840  [004]    97.303240: xfs_iomap_found:      dev 253:32 ino 0x1b2a8 size 0x51000 offset 0xf6c000 count 4096 type  startoff 0x70 startblock 4503599627239444 blockcount 0xf80
          xfs_io-11186 [004]    97.303254: bprint:               xfs_iext_find_extbuf: find rbhdr 0xffff880239cf8000, idx 884, count 253, end 1390
          xfs_io-11186 [004]    97.303255: bprint:               xfs_iext_find_extbuf: find rbhdr 0xffff880239cff000, idx 631, count 253, end 885
          xfs_io-11186 [004]    97.303256: bprint:               xfs_iext_find_extbuf: found 0xffff880239cff000, idx 631, count 253, end 885
          xfs_io-11186 [004]    97.303269: console:              [   97.303261] XFS: Assertion failed: idx - hdr->idx < hdr->count, file: fs/xfs/libxfs/xfs_inode_fork.c, line: 451

Oh, did I mention that kernel console output also shows up in
the trace? And that, by default, tracing stops when a oops or BUG()
is triggered? So the last event in the trace is the ASSERT() failure
that I'm trying to debug....

>From the above, I can clearly see that idx = 884, hdr->idx = 631,
and hdr->count = 253. And 884 - 631 < 253 is clearly false - there's
an off by one somewhere in the new code....

IOWs, I have all my debug printk statements, all the XFS events and
the assert failure all in the one event trace and so it's trivial to
work out why the assert failed. No more guessing!

But the best bit is still to come: When you are not tracing, the
debug trace_printk()s have zero overhead - the code runs like they
aren't there. No more recompiling to add/remove debug printks for
test runs. And with event triggers you can also control your debug
trace_printk output dynamically when specific tracepoints are
encountered.....

Enjoy!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [INFO] xfs: kernel debugging with trace_printk()
  2016-09-06  8:20 [INFO] xfs: kernel debugging with trace_printk() Dave Chinner
@ 2016-09-13 15:02 ` Joshua Kinard
  2016-09-13 21:20   ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Joshua Kinard @ 2016-09-13 15:02 UTC (permalink / raw)
  To: xfs

On 09/06/2016 04:20, Dave Chinner wrote:
> Hi folks,
> 
> I learnt something today that made my life immediately better:
> trace_printk(). After I discovered it, Darrick mentioned on #xfs
> that he didn't know about it either, hence I figured I should send
> an email to the list so that everyone working with XFS knows about
> it....
> 
> Yes, this community service starts with me admitting I am a heavy
> user of printk() for debugging code, and I'm about to go cold
> turkey.
> 
> "Me Ugg, Caveman. Big Club mine. Hit Stuff Hard. Fix It Good."
> 
> But, really, everyone does the same thing - anyone what says they
> don't is lying. :P
> 
> You all know the drill: add a printk, recompile, test, see what gets
> output on the console, go back to step one. It starts like this:
> 
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> 
> Then if there's lots of output, you turn off the console output with
> 'dmesg -n X' and look at the output in /var/log/messages instead, or
> directly run dmesg to see it. You spend time wading through reams of
> output.
> 
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> 
> Then you start going mad because the code runs so slow and it's
> extremely hard to correlate with other diagnostic information (such
> as trace events) that you're also using to try work out what is
> going on. And then you end up adding so much debug that it turns
> into a heisenbug and you think you're going insane...
> 
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> 
> And, finally, when you think you've fixed the problem, you remove
> the debug printk()s and find the problem is still there....
> 
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> Eat, sleep, printk(), repeat.
> 
> So, avoid the existential crisis and the hard liquor and drugs:
> use trace_printk() and `trace-cmd record -e printk` to maintain your
> sanity!
> 
> (You all know about trace-cmd, right?)
> 

[snip]

But this assumes one can even get to a working shell prompt from which one can
run commands, like 'trace-cmd'.  I've been trying to hunt down and fix various
bugs in running Linux on some of the old SGI machines (which is an appropriate
statement for this list), and some of them, you never even get as far as a
working shell prompt that accepts keyboard or serial input (thanks, IOC3!).

Usually, I'm not only doing the "Eat, sleep, printk(), repeat." dance.  In some
instances, my only workable debugging aid is printk() (or actually the pr_*()
versions), and a DSLR mounted on a tripod with a fast shutter speed, high ISO,
and a remote trigger to try and catch error messages as they zoom by on the
screen.  Luckily, I have a really old LCD flat panel that has a slow refresh
rate, so I can sometimes read the fading text of the errors as they're
overwritten by new output.

Now I just need to figure out how to write a kgdb shim driver for the IOC3 chip
so I can get GDB to do remote debugging over serial.  That will *really* help
hunt down some of the really screwy bugs I've run into (like Onyx2's mysterious
hard-lockup NUMA bug).

And yes, the SGI machines that do work well all use XFS as their Linux
filesystem of choice :)

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [INFO] xfs: kernel debugging with trace_printk()
  2016-09-13 15:02 ` Joshua Kinard
@ 2016-09-13 21:20   ` Dave Chinner
  0 siblings, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2016-09-13 21:20 UTC (permalink / raw)
  To: Joshua Kinard; +Cc: xfs

On Tue, Sep 13, 2016 at 11:02:01AM -0400, Joshua Kinard wrote:
> On 09/06/2016 04:20, Dave Chinner wrote:
> [snip]
> 
> But this assumes one can even get to a working shell prompt from which one can
> run commands, like 'trace-cmd'.

Well, yes. This is, after all, a filesystem development discussion
list, not a "I've got b0rken old hardware that won't boot" support
list, so the assumption that there is a working shell available for
debugging is a pretty good one to make...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-13 21:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06  8:20 [INFO] xfs: kernel debugging with trace_printk() Dave Chinner
2016-09-13 15:02 ` Joshua Kinard
2016-09-13 21:20   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.