All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Miller <davem@davemloft.net>
To: sparclinux@vger.kernel.org
Subject: Re: [PATCH 7/7] sparc64: Add function graph tracer support.
Date: Fri, 16 Apr 2010 20:47:01 +0000	[thread overview]
Message-ID: <20100416.134701.106783257.davem@davemloft.net> (raw)
In-Reply-To: <20100412.234300.212396783.davem@davemloft.net>

From: Frederic Weisbecker <fweisbec@gmail.com>
Date: Fri, 16 Apr 2010 17:44:21 +0200

> """(note the hrtimer warnings are normals. This is a hanging prevention
> that has been added because of the function graph tracer first but
> eventually serves as a general protection for hrtimer. It's about
> similar to the balancing problem scheme: the time to service timers
> is so slow that timers re-expire before we exit the servicing loop,
> so we risk an endless loop)."""

I don't think it's normal in this case, I suspect we loop because
of some kind of corruption.

> That said it also means there is a problem I think. It's normal
> that it happens in a guest, but not a normal box. May be there
> a contention in the tracer fast path that slows down the machine.

I think it's looping not because of contention, but because of
corrupted memory/registers.

> Do you have CONFIG_DEBUG_LOCKDEP enabled? This was one of the
> sources of these contentions (fixed lately in -tip but for
> .35).

I'm using PROVE_LOCKING but not DEBUG_LOCKDEP.

Anyways, consistently my machine crashes with completely corrupted
registers in either irq_exit() or __do_softirq().  Usually we get an
unaligned access of some sort, either accessing the stack (because %fp
is garbage) or via an indirect call (usually because %i7 is garbage).

One thing that's interesting about the softirq path is that it uses
the softirq stack.  The only thing that guards us jumping onto the
softirq_stack are the tests done by do_softirq(), mainly
!in_interrupt() and we have softirqs pending.

What if preempt_count() got corrupted in such a way that we end up
evaluating in_interrupt() to zero when we shouldn't?

If that happens, and this makes us jump onto the top of softirq stack
of the current cpu multiple times, that could cause some wild
corruptions.

Another thing I've noticed is that there appears to be some kind of
pattern to many of the register corruptions I've seen.  There is
a pattern of 64-bit values that often looks like this (in memory
order):

0xffffffffc3300000
0xffffffffc33000cc
0xffffffffc3d00000
0xffffffffc3d000cc
0xffffffffc4000000
0xffffffffc40000cc

and, from another trace:

0xffffffffc6100000
0xffffffffc61000cc
0xffffffffc6a00000
0xffffffffc6a000cc
0xffffffffc6e00000
0xffffffffc6e000cc

They look like some kind of descriptor.  The closest thing I could
find were the scatter-gather descriptors used by the Fusion mptsas
driver, but I can't find a way that the descriptors would be formed
exactly like the above, but it does come close.

For example, drivers/message/fusion/mptscsih.c:mptscsih_qcmd()
has this call:

		ioc->add_sge((char *)&pScsiReq->SGL,
			MPT_SGE_FLAGS_SSIMPLE_READ | 0,
			(dma_addr_t) -1);

which puts -1 into the address field, but this doesn't exactly line up
because the 32-bit SGE descriptors are in the order "flags" then
"address" not the other way around.

Ho hum... anyways, just looking for clues.  If those are mptsas
descriptors, then it would be consistent with how I've found that the
block I/O path seems to invariably be involved during the crashes.
In another trace (that time with PROVE_LOCKING disabled) I saw
the host->host_lock passed down into spin_lock_irqsave() being NULL.
And this was in the software interrupt handler.

  parent reply	other threads:[~2010-04-16 20:47 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-13  6:43 [PATCH 7/7] sparc64: Add function graph tracer support David Miller
2010-04-13 19:18 ` Frederic Weisbecker
2010-04-13 19:39 ` Rostedt
2010-04-13 19:45 ` Frederic Weisbecker
2010-04-13 21:34 ` David Miller
2010-04-13 21:35 ` David Miller
2010-04-13 21:51 ` Frederic Weisbecker
2010-04-13 21:52 ` Steven Rostedt
2010-04-13 21:56 ` David Miller
2010-04-13 21:57 ` David Miller
2010-04-13 21:57 ` Frederic Weisbecker
2010-04-13 22:05 ` Frederic Weisbecker
2010-04-13 22:11 ` David Miller
2010-04-13 23:34 ` David Miller
2010-04-13 23:56 ` David Miller
2010-04-14  1:59 ` David Miller
2010-04-14  9:04 ` David Miller
2010-04-14 15:29 ` Frederic Weisbecker
2010-04-14 15:48 ` Frederic Weisbecker
2010-04-14 23:08 ` David Miller
2010-04-16  9:12 ` David Miller
2010-04-16 15:44 ` Frederic Weisbecker
2010-04-16 20:47 ` David Miller [this message]
2010-04-16 22:51 ` David Miller
2010-04-16 23:14 ` Frederic Weisbecker
2010-04-16 23:17 ` David Miller
2010-04-17  7:51 ` David Miller
2010-04-17 16:59 ` Frederic Weisbecker
2010-04-17 17:22 ` Frederic Weisbecker
2010-04-17 21:24 ` David Miller
2010-04-17 21:25 ` David Miller
2010-04-17 21:29 ` David Miller
2010-04-17 21:34 ` Frederic Weisbecker
2010-04-17 21:38 ` Frederic Weisbecker
2010-04-17 21:38 ` David Miller
2010-04-17 21:41 ` Frederic Weisbecker
2010-04-18 15:31 ` Frederic Weisbecker
2010-04-18 21:19 ` David Miller
2010-04-19  7:56 ` David Miller
2010-04-19  8:15 ` David Miller
2010-04-19 19:52 ` Frederic Weisbecker
2010-04-19 19:56 ` David Miller
2010-04-19 20:37 ` Frederic Weisbecker
2010-04-20  5:51 ` David Miller
2010-04-20  7:50 ` David Miller
2010-04-20 13:58 ` Frederic Weisbecker
2010-04-20 21:17 ` David Miller
2010-04-20 22:52 ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100416.134701.106783257.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.