All of lore.kernel.org
 help / color / mirror / Atom feed
* Debugging Thinkpad T430s occasional suspend failure.
@ 2013-02-12 19:39 Dave Jones
  2013-02-12 20:13 ` Linus Torvalds
  2013-02-13  0:26 ` Hugh Dickins
  0 siblings, 2 replies; 50+ messages in thread
From: Dave Jones @ 2013-02-12 19:39 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Linus Torvalds

My Thinkpad T430s suspend/resumes fine most of the time. But every so often
(like one in ten times or so), as soon as I suspend, I get a black screen,
and a blinking power button.

(Note: Not the capslock lights like when we panic, this laptop 'conveniently
 doesn't have those. This is the light surrounding the power button, which afaik
 isn't even OS controlled, so maybe we're dying somewhere in SMI/BIOS land?)

I tried debugging this with pm_trace, which told me..

[    4.576035]   Magic number: 0:455:740
[    4.576037]   hash matches drivers/base/power/main.c:645

Which points me at..

 642  Complete:
 643         complete_all(&dev->power.completion);
 644 
 645         TRACE_RESUME(error);
 646 
 647         return error;
 648 }

The only thing interesting here I think is that this is the resume path.
So perhaps something failed to suspend, and we tried to back out of suspending,
but something was too screwed up to abort cleanly ?

I've tried hooking up a serial console, and even tried console_noblank, 
which yielded no additional info at all. (I'm guessing the consoles are suspended
at the time of panic)

I also tried unloading all the modules I have loaded before the suspend, which
seemed to reduce the chances of it happening, but eventually it reoccurred.

Any ideas on how I can further debug this ?

	Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-12 19:39 Debugging Thinkpad T430s occasional suspend failure Dave Jones
@ 2013-02-12 20:13 ` Linus Torvalds
  2013-02-13  0:26 ` Hugh Dickins
  1 sibling, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2013-02-12 20:13 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, Linus Torvalds

On Tue, Feb 12, 2013 at 11:39 AM, Dave Jones <davej@redhat.com> wrote:
> My Thinkpad T430s suspend/resumes fine most of the time. But every so often
> (like one in ten times or so), as soon as I suspend, I get a black screen,
> and a blinking power button.
>
> (Note: Not the capslock lights like when we panic, this laptop 'conveniently
>  doesn't have those. This is the light surrounding the power button, which afaik
>  isn't even OS controlled, so maybe we're dying somewhere in SMI/BIOS land?)

Yeah, the blinking power light is a feature of the chipset, the SMI
code sets a magic bit in one the register and it will pulse a pin at a
given frequency so that you get the "power light blinking while
suspended" thing.

So the suspend finished, and

> I tried debugging this with pm_trace, which told me..
>
> [    4.576035]   Magic number: 0:455:740
> [    4.576037]   hash matches drivers/base/power/main.c:645
>
> Which points me at..
>
>  642  Complete:
>  643         complete_all(&dev->power.completion);
>  644
>  645         TRACE_RESUME(error);
>  646
>  647         return error;
>  648 }

I suspect it's the last tracepoint, and the kernel thinks it
sucessfully resumed all devices. You *should* be able to match the
magic number with the last device too, but that's only interesting if
you get the hash matching *before* the device is resumed (ie you can
try to figure out if the resume hung in the device resume list). And
it only works if it gets a matching name on the dpm_list (see
show_dev_hash), and it apparently didn't. I suspect it's some system
device and not interesting, and you really just hit the last entry in
the resume tree.

> The only thing interesting here I think is that this is the resume path.
> So perhaps something failed to suspend, and we tried to back out of suspending,
> but something was too screwed up to abort cleanly ?

Yes, the trace is definitely in the resume path. And maybe we have something

> I've tried hooking up a serial console, and even tried console_noblank,
> which yielded no additional info at all. (I'm guessing the consoles are suspended
> at the time of panic)

serial consoles and even nonblanking consoles seldom tend to work well
for suspend debugging. It *has* happened, but it's rare.

> I also tried unloading all the modules I have loaded before the suspend, which
> seemed to reduce the chances of it happening, but eventually it reoccurred.
>
> Any ideas on how I can further debug this ?

The design of the TRACE_RESUME() thing really is as a really poor mans
"printf()". IOW, the existing points are more "suggested starting
points" than anything else, and the idea is that you can start adding
more and more of them as you try to narrow down exactly where it
fails..

And it's painful has hell. Plus add too many of them, and you get hash
collisions etc. It's a last-ditch effort, but it exists mainly because
we have never really figured out anything better.

There's a reason I've asked Intel for better CPU lockup tracing
facilities for the last 10+ years ;)

                   Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-12 19:39 Debugging Thinkpad T430s occasional suspend failure Dave Jones
  2013-02-12 20:13 ` Linus Torvalds
@ 2013-02-13  0:26 ` Hugh Dickins
  2013-02-13  0:40   ` Dave Jones
  2013-02-13  2:17   ` Dave Jones
  1 sibling, 2 replies; 50+ messages in thread
From: Hugh Dickins @ 2013-02-13  0:26 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linus Torvalds, linux-kernel

On Tue, 12 Feb 2013, Dave Jones wrote:

> My Thinkpad T430s suspend/resumes fine most of the time. But every so often
> (like one in ten times or so), as soon as I suspend, I get a black screen,
> and a blinking power button.
> 
> (Note: Not the capslock lights like when we panic, this laptop 'conveniently
>  doesn't have those. This is the light surrounding the power button, which afaik
>  isn't even OS controlled, so maybe we're dying somewhere in SMI/BIOS land?)

Me too on T420s, except that is blessed with a blinking CapsLk.

It's so erratic (though I think I see more failures than you do: I'd say
a quick resume never fails, but an overnight resume fails half the time):
I'm afraid I didn't have the patience to embark on pm_trace at all.

I did try to bisect it during the -rc5 week.  I'm not sure, but I have
no record of seeing it on -rc1 or -rc2, but definitely saw it on -rc3.
So I tried bisecting between -rc2 and -rc3, persisting for a day if it
looked good; but the bisection didn't seem to be converging anywhere
likely when -rc6 came out, and I switched to see if -rc6 solved it.

I had no problem with -rc6; but with -rc7 it happens more than ever.
Though still not on "quick" resumes, the kind you want to do when
bisecting.

Sharing these anecdotes in case they match or diverge from your
experience and others, and might help towards finding the cause.
Not-to-be-trusted bisection log appended: of course the bads are
reliable, but perhaps none of the goods.

Hugh

> 
> I tried debugging this with pm_trace, which told me..
> 
> [    4.576035]   Magic number: 0:455:740
> [    4.576037]   hash matches drivers/base/power/main.c:645
> 
> Which points me at..
> 
>  642  Complete:
>  643         complete_all(&dev->power.completion);
>  644 
>  645         TRACE_RESUME(error);
>  646 
>  647         return error;
>  648 }
> 
> The only thing interesting here I think is that this is the resume path.
> So perhaps something failed to suspend, and we tried to back out of suspending,
> but something was too screwed up to abort cleanly ?
> 
> I've tried hooking up a serial console, and even tried console_noblank, 
> which yielded no additional info at all. (I'm guessing the consoles are suspended
> at the time of panic)
> 
> I also tried unloading all the modules I have loaded before the suspend, which
> seemed to reduce the chances of it happening, but eventually it reoccurred.
> 
> Any ideas on how I can further debug this ?
> 
> 	Dave

git bisect start
# good: [d1c3ed669a2d452cacfb48c2d171a1f364dae2ed] Linux 3.8-rc2
git bisect good d1c3ed669a2d452cacfb48c2d171a1f364dae2ed
# bad: [9931faca02c604c22335f5a935a501bb2ace6e20] Linux 3.8-rc3
git bisect bad 9931faca02c604c22335f5a935a501bb2ace6e20
# bad: [36a25de23359940b7713fc40cbcbb046b3797511] sctp: fix Kconfig bug in default cookie hmac selection
git bisect bad 36a25de23359940b7713fc40cbcbb046b3797511
# good: [49569646b2413ee1a4fb7c4537fca058ac22292e] Merge tag 'driver-core-3.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good 49569646b2413ee1a4fb7c4537fca058ac22292e
# good: [251741b130c4396d5076f8e0e685cd8883ba39c0] MAINTAINERS: fix drivers/ieee802154/
git bisect good 251741b130c4396d5076f8e0e685cd8883ba39c0
# good: [d0631c6e09f51e094ae5aec1eabe81cc63d78178] Merge branch 'akpm' (fixes from Andrew)
git bisect good d0631c6e09f51e094ae5aec1eabe81cc63d78178
# good: [5ce2955e04a80da7287dc12f32da7f870039bf8f] Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect good 5ce2955e04a80da7287dc12f32da7f870039bf8f

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13  0:26 ` Hugh Dickins
@ 2013-02-13  0:40   ` Dave Jones
  2013-02-13  0:56     ` Hugh Dickins
  2013-02-13  2:17   ` Dave Jones
  1 sibling, 1 reply; 50+ messages in thread
From: Dave Jones @ 2013-02-13  0:40 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, linux-kernel

On Tue, Feb 12, 2013 at 04:26:22PM -0800, Hugh Dickins wrote:
 > On Tue, 12 Feb 2013, Dave Jones wrote:
 > 
 > > My Thinkpad T430s suspend/resumes fine most of the time. But every so often
 > > (like one in ten times or so), as soon as I suspend, I get a black screen,
 > > and a blinking power button.
 > > 
 > > (Note: Not the capslock lights like when we panic, this laptop 'conveniently
 > >  doesn't have those. This is the light surrounding the power button, which afaik
 > >  isn't even OS controlled, so maybe we're dying somewhere in SMI/BIOS land?)
 > 
 > Me too on T420s, except that is blessed with a blinking CapsLk.
 > 
 > It's so erratic (though I think I see more failures than you do: I'd say
 > a quick resume never fails, but an overnight resume fails half the time):
 > I'm afraid I didn't have the patience to embark on pm_trace at all.

Sounds like yours successfully suspends all the time though ?
For me, as soon as I shut the lid, the moon light starts flashing indicating failure.

That said, we saw that I was failing on the resume path after a failed suspend,
so it could still be the same problem.

 > I did try to bisect it during the -rc5 week.  I'm not sure, but I have
 > no record of seeing it on -rc1 or -rc2, but definitely saw it on -rc3.
 > So I tried bisecting between -rc2 and -rc3, persisting for a day if it
 > looked good; but the bisection didn't seem to be converging anywhere
 > likely when -rc6 came out, and I switched to see if -rc6 solved it.
 > 
 > I had no problem with -rc6; but with -rc7 it happens more than ever.
 > Though still not on "quick" resumes, the kind you want to do when
 > bisecting.

so I can hit this pretty quickly with this..

while [ 1 ];
do
 rtcwake -s 5 -m no ; pm-suspend
done

Which should make it bisectable at least..

 > Sharing these anecdotes in case they match or diverge from your
 > experience and others, and might help towards finding the cause.
 > Not-to-be-trusted bisection log appended: of course the bads are
 > reliable, but perhaps none of the goods.

I didn't have this laptop pre 3.8rc, so I've not had a 'good' reference,
pretty sure I saw it during rc1 though, so I might have to go back further
a version or two.

	Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13  0:40   ` Dave Jones
@ 2013-02-13  0:56     ` Hugh Dickins
  2013-02-13  4:16       ` Dave Jones
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2013-02-13  0:56 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, linux-kernel

On Tue, 12 Feb 2013, Dave Jones wrote:
> On Tue, Feb 12, 2013 at 04:26:22PM -0800, Hugh Dickins wrote:
>  > On Tue, 12 Feb 2013, Dave Jones wrote:
>  > 
>  > > My Thinkpad T430s suspend/resumes fine most of the time. But every so often
>  > > (like one in ten times or so), as soon as I suspend, I get a black screen,
>  > > and a blinking power button.
>  > > 
>  > > (Note: Not the capslock lights like when we panic, this laptop 'conveniently
>  > >  doesn't have those. This is the light surrounding the power button, which afaik
>  > >  isn't even OS controlled, so maybe we're dying somewhere in SMI/BIOS land?)
>  > 
>  > Me too on T420s, except that is blessed with a blinking CapsLk.
>  > 
>  > It's so erratic (though I think I see more failures than you do: I'd say
>  > a quick resume never fails, but an overnight resume fails half the time):
>  > I'm afraid I didn't have the patience to embark on pm_trace at all.
> 
> Sounds like yours successfully suspends all the time though ?

So far as I can tell.

> For me, as soon as I shut the lid, the moon light starts flashing indicating failure.

My moon light shines steadily.

> 
> That said, we saw that I was failing on the resume path after a failed suspend,
> so it could still be the same problem.
> 
>  > I did try to bisect it during the -rc5 week.  I'm not sure, but I have
>  > no record of seeing it on -rc1 or -rc2, but definitely saw it on -rc3.
>  > So I tried bisecting between -rc2 and -rc3, persisting for a day if it
>  > looked good; but the bisection didn't seem to be converging anywhere
>  > likely when -rc6 came out, and I switched to see if -rc6 solved it.
>  > 
>  > I had no problem with -rc6; but with -rc7 it happens more than ever.
>  > Though still not on "quick" resumes, the kind you want to do when
>  > bisecting.
> 
> so I can hit this pretty quickly with this..
> 
> while [ 1 ];
> do
>  rtcwake -s 5 -m no ; pm-suspend
> done
> 
> Which should make it bisectable at least..

Thanks, I'll give that a try this evening.

> 
>  > Sharing these anecdotes in case they match or diverge from your
>  > experience and others, and might help towards finding the cause.
>  > Not-to-be-trusted bisection log appended: of course the bads are
>  > reliable, but perhaps none of the goods.
> 
> I didn't have this laptop pre 3.8rc, so I've not had a 'good' reference,
> pretty sure I saw it during rc1 though, so I might have to go back further
> a version or two.
> 
> 	Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13  0:26 ` Hugh Dickins
  2013-02-13  0:40   ` Dave Jones
@ 2013-02-13  2:17   ` Dave Jones
  1 sibling, 0 replies; 50+ messages in thread
From: Dave Jones @ 2013-02-13  2:17 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, linux-kernel

On Tue, Feb 12, 2013 at 04:26:22PM -0800, Hugh Dickins wrote:
 
 > Me too on T420s, except that is blessed with a blinking CapsLk.
 > 
 > It's so erratic (though I think I see more failures than you do: I'd say
 > a quick resume never fails, but an overnight resume fails half the time):
 > I'm afraid I didn't have the patience to embark on pm_trace at all.
 > 
 > I did try to bisect it during the -rc5 week.  I'm not sure, but I have
 > no record of seeing it on -rc1 or -rc2, but definitely saw it on -rc3.
 > So I tried bisecting between -rc2 and -rc3, persisting for a day if it
 > looked good; but the bisection didn't seem to be converging anywhere
 > likely when -rc6 came out, and I switched to see if -rc6 solved it.
 > 
 > I had no problem with -rc6; but with -rc7 it happens more than ever.
 > Though still not on "quick" resumes, the kind you want to do when
 > bisecting.
 > 
 > Sharing these anecdotes in case they match or diverge from your
 > experience and others, and might help towards finding the cause.
 > Not-to-be-trusted bisection log appended: of course the bads are
 > reliable, but perhaps none of the goods.

Yeah, my first bisection results look totally different.
I can do 100 suspend/resumes fine on 3.6.0, but it fails after just 2-3
on 3.7.0, so it looks like the bug I'm chasing happened before whatever
is affecting you.  (Or maybe you just didn't get 'lucky' when testing rc1,
and it wasn't good after all).

Now that I have what appears to be a 'good' candidate, hopefully
a bisection can quickly get to the bottom of this.

	Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13  0:56     ` Hugh Dickins
@ 2013-02-13  4:16       ` Dave Jones
  2013-02-13  5:37         ` Hugh Dickins
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Jones @ 2013-02-13  4:16 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, linux-kernel

On Tue, Feb 12, 2013 at 04:56:55PM -0800, Hugh Dickins wrote:
 
 > > so I can hit this pretty quickly with this..
 > > 
 > > while [ 1 ];
 > > do
 > >  rtcwake -s 5 -m no ; pm-suspend
 > > done
 > > 
 > > Which should make it bisectable at least..
 > 
 > Thanks, I'll give that a try this evening.
 
So uh, put a 'sleep 5' in that loop.

Turns out that the firmware on my Thinkpad craps itself when it tries to suspend while
the cpu fan is still spinning back up, and eventually it gets in a state (on a 'good' boot)
where the fan is stuck in the 'off' state.

Result: hot laptop, machine check exceptions, and then a nice
"CPU FAN ERROR" message from the BIOS on next reboot.

Full marks Lenovo.

	Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13  4:16       ` Dave Jones
@ 2013-02-13  5:37         ` Hugh Dickins
  2013-02-13 19:34           ` Dave Jones
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2013-02-13  5:37 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linus Torvalds, linux-kernel

On Tue, 12 Feb 2013, Dave Jones wrote:
> On Tue, Feb 12, 2013 at 04:56:55PM -0800, Hugh Dickins wrote:
>  
>  > > so I can hit this pretty quickly with this..
>  > > 
>  > > while [ 1 ];
>  > > do
>  > >  rtcwake -s 5 -m no ; pm-suspend
>  > > done
>  > > 
>  > > Which should make it bisectable at least..
>  > 
>  > Thanks, I'll give that a try this evening.
>  
> So uh, put a 'sleep 5' in that loop.

I had already put exactly that, to help break out when I got bored.

I'm afraid it just confirms that we have different problems: that
script gives me no trouble.  Perhaps it would give me trouble if the
timeout went up from 5 seconds to 10 hours (it does not at 10 minutes),
but that doesn't feel like a very useful line of enquiry.

I think for the moment I'm going to hope that under the surface our
problems are related, and when your bisection points a finger, look
around to see if it suggests anything for my case.  Lazy, but...

Hugh

> 
> Turns out that the firmware on my Thinkpad craps itself when it tries to suspend while
> the cpu fan is still spinning back up, and eventually it gets in a state (on a 'good' boot)
> where the fan is stuck in the 'off' state.
> 
> Result: hot laptop, machine check exceptions, and then a nice
> "CPU FAN ERROR" message from the BIOS on next reboot.
> 
> Full marks Lenovo.
> 
> 	Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13  5:37         ` Hugh Dickins
@ 2013-02-13 19:34           ` Dave Jones
  2013-02-13 19:56             ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Jones @ 2013-02-13 19:34 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, linux-kernel

On Tue, Feb 12, 2013 at 09:37:27PM -0800, Hugh Dickins wrote:

 > I think for the moment I'm going to hope that under the surface our
 > problems are related, and when your bisection points a finger, look
 > around to see if it suggests anything for my case.  Lazy, but...

I have the worst luck with bisects.  Halfway through I started seeing
a *different* suspend failure, which led me down the wrong path
ultimately leading to 59a93c27c which was a red herring even though
it even talks about suspend (I had my hopes high when testing that one).

Started over, and ended up in a merge commit containing just a single
patch to slab (210ed9defff). Reverting that on top of 3.7-rc8 doesn't
fix the problem.  Backing up one level of bisect, the other alternative
path was 5b3900cd40, which also didn't help when reverting.

My test was a loop of 100 suspend/resume cycles before calling something
'good'. The 'bad' cases all failed within 10 cycles (usually 2-3).

	Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13 19:34           ` Dave Jones
@ 2013-02-13 19:56             ` Linus Torvalds
  2013-02-13 20:53               ` Dave Jones
  2013-02-15  1:15               ` Dave Jones
  0 siblings, 2 replies; 50+ messages in thread
From: Linus Torvalds @ 2013-02-13 19:56 UTC (permalink / raw)
  To: Dave Jones, Hugh Dickins, Linus Torvalds, Linux Kernel Mailing List

On Wed, Feb 13, 2013 at 11:34 AM, Dave Jones <davej@redhat.com> wrote:
>
> My test was a loop of 100 suspend/resume cycles before calling something
> 'good'. The 'bad' cases all failed within 10 cycles (usually 2-3).

Considering that you apparently already found one case where the BIOS
crapped out due to effectively unrelated timing details (ie timing
triggered a temperature issue that then triggered behavioral changes),
I wonder if your more occasional problem might not be a sign of
something similar.

But since you seem to be able to automate it well, maybe one thing to
try is to change the timing a bit while testing. Maybe some failures
were hidden by the timing just happening to work out.

Also, as I suspect you're aware: since the "bad" markings for "git
bisect" are presumably reliable (with the caveat that you need to
worry about the exact symptoms and not mix it up with some other
independent bug, as you already found out), you can usually speed up
repeated bisects by using the last bad information from the previous
bisect.

Note that there is only ever one "bad" commit - since all the commits
you test while bisecting are by definition reachable from the previous
bad one and both contain the bug, picking a bad commit makes all other
previous bad commits uninteresting. So you just need to look at the
last bad commit, not the whole set of bad commits. So when re-doing
the bisect, and if you trust that your bad kernels really were bad and
had the *right* badness, you can just start with "git bisect bad
<last-bad-commit>"

(good commits, on the other hand, are independent of each other: "not
containing the bug" is not some kind of exclusivity test, so finding
one good kernel doesn't make the information about other good kernels
irrelevant)

                 Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13 19:56             ` Linus Torvalds
@ 2013-02-13 20:53               ` Dave Jones
  2013-02-16 20:54                 ` Paul E. McKenney
  2013-02-15  1:15               ` Dave Jones
  1 sibling, 1 reply; 50+ messages in thread
From: Dave Jones @ 2013-02-13 20:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Hugh Dickins, Linux Kernel Mailing List

On Wed, Feb 13, 2013 at 11:56:25AM -0800, Linus Torvalds wrote:

 > Note that there is only ever one "bad" commit - since all the commits
 > you test while bisecting are by definition reachable from the previous
 > bad one and both contain the bug, picking a bad commit makes all other
 > previous bad commits uninteresting. So you just need to look at the
 > last bad commit, not the whole set of bad commits. So when re-doing
 > the bisect, and if you trust that your bad kernels really were bad and
 > had the *right* badness, you can just start with "git bisect bad
 > <last-bad-commit>"
 > 
 > (good commits, on the other hand, are independent of each other: "not
 > containing the bug" is not some kind of exclusivity test, so finding
 > one good kernel doesn't make the information about other good kernels
 > irrelevant)

I've tried working my way back up the bisect and then retesting some
of those 'good' branches, but every time I feel like I'm getting somewhere,
I get another result that makes no sense.

The latest for example ends like this..

# bad: [ddffeb8c4d0331609ef2581d84de4d763607bd37] Linux 3.7-rc1
git bisect bad ddffeb8c4d0331609ef2581d84de4d763607bd37
# good: [a0d271cbfed1dd50278c6b06bead3d00ba0a88f9] Linux 3.6
git bisect good a0d271cbfed1dd50278c6b06bead3d00ba0a88f9
# good: [24d7b40a60cf19008334bcbcbd98da374d4d9c64] ARM: OMAP2+: PM: MPU DVFS: use generic CPU device for MPU-SS
git bisect good 24d7b40a60cf19008334bcbcbd98da374d4d9c64
# good: [21c8715f0a1f4df8bfa2bd6f3915e5e33c1c2e6e] ARM: integrator: use __iomem pointers for MMIO, part 2
git bisect good 21c8715f0a1f4df8bfa2bd6f3915e5e33c1c2e6e
# good: [23d5385f382a7c7d8b6bf19b0c2cfb3acbb12d31] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect good 23d5385f382a7c7d8b6bf19b0c2cfb3acbb12d31
# good: [35e9a274fdc9c8feb763e4970a32d7089f51393c] Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
git bisect good 35e9a274fdc9c8feb763e4970a32d7089f51393c
# bad: [e225ca27057e70e7cbb14ae4c1e5f758973af645] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect bad e225ca27057e70e7cbb14ae4c1e5f758973af645
# good: [4f1cd91497774488ed16119ec3f54b3daf1561de] Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect good 4f1cd91497774488ed16119ec3f54b3daf1561de
# bad: [4e21fc138bfd7fe625ff5dc81541399aaf9d429b] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
git bisect bad 4e21fc138bfd7fe625ff5dc81541399aaf9d429b
# bad: [ccff9b1db693062b0a9c9070f4304deb47ef215c] Merge tag 'stable/for-linus-3.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
git bisect bad ccff9b1db693062b0a9c9070f4304deb47ef215c
# good: [5b3900cd409466c0070b234d941650685ad0c791] timekeeping: Cast raw_interval to u64 to avoid shift overflow
git bisect good 5b3900cd409466c0070b234d941650685ad0c791
# bad: [0588f1f934791b79d0a1e9b327be9b6eb361d2b8] Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 0588f1f934791b79d0a1e9b327be9b6eb361d2b8
# bad: [9d55ab71b735621a63c8923ba934d87879847cef] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 9d55ab71b735621a63c8923ba934d87879847cef
# bad: [d677124b1fc3b2706a13e92f1b329f1ac77e5f20] rcu: Advise most users not to enable RCU user mode
git bisect bad d677124b1fc3b2706a13e92f1b329f1ac77e5f20
# bad: [a4fbe35a124526e6759be07bd9c7ea796ba1e00d] rcu: Grace-period initialization excludes only RCU notifier
git bisect bad a4fbe35a124526e6759be07bd9c7ea796ba1e00d

Which points out something in tree-rcu.  Which I don't even have compiled in,
because I have CONFIG_PREEMPT set.

I'll keep trying some of the earlier good branches in case I missed something.

	Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13 19:56             ` Linus Torvalds
  2013-02-13 20:53               ` Dave Jones
@ 2013-02-15  1:15               ` Dave Jones
  2013-02-15  2:09                 ` Linus Torvalds
  2013-02-15  2:09                 ` Hugh Dickins
  1 sibling, 2 replies; 50+ messages in thread
From: Dave Jones @ 2013-02-15  1:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Hugh Dickins, Linux Kernel Mailing List, paul.mckenney

On Wed, Feb 13, 2013 at 11:56:25AM -0800, Linus Torvalds wrote:
 > On Wed, Feb 13, 2013 at 11:34 AM, Dave Jones <davej@redhat.com> wrote:
 > >
 > > My test was a loop of 100 suspend/resume cycles before calling something
 > > 'good'. The 'bad' cases all failed within 10 cycles (usually 2-3).
 > 
 > Considering that you apparently already found one case where the BIOS
 > crapped out due to effectively unrelated timing details (ie timing
 > triggered a temperature issue that then triggered behavioral changes),
 > I wonder if your more occasional problem might not be a sign of
 > something similar.
 > 
 > But since you seem to be able to automate it well, maybe one thing to
 > try is to change the timing a bit while testing. Maybe some failures
 > were hidden by the timing just happening to work out.

Given I never saw this on a Fedora kernel, just my self-built ones, I eventually
gave up on bisecting code, and switched to bisecting config options.
I should have started this way, as I figured it out within an hour.

3.7 merge window is when I started seeing this, and here's what got introduced
during that time..

commit e3ebfb96f396731ca2d0b108785d5da31b53ab00
Author: Paul E. McKenney <paul.mckenney@linaro.org>
Date:   Mon Jul 2 14:42:01 2012 -0700

    rcu: Add PROVE_RCU_DELAY to provoke difficult races

'difficult' is an understatement.  This explains why some of those 'good'
bisects survived 100 suspends on one day, and failed the next.

Unfortunatly, I don't think there's any sane way to retrieve whatever debug
info might be getting spewed.  Perhaps when I reinstall, and switch to booting EFI
I'll be able to use pstore, but on a bios-based boot, all hope seems lost.
No netconsole, no usb-serial, even crippling i915's suspend routine doesn't help.

I'll just disable this option for now.

	Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15  1:15               ` Dave Jones
@ 2013-02-15  2:09                 ` Linus Torvalds
  2013-02-15 15:50                   ` Ingo Molnar
  2013-02-15 17:44                   ` Paul E. McKenney
  2013-02-15  2:09                 ` Hugh Dickins
  1 sibling, 2 replies; 50+ messages in thread
From: Linus Torvalds @ 2013-02-15  2:09 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney

On Thu, Feb 14, 2013 at 5:15 PM, Dave Jones <davej@redhat.com> wrote:
>
> Given I never saw this on a Fedora kernel, just my self-built ones, I eventually
> gave up on bisecting code, and switched to bisecting config options.
> I should have started this way, as I figured it out within an hour.
>
> 3.7 merge window is when I started seeing this, and here's what got introduced
> during that time..
>
> commit e3ebfb96f396731ca2d0b108785d5da31b53ab00
> Author: Paul E. McKenney <paul.mckenney@linaro.org>
> Date:   Mon Jul 2 14:42:01 2012 -0700
>
>     rcu: Add PROVE_RCU_DELAY to provoke difficult races
>
> 'difficult' is an understatement.  This explains why some of those 'good'
> bisects survived 100 suspends on one day, and failed the next.
>
> Unfortunatly, I don't think there's any sane way to retrieve whatever debug
> info might be getting spewed.

Hmm. I have to say, that's a particularly unhelpful config option. It
may make races much easier to hit, but when you do hit them, what's
the symptoms of said race?

Paul? Apparently you end up with a dead machine at least during resume
and no oops. Which isn't very helpful. Maybe there is possibly some
BUG_ON() in the RCU code somewhere?

So Paul, if you know what the common symptoms of the bug that that
debug option helps trigger are, is there some way to make them less
lethal and still print out useful information?

                   Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15  1:15               ` Dave Jones
  2013-02-15  2:09                 ` Linus Torvalds
@ 2013-02-15  2:09                 ` Hugh Dickins
  2013-02-15  2:15                   ` Linus Torvalds
  1 sibling, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2013-02-15  2:09 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, paul.mckenney

On Thu, 14 Feb 2013, Dave Jones wrote:
> On Wed, Feb 13, 2013 at 11:56:25AM -0800, Linus Torvalds wrote:
>  > On Wed, Feb 13, 2013 at 11:34 AM, Dave Jones <davej@redhat.com> wrote:
>  > >
>  > > My test was a loop of 100 suspend/resume cycles before calling something
>  > > 'good'. The 'bad' cases all failed within 10 cycles (usually 2-3).
>  > 
>  > Considering that you apparently already found one case where the BIOS
>  > crapped out due to effectively unrelated timing details (ie timing
>  > triggered a temperature issue that then triggered behavioral changes),
>  > I wonder if your more occasional problem might not be a sign of
>  > something similar.
>  > 
>  > But since you seem to be able to automate it well, maybe one thing to
>  > try is to change the timing a bit while testing. Maybe some failures
>  > were hidden by the timing just happening to work out.
> 
> Given I never saw this on a Fedora kernel, just my self-built ones, I eventually
> gave up on bisecting code, and switched to bisecting config options.
> I should have started this way, as I figured it out within an hour.
> 
> 3.7 merge window is when I started seeing this, and here's what got introduced
> during that time..
> 
> commit e3ebfb96f396731ca2d0b108785d5da31b53ab00
> Author: Paul E. McKenney <paul.mckenney@linaro.org>
> Date:   Mon Jul 2 14:42:01 2012 -0700
> 
>     rcu: Add PROVE_RCU_DELAY to provoke difficult races
> 
> 'difficult' is an understatement.  This explains why some of those 'good'
> bisects survived 100 suspends on one day, and failed the next.
> 
> Unfortunatly, I don't think there's any sane way to retrieve whatever debug
> info might be getting spewed.  Perhaps when I reinstall, and switch to booting EFI
> I'll be able to use pstore, but on a bios-based boot, all hope seems lost.
> No netconsole, no usb-serial, even crippling i915's suspend routine doesn't help.
> 
> I'll just disable this option for now.

Which won't affect my case since I never enabled it.

Hugh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15  2:09                 ` Hugh Dickins
@ 2013-02-15  2:15                   ` Linus Torvalds
  2013-02-16 21:45                     ` Hugh Dickins
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2013-02-15  2:15 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, Linux Kernel Mailing List, Paul McKenney

On Thu, Feb 14, 2013 at 6:09 PM, Hugh Dickins <hughd@google.com> wrote:
>
> Which won't affect my case since I never enabled it.

Well, in theory, you may have the same bug Dave just made it easier to
trigger for himself with the forced config option.

In reality, your bug behavior differences were already big enough that
it sounded likely that they were two different things to start with..

              Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15  2:09                 ` Linus Torvalds
@ 2013-02-15 15:50                   ` Ingo Molnar
  2013-02-15 22:33                     ` Dave Jones
  2013-02-15 17:44                   ` Paul E. McKenney
  1 sibling, 1 reply; 50+ messages in thread
From: Ingo Molnar @ 2013-02-15 15:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Hugh Dickins, Linux Kernel Mailing List, Paul McKenney


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Paul? Apparently you end up with a dead machine at least 
> during resume and no oops. Which isn't very helpful. Maybe 
> there is possibly some BUG_ON() in the RCU code somewhere?

AFAICS there's no BUG_ON() or panic() in the RCU code, outside 
early init code allocation failure paths where we genuinely 
can't continue.

( Unless it _is_ one of those BUG_ON()s triggering during early 
  resume, reconstructing kernel threads, or the BUG_ON() comes 
  in indirectly via some other way ... )

Dave, one thing that might be worth trying is to disable 
CONFIG_BUG and see whether the system manages to limp through to 
get console output?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15  2:09                 ` Linus Torvalds
  2013-02-15 15:50                   ` Ingo Molnar
@ 2013-02-15 17:44                   ` Paul E. McKenney
  2013-02-15 18:34                     ` Linus Torvalds
  1 sibling, 1 reply; 50+ messages in thread
From: Paul E. McKenney @ 2013-02-15 17:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Hugh Dickins, Linux Kernel Mailing List, Paul McKenney

On Thu, Feb 14, 2013 at 06:09:44PM -0800, Linus Torvalds wrote:
> On Thu, Feb 14, 2013 at 5:15 PM, Dave Jones <davej@redhat.com> wrote:
> >
> > Given I never saw this on a Fedora kernel, just my self-built ones, I eventually
> > gave up on bisecting code, and switched to bisecting config options.
> > I should have started this way, as I figured it out within an hour.
> >
> > 3.7 merge window is when I started seeing this, and here's what got introduced
> > during that time..
> >
> > commit e3ebfb96f396731ca2d0b108785d5da31b53ab00
> > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > Date:   Mon Jul 2 14:42:01 2012 -0700
> >
> >     rcu: Add PROVE_RCU_DELAY to provoke difficult races
> >
> > 'difficult' is an understatement.  This explains why some of those 'good'
> > bisects survived 100 suspends on one day, and failed the next.
> >
> > Unfortunatly, I don't think there's any sane way to retrieve whatever debug
> > info might be getting spewed.
> 
> Hmm. I have to say, that's a particularly unhelpful config option. It
> may make races much easier to hit, but when you do hit them, what's
> the symptoms of said race?
> 
> Paul? Apparently you end up with a dead machine at least during resume
> and no oops. Which isn't very helpful. Maybe there is possibly some
> BUG_ON() in the RCU code somewhere?
> 
> So Paul, if you know what the common symptoms of the bug that that
> debug option helps trigger are, is there some way to make them less
> lethal and still print out useful information?

This commit was designed to increase the probability of hitting the
races described in http://lwn.net/Articles/453002/.  These races result
in deadlocks involving the runqueue lock (and perhaps also the priority
inheritance locks).  And yes, I most certainly should have described
this in the commit message.  :-(

So it looks like Dave is hitting some other race/bug than the one that
this commit was designed to expose.

I must confess that I don't know how to proceed without any meaningful
debug information here.  Feel free to revert the above commit if you would
like -- I can of course always maintain it locally for my own testing.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15 17:44                   ` Paul E. McKenney
@ 2013-02-15 18:34                     ` Linus Torvalds
  2013-02-15 18:35                       ` Linus Torvalds
  2013-02-17 15:11                       ` Frederic Weisbecker
  0 siblings, 2 replies; 50+ messages in thread
From: Linus Torvalds @ 2013-02-15 18:34 UTC (permalink / raw)
  To: Paul McKenney, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra
  Cc: Dave Jones, Hugh Dickins, Linux Kernel Mailing List, Paul McKenney

On Fri, Feb 15, 2013 at 9:44 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> This commit was designed to increase the probability of hitting the
> races described in http://lwn.net/Articles/453002/.  These races result
> in deadlocks involving the runqueue lock (and perhaps also the priority
> inheritance locks).  And yes, I most certainly should have described
> this in the commit message.  :-(

Ugh. That particular race seems to be because the softirq handling is
just crazy, and does the "wakeup_softirqd()" form interrupt context,
BUT HAS SPECIFICALLY BROKEN THE IRQ COUNTING!

Because it claims to do it from softirq context, which is pure
garbage. It's not actually in softirq context.

The whole hardirq -> softirq transition seems stupid. I'm sure I made
some serious mistake in cleaning it up, and there's probably some
missed tracepoint (or perhaps screwed-up lockdep annotation), but I
think the hardirq -> softirq preempt thing shoudl be done as an atomic
preempt downgrade, so that we never have a window of "uhhuh, another
interrupt can come in between and see us as being in neither). And the
wakeup_softirqd should be done without playing with preempt count at
all.

Something like this ENTIRELY UNTESTED patch.

Note: I doubt this patch affects Dave's issue at all, I just started
looking at that do_softirq code when I read your bug explanation.

Adding random people for kernel/softirq.c to the participants list.
Comments about the patch? Do note that it's entirely untested, so
consider it more a RFD than a real patch.. It looks like it adds a lot
of lines, but most of it is for comments and simplification of the
logic.

             Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15 18:34                     ` Linus Torvalds
@ 2013-02-15 18:35                       ` Linus Torvalds
  2013-02-15 18:50                         ` Linus Torvalds
  2013-02-17 15:11                       ` Frederic Weisbecker
  1 sibling, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2013-02-15 18:35 UTC (permalink / raw)
  To: Paul McKenney, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra
  Cc: Dave Jones, Hugh Dickins, Linux Kernel Mailing List, Paul McKenney

On Fri, Feb 15, 2013 at 10:34 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Something like this ENTIRELY UNTESTED patch.

.. and let's actually attach the patch this time.

                      Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15 18:35                       ` Linus Torvalds
@ 2013-02-15 18:50                         ` Linus Torvalds
  2013-02-16 19:25                           ` Paul E. McKenney
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2013-02-15 18:50 UTC (permalink / raw)
  To: Paul McKenney, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra
  Cc: Dave Jones, Hugh Dickins, Linux Kernel Mailing List, Paul McKenney

[-- Attachment #1: Type: text/plain, Size: 395 bytes --]

On Fri, Feb 15, 2013 at 10:35 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Feb 15, 2013 at 10:34 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Something like this ENTIRELY UNTESTED patch.
>
> .. and let's actually attach the patch this time.

That's odd. I got a "server error" from gmail. And again no attachment.

Maybe THIS time.

            Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 2259 bytes --]

 kernel/softirq.c | 60 ++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567babe789..3736b509cad5 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -320,20 +320,52 @@ void irq_enter(void)
 	__irq_enter();
 }
 
-static inline void invoke_softirq(void)
+#define preempt_value_in_interrupt(val) \
+	((val) & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK))
+
+/*
+ * Invoce softirq's from irq_exit().
+ *
+ * Return the preempt offset: either IRQ_EXIT_OFFSET (if we
+ * did nothing to the preemption count) or SOFTIRQ_OFFSET (in
+ * case we did softirq processing and changed the preemption
+ * count to account for that).
+ */
+static inline int invoke_softirq(void)
 {
-	if (!force_irqthreads) {
+	int offset = IRQ_EXIT_OFFSET;
+	int count = preempt_count() - offset;
+
+	/* Can we run softirq's at all? We migth be nesting interrupts */
+	if (preempt_value_in_interrupt(count))
+		return offset;
+
+	/* Are there any pending? */
+	if (!local_softirq_pending())
+		return offset;
+
+	/* Do we force irq threads? If so, just wake up the daemon */
+	if (force_irqthreads) {
+		wakeup_softirqd();
+		return offset;
+	}
+
+	/*
+	 * Ok, do actual softirq handling!
+	 *
+	 * This downgrades us from hardirq context to softirq context.
+	 */
+	offset = SOFTIRQ_OFFSET;
+	preempt_count() = count + offset;
+
+	trace_softirqs_off(__builtin_return_address(0));
 #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
-		__do_softirq();
+	__do_softirq();
 #else
-		do_softirq();
+	do_softirq();
 #endif
-	} else {
-		__local_bh_disable((unsigned long)__builtin_return_address(0),
-				SOFTIRQ_OFFSET);
-		wakeup_softirqd();
-		__local_bh_enable(SOFTIRQ_OFFSET);
-	}
+	trace_softirqs_on((unsigned long)__builtin_return_address(0));
+	return offset;
 }
 
 /*
@@ -341,11 +373,13 @@ static inline void invoke_softirq(void)
  */
 void irq_exit(void)
 {
+	int offset;
+
 	vtime_account_irq_exit(current);
 	trace_hardirq_exit();
-	sub_preempt_count(IRQ_EXIT_OFFSET);
-	if (!in_interrupt() && local_softirq_pending())
-		invoke_softirq();
+
+	offset = invoke_softirq();
+	sub_preempt_count(offset);
 
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15 15:50                   ` Ingo Molnar
@ 2013-02-15 22:33                     ` Dave Jones
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Jones @ 2013-02-15 22:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Hugh Dickins, Linux Kernel Mailing List, Paul McKenney

On Fri, Feb 15, 2013 at 04:50:28PM +0100, Ingo Molnar wrote:
 
 > Dave, one thing that might be worth trying is to disable 
 > CONFIG_BUG and see whether the system manages to limp through to 
 > get console output?
 
Didn't make any difference. Black screen, blinky power light.

	Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15 18:50                         ` Linus Torvalds
@ 2013-02-16 19:25                           ` Paul E. McKenney
  2013-02-16 19:46                             ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Paul E. McKenney @ 2013-02-16 19:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Dave Jones, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney

On Fri, Feb 15, 2013 at 10:50:31AM -0800, Linus Torvalds wrote:
> On Fri, Feb 15, 2013 at 10:35 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Fri, Feb 15, 2013 at 10:34 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> Something like this ENTIRELY UNTESTED patch.
> >
> > .. and let's actually attach the patch this time.
> 
> That's odd. I got a "server error" from gmail. And again no attachment.
> 
> Maybe THIS time.

Sorry for the delay in testing this, but there was a need to upgrade
my laptop, and bozo here figured "why not go to 64 bits while I am at
it?" -- and then proceeded to learn the hard way that it is necessary
to do "make mrproper" before doing a build in 64-bit mode.  :-/

The kernel build system's way of telling you this at the moment is:

arch/x86/built-in.o:(.rodata+0x4990): undefined reference to `stub32_sigaltstack'

Anyway, with this patch, I see CPU stall warnings when running rcutorture
as shown below.  This is not a hard failure:

[   32.553647] rcu-torture: rtc: ffffffff836a57a0 ver: 131 tfle: 0 rta: 131 rtaf: 0 rtf: 93 rtmbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 27 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=1000) barrier: 6/7:0

Note the "ver: 131", which means that rcutorture successfully did 131
grace periods before the failure.  It manages to offline a CPU subsequently,
but then appears to hang while onlining it.

							Thanx, Paul

	[   23.706051] INFO: rcu_preempt self-detected stall on CPU[   23.707053] INFO: rcu_preempt self-detected stall on CPU
	[   23.707053] 	1: (20976 ticks this GP) idle=883/140000000000001/0 drain=5 . timer not pending
	[   23.707053] 	 (t=21000 jiffies g=85 c=84 q=6756)
	[   23.707053] Pid: 606, comm: rcu_torture_rea Not tainted 3.8.0-rc7+ #2
	[   23.707053] Call Trace:
	[   23.707053]  <IRQ>  [<ffffffff810c12b2>] rcu_pending+0x192/0x7a0
	[   23.707053]  [<ffffffff810c590b>] rcu_check_callbacks+0xbb/0x220
	[   23.707053]  [<ffffffff8104c193>] update_process_times+0x43/0x80
	[   23.707053]  [<ffffffff8108f901>] tick_sched_handle.isra.8+0x31/0x40
	[   23.707053]  [<ffffffff8108f9f7>] tick_sched_timer+0x47/0x70
	[   23.707053]  [<ffffffff8106619c>] __run_hrtimer+0x7c/0x240
	[   23.707053]  [<ffffffff8108f9b0>] ? tick_nohz_handler+0xa0/0xa0
	[   23.707053]  [<ffffffff81089617>] ? ktime_get_update_offsets+0x47/0xd0
	[   23.707053]  [<ffffffff81066a37>] hrtimer_interrupt+0xf7/0x240
	[   23.707053]  [<ffffffff81025883>] smp_apic_timer_interrupt+0x63/0xa0
	[   23.707053]  [<ffffffff817b6e8a>] apic_timer_interrupt+0x6a/0x70
	[   23.707053]  <EOI>  [<ffffffff817af17d>] ? _raw_spin_unlock_irq+0xd/0x40
	[   23.707053]  [<ffffffff817ae2aa>] __schedule+0x68a/0x790
	[   23.707053]  [<ffffffff817ae4c4>] schedule+0x24/0x70
	[   23.707053]  [<ffffffff810c05b7>] rcu_torture_reader+0xd7/0x2d0
	[   23.707053]  [<ffffffff810c07b0>] ? rcu_torture_reader+0x2d0/0x2d0
	[   23.707053]  [<ffffffff810c04e0>] ? rcutorture_trace_dump+0x50/0x50
	[   23.707053]  [<ffffffff81061e6b>] kthread+0xbb/0xc0
	[   23.707053]  [<ffffffff817a0000>] ? timer_cpu_notify+0x1e9/0x300
	[   23.707053]  [<ffffffff81061db0>] ? flush_kthread_worker+0xa0/0xa0
	[   23.707053]  [<ffffffff817b626c>] ret_from_fork+0x7c/0xb0
	[   23.707053]  [<ffffffff81061db0>] ? flush_kthread_worker+0xa0/0xa0

	[   23.706051] 
	[   23.706051] 	0: (1 GPs behind) idle=429/140000000000001/0 drain=5 . timer not pending
	[   23.706051] 	 (t=21000 jiffies g=85 c=84 q=6756)
	[   23.706051] Pid: 603, comm: rcu_torture_rea Not tainted 3.8.0-rc7+ #2
	[   23.706051] Call Trace:
	[   23.706051]  <IRQ>  [<ffffffff810c12b2>] rcu_pending+0x192/0x7a0
	[   23.706051]  [<ffffffff810c590b>] rcu_check_callbacks+0xbb/0x220
	[   23.706051]  [<ffffffff8104c193>] update_process_times+0x43/0x80
	[   23.706051]  [<ffffffff8108f901>] tick_sched_handle.isra.8+0x31/0x40
	[   23.706051]  [<ffffffff8108f9f7>] tick_sched_timer+0x47/0x70
	[   23.706051]  [<ffffffff8106619c>] __run_hrtimer+0x7c/0x240
	[   23.706051]  [<ffffffff8108f9b0>] ? tick_nohz_handler+0xa0/0xa0
	[   23.706051]  [<ffffffff81089617>] ? ktime_get_update_offsets+0x47/0xd0
	[   23.706051]  [<ffffffff81066a37>] hrtimer_interrupt+0xf7/0x240
	[   23.706051]  [<ffffffff81025883>] smp_apic_timer_interrupt+0x63/0xa0
	[   23.706051]  [<ffffffff817b6e8a>] apic_timer_interrupt+0x6a/0x70
	[   23.706051]  <EOI>  [<ffffffff817b2ae0>] ? add_preempt_count+0x30/0x90
	[   23.706051]  [<ffffffff817adc5d>] __schedule+0x3d/0x790
	[   23.706051]  [<ffffffff817ae4c4>] schedule+0x24/0x70
	[   23.706051]  [<ffffffff810c05b7>] rcu_torture_reader+0xd7/0x2d0
	[   23.706051]  [<ffffffff810c07b0>] ? rcu_torture_reader+0x2d0/0x2d0
	[   23.706051]  [<ffffffff810c04e0>] ? rcutorture_trace_dump+0x50/0x50
	[   23.706051]  [<ffffffff81061e6b>] kthread+0xbb/0xc0
	[   23.706051]  [<ffffffff817a0000>] ? timer_cpu_notify+0x1e9/0x300
	[   23.706051]  [<ffffffff81061db0>] ? flush_kthread_worker+0xa0/0xa0
	[   23.706051]  [<ffffffff817b626c>] ret_from_fork+0x7c/0xb0
	[   23.706051]  [<ffffffff81061db0>] ? flush_kthread_worker+0xa0/0xa0


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-16 19:25                           ` Paul E. McKenney
@ 2013-02-16 19:46                             ` Linus Torvalds
  2013-02-16 21:07                               ` Paul E. McKenney
                                                 ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Linus Torvalds @ 2013-02-16 19:46 UTC (permalink / raw)
  To: Paul McKenney, H. Peter Anvin
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Dave Jones, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney

On Sat, Feb 16, 2013 at 11:25 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> Sorry for the delay in testing this, but there was a need to upgrade
> my laptop, and bozo here figured "why not go to 64 bits while I am at
> it?" -- and then proceeded to learn the hard way that it is necessary
> to do "make mrproper" before doing a build in 64-bit mode.  :-/

Hmm. Our object file dependency check includes checking that the
compiler options are the same, but that's only true for normal C
files. Some of the other rules do *not* test the full range of config
options, so in general, if you change architecture etc models, you do
indeed want to make sure that you do a "make distclean" (aka "make
mrproper") or something like "git clean -dqfx".

For a number of other files, we just depend on the normal make
timestamp logic, which means that "if the object file is newer than
the sources", we'll trust it. Which obviously doesn't work for cases
where the object file may have been generated under totally different
architecture rules..

(That said, what kind of old environment did you do this in?
stub32_sigaltstack was removed during the merge window, so I'm
assuming you applied my patch on top of plain 3.7 or something?)

> The kernel build system's way of telling you this at the moment is:
>
> arch/x86/built-in.o:(.rodata+0x4990): undefined reference to `stub32_sigaltstack'

Adding Peter Anvin to the people, just in case he sees what's wrong
with the system call stub generation that keeps excessively old object
files around. If it's easy to fix, it might be worth trying to make it
ok to switch from i386 to x86-64 and back in the same tree.

Peter? Not a big deal, but if you see something obvious, let's just
try to fix it, ok?

> Anyway, with this patch, I see CPU stall warnings when running rcutorture
> as shown below.  This is not a hard failure:

Yeah, there's something wrong with the patch, I didn't bother trying
to figure it out for now. It also causes a hard failure with lockdep
(or lock proving/debugging, I'm not sure which one triggered it) - and
it happens too early to even see anything on the screen.

So I'd like to make that "downgrade from hardirq to softirq" atomic,
and I think it would clean up the crazy code too (currently it does a
*lot* of back-and-forth on the preempt flags), but I clearly missed
some case where we used a wrapper or two to add some tracepoint or a
RCU scheduling point. And I'm not going to worry about it right now,
since I'm preparing to make v3.8 soon.

But if somebody spots the bug, holler.

                     Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-13 20:53               ` Dave Jones
@ 2013-02-16 20:54                 ` Paul E. McKenney
  0 siblings, 0 replies; 50+ messages in thread
From: Paul E. McKenney @ 2013-02-16 20:54 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Hugh Dickins, Linux Kernel Mailing List

On Wed, Feb 13, 2013 at 03:53:13PM -0500, Dave Jones wrote:
> On Wed, Feb 13, 2013 at 11:56:25AM -0800, Linus Torvalds wrote:

[ . . . ]

> Which points out something in tree-rcu.  Which I don't even have compiled in,
> because I have CONFIG_PREEMPT set.

FWIW, the code in kernel/rcutree.c is shared between CONFIG_TREE_RCU
and CONFIG_TREE_PREEMPT_RCU.  The differences between these two options
is mostly in kernel/rcutree_plugin.h under #ifdef, with some others in
include/linux/rcudate.h and kernel/rcutree.h.

							Thanx, Paul

> I'll keep trying some of the earlier good branches in case I missed something.
> 
> 	Dave
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-16 19:46                             ` Linus Torvalds
@ 2013-02-16 21:07                               ` Paul E. McKenney
  2013-02-16 21:53                               ` H. Peter Anvin
  2013-02-17 22:49                               ` H. Peter Anvin
  2 siblings, 0 replies; 50+ messages in thread
From: Paul E. McKenney @ 2013-02-16 21:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Dave Jones, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney

On Sat, Feb 16, 2013 at 11:46:59AM -0800, Linus Torvalds wrote:
> On Sat, Feb 16, 2013 at 11:25 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > Sorry for the delay in testing this, but there was a need to upgrade
> > my laptop, and bozo here figured "why not go to 64 bits while I am at
> > it?" -- and then proceeded to learn the hard way that it is necessary
> > to do "make mrproper" before doing a build in 64-bit mode.  :-/
> 
> Hmm. Our object file dependency check includes checking that the
> compiler options are the same, but that's only true for normal C
> files. Some of the other rules do *not* test the full range of config
> options, so in general, if you change architecture etc models, you do
> indeed want to make sure that you do a "make distclean" (aka "make
> mrproper") or something like "git clean -dqfx".
> 
> For a number of other files, we just depend on the normal make
> timestamp logic, which means that "if the object file is newer than
> the sources", we'll trust it. Which obviously doesn't work for cases
> where the object file may have been generated under totally different
> architecture rules..
> 
> (That said, what kind of old environment did you do this in?
> stub32_sigaltstack was removed during the merge window, so I'm
> assuming you applied my patch on top of plain 3.7 or something?)

This was in a git tree 3.7-rc7.  And stub32_sigaltstack is now gone,
but perhaps I did something stupid that made it persist.

Ah, the previous time I did a build directly out of this git tree
might well have been before 3.8-rc, so maybe the .o file was from before?

> > The kernel build system's way of telling you this at the moment is:
> >
> > arch/x86/built-in.o:(.rodata+0x4990): undefined reference to `stub32_sigaltstack'
> 
> Adding Peter Anvin to the people, just in case he sees what's wrong
> with the system call stub generation that keeps excessively old object
> files around. If it's easy to fix, it might be worth trying to make it
> ok to switch from i386 to x86-64 and back in the same tree.
> 
> Peter? Not a big deal, but if you see something obvious, let's just
> try to fix it, ok?
> 
> > Anyway, with this patch, I see CPU stall warnings when running rcutorture
> > as shown below.  This is not a hard failure:
> 
> Yeah, there's something wrong with the patch, I didn't bother trying
> to figure it out for now. It also causes a hard failure with lockdep
> (or lock proving/debugging, I'm not sure which one triggered it) - and
> it happens too early to even see anything on the screen.

Glad that it is not just me, then.  ;-)

> So I'd like to make that "downgrade from hardirq to softirq" atomic,
> and I think it would clean up the crazy code too (currently it does a
> *lot* of back-and-forth on the preempt flags), but I clearly missed
> some case where we used a wrapper or two to add some tracepoint or a
> RCU scheduling point. And I'm not going to worry about it right now,
> since I'm preparing to make v3.8 soon.
> 
> But if somebody spots the bug, holler.

I must confess that your patch looked OK to me...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15  2:15                   ` Linus Torvalds
@ 2013-02-16 21:45                     ` Hugh Dickins
  2013-02-16 23:02                       ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2013-02-16 21:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel Mailing List, Paul McKenney

On Thu, 14 Feb 2013, Linus Torvalds wrote:
> On Thu, Feb 14, 2013 at 6:09 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > Which won't affect my case since I never enabled it.
> 
> Well, in theory, you may have the same bug Dave just made it easier to
> trigger for himself with the forced config option.
> 
> In reality, your bug behavior differences were already big enough that
> it sounded likely that they were two different things to start with..

Yes.

I hacked around on your PM_TRACE set_magic_time() / read_magic_time()
yesterday, to save an oopsing core kernel ip there, instead of hashed
pm trace info (it makes sense in this case to invert your sequence,
putting the high order into years and the low order into minutes).

Rewarded last night by reboot to Feb 21 14:45:53 2006.  Which is
ffffffff812d60ed intel_choose_pipe_bpp_dither.isra.13+0x216/0x2d6

/home/hugh/3087X/drivers/gpu/drm/i915/intel_display.c:4159
	 * enable dithering as needed, but that costs bandwidth.  So choose
	 * the minimum value that expresses the full color range of the fb but
	 * also stays within the max display bpc discovered above.
	 */

	switch (fb->depth) {
ffffffff812d60e9:	48 8b 55 c0          	mov    -0x40(%rbp),%rdx
ffffffff812d60ed:	8b 02                	mov    (%rdx),%eax

(gcc chose to pass a pointer to fb->depth down to the function,
instead of fb itself, since that is the only use of it there.)

I expect that fb is NULL; but with an average of one failure to resume
per day, and ~26 bits of info per crash, this is not a fast procedure!

I notice that intel_pipe_set_base() allows for NULL fb,
so I'm currently running with the oops-in-rtc hackery, plus
-	switch (fb->depth) {
+	if (WARN_ON(!fb))
+		bpc = 8;
+	else switch (fb->depth) {

There's been a fair bit of change to intel_display.c since 3.7 (if
my 3.7 was indeed good), mainly splitting intel_ into haswell_ versus
ironlake_, but I've not yet spotted anything obvious; nor yet looked
to see where fb would originate from anyway.

Once I've got just a little more info out of it, I'll start another
thread addressed principally to the drm/gpu/i915 guys.

Anyway, nothing I've found yet is worth delaying v3.8.

Hugh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-16 19:46                             ` Linus Torvalds
  2013-02-16 21:07                               ` Paul E. McKenney
@ 2013-02-16 21:53                               ` H. Peter Anvin
  2013-02-17 22:49                               ` H. Peter Anvin
  2 siblings, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2013-02-16 21:53 UTC (permalink / raw)
  To: Linus Torvalds, Paul McKenney
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Dave Jones, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney

The syscall generation *should* make files with different names only, but I'll look.

Linus Torvalds <torvalds@linux-foundation.org> wrote:

>On Sat, Feb 16, 2013 at 11:25 AM, Paul E. McKenney
><paulmck@linux.vnet.ibm.com> wrote:
>>
>> Sorry for the delay in testing this, but there was a need to upgrade
>> my laptop, and bozo here figured "why not go to 64 bits while I am at
>> it?" -- and then proceeded to learn the hard way that it is necessary
>> to do "make mrproper" before doing a build in 64-bit mode.  :-/
>
>Hmm. Our object file dependency check includes checking that the
>compiler options are the same, but that's only true for normal C
>files. Some of the other rules do *not* test the full range of config
>options, so in general, if you change architecture etc models, you do
>indeed want to make sure that you do a "make distclean" (aka "make
>mrproper") or something like "git clean -dqfx".
>
>For a number of other files, we just depend on the normal make
>timestamp logic, which means that "if the object file is newer than
>the sources", we'll trust it. Which obviously doesn't work for cases
>where the object file may have been generated under totally different
>architecture rules..
>
>(That said, what kind of old environment did you do this in?
>stub32_sigaltstack was removed during the merge window, so I'm
>assuming you applied my patch on top of plain 3.7 or something?)
>
>> The kernel build system's way of telling you this at the moment is:
>>
>> arch/x86/built-in.o:(.rodata+0x4990): undefined reference to
>`stub32_sigaltstack'
>
>Adding Peter Anvin to the people, just in case he sees what's wrong
>with the system call stub generation that keeps excessively old object
>files around. If it's easy to fix, it might be worth trying to make it
>ok to switch from i386 to x86-64 and back in the same tree.
>
>Peter? Not a big deal, but if you see something obvious, let's just
>try to fix it, ok?
>
>> Anyway, with this patch, I see CPU stall warnings when running
>rcutorture
>> as shown below.  This is not a hard failure:
>
>Yeah, there's something wrong with the patch, I didn't bother trying
>to figure it out for now. It also causes a hard failure with lockdep
>(or lock proving/debugging, I'm not sure which one triggered it) - and
>it happens too early to even see anything on the screen.
>
>So I'd like to make that "downgrade from hardirq to softirq" atomic,
>and I think it would clean up the crazy code too (currently it does a
>*lot* of back-and-forth on the preempt flags), but I clearly missed
>some case where we used a wrapper or two to add some tracepoint or a
>RCU scheduling point. And I'm not going to worry about it right now,
>since I'm preparing to make v3.8 soon.
>
>But if somebody spots the bug, holler.
>
>                     Linus

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-16 21:45                     ` Hugh Dickins
@ 2013-02-16 23:02                       ` Linus Torvalds
  2013-02-17  0:01                         ` Hugh Dickins
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2013-02-16 23:02 UTC (permalink / raw)
  To: Hugh Dickins, Daniel Vetter, David Airlie
  Cc: Dave Jones, Linux Kernel Mailing List, Paul McKenney, DRI

On Sat, Feb 16, 2013 at 1:45 PM, Hugh Dickins <hughd@google.com> wrote:
>
> I hacked around on your PM_TRACE set_magic_time() / read_magic_time()
> yesterday, to save an oopsing core kernel ip there, instead of hashed
> pm trace info (it makes sense in this case to invert your sequence,
> putting the high order into years and the low order into minutes).

That sounds like a good idea in general. The PM_TRACE() thing was done
to figure out things that locked up the PCI bus etc, but encoding the
oopses during suspend sounds like a really good idea too.

Is your patch clean enough to just be made part of the standard
PM_TRACE infrastructure, or was it something really hacky and one-off?

> Rewarded last night by reboot to Feb 21 14:45:53 2006.  Which is
> ffffffff812d60ed intel_choose_pipe_bpp_dither.isra.13+0x216/0x2d6
>
> /home/hugh/3087X/drivers/gpu/drm/i915/intel_display.c:4159
>          * enable dithering as needed, but that costs bandwidth.  So choose
>          * the minimum value that expresses the full color range of the fb but
>          * also stays within the max display bpc discovered above.
>          */
>
>         switch (fb->depth) {
> ffffffff812d60e9:       48 8b 55 c0             mov    -0x40(%rbp),%rdx
> ffffffff812d60ed:       8b 02                   mov    (%rdx),%eax
>
> (gcc chose to pass a pointer to fb->depth down to the function,
> instead of fb itself, since that is the only use of it there.)
>
> I expect that fb is NULL; but with an average of one failure to resume
> per day, and ~26 bits of info per crash, this is not a fast procedure!
>
> I notice that intel_pipe_set_base() allows for NULL fb,
> so I'm currently running with the oops-in-rtc hackery, plus
> -       switch (fb->depth) {
> +       if (WARN_ON(!fb))
> +               bpc = 8;
> +       else switch (fb->depth) {
>
> There's been a fair bit of change to intel_display.c since 3.7 (if
> my 3.7 was indeed good), mainly splitting intel_ into haswell_ versus
> ironlake_, but I've not yet spotted anything obvious; nor yet looked
> to see where fb would originate from anyway.
>
> Once I've got just a little more info out of it, I'll start another
> thread addressed principally to the drm/gpu/i915 guys.

I think it's worth it to give them a heads-up already. So I've cc'd
the main suspects here..

Daniel, Dave - any comments about a NULL fb in
intel_choose_pipe_bpp_dither() during either suspend or resume? Some
googling shows this:

    https://bugzilla.redhat.com/show_bug.cgi?id=895123

which sounds remarkably similar, and is also during a suspend attempt
(but apparently Satish got a full oops out).. Some timing race with a
worker entry?

                        Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-16 23:02                       ` Linus Torvalds
@ 2013-02-17  0:01                         ` Hugh Dickins
  2013-02-17  2:21                           ` Hugh Dickins
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2013-02-17  0:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Vetter, David Airlie, Dave Jones,
	Linux Kernel Mailing List, Paul McKenney, DRI

On Sat, 16 Feb 2013, Linus Torvalds wrote:
> On Sat, Feb 16, 2013 at 1:45 PM, Hugh Dickins <hughd@google.com> wrote:
> >
> > I hacked around on your PM_TRACE set_magic_time() / read_magic_time()
> > yesterday, to save an oopsing core kernel ip there, instead of hashed
> > pm trace info (it makes sense in this case to invert your sequence,
> > putting the high order into years and the low order into minutes).
> 
> That sounds like a good idea in general. The PM_TRACE() thing was done
> to figure out things that locked up the PCI bus etc, but encoding the
> oopses during suspend sounds like a really good idea too.

Yes, it can and should be generalized, but needs more than I gave it.

> 
> Is your patch clean enough to just be made part of the standard
> PM_TRACE infrastructure, or was it something really hacky and one-off?

Not really hacky, but three reasons why it cannot be standard without
more work (I am supposing that it should not be tied to suspend/resume,
but could be used for any oops which gets hidden by graphic screen,
and also fails to reach /var/log/messages).

1. Because I usually have a version of KDB patched into my kernels
("forked" many years ago, never time to integrate with that subset
which eventually went into 2.6.??), it was easiest to implement as
a pair of KDB commands (needing keyboard input: I already knew I
could "reboot" from KDB in this state).  (Sidenote: using KDB can
prevent getting a trace into /var/log/messages: so I had tried
without it, but still failed to get one.)

2. I don't use initrd, have very little in modules, and a pared down
kernel: so for me it was not a serious restriction to limit to core
kernel addresses, which easily fitted within the limit; but we ought
to put thought into handling module addresses too.

3. Needs care on the interface: a debug config option presumably,
but perhaps also needs to tie in with panic_on_oops or something -
I don't like my date getting messed up by surprise, and there's no
point in messing it up unless there's reason to believe the machine
will very quickly be rebooted.  Since I had to open the lid to resume
to hit the oops, in this case we could rely on me quickly rebooting.

But I've appended what I had below, so it's on the record, and can
be taken further if (likely) someone gets there sooner than I do.

> 
> > Rewarded last night by reboot to Feb 21 14:45:53 2006.  Which is
> > ffffffff812d60ed intel_choose_pipe_bpp_dither.isra.13+0x216/0x2d6
> >
> > /home/hugh/3087X/drivers/gpu/drm/i915/intel_display.c:4159
> >          * enable dithering as needed, but that costs bandwidth.  So choose
> >          * the minimum value that expresses the full color range of the fb but
> >          * also stays within the max display bpc discovered above.
> >          */
> >
> >         switch (fb->depth) {
> > ffffffff812d60e9:       48 8b 55 c0             mov    -0x40(%rbp),%rdx
> > ffffffff812d60ed:       8b 02                   mov    (%rdx),%eax
> >
> > (gcc chose to pass a pointer to fb->depth down to the function,
> > instead of fb itself, since that is the only use of it there.)
> >
> > I expect that fb is NULL; but with an average of one failure to resume
> > per day, and ~26 bits of info per crash, this is not a fast procedure!
> >
> > I notice that intel_pipe_set_base() allows for NULL fb,
> > so I'm currently running with the oops-in-rtc hackery, plus
> > -       switch (fb->depth) {
> > +       if (WARN_ON(!fb))
> > +               bpc = 8;
> > +       else switch (fb->depth) {
> >
> > There's been a fair bit of change to intel_display.c since 3.7 (if
> > my 3.7 was indeed good), mainly splitting intel_ into haswell_ versus
> > ironlake_, but I've not yet spotted anything obvious; nor yet looked
> > to see where fb would originate from anyway.
> >
> > Once I've got just a little more info out of it, I'll start another
> > thread addressed principally to the drm/gpu/i915 guys.
> 
> I think it's worth it to give them a heads-up already. So I've cc'd
> the main suspects here..

Okay, thanks.

> 
> Daniel, Dave - any comments about a NULL fb in
> intel_choose_pipe_bpp_dither() during either suspend or resume? Some
> googling shows this:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=895123

Great, yes, I'm sure that's the same (though it says "suspend"
and I say "resume").

> 
> which sounds remarkably similar, and is also during a suspend attempt
> (but apparently Satish got a full oops out).. Some timing race with a
> worker entry?
> 
>                         Linus

#include <linux/rtc.h>
#include <asm/rtc.h>
/*
 * HughD adapted the following from drivers/base/power/trace.c:
 *
 * Copyright (C) 2006 Linus Torvalds
 *
 * Trace facility for suspend/resume problems, when none of the
 * devices may be working.
 *
 * Horrid, horrid, horrid.
 *
 * It turns out that the _only_ piece of hardware that actually
 * keeps its value across a hard boot (and, more importantly, the
 * POST init sequence) is literally the realtime clock.
 *
 * Never mind that an RTC chip has 114 bytes (and often a whole
 * other bank of an additional 128 bytes) of nice SRAM that is
 * _designed_ to keep data - the POST will clear it. So we literally
 * can just use the few bytes of actual time data, which means that
 * we're really limited.
 *
 * It means, for example, that we can't use the seconds at all
 * (since the time between the hang and the boot might be more
 * than a minute).
 *
 * There are the wday fields etc, but I wouldn't guarantee those
 * are dependable either. And if the date isn't valid, either the
 * hw or POST will do strange things.
 *
 * So we're left with:
 *  - year: 0-99
 *  - month: 0-11
 *  - day-of-month: 1-28
 *  - hour: 0-23
 *  - min: 0-59
 *
 * Giving us a total range of 0-48384000 (0x2e24800), ie less
 * than 26 bits of actual data we can save across reboots:
 * but that should be enough to save one core kernel address.
 */

#define LIMITING_VALUE (100*12*28*24*60)
static unsigned int saved_value;

static void set_magic_time(unsigned int n)
{
	// June 7th, 2006
	static struct rtc_time time = {
		.tm_sec = 0,
		.tm_min = 0,
		.tm_hour = 0,
		.tm_mday = 7,
		.tm_mon = 5,	// June - counting from zero
		.tm_year = 106,
		.tm_wday = 3,
		.tm_yday = 160,
		.tm_isdst = 1
	};

	time.tm_min = (n % 60);
	n /= 60;
	time.tm_hour = (n % 24);
	n /= 24;
	time.tm_mday = (n % 28) + 1;
	n /= 28;
	time.tm_mon = (n % 12);
	n /= 12;
	time.tm_year = (n % 100);
	set_rtc_time(&time);
}

static unsigned int __init read_magic_time(void)
{
	struct rtc_time time;
	unsigned int n;

	get_rtc_time(&time);
	n = time.tm_year;
	if (n >= 100)
		n -= 100;
	n *= 12;
	n += time.tm_mon;
	n *= 28;
	n += time.tm_mday - 1;
	n *= 24;
	n += time.tm_hour;
	n *= 60;
	n += time.tm_min;
	return n;
}

static int kdba_rtcw(int argc, const char **argv)
{
	unsigned long addr;

	if (!kdb_current_regs)
		return KDB_BADADDR;
	addr = kdb_current_regs->ip;
	if (addr < (unsigned long)_stext ||
	    addr > (unsigned long)_etext)
		return KDB_BADADDR;
	if (addr - (unsigned long)_stext >= LIMITING_VALUE)
		return KDB_BADADDR;
	saved_value = addr - (unsigned long)_stext;
	set_magic_time(saved_value);
	return 0;
}

static int kdba_rtcr(int argc, const char **argv)
{
	int diag = KDB_BADADDR;
	unsigned long addr;

	addr = saved_value + (unsigned long)_stext;
	if (addr < (unsigned long)_stext ||
	    addr > (unsigned long)_etext)
		return diag;
	kdb_printf("%*p %pB\n",
			2 * (int)sizeof(long), (void *)addr, (void *)addr);
	return 0;
}

void __init kdba_rtcinit(void)
{
	saved_value = read_magic_time();
	kdb_register("rtcw", kdba_rtcw, "", "Write ip to rtc", 0);
	kdb_register("rtcr", kdba_rtcr, "", "Read ip from rtc", 0);
}

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17  0:01                         ` Hugh Dickins
@ 2013-02-17  2:21                           ` Hugh Dickins
  2013-02-17 13:38                             ` Daniel Vetter
  0 siblings, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2013-02-17  2:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Vetter, David Airlie, Dave Jones,
	Linux Kernel Mailing List, Paul McKenney, DRI

On Sat, 16 Feb 2013, Hugh Dickins wrote:
> On Sat, 16 Feb 2013, Linus Torvalds wrote:
> > 
> > I think it's worth it to give them a heads-up already. So I've cc'd
> > the main suspects here..
> 
> Okay, thanks.
> 
> > 
> > Daniel, Dave - any comments about a NULL fb in
> > intel_choose_pipe_bpp_dither() during either suspend or resume? Some
> > googling shows this:
> > 
> >     https://bugzilla.redhat.com/show_bug.cgi?id=895123
> 
> Great, yes, I'm sure that's the same (though it says "suspend"
> and I say "resume").
> 
> > 
> > which sounds remarkably similar, and is also during a suspend attempt
> > (but apparently Satish got a full oops out).. Some timing race with a
> > worker entry?

Comparing Satish's backtrace and drivers/gpu/drm history, it's clear that
the oops comes from Daniel's 3.8-rc2 45e2b5f640b3 "drm/i915: force restore
on lid open", whose force_restore case now passes down crtc->base.fb.  But
I wouldn't have a clue why that's usually non-NULL but occasionally NULL:
your timing race with a worker entry, perhaps.

And 45e2b5f640b3 contains a fine history of going back and forth, so I
wouldn't want to play further with it out of ignorance - though tempted
to replace the "if (force_restore) {" by an interim safe-seeming
compromise of "if (force_restore && crtc->base.fb) {".

Hugh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17  2:21                           ` Hugh Dickins
@ 2013-02-17 13:38                             ` Daniel Vetter
  2013-02-17 14:54                               ` Daniel Vetter
  2013-02-17 16:31                               ` Hugh Dickins
  0 siblings, 2 replies; 50+ messages in thread
From: Daniel Vetter @ 2013-02-17 13:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, David Airlie, Dave Jones,
	Linux Kernel Mailing List, Paul McKenney, DRI

On Sun, Feb 17, 2013 at 3:21 AM, Hugh Dickins <hughd@google.com> wrote:
> On Sat, 16 Feb 2013, Hugh Dickins wrote:
>> On Sat, 16 Feb 2013, Linus Torvalds wrote:
>> >
>> > I think it's worth it to give them a heads-up already. So I've cc'd
>> > the main suspects here..
>>
>> Okay, thanks.
>>
>> >
>> > Daniel, Dave - any comments about a NULL fb in
>> > intel_choose_pipe_bpp_dither() during either suspend or resume? Some
>> > googling shows this:
>> >
>> >     https://bugzilla.redhat.com/show_bug.cgi?id=895123
>>
>> Great, yes, I'm sure that's the same (though it says "suspend"
>> and I say "resume").
>>
>> >
>> > which sounds remarkably similar, and is also during a suspend attempt
>> > (but apparently Satish got a full oops out).. Some timing race with a
>> > worker entry?
>
> Comparing Satish's backtrace and drivers/gpu/drm history, it's clear that
> the oops comes from Daniel's 3.8-rc2 45e2b5f640b3 "drm/i915: force restore
> on lid open", whose force_restore case now passes down crtc->base.fb.  But
> I wouldn't have a clue why that's usually non-NULL but occasionally NULL:
> your timing race with a worker entry, perhaps.
>
> And 45e2b5f640b3 contains a fine history of going back and forth, so I
> wouldn't want to play further with it out of ignorance - though tempted
> to replace the "if (force_restore) {" by an interim safe-seeming
> compromise of "if (force_restore && crtc->base.fb) {".

Two things to try while I try to grow a clue on what exactly is going on:

1. Related to new ACPI sleep states we've recently made the lid_notifier
locking more sound, maybe that helps:

http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queued&id=b8efb17b3d687695b81485f606fc4e6c35a50f9a

2. The new i915 force restore code is indeed missing a safety check
compared to the old crtc helper based one:

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 6eb3882..095094c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9153,7 +9153,13 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
 
 	if (force_restore) {
 		for_each_pipe(pipe) {
-			intel_crtc_restore_mode(dev_priv->pipe_to_crtc_mapping[pipe]);
+			struct drm_crtc * crtc =
+				dev_priv->pipe_to_crtc_mapping[pipe];
+
+			if (!crtc->enabled)
+				continue;
+
+			intel_crtc_restore_mode(crtc);
 		}
 
 		i915_redisable_vga(dev);

The issue is that we save a pointer to the fb (since those are refcounted)
but copy the mode into the crtc struct (since modes are not refcounted).
So on restore the mode will always be non-NULL, which is wrong if the crtc
is off (and so also has a NULL fb).

The problem I have with that patch is figuring out how this ever worked. I
think I need more coffee ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 13:38                             ` Daniel Vetter
@ 2013-02-17 14:54                               ` Daniel Vetter
  2013-02-17 16:31                               ` Hugh Dickins
  1 sibling, 0 replies; 50+ messages in thread
From: Daniel Vetter @ 2013-02-17 14:54 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, David Airlie, Dave Jones,
	Linux Kernel Mailing List, Paul McKenney, DRI

On Sun, Feb 17, 2013 at 2:38 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> 2. The new i915 force restore code is indeed missing a safety check
> compared to the old crtc helper based one:
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 6eb3882..095094c 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9153,7 +9153,13 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
>
>         if (force_restore) {
>                 for_each_pipe(pipe) {
> -                       intel_crtc_restore_mode(dev_priv->pipe_to_crtc_mapping[pipe]);
> +                       struct drm_crtc * crtc =
> +                               dev_priv->pipe_to_crtc_mapping[pipe];
> +
> +                       if (!crtc->enabled)
> +                               continue;
> +
> +                       intel_crtc_restore_mode(crtc);
>                 }
>
>                 i915_redisable_vga(dev);
>
> The issue is that we save a pointer to the fb (since those are refcounted)
> but copy the mode into the crtc struct (since modes are not refcounted).
> So on restore the mode will always be non-NULL, which is wrong if the crtc
> is off (and so also has a NULL fb).
>
> The problem I have with that patch is figuring out how this ever worked. I
> think I need more coffee ;-)

Ok, coffee seems to be working now. I think the above diff shouldn't
change anything, since we already have a crtc->enabled check in
intel_modeset_affected_pipes in intel_display.c. Still would be good
if you can prove this one way or the other.

For those wondering why this check is buried this deeply: We're in the
middle of a massive rework of our modeset code, moving from
one-crtc-at-a-time to global modeset. We need that to implement some
fancy features like fastboot or better handling of global resource
constraints (shared clocks, bw limits, ...). In the new world we set
up the desired state in staging pointers/data structures. Then the
modeset code diffs that with the current state and computes the best
way to do the transition. But since we're still converting code some
pieces pass in the new state explicitly, but lower levels then ignore
some pieces when not required to reach the desired state.

The new lid restore code relies on that by updating the tracked hw
state from the real hw one and restoring the last desired state (which
is still around from the last modeset call).

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-15 18:34                     ` Linus Torvalds
  2013-02-15 18:35                       ` Linus Torvalds
@ 2013-02-17 15:11                       ` Frederic Weisbecker
  2013-02-17 17:32                         ` Linus Torvalds
  1 sibling, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2013-02-17 15:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

2013/2/15 Linus Torvalds <torvalds@linux-foundation.org>:
> On Fri, Feb 15, 2013 at 9:44 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
>>
>> This commit was designed to increase the probability of hitting the
>> races described in http://lwn.net/Articles/453002/.  These races result
>> in deadlocks involving the runqueue lock (and perhaps also the priority
>> inheritance locks).  And yes, I most certainly should have described
>> this in the commit message.  :-(
>
> Ugh. That particular race seems to be because the softirq handling is
> just crazy, and does the "wakeup_softirqd()" form interrupt context,
> BUT HAS SPECIFICALLY BROKEN THE IRQ COUNTING!
>
> Because it claims to do it from softirq context, which is pure
> garbage. It's not actually in softirq context.
>
> The whole hardirq -> softirq transition seems stupid. I'm sure I made
> some serious mistake in cleaning it up, and there's probably some
> missed tracepoint (or perhaps screwed-up lockdep annotation), but I
> think the hardirq -> softirq preempt thing shoudl be done as an atomic
> preempt downgrade, so that we never have a window of "uhhuh, another
> interrupt can come in between and see us as being in neither). And the
> wakeup_softirqd should be done without playing with preempt count at
> all.
>
> Something like this ENTIRELY UNTESTED patch.
>
> Note: I doubt this patch affects Dave's issue at all, I just started
> looking at that do_softirq code when I read your bug explanation.
>
> Adding random people for kernel/softirq.c to the participants list.
> Comments about the patch? Do note that it's entirely untested, so
> consider it more a RFD than a real patch.. It looks like it adds a lot
> of lines, but most of it is for comments and simplification of the
> logic.

preempt_value_in_interrupt() looks buggy in your patch: it makes
invoke_softirq() returning if (val & HARDIRQ_MASK). But that's always
true since you have moved further the sub_preempt_count(IRQ_EXIT)
further.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 13:38                             ` Daniel Vetter
  2013-02-17 14:54                               ` Daniel Vetter
@ 2013-02-17 16:31                               ` Hugh Dickins
  2013-02-17 17:28                                   ` Daniel Vetter
  1 sibling, 1 reply; 50+ messages in thread
From: Hugh Dickins @ 2013-02-17 16:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Linus Torvalds, David Airlie, Dave Jones,
	Linux Kernel Mailing List, Paul McKenney, DRI

On Sun, 17 Feb 2013, Daniel Vetter wrote:
> On Sun, Feb 17, 2013 at 3:21 AM, Hugh Dickins <hughd@google.com> wrote:
> > On Sat, 16 Feb 2013, Hugh Dickins wrote:
> >> On Sat, 16 Feb 2013, Linus Torvalds wrote:
> >> >
> >> > I think it's worth it to give them a heads-up already. So I've cc'd
> >> > the main suspects here..
> >>
> >> Okay, thanks.
> >>
> >> >
> >> > Daniel, Dave - any comments about a NULL fb in
> >> > intel_choose_pipe_bpp_dither() during either suspend or resume? Some
> >> > googling shows this:
> >> >
> >> >     https://bugzilla.redhat.com/show_bug.cgi?id=895123
> >>
> >> Great, yes, I'm sure that's the same (though it says "suspend"
> >> and I say "resume").
> >>
> >> >
> >> > which sounds remarkably similar, and is also during a suspend attempt
> >> > (but apparently Satish got a full oops out).. Some timing race with a
> >> > worker entry?
> >
> > Comparing Satish's backtrace and drivers/gpu/drm history, it's clear that
> > the oops comes from Daniel's 3.8-rc2 45e2b5f640b3 "drm/i915: force restore
> > on lid open", whose force_restore case now passes down crtc->base.fb.  But
> > I wouldn't have a clue why that's usually non-NULL but occasionally NULL:
> > your timing race with a worker entry, perhaps.
> >
> > And 45e2b5f640b3 contains a fine history of going back and forth, so I
> > wouldn't want to play further with it out of ignorance - though tempted
> > to replace the "if (force_restore) {" by an interim safe-seeming
> > compromise of "if (force_restore && crtc->base.fb) {".

My suggestion there in the line above was stupidly wrong :(

> 
> Two things to try while I try to grow a clue on what exactly is going on:

Thank you.

By the way, I hope you've looked back through this thread, and realize
that Dave and I both had ThinkPad T4?0s suspend/resume display problems,
but they've gone off in different directions: so a lot of the discussion
comes from Dave having CONFIG_PROVE_RCU_DELAY, and has nothing to do with
what we now know to be my oops in i915/intel_display.c.

> 
> 1. Related to new ACPI sleep states we've recently made the lid_notifier
> locking more sound, maybe that helps:
> 
> http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queued&id=b8efb17b3d687695b81485f606fc4e6c35a50f9a

Looks like it may be relevant, but I'll ignore it for now:
preferring first to test the more minimal safety you suggest below.

> 
> 2. The new i915 force restore code is indeed missing a safety check
> compared to the old crtc helper based one:
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 6eb3882..095094c 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9153,7 +9153,13 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
>  
>  	if (force_restore) {
>  		for_each_pipe(pipe) {
> -			intel_crtc_restore_mode(dev_priv->pipe_to_crtc_mapping[pipe]);
> +			struct drm_crtc * crtc =
> +				dev_priv->pipe_to_crtc_mapping[pipe];
> +
> +			if (!crtc->enabled)
> +				continue;
> +
> +			intel_crtc_restore_mode(crtc);
>  		}
>  
>  		i915_redisable_vga(dev);

I see your followup, where you observe that intel_modeset_affected_pipes()
should already have made this check; but you do say it would still be good
to prove one way or the other, so I'll run from now with the patch below.

Note that we haven't got to worrying about what will be in 3.9 yet
(linux-next tells me to expect hangcheck timer problems): it's Linus's
current git for 3.8 final that we're working on at present.

And since quick resumes have always worked for me, it's only occasionally
that a long suspend (e.g. overnight) fails for me in this way, so I won't
be able to report success for several days - but failure may come sooner.

And, it being very tiresome to debug when it does fail, I have inserted
WARN_ONs and more safety: here's what I'm running with now.

--- 3.8-rc7/drivers/gpu/drm/i915/intel_display.c	2013-01-17 20:06:11.384002362 -0800
+++ linux/drivers/gpu/drm/i915/intel_display.c	2013-02-17 07:50:28.012075150 -0800
@@ -4156,7 +4156,9 @@ static bool intel_choose_pipe_bpp_dither
 	 * also stays within the max display bpc discovered above.
 	 */
 
-	switch (fb->depth) {
+	if (WARN_ON(!fb))
+		bpc = 8;
+	else switch (fb->depth) {
 	case 8:
 		bpc = 8; /* since we go through a colormap */
 		break;
@@ -9302,6 +9304,10 @@ void intel_modeset_setup_hw_state(struct
 	if (force_restore) {
 		for_each_pipe(pipe) {
 			crtc = to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]);
+			if (WARN_ON(!crtc->base.enabled))
+				continue;
+			if (WARN_ON(!crtc->base.fb))
+				continue;
 			intel_set_mode(&crtc->base, &crtc->base.mode,
 				       crtc->base.x, crtc->base.y, crtc->base.fb);
 		}

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 16:31                               ` Hugh Dickins
@ 2013-02-17 17:28                                   ` Daniel Vetter
  0 siblings, 0 replies; 50+ messages in thread
From: Daniel Vetter @ 2013-02-17 17:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, David Airlie, Dave Jones,
	Linux Kernel Mailing List, Paul McKenney, DRI

On Sun, Feb 17, 2013 at 5:31 PM, Hugh Dickins <hughd@google.com> wrote:
> On Sun, 17 Feb 2013, Daniel Vetter wrote:
>> On Sun, Feb 17, 2013 at 3:21 AM, Hugh Dickins <hughd@google.com> wrote:
>> > On Sat, 16 Feb 2013, Hugh Dickins wrote:
>> >> On Sat, 16 Feb 2013, Linus Torvalds wrote:
>> >> >
>> >> > I think it's worth it to give them a heads-up already. So I've cc'd
>> >> > the main suspects here..
>> >>
>> >> Okay, thanks.
>> >>
>> >> >
>> >> > Daniel, Dave - any comments about a NULL fb in
>> >> > intel_choose_pipe_bpp_dither() during either suspend or resume? Some
>> >> > googling shows this:
>> >> >
>> >> >     https://bugzilla.redhat.com/show_bug.cgi?id=895123
>> >>
>> >> Great, yes, I'm sure that's the same (though it says "suspend"
>> >> and I say "resume").
>> >>
>> >> >
>> >> > which sounds remarkably similar, and is also during a suspend attempt
>> >> > (but apparently Satish got a full oops out).. Some timing race with a
>> >> > worker entry?
>> >
>> > Comparing Satish's backtrace and drivers/gpu/drm history, it's clear that
>> > the oops comes from Daniel's 3.8-rc2 45e2b5f640b3 "drm/i915: force restore
>> > on lid open", whose force_restore case now passes down crtc->base.fb.  But
>> > I wouldn't have a clue why that's usually non-NULL but occasionally NULL:
>> > your timing race with a worker entry, perhaps.
>> >
>> > And 45e2b5f640b3 contains a fine history of going back and forth, so I
>> > wouldn't want to play further with it out of ignorance - though tempted
>> > to replace the "if (force_restore) {" by an interim safe-seeming
>> > compromise of "if (force_restore && crtc->base.fb) {".
>
> My suggestion there in the line above was stupidly wrong :(
>
>>
>> Two things to try while I try to grow a clue on what exactly is going on:
>
> Thank you.
>
> By the way, I hope you've looked back through this thread, and realize
> that Dave and I both had ThinkPad T4?0s suspend/resume display problems,
> but they've gone off in different directions: so a lot of the discussion
> comes from Dave having CONFIG_PROVE_RCU_DELAY, and has nothing to do with
> what we now know to be my oops in i915/intel_display.c.

Oh, I haven't read the earlier parts of the thread, but agree that
it's a completely different bug: Different chipset (this matters
massively for gpus usually), Dave's issue happens on -rc1 (which
doesn't contain the offending lid_notifier patch yet) and seems to die
someplace completely else than your box.

>> 1. Related to new ACPI sleep states we've recently made the lid_notifier
>> locking more sound, maybe that helps:
>>
>> http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queued&id=b8efb17b3d687695b81485f606fc4e6c35a50f9a
>
> Looks like it may be relevant, but I'll ignore it for now:
> preferring first to test the more minimal safety you suggest below.
>
>>
>> 2. The new i915 force restore code is indeed missing a safety check
>> compared to the old crtc helper based one:
>>
>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>> index 6eb3882..095094c 100644
>> --- a/drivers/gpu/drm/i915/intel_display.c
>> +++ b/drivers/gpu/drm/i915/intel_display.c
>> @@ -9153,7 +9153,13 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
>>
>>       if (force_restore) {
>>               for_each_pipe(pipe) {
>> -                     intel_crtc_restore_mode(dev_priv->pipe_to_crtc_mapping[pipe]);
>> +                     struct drm_crtc * crtc =
>> +                             dev_priv->pipe_to_crtc_mapping[pipe];
>> +
>> +                     if (!crtc->enabled)
>> +                             continue;
>> +
>> +                     intel_crtc_restore_mode(crtc);
>>               }
>>
>>               i915_redisable_vga(dev);
>
> I see your followup, where you observe that intel_modeset_affected_pipes()
> should already have made this check; but you do say it would still be good
> to prove one way or the other, so I'll run from now with the patch below.
>
> Note that we haven't got to worrying about what will be in 3.9 yet
> (linux-next tells me to expect hangcheck timer problems): it's Linus's
> current git for 3.8 final that we're working on at present.

Right, patch was on top of -next, but there shouldn't be any
(functional) differences in this area compared to 3.8. The first part
of the big rework landed in 3.7 and contained the crtc->enabled check
from day one.

For the hangcheck issue in -next, that might be a new one. If you
catch it again, can you please grab the i915_error_state from debugfs
and throw it over to me? That should be enough for basic analysis.

> And since quick resumes have always worked for me, it's only occasionally
> that a long suspend (e.g. overnight) fails for me in this way, so I won't
> be able to report success for several days - but failure may come sooner.
>
> And, it being very tiresome to debug when it does fail, I have inserted
> WARN_ONs and more safety: here's what I'm running with now.

Yep, that should be interesting once it catches something. For more
debug noise can you set drm.debug=0xe (should work even when setting
it in /sys/modules/drm/parameters at runtime). That spills tons of
stuff into dmesg which will come handy in reconstructing how things
fell apart. Presuming your machines survives a bad resume and you can
grab dmesg, ofc.

Thanks, Daniel

> --- 3.8-rc7/drivers/gpu/drm/i915/intel_display.c        2013-01-17 20:06:11.384002362 -0800
> +++ linux/drivers/gpu/drm/i915/intel_display.c  2013-02-17 07:50:28.012075150 -0800
> @@ -4156,7 +4156,9 @@ static bool intel_choose_pipe_bpp_dither
>          * also stays within the max display bpc discovered above.
>          */
>
> -       switch (fb->depth) {
> +       if (WARN_ON(!fb))
> +               bpc = 8;
> +       else switch (fb->depth) {
>         case 8:
>                 bpc = 8; /* since we go through a colormap */
>                 break;
> @@ -9302,6 +9304,10 @@ void intel_modeset_setup_hw_state(struct
>         if (force_restore) {
>                 for_each_pipe(pipe) {
>                         crtc = to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]);
> +                       if (WARN_ON(!crtc->base.enabled))
> +                               continue;
> +                       if (WARN_ON(!crtc->base.fb))
> +                               continue;
>                         intel_set_mode(&crtc->base, &crtc->base.mode,
>                                        crtc->base.x, crtc->base.y, crtc->base.fb);
>                 }



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
@ 2013-02-17 17:28                                   ` Daniel Vetter
  0 siblings, 0 replies; 50+ messages in thread
From: Daniel Vetter @ 2013-02-17 17:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linux Kernel Mailing List, DRI, Paul McKenney, Dave Jones,
	Linus Torvalds

On Sun, Feb 17, 2013 at 5:31 PM, Hugh Dickins <hughd@google.com> wrote:
> On Sun, 17 Feb 2013, Daniel Vetter wrote:
>> On Sun, Feb 17, 2013 at 3:21 AM, Hugh Dickins <hughd@google.com> wrote:
>> > On Sat, 16 Feb 2013, Hugh Dickins wrote:
>> >> On Sat, 16 Feb 2013, Linus Torvalds wrote:
>> >> >
>> >> > I think it's worth it to give them a heads-up already. So I've cc'd
>> >> > the main suspects here..
>> >>
>> >> Okay, thanks.
>> >>
>> >> >
>> >> > Daniel, Dave - any comments about a NULL fb in
>> >> > intel_choose_pipe_bpp_dither() during either suspend or resume? Some
>> >> > googling shows this:
>> >> >
>> >> >     https://bugzilla.redhat.com/show_bug.cgi?id=895123
>> >>
>> >> Great, yes, I'm sure that's the same (though it says "suspend"
>> >> and I say "resume").
>> >>
>> >> >
>> >> > which sounds remarkably similar, and is also during a suspend attempt
>> >> > (but apparently Satish got a full oops out).. Some timing race with a
>> >> > worker entry?
>> >
>> > Comparing Satish's backtrace and drivers/gpu/drm history, it's clear that
>> > the oops comes from Daniel's 3.8-rc2 45e2b5f640b3 "drm/i915: force restore
>> > on lid open", whose force_restore case now passes down crtc->base.fb.  But
>> > I wouldn't have a clue why that's usually non-NULL but occasionally NULL:
>> > your timing race with a worker entry, perhaps.
>> >
>> > And 45e2b5f640b3 contains a fine history of going back and forth, so I
>> > wouldn't want to play further with it out of ignorance - though tempted
>> > to replace the "if (force_restore) {" by an interim safe-seeming
>> > compromise of "if (force_restore && crtc->base.fb) {".
>
> My suggestion there in the line above was stupidly wrong :(
>
>>
>> Two things to try while I try to grow a clue on what exactly is going on:
>
> Thank you.
>
> By the way, I hope you've looked back through this thread, and realize
> that Dave and I both had ThinkPad T4?0s suspend/resume display problems,
> but they've gone off in different directions: so a lot of the discussion
> comes from Dave having CONFIG_PROVE_RCU_DELAY, and has nothing to do with
> what we now know to be my oops in i915/intel_display.c.

Oh, I haven't read the earlier parts of the thread, but agree that
it's a completely different bug: Different chipset (this matters
massively for gpus usually), Dave's issue happens on -rc1 (which
doesn't contain the offending lid_notifier patch yet) and seems to die
someplace completely else than your box.

>> 1. Related to new ACPI sleep states we've recently made the lid_notifier
>> locking more sound, maybe that helps:
>>
>> http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queued&id=b8efb17b3d687695b81485f606fc4e6c35a50f9a
>
> Looks like it may be relevant, but I'll ignore it for now:
> preferring first to test the more minimal safety you suggest below.
>
>>
>> 2. The new i915 force restore code is indeed missing a safety check
>> compared to the old crtc helper based one:
>>
>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>> index 6eb3882..095094c 100644
>> --- a/drivers/gpu/drm/i915/intel_display.c
>> +++ b/drivers/gpu/drm/i915/intel_display.c
>> @@ -9153,7 +9153,13 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
>>
>>       if (force_restore) {
>>               for_each_pipe(pipe) {
>> -                     intel_crtc_restore_mode(dev_priv->pipe_to_crtc_mapping[pipe]);
>> +                     struct drm_crtc * crtc =
>> +                             dev_priv->pipe_to_crtc_mapping[pipe];
>> +
>> +                     if (!crtc->enabled)
>> +                             continue;
>> +
>> +                     intel_crtc_restore_mode(crtc);
>>               }
>>
>>               i915_redisable_vga(dev);
>
> I see your followup, where you observe that intel_modeset_affected_pipes()
> should already have made this check; but you do say it would still be good
> to prove one way or the other, so I'll run from now with the patch below.
>
> Note that we haven't got to worrying about what will be in 3.9 yet
> (linux-next tells me to expect hangcheck timer problems): it's Linus's
> current git for 3.8 final that we're working on at present.

Right, patch was on top of -next, but there shouldn't be any
(functional) differences in this area compared to 3.8. The first part
of the big rework landed in 3.7 and contained the crtc->enabled check
from day one.

For the hangcheck issue in -next, that might be a new one. If you
catch it again, can you please grab the i915_error_state from debugfs
and throw it over to me? That should be enough for basic analysis.

> And since quick resumes have always worked for me, it's only occasionally
> that a long suspend (e.g. overnight) fails for me in this way, so I won't
> be able to report success for several days - but failure may come sooner.
>
> And, it being very tiresome to debug when it does fail, I have inserted
> WARN_ONs and more safety: here's what I'm running with now.

Yep, that should be interesting once it catches something. For more
debug noise can you set drm.debug=0xe (should work even when setting
it in /sys/modules/drm/parameters at runtime). That spills tons of
stuff into dmesg which will come handy in reconstructing how things
fell apart. Presuming your machines survives a bad resume and you can
grab dmesg, ofc.

Thanks, Daniel

> --- 3.8-rc7/drivers/gpu/drm/i915/intel_display.c        2013-01-17 20:06:11.384002362 -0800
> +++ linux/drivers/gpu/drm/i915/intel_display.c  2013-02-17 07:50:28.012075150 -0800
> @@ -4156,7 +4156,9 @@ static bool intel_choose_pipe_bpp_dither
>          * also stays within the max display bpc discovered above.
>          */
>
> -       switch (fb->depth) {
> +       if (WARN_ON(!fb))
> +               bpc = 8;
> +       else switch (fb->depth) {
>         case 8:
>                 bpc = 8; /* since we go through a colormap */
>                 break;
> @@ -9302,6 +9304,10 @@ void intel_modeset_setup_hw_state(struct
>         if (force_restore) {
>                 for_each_pipe(pipe) {
>                         crtc = to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]);
> +                       if (WARN_ON(!crtc->base.enabled))
> +                               continue;
> +                       if (WARN_ON(!crtc->base.fb))
> +                               continue;
>                         intel_set_mode(&crtc->base, &crtc->base.mode,
>                                        crtc->base.x, crtc->base.y, crtc->base.fb);
>                 }



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 15:11                       ` Frederic Weisbecker
@ 2013-02-17 17:32                         ` Linus Torvalds
  2013-02-17 18:17                           ` Frederic Weisbecker
  2013-02-17 20:58                           ` Frederic Weisbecker
  0 siblings, 2 replies; 50+ messages in thread
From: Linus Torvalds @ 2013-02-17 17:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul McKenney, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

On Sun, Feb 17, 2013 at 7:11 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> preempt_value_in_interrupt() looks buggy in your patch: it makes
> invoke_softirq() returning if (val & HARDIRQ_MASK). But that's always
> true since you have moved further the sub_preempt_count(IRQ_EXIT)
> further.

No, that's not it. The value hasn't been written back yet, but it already did:

+       int offset = IRQ_EXIT_OFFSET;
+       int count = preempt_count() - offset;

so the 'count' has the IRQ_EXIT_OFFSET already subtracted. So no,
HARDIRQ_MASK is *not* always set.

                   Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 17:32                         ` Linus Torvalds
@ 2013-02-17 18:17                           ` Frederic Weisbecker
  2013-02-17 20:58                           ` Frederic Weisbecker
  1 sibling, 0 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2013-02-17 18:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

2013/2/17 Linus Torvalds <torvalds@linux-foundation.org>:
> On Sun, Feb 17, 2013 at 7:11 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>
>> preempt_value_in_interrupt() looks buggy in your patch: it makes
>> invoke_softirq() returning if (val & HARDIRQ_MASK). But that's always
>> true since you have moved further the sub_preempt_count(IRQ_EXIT)
>> further.
>
> No, that's not it. The value hasn't been written back yet, but it already did:
>
> +       int offset = IRQ_EXIT_OFFSET;
> +       int count = preempt_count() - offset;
>
> so the 'count' has the IRQ_EXIT_OFFSET already subtracted. So no,
> HARDIRQ_MASK is *not* always set.

Ah right. I was confused.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 17:32                         ` Linus Torvalds
  2013-02-17 18:17                           ` Frederic Weisbecker
@ 2013-02-17 20:58                           ` Frederic Weisbecker
  2013-02-17 21:02                             ` Frederic Weisbecker
  1 sibling, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2013-02-17 20:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]

2013/2/17 Linus Torvalds <torvalds@linux-foundation.org>:
> On Sun, Feb 17, 2013 at 7:11 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>
>> preempt_value_in_interrupt() looks buggy in your patch: it makes
>> invoke_softirq() returning if (val & HARDIRQ_MASK). But that's always
>> true since you have moved further the sub_preempt_count(IRQ_EXIT)
>> further.
>
> No, that's not it. The value hasn't been written back yet, but it already did:
>
> +       int offset = IRQ_EXIT_OFFSET;
> +       int count = preempt_count() - offset;
>
> so the 'count' has the IRQ_EXIT_OFFSET already subtracted. So no,
> HARDIRQ_MASK is *not* always set.

Another thing. Perhaps we can push the idea of your patch a little
further by re-entering HARDIRQ_OFFSET at the end of the softirq
processing and do the final sub_preempt_count(HARDIRQ_OFFSET) at the
very end of irq_exit().

This way irq_exit() looks a bit more simple to me: everything there
becomes considered as in hardirq: (ie: rcu_irq_exit() and
tick_nohz_irq_exit() won't appear anymore as weird special cases) and
we get rid of that IRQ_EXIT_OFFSET hack that fixes the CONFIG_PREEMPT
case.

I'm attaching an untested patch that modify yours. It's probably
missing some corner cases that rely on in_interrupt() value in
rcu_irq_exit() and tick_nohz_irq_exit() and may be other things.

[-- Attachment #2: patch2.diff --]
[-- Type: application/octet-stream, Size: 3107 bytes --]

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 624ef3f..9e36680 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -107,7 +107,8 @@
  * used in the general case to determine whether sleeping is possible.
  * Do not use in_atomic() in driver code.
  */
-#define in_atomic()	((preempt_count() & ~PREEMPT_ACTIVE) != 0)
+#define preempt_offset_in_atomic(offset)	((offset & ~PREEMPT_ACTIVE) != 0)
+#define in_atomic()	preempt_offset_in_atomic(preempt_count())
 
 /*
  * Check whether we were atomic before we did preempt_disable():
@@ -118,10 +119,8 @@
 
 #ifdef CONFIG_PREEMPT_COUNT
 # define preemptible()	(preempt_count() == 0 && !irqs_disabled())
-# define IRQ_EXIT_OFFSET (HARDIRQ_OFFSET-1)
 #else
 # define preemptible()	0
-# define IRQ_EXIT_OFFSET HARDIRQ_OFFSET
 #endif
 
 #if defined(CONFIG_SMP) || defined(CONFIG_GENERIC_HARDIRQS)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..b14e40e 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -320,20 +320,45 @@ void irq_enter(void)
 	__irq_enter();
 }
 
+/*
+ * Invoce softirq's from irq_exit().
+ *
+ * Return the preempt offset: either IRQ_EXIT_OFFSET (if we
+ * did nothing to the preemption count) or SOFTIRQ_OFFSET (in
+ * case we did softirq processing and changed the preemption
+ * count to account for that).
+ */
 static inline void invoke_softirq(void)
 {
-	if (!force_irqthreads) {
+	/* Can we run softirq's at all? We migth be nesting interrupts */
+	if (preempt_offset_in_interrupt(preempt_count() - HARDIRQ_OFFSET))
+		return;
+
+	/* Are there any pending? */
+	if (!local_softirq_pending())
+		return;
+
+	/* Do we force irq threads? If so, just wake up the daemon */
+	if (force_irqthreads) {
+		wakeup_softirqd();
+		return;
+	}
+
+	/*
+	 * Ok, do actual softirq handling!
+	 *
+	 * This downgrades us from hardirq context to softirq context.
+	 */
+	preempt_count() += SOFTIRQ_OFFSET - HARDIRQ_OFFSET;
+
+	trace_softirqs_off(__builtin_return_address(0));
 #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
-		__do_softirq();
+	__do_softirq();
 #else
-		do_softirq();
+	do_softirq();
 #endif
-	} else {
-		__local_bh_disable((unsigned long)__builtin_return_address(0),
-				SOFTIRQ_OFFSET);
-		wakeup_softirqd();
-		__local_bh_enable(SOFTIRQ_OFFSET);
-	}
+	preempt_count() += HARDIRQ_OFFSET - SOFTIRQ_OFFSET;
+	trace_softirqs_on((unsigned long)__builtin_return_address(0));
 }
 
 /*
@@ -343,17 +368,19 @@ void irq_exit(void)
 {
 	vtime_account_irq_exit(current);
 	trace_hardirq_exit();
-	sub_preempt_count(IRQ_EXIT_OFFSET);
-	if (!in_interrupt() && local_softirq_pending())
-		invoke_softirq();
+	invoke_softirq();
 
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
-	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
-		tick_nohz_irq_exit();
+	if (idle_cpu(smp_processor_id())) {
+		int offset = preempt_count() - HARDIRQ_OFFSET;
+
+		if (!preempt_offset_in_interrupt(offset) && !need_resched())
+			tick_nohz_irq_exit();
+	}
 #endif
 	rcu_irq_exit();
-	sched_preempt_enable_no_resched();
+	sub_preempt_count(HARDIRQ_OFFSET);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 20:58                           ` Frederic Weisbecker
@ 2013-02-17 21:02                             ` Frederic Weisbecker
  2013-02-18 15:53                               ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2013-02-17 21:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

2013/2/17 Frederic Weisbecker <fweisbec@gmail.com>:
> 2013/2/17 Linus Torvalds <torvalds@linux-foundation.org>:
>> On Sun, Feb 17, 2013 at 7:11 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>>
>>> preempt_value_in_interrupt() looks buggy in your patch: it makes
>>> invoke_softirq() returning if (val & HARDIRQ_MASK). But that's always
>>> true since you have moved further the sub_preempt_count(IRQ_EXIT)
>>> further.
>>
>> No, that's not it. The value hasn't been written back yet, but it already did:
>>
>> +       int offset = IRQ_EXIT_OFFSET;
>> +       int count = preempt_count() - offset;
>>
>> so the 'count' has the IRQ_EXIT_OFFSET already subtracted. So no,
>> HARDIRQ_MASK is *not* always set.
>
> Another thing. Perhaps we can push the idea of your patch a little
> further by re-entering HARDIRQ_OFFSET at the end of the softirq
> processing and do the final sub_preempt_count(HARDIRQ_OFFSET) at the
> very end of irq_exit().
>
> This way irq_exit() looks a bit more simple to me: everything there
> becomes considered as in hardirq: (ie: rcu_irq_exit() and
> tick_nohz_irq_exit() won't appear anymore as weird special cases) and
> we get rid of that IRQ_EXIT_OFFSET hack that fixes the CONFIG_PREEMPT
> case.
>
> I'm attaching an untested patch that modify yours. It's probably
> missing some corner cases that rely on in_interrupt() value in
> rcu_irq_exit() and tick_nohz_irq_exit() and may be other things.

I messed up preempt_offset_in_interrupt() with in_atomic() code
instead of in_interrupt(). Anyway the patch is untested and is more
there to get your opinion for now. I'll put some more care on it if
people like it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-16 19:46                             ` Linus Torvalds
  2013-02-16 21:07                               ` Paul E. McKenney
  2013-02-16 21:53                               ` H. Peter Anvin
@ 2013-02-17 22:49                               ` H. Peter Anvin
  2013-02-18  8:41                                 ` Ingo Molnar
  2 siblings, 1 reply; 50+ messages in thread
From: H. Peter Anvin @ 2013-02-17 22:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Frederic Weisbecker, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Dave Jones, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney

On 02/16/2013 11:46 AM, Linus Torvalds wrote:
> Adding Peter Anvin to the people, just in case he sees what's wrong
> with the system call stub generation that keeps excessively old object
> files around. If it's easy to fix, it might be worth trying to make it
> ok to switch from i386 to x86-64 and back in the same tree.

I have not been able to reproduce this; it seems to Just Work[TM].

The syscall header stuff is definitely not to blame: it doesn't even 
*see* the CONFIG_ settings; instead they are only used to determine 
which subset of files to create, but the files themselves are 
configuration-independent.  This is by design.

As such, without an actual known to fail test case there isn't much I 
can do.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 22:49                               ` H. Peter Anvin
@ 2013-02-18  8:41                                 ` Ingo Molnar
  2013-02-19  8:50                                   ` Paul E. McKenney
  0 siblings, 1 reply; 50+ messages in thread
From: Ingo Molnar @ 2013-02-18  8:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Paul McKenney, Frederic Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Dave Jones, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 02/16/2013 11:46 AM, Linus Torvalds wrote:
> >Adding Peter Anvin to the people, just in case he sees what's wrong
> >with the system call stub generation that keeps excessively old object
> >files around. If it's easy to fix, it might be worth trying to make it
> >ok to switch from i386 to x86-64 and back in the same tree.
> 
> I have not been able to reproduce this; it seems to Just 
> Work[TM].
> 
> The syscall header stuff is definitely not to blame: it 
> doesn't even *see* the CONFIG_ settings; instead they are only 
> used to determine which subset of files to create, but the 
> files themselves are configuration-independent.  This is by 
> design.
> 
> As such, without an actual known to fail test case there isn't 
> much I can do.

I tried (based on the versions given by Paul):

  git checkout v3.7-rc7
  make ARCH=i386 defconfig
  make -j64 bzImage

  git checkout v3.8-rc7
  make ARCH=x86_64 olddefconfig
  make -j64 bzImage

but it built just fine. The build bug might depend on the 
specific config file, or might depend on tooling details?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-17 21:02                             ` Frederic Weisbecker
@ 2013-02-18 15:53                               ` Frederic Weisbecker
  2013-02-18 18:12                                 ` Linus Torvalds
  2013-02-18 19:58                                 ` Thomas Gleixner
  0 siblings, 2 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2013-02-18 15:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

On Sun, Feb 17, 2013 at 10:02:11PM +0100, Frederic Weisbecker wrote:
> 2013/2/17 Frederic Weisbecker <fweisbec@gmail.com>:
> > 2013/2/17 Linus Torvalds <torvalds@linux-foundation.org>:
> >> On Sun, Feb 17, 2013 at 7:11 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >>>
> >>> preempt_value_in_interrupt() looks buggy in your patch: it makes
> >>> invoke_softirq() returning if (val & HARDIRQ_MASK). But that's always
> >>> true since you have moved further the sub_preempt_count(IRQ_EXIT)
> >>> further.
> >>
> >> No, that's not it. The value hasn't been written back yet, but it already did:
> >>
> >> +       int offset = IRQ_EXIT_OFFSET;
> >> +       int count = preempt_count() - offset;
> >>
> >> so the 'count' has the IRQ_EXIT_OFFSET already subtracted. So no,
> >> HARDIRQ_MASK is *not* always set.
> >
> > Another thing. Perhaps we can push the idea of your patch a little
> > further by re-entering HARDIRQ_OFFSET at the end of the softirq
> > processing and do the final sub_preempt_count(HARDIRQ_OFFSET) at the
> > very end of irq_exit().
> >
> > This way irq_exit() looks a bit more simple to me: everything there
> > becomes considered as in hardirq: (ie: rcu_irq_exit() and
> > tick_nohz_irq_exit() won't appear anymore as weird special cases) and
> > we get rid of that IRQ_EXIT_OFFSET hack that fixes the CONFIG_PREEMPT
> > case.
> >
> > I'm attaching an untested patch that modify yours. It's probably
> > missing some corner cases that rely on in_interrupt() value in
> > rcu_irq_exit() and tick_nohz_irq_exit() and may be other things.
> 
> I messed up preempt_offset_in_interrupt() with in_atomic() code
> instead of in_interrupt(). Anyway the patch is untested and is more
> there to get your opinion for now. I'll put some more care on it if
> people like it.

Here is an updated version. It cleans up things a bit and boots fine with my usual
config. There might be still some small details to work on but here is the big picture.

What do you think?

---
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 624ef3f..d1ab5b4 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -87,6 +87,7 @@
 #define in_irq()		(hardirq_count())
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
+#define in_nesting_interrupt()  (irq_count() - HARDIRQ_OFFSET)
 #define in_serving_softirq()	(softirq_count() & SOFTIRQ_OFFSET)
 
 /*
@@ -118,10 +119,8 @@
 
 #ifdef CONFIG_PREEMPT_COUNT
 # define preemptible()	(preempt_count() == 0 && !irqs_disabled())
-# define IRQ_EXIT_OFFSET (HARDIRQ_OFFSET-1)
 #else
 # define preemptible()	0
-# define IRQ_EXIT_OFFSET HARDIRQ_OFFSET
 #endif
 
 #if defined(CONFIG_SMP) || defined(CONFIG_GENERIC_HARDIRQS)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..69fbefd 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -277,6 +277,17 @@ restart:
 	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
 }
 
+#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
+static void __do_softirq_irq_unsafe(void)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__do_softirq();
+	local_irq_restore(flags);
+}
+#endif
+
 #ifndef __ARCH_HAS_DO_SOFTIRQ
 
 asmlinkage void do_softirq(void)
@@ -320,20 +331,45 @@ void irq_enter(void)
 	__irq_enter();
 }
 
+/*
+ * Invoce softirq's from irq_exit().
+ *
+ * Return the preempt offset: either IRQ_EXIT_OFFSET (if we
+ * did nothing to the preemption count) or SOFTIRQ_OFFSET (in
+ * case we did softirq processing and changed the preemption
+ * count to account for that).
+ */
 static inline void invoke_softirq(void)
 {
-	if (!force_irqthreads) {
+	/* Can we run softirq's at all? We migth be nesting interrupts */
+	if (in_nesting_interrupt())
+		return;
+
+	/* Are there any pending? */
+	if (!local_softirq_pending())
+		return;
+
+	/* Do we force irq threads? If so, just wake up the daemon */
+	if (force_irqthreads) {
+		wakeup_softirqd();
+		return;
+	}
+
+	/*
+	 * Ok, do actual softirq handling!
+	 *
+	 * This downgrades us from hardirq context to softirq context.
+	 */
+	preempt_count() += SOFTIRQ_OFFSET - HARDIRQ_OFFSET;
+
+	trace_softirqs_off(__builtin_return_address(0));
 #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
-		__do_softirq();
+	__do_softirq();
 #else
-		do_softirq();
+	__do_softirq_irq_unsafe();
 #endif
-	} else {
-		__local_bh_disable((unsigned long)__builtin_return_address(0),
-				SOFTIRQ_OFFSET);
-		wakeup_softirqd();
-		__local_bh_enable(SOFTIRQ_OFFSET);
-	}
+	preempt_count() += HARDIRQ_OFFSET - SOFTIRQ_OFFSET;
+	trace_softirqs_on((unsigned long)__builtin_return_address(0));
 }
 
 /*
@@ -343,17 +379,17 @@ void irq_exit(void)
 {
 	vtime_account_irq_exit(current);
 	trace_hardirq_exit();
-	sub_preempt_count(IRQ_EXIT_OFFSET);
-	if (!in_interrupt() && local_softirq_pending())
-		invoke_softirq();
+	invoke_softirq();
 
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
-	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
-		tick_nohz_irq_exit();
+	if (idle_cpu(smp_processor_id())) {
+		if (!in_nesting_interrupt() && !need_resched())
+			tick_nohz_irq_exit();
+	}
 #endif
 	rcu_irq_exit();
-	sched_preempt_enable_no_resched();
+	sub_preempt_count(HARDIRQ_OFFSET);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-18 15:53                               ` Frederic Weisbecker
@ 2013-02-18 18:12                                 ` Linus Torvalds
  2013-02-19 10:08                                   ` Frederic Weisbecker
  2013-02-18 19:58                                 ` Thomas Gleixner
  1 sibling, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2013-02-18 18:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul McKenney, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

On Mon, Feb 18, 2013 at 7:53 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> Here is an updated version. It cleans up things a bit and boots fine with my usual
> config. There might be still some small details to work on but here is the big picture.
>
> What do you think?

Looks fine to me. Did this get tested with lockdep and spinlock
debugging etc? Those are the ones that are the primary users of all
the irq state and preempt tracing..

             Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-18 15:53                               ` Frederic Weisbecker
  2013-02-18 18:12                                 ` Linus Torvalds
@ 2013-02-18 19:58                                 ` Thomas Gleixner
  2013-02-19 10:38                                   ` Frederic Weisbecker
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2013-02-18 19:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Paul McKenney, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

On Mon, 18 Feb 2013, Frederic Weisbecker wrote:
> On Sun, Feb 17, 2013 at 10:02:11PM +0100, Frederic Weisbecker wrote:
> > 2013/2/17 Frederic Weisbecker <fweisbec@gmail.com>:
> > > 2013/2/17 Linus Torvalds <torvalds@linux-foundation.org>:
> > >> On Sun, Feb 17, 2013 at 7:11 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > >>>
> > >>> preempt_value_in_interrupt() looks buggy in your patch: it makes
> > >>> invoke_softirq() returning if (val & HARDIRQ_MASK). But that's always
> > >>> true since you have moved further the sub_preempt_count(IRQ_EXIT)
> > >>> further.
> > >>
> > >> No, that's not it. The value hasn't been written back yet, but it already did:
> > >>
> > >> +       int offset = IRQ_EXIT_OFFSET;
> > >> +       int count = preempt_count() - offset;
> > >>
> > >> so the 'count' has the IRQ_EXIT_OFFSET already subtracted. So no,
> > >> HARDIRQ_MASK is *not* always set.
> > >
> > > Another thing. Perhaps we can push the idea of your patch a little
> > > further by re-entering HARDIRQ_OFFSET at the end of the softirq
> > > processing and do the final sub_preempt_count(HARDIRQ_OFFSET) at the
> > > very end of irq_exit().
> > >
> > > This way irq_exit() looks a bit more simple to me: everything there
> > > becomes considered as in hardirq: (ie: rcu_irq_exit() and
> > > tick_nohz_irq_exit() won't appear anymore as weird special cases) and
> > > we get rid of that IRQ_EXIT_OFFSET hack that fixes the CONFIG_PREEMPT
> > > case.
> > >
> > > I'm attaching an untested patch that modify yours. It's probably
> > > missing some corner cases that rely on in_interrupt() value in
> > > rcu_irq_exit() and tick_nohz_irq_exit() and may be other things.
> > 
> > I messed up preempt_offset_in_interrupt() with in_atomic() code
> > instead of in_interrupt(). Anyway the patch is untested and is more
> > there to get your opinion for now. I'll put some more care on it if
> > people like it.
> 
> Here is an updated version. It cleans up things a bit and boots fine with my usual
> config. There might be still some small details to work on but here is the big picture.
> 
> What do you think?
> 
> ---
> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index 624ef3f..d1ab5b4 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -87,6 +87,7 @@
>  #define in_irq()		(hardirq_count())
>  #define in_softirq()		(softirq_count())
>  #define in_interrupt()		(irq_count())
> +#define in_nesting_interrupt()  (irq_count() - HARDIRQ_OFFSET)
>  #define in_serving_softirq()	(softirq_count() & SOFTIRQ_OFFSET)
>  
>  /*
> @@ -118,10 +119,8 @@
>  
>  #ifdef CONFIG_PREEMPT_COUNT
>  # define preemptible()	(preempt_count() == 0 && !irqs_disabled())
> -# define IRQ_EXIT_OFFSET (HARDIRQ_OFFSET-1)
>  #else
>  # define preemptible()	0
> -# define IRQ_EXIT_OFFSET HARDIRQ_OFFSET
>  #endif
>  
>  #if defined(CONFIG_SMP) || defined(CONFIG_GENERIC_HARDIRQS)
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index ed567ba..69fbefd 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -277,6 +277,17 @@ restart:
>  	tsk_restore_flags(current, old_flags, PF_MEMALLOC);
>  }
>  
> +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED

We really should get rid of that and make irqs disabled on irq exit
mandatory.

> +static void __do_softirq_irq_unsafe(void)
> +{
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	__do_softirq();
> +	local_irq_restore(flags);
> +}
> +#endif
> +
>  #ifndef __ARCH_HAS_DO_SOFTIRQ
>  
>  asmlinkage void do_softirq(void)
> @@ -320,20 +331,45 @@ void irq_enter(void)
>  	__irq_enter();
>  }
>  
> +/*
> + * Invoce softirq's from irq_exit().
> + *
> + * Return the preempt offset: either IRQ_EXIT_OFFSET (if we
> + * did nothing to the preemption count) or SOFTIRQ_OFFSET (in
> + * case we did softirq processing and changed the preemption
> + * count to account for that).
> + */
>  static inline void invoke_softirq(void)
>  {
> -	if (!force_irqthreads) {
> +	/* Can we run softirq's at all? We migth be nesting interrupts */
> +	if (in_nesting_interrupt())
> +		return;

We might also be in a softirq disabled region where it makes no sense
to continue. So this should be

   in_nesting_interrupt() || in_softirq()

> +	/* Are there any pending? */
> +	if (!local_softirq_pending())
> +		return;
> +
> +	/* Do we force irq threads? If so, just wake up the daemon */
> +	if (force_irqthreads) {
> +		wakeup_softirqd();
> +		return;
> +	}
> +
> +	/*
> +	 * Ok, do actual softirq handling!
> +	 *
> +	 * This downgrades us from hardirq context to softirq context.
> +	 */
> +	preempt_count() += SOFTIRQ_OFFSET - HARDIRQ_OFFSET;
> +
> +	trace_softirqs_off(__builtin_return_address(0));
>  #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
> -		__do_softirq();
> +	__do_softirq();
>  #else
> -		do_softirq();
> +	__do_softirq_irq_unsafe();
>  #endif
> -	} else {
> -		__local_bh_disable((unsigned long)__builtin_return_address(0),
> -				SOFTIRQ_OFFSET);
> -		wakeup_softirqd();
> -		__local_bh_enable(SOFTIRQ_OFFSET);
> -	}
> +	preempt_count() += HARDIRQ_OFFSET - SOFTIRQ_OFFSET;
> +	trace_softirqs_on((unsigned long)__builtin_return_address(0));
>  }
>  
>  /*
> @@ -343,17 +379,17 @@ void irq_exit(void)
>  {
>  	vtime_account_irq_exit(current);
>  	trace_hardirq_exit();
> -	sub_preempt_count(IRQ_EXIT_OFFSET);
> -	if (!in_interrupt() && local_softirq_pending())
> -		invoke_softirq();
> +	invoke_softirq();
>  
>  #ifdef CONFIG_NO_HZ
>  	/* Make sure that timer wheel updates are propagated */
> -	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
> -		tick_nohz_irq_exit();
> +	if (idle_cpu(smp_processor_id())) {
> +		if (!in_nesting_interrupt() && !need_resched())
> +			tick_nohz_irq_exit();
> +	}
>  #endif
>  	rcu_irq_exit();
> -	sched_preempt_enable_no_resched();
> +	sub_preempt_count(HARDIRQ_OFFSET);
>  }

Otherwise this is a nice improvement!

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-18  8:41                                 ` Ingo Molnar
@ 2013-02-19  8:50                                   ` Paul E. McKenney
  2013-02-19  8:56                                     ` Ingo Molnar
  0 siblings, 1 reply; 50+ messages in thread
From: Paul E. McKenney @ 2013-02-19  8:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Linus Torvalds, Frederic Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Dave Jones, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney

On Mon, Feb 18, 2013 at 09:41:30AM +0100, Ingo Molnar wrote:
> 
> * H. Peter Anvin <hpa@zytor.com> wrote:
> 
> > On 02/16/2013 11:46 AM, Linus Torvalds wrote:
> > >Adding Peter Anvin to the people, just in case he sees what's wrong
> > >with the system call stub generation that keeps excessively old object
> > >files around. If it's easy to fix, it might be worth trying to make it
> > >ok to switch from i386 to x86-64 and back in the same tree.
> > 
> > I have not been able to reproduce this; it seems to Just 
> > Work[TM].
> > 
> > The syscall header stuff is definitely not to blame: it 
> > doesn't even *see* the CONFIG_ settings; instead they are only 
> > used to determine which subset of files to create, but the 
> > files themselves are configuration-independent.  This is by 
> > design.
> > 
> > As such, without an actual known to fail test case there isn't 
> > much I can do.
> 
> I tried (based on the versions given by Paul):
> 
>   git checkout v3.7-rc7
>   make ARCH=i386 defconfig
>   make -j64 bzImage
> 
>   git checkout v3.8-rc7
>   make ARCH=x86_64 olddefconfig
>   make -j64 bzImage
> 
> but it built just fine. The build bug might depend on the 
> specific config file, or might depend on tooling details?

The problem is that the git tree I used is one that I build out of
very infrequently, and so all I really know about the previous
build was that it was done a long time ago.  :-(

							Thanx, Paul


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-19  8:50                                   ` Paul E. McKenney
@ 2013-02-19  8:56                                     ` Ingo Molnar
  0 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2013-02-19  8:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: H. Peter Anvin, Linus Torvalds, Frederic Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Dave Jones, Hugh Dickins,
	Linux Kernel Mailing List, Paul McKenney


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> On Mon, Feb 18, 2013 at 09:41:30AM +0100, Ingo Molnar wrote:
> > 
> > * H. Peter Anvin <hpa@zytor.com> wrote:
> > 
> > > On 02/16/2013 11:46 AM, Linus Torvalds wrote:
> > > >Adding Peter Anvin to the people, just in case he sees what's wrong
> > > >with the system call stub generation that keeps excessively old object
> > > >files around. If it's easy to fix, it might be worth trying to make it
> > > >ok to switch from i386 to x86-64 and back in the same tree.
> > > 
> > > I have not been able to reproduce this; it seems to Just 
> > > Work[TM].
> > > 
> > > The syscall header stuff is definitely not to blame: it 
> > > doesn't even *see* the CONFIG_ settings; instead they are only 
> > > used to determine which subset of files to create, but the 
> > > files themselves are configuration-independent.  This is by 
> > > design.
> > > 
> > > As such, without an actual known to fail test case there isn't 
> > > much I can do.
> > 
> > I tried (based on the versions given by Paul):
> > 
> >   git checkout v3.7-rc7
> >   make ARCH=i386 defconfig
> >   make -j64 bzImage
> > 
> >   git checkout v3.8-rc7
> >   make ARCH=x86_64 olddefconfig
> >   make -j64 bzImage
> > 
> > but it built just fine. The build bug might depend on the 
> > specific config file, or might depend on tooling details?
> 
> The problem is that the git tree I used is one that I build out of
> very infrequently, and so all I really know about the previous
> build was that it was done a long time ago.  :-(

Unless you recreated the Git repo from scratch, you could find 
clues about past versions used in .git/logs/HEAD.

Except if you used it to pull bits frequently - but built rarely 
out of it. In that case the information is probably lost.

(Don't worry about it too much - someone will eventually hit it 
again during bisection or so.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-18 18:12                                 ` Linus Torvalds
@ 2013-02-19 10:08                                   ` Frederic Weisbecker
  0 siblings, 0 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2013-02-19 10:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

2013/2/18 Linus Torvalds <torvalds@linux-foundation.org>:
> On Mon, Feb 18, 2013 at 7:53 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>>
>> Here is an updated version. It cleans up things a bit and boots fine with my usual
>> config. There might be still some small details to work on but here is the big picture.
>>
>> What do you think?
>
> Looks fine to me. Did this get tested with lockdep and spinlock
> debugging etc? Those are the ones that are the primary users of all
> the irq state and preempt tracing..

Yeah I always keep those options turned on to test my patches. But I
need to test with and without preempt, check if I can reuse more
local_bh_disable, etc... So there will be more iterations.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-18 19:58                                 ` Thomas Gleixner
@ 2013-02-19 10:38                                   ` Frederic Weisbecker
  2013-02-19 10:44                                     ` Thomas Gleixner
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2013-02-19 10:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Paul McKenney, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney

2013/2/18 Thomas Gleixner <tglx@linutronix.de>:
> On Mon, 18 Feb 2013, Frederic Weisbecker wrote:
>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>> index ed567ba..69fbefd 100644
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -277,6 +277,17 @@ restart:
>>       tsk_restore_flags(current, old_flags, PF_MEMALLOC);
>>  }
>>
>> +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
>
> We really should get rid of that and make irqs disabled on irq exit
> mandatory.

Until now I even thought that having irqs disabled in irq_exit() was
mandatory. I need to have a second look at tick_nohz_irq_exit() as I
modified it with this requirement in mind...

Also with this patch, if a nesting interrupt happens right after
tick_nohz_irq_exit(), this new irq will call tick_nohz_stop_idle() but
not tick_nohz_start_idle() on the end. Leaving that unbalanced. I
could work that around but I don't want to spaghettize further around
that ifdef.

So this patch actually does depend on __ARCH_IRQ_EXIT_IRQS_DISABLED. I
need to convert all archs to force this first. Then think about the
softirq.c cleanup once its done.

I wonder what's the right path to this, may be loop over archs until
they all have __ARCH_IRQ_EXIT_IRQS_DISABLED, then remove that? Sounds
like non-sense but the safest way at the same time.

>> @@ -320,20 +331,45 @@ void irq_enter(void)
>>       __irq_enter();
>>  }
>>
>> +/*
>> + * Invoce softirq's from irq_exit().
>> + *
>> + * Return the preempt offset: either IRQ_EXIT_OFFSET (if we
>> + * did nothing to the preemption count) or SOFTIRQ_OFFSET (in
>> + * case we did softirq processing and changed the preemption
>> + * count to account for that).
>> + */
>>  static inline void invoke_softirq(void)
>>  {
>> -     if (!force_irqthreads) {
>> +     /* Can we run softirq's at all? We migth be nesting interrupts */
>> +     if (in_nesting_interrupt())
>> +             return;
>
> We might also be in a softirq disabled region where it makes no sense
> to continue. So this should be
>
>    in_nesting_interrupt() || in_softirq()

in_nesting_interrupt() takes care of that as well.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Debugging Thinkpad T430s occasional suspend failure.
  2013-02-19 10:38                                   ` Frederic Weisbecker
@ 2013-02-19 10:44                                     ` Thomas Gleixner
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Gleixner @ 2013-02-19 10:44 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Paul McKenney, Ingo Molnar, Peter Zijlstra,
	Dave Jones, Hugh Dickins, Linux Kernel Mailing List,
	Paul McKenney, linux-arch

CC'ing linux-arch, so the arch folks can scream murder if they have
any objections.

On Tue, 19 Feb 2013, Frederic Weisbecker wrote:
> 2013/2/18 Thomas Gleixner <tglx@linutronix.de>:
> > On Mon, 18 Feb 2013, Frederic Weisbecker wrote:
> >> diff --git a/kernel/softirq.c b/kernel/softirq.c
> >> index ed567ba..69fbefd 100644
> >> --- a/kernel/softirq.c
> >> +++ b/kernel/softirq.c
> >> @@ -277,6 +277,17 @@ restart:
> >>       tsk_restore_flags(current, old_flags, PF_MEMALLOC);
> >>  }
> >>
> >> +#ifndef __ARCH_IRQ_EXIT_IRQS_DISABLED
> >
> > We really should get rid of that and make irqs disabled on irq exit
> > mandatory.
> 
> Until now I even thought that having irqs disabled in irq_exit() was
> mandatory. I need to have a second look at tick_nohz_irq_exit() as I
> modified it with this requirement in mind...

Duh.
 
> Also with this patch, if a nesting interrupt happens right after
> tick_nohz_irq_exit(), this new irq will call tick_nohz_stop_idle() but
> not tick_nohz_start_idle() on the end. Leaving that unbalanced. I
> could work that around but I don't want to spaghettize further around
> that ifdef.
> 
> So this patch actually does depend on __ARCH_IRQ_EXIT_IRQS_DISABLED. I
> need to convert all archs to force this first. Then think about the
> softirq.c cleanup once its done.
> 
> I wonder what's the right path to this, may be loop over archs until
> they all have __ARCH_IRQ_EXIT_IRQS_DISABLED, then remove that? Sounds
> like non-sense but the safest way at the same time.

Yeah.
 
> >>  static inline void invoke_softirq(void)
> >>  {
> >> -     if (!force_irqthreads) {
> >> +     /* Can we run softirq's at all? We migth be nesting interrupts */
> >> +     if (in_nesting_interrupt())
> >> +             return;
> >
> > We might also be in a softirq disabled region where it makes no sense
> > to continue. So this should be
> >
> >    in_nesting_interrupt() || in_softirq()
> 
> in_nesting_interrupt() takes care of that as well.
 
Gah, misread the macro maze :(

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2013-02-19 10:44 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-12 19:39 Debugging Thinkpad T430s occasional suspend failure Dave Jones
2013-02-12 20:13 ` Linus Torvalds
2013-02-13  0:26 ` Hugh Dickins
2013-02-13  0:40   ` Dave Jones
2013-02-13  0:56     ` Hugh Dickins
2013-02-13  4:16       ` Dave Jones
2013-02-13  5:37         ` Hugh Dickins
2013-02-13 19:34           ` Dave Jones
2013-02-13 19:56             ` Linus Torvalds
2013-02-13 20:53               ` Dave Jones
2013-02-16 20:54                 ` Paul E. McKenney
2013-02-15  1:15               ` Dave Jones
2013-02-15  2:09                 ` Linus Torvalds
2013-02-15 15:50                   ` Ingo Molnar
2013-02-15 22:33                     ` Dave Jones
2013-02-15 17:44                   ` Paul E. McKenney
2013-02-15 18:34                     ` Linus Torvalds
2013-02-15 18:35                       ` Linus Torvalds
2013-02-15 18:50                         ` Linus Torvalds
2013-02-16 19:25                           ` Paul E. McKenney
2013-02-16 19:46                             ` Linus Torvalds
2013-02-16 21:07                               ` Paul E. McKenney
2013-02-16 21:53                               ` H. Peter Anvin
2013-02-17 22:49                               ` H. Peter Anvin
2013-02-18  8:41                                 ` Ingo Molnar
2013-02-19  8:50                                   ` Paul E. McKenney
2013-02-19  8:56                                     ` Ingo Molnar
2013-02-17 15:11                       ` Frederic Weisbecker
2013-02-17 17:32                         ` Linus Torvalds
2013-02-17 18:17                           ` Frederic Weisbecker
2013-02-17 20:58                           ` Frederic Weisbecker
2013-02-17 21:02                             ` Frederic Weisbecker
2013-02-18 15:53                               ` Frederic Weisbecker
2013-02-18 18:12                                 ` Linus Torvalds
2013-02-19 10:08                                   ` Frederic Weisbecker
2013-02-18 19:58                                 ` Thomas Gleixner
2013-02-19 10:38                                   ` Frederic Weisbecker
2013-02-19 10:44                                     ` Thomas Gleixner
2013-02-15  2:09                 ` Hugh Dickins
2013-02-15  2:15                   ` Linus Torvalds
2013-02-16 21:45                     ` Hugh Dickins
2013-02-16 23:02                       ` Linus Torvalds
2013-02-17  0:01                         ` Hugh Dickins
2013-02-17  2:21                           ` Hugh Dickins
2013-02-17 13:38                             ` Daniel Vetter
2013-02-17 14:54                               ` Daniel Vetter
2013-02-17 16:31                               ` Hugh Dickins
2013-02-17 17:28                                 ` Daniel Vetter
2013-02-17 17:28                                   ` Daniel Vetter
2013-02-13  2:17   ` Dave Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.