All of lore.kernel.org
 help / color / mirror / Atom feed
From: Takashi Iwai <tiwai@suse.de>
To: Lyude Paul <lyude@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>, intel-gfx@lists.freedesktop.org
Subject: Re: Panic after S3 resume and modeset with MST
Date: Thu, 30 Mar 2017 22:27:12 +0200	[thread overview]
Message-ID: <s5hvaqqh5nj.wl-tiwai@suse.de> (raw)
In-Reply-To: <1490904091.19826.3.camel@redhat.com>

On Thu, 30 Mar 2017 22:01:31 +0200,
Lyude Paul wrote:
> 
> On Thu, 2017-03-30 at 20:50 +0200, Takashi Iwai wrote:
> <snip>
> > 
> > Sure, if we get a proper stack dump, we can analyze it somehow.  You
> > can use addr2line, or even check objdump output manually.
> > But in this case, as already mentioned, it was impossible to get any
> > sensible stack trace on my machine with 4.11-rc, so far,
> > unfortunately.  So no material to read.
> 
> huh? I thought that was what the file called "screenshot showing kernel
> panic trace" on the bugzilla was (although that backtrace definitely
> didn't look too relevant)...

It's not from my machine, and it's not from 4.11-rc.  It's a
screenshot taken on 4.4.x openSUSE kernel with the backport of your
fix.  So it might be some help, but the stack trace there is merely a
red herring.

The reason I couldn't get such a screenshot is that VT switching is
broken on 4.11 in multiple ways.  One VT bug got fixed in 4.11-rc4,
but another still remains....

> anyway if you are having trouble getting
> just a stack trace though, one of my coworkers here has taught me a
> trick called divide and conquer.
> 
> The idea is pretty simple. Let's say we have a block of code like this
> in the kernel
> 
> void some_resume_func() {
> 	cool_function_call();
> 	this_is_neat_too();
> 
> 	foo();
> 	bar();
> 	death();
> 	baz();
> 	zab();
> }
> 
> And you know it's crashing inside this function on resume (e.g. it
> could be in foo(), bar(), or that suspicious death() function) but you
> have no way of getting a back trace.
> 
> This is where the trick comes in: while you might not be able to get a
> stack trace, you can probably at least tell the difference between when
> the machine reboots immediately as a result of calling
> emergency_restart(), and whether it's just hanging due to the bug.
> 
> So what you do is kind of like bisecting, except instead of testing
> different commits you see what happens when you insert a call to
> emergency_restart() and move it around:
> 
> - Try #1:
> 
> void some_resume_func() {
> 	cool_function_call();
> 	this_is_neat_too();
> 
> 	foo();
> 	emergency_restart();
> 	bar();
> 	death();
> 	baz();
> 	zab();
> }
> 
> The machine immediately reboots, so the problem is below where we
> inserted the emergency_reboot() call
> 
> - Try #2:
> 
> void some_resume_func() {
> 	cool_function_call();
> 	this_is_neat_too();
> 
> 	foo();
> 	bar();
> 	death();
> 	emergency_restart();
> 	baz();
> 	zab();
> }
> 
> The machine hangs, so we know the problem's either in the call to bar()
> or death().
> 
> - Try #3:
> 
> void some_resume_func() {
> 	cool_function_call();
> 	this_is_neat_too();
> 
> 	foo();
> 	bar();
> 	emergency_restart();
> 	death();
> 	baz();
> 	zab();
> }
> 
> The machine reboots immediately this time, which means that the problem
> has to be occurring inside the suspicious death() function. Of course,
> if we want to keep debugging further we can go into the death()
> function itself and try the same thing to figure out which line inside
> it is causing the issue.

Heh, the divide-and-conquer is also the strategy how I reached to my
patch :)  I divided the possible cause (the call of
intel_dp_mst_resume()), split them, and luckily it worked by the first
shot.

> So if you do this except around wherever it looks like this crash might
> be happening. From:
> 
> https://bugzilla.suse.com/show_bug.cgi?id=1029634#c5
> 
> It sounds like this happens on hotplugging, so the place to start this
> would probably be i915_hotplug_work_func(). Keep going down the call
> stack there and you should eventually find the culprit.
> 
> The only complication I foresee here is that you'll have to write a
> little bit of additional debugging code so that
> i915_hotplug_work_func() doesn't actually call emergency_restart()
> until right before the moment where the crash happens. This shouldn't
> be too difficult, you could do something like add a module parameter to
> i915 that you change right before the final step of reproducing the bug
> that enables the calls to emergency_restart(). If you have any trouble
> with this part, feel free to let me know and I'll hack together a quick
> patch you can use.

Right, that's the most difficult part; for reproducing the crash, we
need multiple suspend/resume and dock/undock, so the code path may be
executed multiple times.  And tracking i915_hotplug_work_func() in the
way you suggested isn't so trivial, as it's with full of indirect
calls...

A trick I often used instead is to put additional delays (very long
ones) between the suspected code lines with marking via trace_ or
normal printk, and track at which point we could reach.  Then you
don't need a frequent reboot but just a few long runs.  Of course, it
can't be used for irq context, but for the work, it's OK.

Maybe I'll give it a try, but likely later in the next week; I'll be
very busy for other tasks in tomorrow, sorry.


thanks,

Takashi

> 
> Lemme know if this helps at all :).
> 
> > 
> > That is, the problem isn't how to translate it, but how to get it.
> > Normal ways didn't work.  Maybe I can try AMT, but I doubt that it'll
> > give any output since kdump already failed...
> > 
> > 
> > thanks,
> > 
> > Takashi
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

      reply	other threads:[~2017-03-30 20:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-27 16:02 Panic after S3 resume and modeset with MST Takashi Iwai
2017-03-27 22:14 ` Lyude Paul
2017-03-29 13:10 ` Takashi Iwai
2017-03-29 13:34   ` Ville Syrjälä
2017-03-29 13:54     ` Takashi Iwai
2017-03-30  0:24       ` Lyude Paul
2017-03-30  5:55         ` Takashi Iwai
2017-03-30 18:07           ` Lyude Paul
2017-03-30 18:50             ` Takashi Iwai
2017-03-30 20:01               ` Lyude Paul
2017-03-30 20:27                 ` Takashi Iwai [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s5hvaqqh5nj.wl-tiwai@suse.de \
    --to=tiwai@suse.de \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=lyude@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.