All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ahmed S. Darwish" <darwish.07@gmail.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: John Ogness <john.ogness@linutronix.de>,
	daniel.vetter@ffwll.ch,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v2 1/3] drm: Add support for panic message output
Date: Thu, 14 Mar 2019 03:51:13 +0100	[thread overview]
Message-ID: <20190314025113.GA17300@darwi-home-pc> (raw)
In-Reply-To: <20190313083710.GI2665@phenom.ffwll.local>


[[ Adding Sebastian, who is quite experienced in intricate
   locking situations due to daily PREEMPT_RT work.. ]]

On Wed, Mar 13, 2019 at 09:37:10AM +0100, Daniel Vetter wrote:
> On Wed, Mar 13, 2019 at 08:49:17AM +0100, John Ogness wrote:
> > On 2019-03-12, Ahmed S. Darwish <darwish.07@gmail.com> wrote:
> > > On Wed, Mar 13, 2019 at 09:37:10AM +0100, Daniel Vetter wrote:
[……]
> > > >
> > > > If we aim for perfect this should be a trylock still, maybe using
> > > > our own device list.
> > > >
> > >
> > > I definitely agree here.
> > >
> > > The lock may already be locked either by a stopped CPU, or by the
> > > very same CPU we execute panic() on (e.g. NMI panic() on the
> > > printing CPU).
> > >
> > > This is why it's very common for example in serial consoles, which
> > > are usually careful about re-entrance and panic contexts, to do:
> > >
> > >   xxxxxx_console_write(...) {
> > > 	if (oops_in_progress)
> > > 		locked = spin_trylock_irqsave(&port->lock, flags);
> > > 	else
> > > 		spin_lock_irqsave(&port->lock, flags);
> > >   }
> > >
> > > I'm quite positive we should do the same for panic drm drivers.
> >
> > This construction will continue, even if the trylock fails. It only
> > makes sense to do this if the driver has a chance of being
> > successful. Ignoring locks is a serious issue. I personally am quite
> > unhappy that the serial drivers do this, which was part of my motivation
> > for the new printk design I'm working on.
> >
> > If the driver is not capable of doing something useful on a failed
> > trylock, then I recommend just skipping that device. Maybe trying it
> > again later after trying all the devices?
>
> Ah yes missed that. If the trylock fails anywhere, we must bail out.
>
> Not sure retrying is useful, my experience from at least drm is that
> either you're lucky, and drm wasn't doing anything right when the machine
> blew up, and then the trylocks will all go through. Or you're unlucky, and
> most likely that means drm itself blew up, and no amount of retrying is
> going to help. I wouldn't bother.
>

Ack, retrying most probably won't help.

I see that, at least in x86:

  kernel/panic.c::panic()
    => kernel/panic.c::smp_send_stop()
      => arch/x86/incluse/smp.h::smp_send_stop(0)
        => arch/x86/kernel/smp.c::native_stop_other_cpus(wait)
          /*
           * Don't wait longer than a second if the caller
           * didn't ask us to wait.
           */
           timeout = USEC_PER_SEC;
           while (num_online_cpus() > 1 && (wait || timeout--))
                   udelay(1);

So the panic()-ing core wait by default for _a full second_ until
all other cores finish their non-preemptible regions. If that did
not work out, it even tries more aggressively with NMIs.

So if one of the other non panic()-ing cores was holding a DRM
related lock through spin_lock_irqsave() for more than one second,
well, it's a bug in the DRM layer code anyway..      [A]

Problem is, what will happen if the non panic()-ing core was
holding a DRM lock through barebone spin_lock()? Interrupts are
enabled there, so the IPI will be handled, and thus that core will
effectively die while the lock is forever held :-(   [B]

But, well, yeah, I don't think there's a solution for the [B] part
except by spin_trylock() the fewest amount of locks possible, and
bailing out if anyone of them is held.

For reference, I asked over IRC why not just resetting GPU state,
but it's not easy as it sounds to people outside of the gfx world:

  [abridged]
  <darwi>  So it seems Windows is forcing driver writers to
           implement a gpu reset method [*]
  <darwi>  since Windows vista
  <danvet> yeah
  <danvet> not going to do that from panic
  <danvet> there's a non-zero chance that gpu reset takes down
           the system with at least lots of hw/driver combos
  <danvet> I think all our drivers also have gpu reset support
  <danvet> if something is still rendering while we panic, mea culpa
  <danvet> only way around that is switching planes
  <danvet> which we tried with the old approach, and that's just
           too hard
  <danvet> you end up running a few 100kloc of code worst case
  <danvet> no way to audit/validate that for panic context
  <danvet> but the rendering shouldn't really matter
  <anholt> yeah, actually modesetting in a panic seems hopeless.
  <danvet> at least for modern compositors
  <danvet> anholt, yeah I killed that idea
  <danvet> so they only render to the backbuffer
  <danvet> and we won't show that with a panic
  <anholt> yep.  just draw on a plane.  for everything but
           uncomposited x11, it'll already be idle.
  <danvet> so only really an issue on boot splash and frontbuffer x11
  <danvet> and there a few cursors/characters drawn won't really
           matter in most cases
  <danvet> anholt, yeah that's the idea, we pick the currently
           showing plane and have a simple draw_xy functions which
	   knows how to draw bright/dark pixels for any drm_fourcc
  <danvet> so you can see the oops even if it's yuv or whatever
  <anholt> nice

Thanks!

[*] DXGKDDI_RESETFROMTIMEOUT callback function:
    https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/d3dkmddi/nc-d3dkmddi-dxgkddi_resetfromtimeout

--
darwi
http://darwish.chasingpointers.com
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2019-03-14  2:51 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-11 17:42 [PATCH v2 0/3] drm: Add panic handling Noralf Trønnes
2019-03-11 17:42 ` [PATCH v2 1/3] drm: Add support for panic message output Noralf Trønnes
2019-03-11 19:23   ` Daniel Vetter
2019-03-11 19:29     ` Daniel Vetter
2019-03-11 22:40       ` Noralf Trønnes
2019-03-12  9:53         ` Daniel Vetter
2019-03-12  9:59           ` Daniel Vetter
2019-03-11 22:33     ` Noralf Trønnes
2019-03-12 10:58       ` Daniel Vetter
2019-03-12 13:29         ` Noralf Trønnes
2019-03-13  3:53         ` Ahmed S. Darwish
2019-03-12 22:13       ` Ahmed S. Darwish
2019-03-13  7:49         ` John Ogness
2019-03-13  8:37           ` Daniel Vetter
2019-03-14  2:51             ` Ahmed S. Darwish [this message]
2019-03-14  9:32               ` Daniel Vetter
2019-03-14  9:43                 ` John Ogness
2019-03-14  9:52                   ` John Ogness
2019-03-15 10:56                     ` Daniel Vetter
2019-03-13  8:35         ` Daniel Vetter
2019-03-14  4:45           ` Ahmed S. Darwish
2019-03-14  9:35             ` Daniel Vetter
2019-03-13 10:24         ` Noralf Trønnes
2019-03-13  4:05       ` Ahmed S. Darwish
2019-03-11 19:55   ` Sam Ravnborg
2019-03-12 10:47   ` Michel Dänzer
2019-03-12 16:17     ` [Intel-gfx] " Ville Syrjälä
2019-03-12 17:15       ` Noralf Trønnes
2019-03-12 17:25         ` Ville Syrjälä
2019-03-12 17:37           ` Noralf Trønnes
2019-03-12 17:44             ` Noralf Trønnes
2019-03-12 18:02             ` [Intel-gfx] " Ville Syrjälä
2019-03-13  8:29               ` Christian König
2019-03-13  8:43               ` [Intel-gfx] " Daniel Vetter
2019-03-13  9:35         ` Michel Dänzer
2019-03-13 13:31           ` [Intel-gfx] " Ville Syrjälä
2019-03-13 13:37             ` Christian König
2019-03-13 15:38               ` Michel Dänzer
2019-03-13 15:54                 ` [Intel-gfx] " Christian König
2019-03-13 16:16                   ` Kazlauskas, Nicholas
2019-03-13 17:30                     ` Koenig, Christian
2019-03-13 17:33                     ` Michel Dänzer
2019-03-13 17:41                       ` Kazlauskas, Nicholas
2019-03-14  9:50                         ` Daniel Vetter
2019-03-14 12:44                           ` Kazlauskas, Nicholas
2019-03-15 10:58                             ` [Intel-gfx] " Daniel Vetter
2019-03-13 17:52                       ` Koenig, Christian
2019-03-14  9:40                   ` [Intel-gfx] " Daniel Vetter
2019-03-11 17:42 ` [PATCH v2 2/3] drm/cma-helper: Add support for panic screen Noralf Trønnes
2019-03-11 17:42 ` [PATCH v2 3/3] drm/vc4: Support " Noralf Trønnes
2019-03-11 18:53 ` [PATCH v2 0/3] drm: Add panic handling Daniel Vetter
2019-03-12  9:09 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2019-03-12  9:12 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-03-12  9:50 ` ✓ Fi.CI.BAT: success " Patchwork
2019-03-12 10:12 ` ✗ Fi.CI.BAT: failure for drm: Add panic handling (rev2) Patchwork
2019-03-17 23:06 ` [PATCH v2 0/3] drm: Add panic handling Ahmed S. Darwish
2019-03-25  8:42   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190314025113.GA17300@darwi-home-pc \
    --to=darwish.07@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=john.ogness@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.