All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Jocelyn Falempe <jfalempe@redhat.com>
Cc: bluescreen_avenger@verizon.net, Daniel Vetter <daniel@ffwll.ch>,
	javierm@redhat.com, mripard@kernel.org, gpiccoli@igalia.com,
	noralf@tronnes.org, dri-devel@lists.freedesktop.org,
	tzimmermann@suse.de, airlied@redhat.com
Subject: Re: [PATCH v7 5/9] drm/fb_dma: Add generic get_scanout_buffer() for drm_panic
Date: Thu, 18 Jan 2024 14:51:48 +0100	[thread overview]
Message-ID: <Zaks9AYf05XIARdd@phenom.ffwll.local> (raw)
In-Reply-To: <e4f14ab3-cb16-43fa-9201-16e132871225@redhat.com>

On Wed, Jan 17, 2024 at 03:28:09PM +0100, Jocelyn Falempe wrote:
> 
> 
> On 12/01/2024 14:41, Daniel Vetter wrote:
> > On Thu, Jan 04, 2024 at 05:00:49PM +0100, Jocelyn Falempe wrote:
> > > This was initialy done for imx6, but should work on most drivers
> > > using drm_fb_dma_helper.
> > > 
> > > Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
> > > ---
> > >   drivers/gpu/drm/drm_fb_dma_helper.c | 55 +++++++++++++++++++++++++++++
> > >   include/drm/drm_fb_dma_helper.h     |  4 +++
> > >   2 files changed, 59 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_fb_dma_helper.c b/drivers/gpu/drm/drm_fb_dma_helper.c
> > > index 3b535ad1b07c..caed2935df4f 100644
> > > --- a/drivers/gpu/drm/drm_fb_dma_helper.c
> > > +++ b/drivers/gpu/drm/drm_fb_dma_helper.c
> > > @@ -15,6 +15,7 @@
> > >   #include <drm/drm_framebuffer.h>
> > >   #include <drm/drm_gem_dma_helper.h>
> > >   #include <drm/drm_gem_framebuffer_helper.h>
> > > +#include <drm/drm_panic.h>
> > >   #include <drm/drm_plane.h>
> > >   #include <linux/dma-mapping.h>
> > >   #include <linux/module.h>
> > > @@ -148,3 +149,57 @@ void drm_fb_dma_sync_non_coherent(struct drm_device *drm,
> > >   	}
> > >   }
> > >   EXPORT_SYMBOL_GPL(drm_fb_dma_sync_non_coherent);
> > > +
> > > +#if defined(CONFIG_DRM_PANIC)
> > > +/**
> > > + * @dev: DRM device
> > > + * @drm_scanout_buffer: scanout buffer for the panic handler
> > > + * Returns: 0 or negative error code
> > > + *
> > > + * Generic get_scanout_buffer() implementation, for drivers that uses the
> > > + * drm_fb_dma_helper.
> > > + */
> > > +int drm_panic_gem_get_scanout_buffer(struct drm_device *dev,
> > > +				     struct drm_scanout_buffer *sb)
> > > +{
> > > +	struct drm_plane *plane;
> > > +	struct drm_gem_dma_object *dma_obj;
> > > +	struct drm_framebuffer *fb;
> > > +
> > > +	drm_for_each_primary_visible_plane(plane, dev) {
> > 
> > Ok that's not enough locking by far. You can't just hope that nothing
> > disappears while you're in a panic handler. We've been there and ended up
> > reliably oopsing in the panic handler itself. So you _have_ to follow the
> > full set of locking rules for all drm structures, or things will just get
> > worse at the worst possible moment.
> > 
> > But also, you're not allowed to do anything else than trylock, because a
> > panic handler might run from nmi context, and so you cannot even acquire
> > irq-safe spinlocks reliably.
> > 
> > Which means:
> > 
> > - You need to be safe against concurrent drm_dev_unregister. Using the
> >    atomic panic notifier directly for each device should take care of that
> >    (but maybe that stuff is still not nmi safe, not sure).
> > 
> > - You _have_ to use all the locks. Luckily iterating over the plane list
> >    doesn't need one, but you have to trylock the plane's modeset lock.
> >    Which means your nice iterator macro is already toast, because that
> >    already looks at state it's not allowed to look at without a lock. Or
> >    well, the plane->state pointer is no-go already.
> 
> mutex_trylock() shouldn't be called from interrupt context, and as the panic
> may occurs in irq, I can't use that.

Yeah I learned that later that day too, disappointing :-/

What's worse, spin_trylock is also no-go, because on r-t kernel they're
essentially sleeping locks. And so spin_trylock can retry in a loop, and
if we interrupt another thread at /just/ the right moment where the
spinlock is in an inconsistent state, the spin_trylock in the panic
handler will loop forever.

> But the panic context should guarantee that only one CPU is still running:
> https://elixir.bootlin.com/linux/latest/source/kernel/panic.c#L310

Uh no, panics can nest so you might panic while a panic is going on and
interrupt it in the middle. So just making sure no other code is running
isn't enough unfortunately.

Also the ipi stuff that should ensure the other cpu is stopped might not
work, afaiui in nmi panic context you cannot send around ipi.

> So I think using mutex_is_locked() should be safe:
> https://elixir.bootlin.com/linux/latest/source/include/linux/mutex.h#L128
> 
> This will only check if the lock is not taken, but as it's not possible for
> another task to run at the same time, I think that should be good enough ?
> 
> The drawback, is if we want to test without crashing the kernel, then we
> need to take the locks with trylock(), (and it's safe this time), but the
> code path would be slightly different.

So I think this predates some of irc chat, and I haven't typed up what I
designed together with John Ogness. Haven't done the patch yet, but rought
sketch:

- we use a raw_spin_lock. That's the _only_ lock type where trylock is ok

- we protect the drm_plane->state update with that lock. That's an
  extremely tiny section of code, so should be fine.

- For just that pointer update we could also use rcu, but using a spinlock
  has the advantage that drivers with peek/poke mmio support (for
  accessing unmapping vram or similar) could wrap these mmio access with
  that spinlock, and so would be able to use peek/poke in their panic
  handler. As soon as we need to protect against mmio we really need a
  lock.

- Because the lock also provides a barrier there's a lot of things you can
  safely assume, and so don't need additional locking for that in the
  panic handler:

  - Anything that's invariant over the lifetime of a drm_device (as
    delineated by drm_dev_register/unregister) is safe to access, like the
    plane list or plane structure

  - Anything in drm_plane_state that's only computed in atomic_check and
    not touched in atomic_commit code anymore is safe to access. This
    means we can get at the bo and everything else without any locks, and
    anything that is invariant for the lifetime of the bo like vaddr for
    dma-helper bo is also fine.

  - Furthermore because ->prepare_fb is called _before_ this point, and we
    stall at the right places before we call ->cleanup_fb. Unfortunately
    the same isn't true for ->begin/end_fb_access (by design), so we
    cannot rely on everything, but we can rely on the fact that the
    framebuffer is correctly pinned for drivers using dynamic buffer
    managers. The kernel vmap might not be around though, I need to think
    more how we best solve that issue. But that's a problem for when we
    add panic support to the first such driver.

I think with this design we'd cover 95% of all cases by simply doing the
raw_spin_trylock before we call into the driver's callback.

Cheers, Sima
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2024-01-18 13:53 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-04 16:00 [RFC][PATCH v7 0/9] drm/panic: Add a drm panic handler Jocelyn Falempe
2024-01-04 16:00 ` [PATCH v7 1/9] drm/format-helper: Add drm_fb_blit_from_r1 and drm_fb_fill Jocelyn Falempe
2024-01-17 15:06   ` Thomas Zimmermann
2024-01-17 16:40     ` Jocelyn Falempe
2024-01-19 10:58       ` Thomas Zimmermann
2024-01-19 12:25         ` Pekka Paalanen
2024-01-19 13:31           ` Thomas Zimmermann
2024-01-23 12:56         ` Thomas Zimmermann
2024-01-23 14:56           ` Jocelyn Falempe
2024-01-30 11:20             ` Thomas Zimmermann
2024-01-17 15:26   ` Jani Nikula
2024-01-17 16:43     ` Jocelyn Falempe
2024-01-04 16:00 ` [PATCH v7 2/9] drm/panic: Add a drm panic handler Jocelyn Falempe
2024-01-12 13:31   ` Daniel Vetter
2024-01-16 10:54     ` Jocelyn Falempe
2024-01-18 13:36       ` Daniel Vetter
2024-01-12 13:50   ` Daniel Vetter
2024-01-19 17:20     ` Jocelyn Falempe
2024-01-17 15:49   ` Thomas Zimmermann
2024-01-18 10:17     ` Jocelyn Falempe
2024-01-19  9:46       ` Thomas Zimmermann
2024-01-04 16:00 ` [PATCH v7 3/9] drm/plane: Add drm_for_each_primary_visible_plane macro Jocelyn Falempe
2024-01-08 10:24   ` Jocelyn Falempe
2024-01-08 10:30     ` Joe Perches
2024-01-04 16:00 ` [PATCH v7 4/9] drm/panic: Add drm_panic_is_format_supported() Jocelyn Falempe
2024-01-05  5:45   ` kernel test robot
2024-01-05  5:45     ` kernel test robot
2024-01-04 16:00 ` [PATCH v7 5/9] drm/fb_dma: Add generic get_scanout_buffer() for drm_panic Jocelyn Falempe
2024-01-06  2:54   ` kernel test robot
2024-01-06  2:54     ` kernel test robot
2024-01-08 10:20   ` Maxime Ripard
2024-01-12 12:29     ` Jocelyn Falempe
2024-01-12 13:41   ` Daniel Vetter
2024-01-12 13:56     ` Maxime Ripard
2024-01-18 13:38       ` Daniel Vetter
2024-01-26 12:39         ` Maxime Ripard
2024-01-17 14:28     ` Jocelyn Falempe
2024-01-18 13:51       ` Daniel Vetter [this message]
2024-01-04 16:00 ` [PATCH v7 6/9] drm/simpledrm: Add drm_panic support Jocelyn Falempe
2024-01-12 13:44   ` Daniel Vetter
2024-01-12 13:58     ` Maxime Ripard
2024-01-17 15:22       ` Thomas Zimmermann
2024-01-18 10:33         ` Maxime Ripard
2024-01-17 15:17   ` Thomas Zimmermann
2024-01-04 16:00 ` [PATCH v7 7/9] drm/mgag200: " Jocelyn Falempe
2024-01-17 15:15   ` Thomas Zimmermann
2024-01-04 16:00 ` [PATCH v7 8/9] drm/ast: " Jocelyn Falempe
2024-01-17 15:16   ` Thomas Zimmermann
2024-01-04 16:00 ` [PATCH v7 9/9] drm/imx: " Jocelyn Falempe
2024-01-12 14:00 ` [RFC][PATCH v7 0/9] drm/panic: Add a drm panic handler Daniel Vetter
2024-01-12 14:36   ` Jocelyn Falempe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zaks9AYf05XIARdd@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@redhat.com \
    --cc=bluescreen_avenger@verizon.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gpiccoli@igalia.com \
    --cc=javierm@redhat.com \
    --cc=jfalempe@redhat.com \
    --cc=mripard@kernel.org \
    --cc=noralf@tronnes.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.