All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
@ 2019-04-04 10:24 Janusz Krzysztofik
  2019-04-04 10:28   ` Chris Wilson
  2019-04-04 16:57 ` ✗ Fi.CI.BAT: failure for " Patchwork
  0 siblings, 2 replies; 12+ messages in thread
From: Janusz Krzysztofik @ 2019-04-04 10:24 UTC (permalink / raw)
  To: Joonas Lahtinen, Jani Nikula, Rodrigo Vivi
  Cc: David Airlie, Daniel Vetter, michal.wajdeczko, intel-gfx,
	dri-devel, linux-kernel, Janusz Krzysztofik

From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>

In case the driver gets unbound while a device is open, kernel panic
may be forced if a list of allocated context IDs is not empty.

When a device is open, the list may happen to be not empty because a
context ID, once allocated by a context ID allocator to a context
assosiated with that open file descriptor, is released as late as
on device close.

On the other hand, there is a need to release all allocated context IDs
and destroy the context ID allocator on driver unbind, even if a device
is open, in order to free memory resources consumed and prevent from
memory leaks.  The purpose of the forced kernel panic was to protect
the context ID allocator from being silently destroyed if not all
allocated IDs had been released.

Before forcing the kernel panic on non-empty list of allocated context
IDs, do that unlikely on non-empty list of contexts that should be
freed by preceding drain of work queue (there must be another bug if
that list happens to be not empty).  If empty, we may assume that
remaining contexts are idle (not pinned) and their IDs can be safely
released.

Once done, release context IDs of each of those remaining contexts
unless it happens a context is unlikely pinned.  Force kernel panic in
that case, there must be still another bug in the driver code.

Now the kernel panic protecting the allocator should not pop up as the
list it checks should be empty.  If it unlikely happens to be not
empty, there must be still another bug.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 280813a4bf82..18d004d94e43 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -611,6 +611,8 @@ void i915_gem_contexts_lost(struct drm_i915_private *dev_priv)
 
 void i915_gem_contexts_fini(struct drm_i915_private *i915)
 {
+	struct i915_gem_context *ctx, *cn;
+
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
 	if (i915->preempt_context)
@@ -618,6 +620,14 @@ void i915_gem_contexts_fini(struct drm_i915_private *i915)
 	destroy_kernel_context(&i915->kernel_context);
 
 	/* Must free all deferred contexts (via flush_workqueue) first */
+	GEM_BUG_ON(!llist_empty(&i915->contexts.free_list));
+
+	/* Release all remaining HW IDs before ID allocator is destroyed */
+	list_for_each_entry_safe(ctx, cn, &i915->contexts.hw_id_list,
+				 hw_id_link) {
+		GEM_BUG_ON(atomic_read(&ctx->hw_id_pin_count));
+		release_hw_id(ctx);
+	}
 	GEM_BUG_ON(!list_empty(&i915->contexts.hw_id_list));
 	ida_destroy(&i915->contexts.hw_ida);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
  2019-04-04 10:24 [PATCH] drm/i915: Fix context IDs not released on driver hot unbind Janusz Krzysztofik
@ 2019-04-04 10:28   ` Chris Wilson
  2019-04-04 16:57 ` ✗ Fi.CI.BAT: failure for " Patchwork
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2019-04-04 10:28 UTC (permalink / raw)
  To: Jani Nikula, Janusz Krzysztofik, Joonas Lahtinen, Rodrigo Vivi
  Cc: Janusz Krzysztofik, David Airlie, intel-gfx, linux-kernel, dri-devel

Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> 
> In case the driver gets unbound while a device is open, kernel panic
> may be forced if a list of allocated context IDs is not empty.
> 
> When a device is open, the list may happen to be not empty because a
> context ID, once allocated by a context ID allocator to a context
> assosiated with that open file descriptor, is released as late as
> on device close.
> 
> On the other hand, there is a need to release all allocated context IDs
> and destroy the context ID allocator on driver unbind, even if a device
> is open, in order to free memory resources consumed and prevent from
> memory leaks.  The purpose of the forced kernel panic was to protect
> the context ID allocator from being silently destroyed if not all
> allocated IDs had been released.

Those open fd are still pointing into kernel memory where the driver
used to be. The panic is entirely correct, we should not be unloading
the module before those dangling pointers have been made safe.

This is papering over the symptom. How is the module being unloaded with
open fd? If all the fd have been closed, how have we failed to flush and
retire all requests (thereby unpinning the contexts and all other
pointers).
-Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
@ 2019-04-04 10:28   ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2019-04-04 10:28 UTC (permalink / raw)
  To: Jani Nikula, Janusz Krzysztofik, Joonas Lahtinen, Rodrigo Vivi
  Cc: David Airlie, Janusz Krzysztofik, intel-gfx, dri-devel, linux-kernel

Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> 
> In case the driver gets unbound while a device is open, kernel panic
> may be forced if a list of allocated context IDs is not empty.
> 
> When a device is open, the list may happen to be not empty because a
> context ID, once allocated by a context ID allocator to a context
> assosiated with that open file descriptor, is released as late as
> on device close.
> 
> On the other hand, there is a need to release all allocated context IDs
> and destroy the context ID allocator on driver unbind, even if a device
> is open, in order to free memory resources consumed and prevent from
> memory leaks.  The purpose of the forced kernel panic was to protect
> the context ID allocator from being silently destroyed if not all
> allocated IDs had been released.

Those open fd are still pointing into kernel memory where the driver
used to be. The panic is entirely correct, we should not be unloading
the module before those dangling pointers have been made safe.

This is papering over the symptom. How is the module being unloaded with
open fd? If all the fd have been closed, how have we failed to flush and
retire all requests (thereby unpinning the contexts and all other
pointers).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
  2019-04-04 10:28   ` Chris Wilson
@ 2019-04-04 10:40     ` Janusz Krzysztofik
  -1 siblings, 0 replies; 12+ messages in thread
From: Janusz Krzysztofik @ 2019-04-04 10:40 UTC (permalink / raw)
  To: Chris Wilson
  Cc: David Airlie, intel-gfx, dri-devel, linux-kernel, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi

On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> > 
> > In case the driver gets unbound while a device is open, kernel
> > panic
> > may be forced if a list of allocated context IDs is not empty.
> > 
> > When a device is open, the list may happen to be not empty because
> > a
> > context ID, once allocated by a context ID allocator to a context
> > assosiated with that open file descriptor, is released as late as
> > on device close.
> > 
> > On the other hand, there is a need to release all allocated context
> > IDs
> > and destroy the context ID allocator on driver unbind, even if a
> > device
> > is open, in order to free memory resources consumed and prevent
> > from
> > memory leaks.  The purpose of the forced kernel panic was to
> > protect
> > the context ID allocator from being silently destroyed if not all
> > allocated IDs had been released.
> 
> Those open fd are still pointing into kernel memory where the driver
> used to be. The panic is entirely correct, we should not be unloading
> the module before those dangling pointers have been made safe.
> 
> This is papering over the symptom. How is the module being unloaded
> with
> open fd? 

A user can play with the driver unbind or device remove sysfs
interface.

Thanks,
Janusz

> If all the fd have been closed, how have we failed to flush and
> retire all requests (thereby unpinning the contexts and all other
> pointers).
> -Chris
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
@ 2019-04-04 10:40     ` Janusz Krzysztofik
  0 siblings, 0 replies; 12+ messages in thread
From: Janusz Krzysztofik @ 2019-04-04 10:40 UTC (permalink / raw)
  To: Chris Wilson
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi

On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> > 
> > In case the driver gets unbound while a device is open, kernel
> > panic
> > may be forced if a list of allocated context IDs is not empty.
> > 
> > When a device is open, the list may happen to be not empty because
> > a
> > context ID, once allocated by a context ID allocator to a context
> > assosiated with that open file descriptor, is released as late as
> > on device close.
> > 
> > On the other hand, there is a need to release all allocated context
> > IDs
> > and destroy the context ID allocator on driver unbind, even if a
> > device
> > is open, in order to free memory resources consumed and prevent
> > from
> > memory leaks.  The purpose of the forced kernel panic was to
> > protect
> > the context ID allocator from being silently destroyed if not all
> > allocated IDs had been released.
> 
> Those open fd are still pointing into kernel memory where the driver
> used to be. The panic is entirely correct, we should not be unloading
> the module before those dangling pointers have been made safe.
> 
> This is papering over the symptom. How is the module being unloaded
> with
> open fd? 

A user can play with the driver unbind or device remove sysfs
interface.

Thanks,
Janusz

> If all the fd have been closed, how have we failed to flush and
> retire all requests (thereby unpinning the contexts and all other
> pointers).
> -Chris
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
  2019-04-04 10:40     ` Janusz Krzysztofik
@ 2019-04-04 10:43       ` Chris Wilson
  -1 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2019-04-04 10:43 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: David Airlie, intel-gfx, dri-devel, linux-kernel, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi

Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
> On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > > From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> > > 
> > > In case the driver gets unbound while a device is open, kernel
> > > panic
> > > may be forced if a list of allocated context IDs is not empty.
> > > 
> > > When a device is open, the list may happen to be not empty because
> > > a
> > > context ID, once allocated by a context ID allocator to a context
> > > assosiated with that open file descriptor, is released as late as
> > > on device close.
> > > 
> > > On the other hand, there is a need to release all allocated context
> > > IDs
> > > and destroy the context ID allocator on driver unbind, even if a
> > > device
> > > is open, in order to free memory resources consumed and prevent
> > > from
> > > memory leaks.  The purpose of the forced kernel panic was to
> > > protect
> > > the context ID allocator from being silently destroyed if not all
> > > allocated IDs had been released.
> > 
> > Those open fd are still pointing into kernel memory where the driver
> > used to be. The panic is entirely correct, we should not be unloading
> > the module before those dangling pointers have been made safe.
> > 
> > This is papering over the symptom. How is the module being unloaded
> > with
> > open fd? 
> 
> A user can play with the driver unbind or device remove sysfs
> interface.

Sure, but we must still follow all the steps before _unloading_ the
module or else the user is left pointing into reused kernel memory.
-Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
@ 2019-04-04 10:43       ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2019-04-04 10:43 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi

Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
> On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > > From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> > > 
> > > In case the driver gets unbound while a device is open, kernel
> > > panic
> > > may be forced if a list of allocated context IDs is not empty.
> > > 
> > > When a device is open, the list may happen to be not empty because
> > > a
> > > context ID, once allocated by a context ID allocator to a context
> > > assosiated with that open file descriptor, is released as late as
> > > on device close.
> > > 
> > > On the other hand, there is a need to release all allocated context
> > > IDs
> > > and destroy the context ID allocator on driver unbind, even if a
> > > device
> > > is open, in order to free memory resources consumed and prevent
> > > from
> > > memory leaks.  The purpose of the forced kernel panic was to
> > > protect
> > > the context ID allocator from being silently destroyed if not all
> > > allocated IDs had been released.
> > 
> > Those open fd are still pointing into kernel memory where the driver
> > used to be. The panic is entirely correct, we should not be unloading
> > the module before those dangling pointers have been made safe.
> > 
> > This is papering over the symptom. How is the module being unloaded
> > with
> > open fd? 
> 
> A user can play with the driver unbind or device remove sysfs
> interface.

Sure, but we must still follow all the steps before _unloading_ the
module or else the user is left pointing into reused kernel memory.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
  2019-04-04 10:43       ` Chris Wilson
  (?)
@ 2019-04-04 10:50       ` Janusz Krzysztofik
  2019-04-04 10:53           ` Chris Wilson
  -1 siblings, 1 reply; 12+ messages in thread
From: Janusz Krzysztofik @ 2019-04-04 10:50 UTC (permalink / raw)
  To: Chris Wilson
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi

On Thu, 2019-04-04 at 11:43 +0100, Chris Wilson wrote:
> Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
> > On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> > > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > > > From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> > > > 
> > > > In case the driver gets unbound while a device is open, kernel
> > > > panic
> > > > may be forced if a list of allocated context IDs is not empty.
> > > > 
> > > > When a device is open, the list may happen to be not empty
> > > > because
> > > > a
> > > > context ID, once allocated by a context ID allocator to a
> > > > context
> > > > assosiated with that open file descriptor, is released as late
> > > > as
> > > > on device close.
> > > > 
> > > > On the other hand, there is a need to release all allocated
> > > > context
> > > > IDs
> > > > and destroy the context ID allocator on driver unbind, even if
> > > > a
> > > > device
> > > > is open, in order to free memory resources consumed and prevent
> > > > from
> > > > memory leaks.  The purpose of the forced kernel panic was to
> > > > protect
> > > > the context ID allocator from being silently destroyed if not
> > > > all
> > > > allocated IDs had been released.
> > > 
> > > Those open fd are still pointing into kernel memory where the
> > > driver
> > > used to be. The panic is entirely correct, we should not be
> > > unloading
> > > the module before those dangling pointers have been made safe.
> > > 
> > > This is papering over the symptom. How is the module being
> > > unloaded
> > > with
> > > open fd? 
> > 
> > A user can play with the driver unbind or device remove sysfs
> > interface.
> 
> Sure, but we must still follow all the steps before _unloading_ the
> module or else the user is left pointing into reused kernel memory.

I'm not talking about unloading the module, that is prevented by open
fds.  The driver still exists after being unbound from a device and may
just respond with -ENODEV.

Janusz

> -Chris
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
  2019-04-04 10:50       ` Janusz Krzysztofik
@ 2019-04-04 10:53           ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2019-04-04 10:53 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi

Quoting Janusz Krzysztofik (2019-04-04 11:50:14)
> On Thu, 2019-04-04 at 11:43 +0100, Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
> > > On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> > > > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > > > > From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> > > > > 
> > > > > In case the driver gets unbound while a device is open, kernel
> > > > > panic
> > > > > may be forced if a list of allocated context IDs is not empty.
> > > > > 
> > > > > When a device is open, the list may happen to be not empty
> > > > > because
> > > > > a
> > > > > context ID, once allocated by a context ID allocator to a
> > > > > context
> > > > > assosiated with that open file descriptor, is released as late
> > > > > as
> > > > > on device close.
> > > > > 
> > > > > On the other hand, there is a need to release all allocated
> > > > > context
> > > > > IDs
> > > > > and destroy the context ID allocator on driver unbind, even if
> > > > > a
> > > > > device
> > > > > is open, in order to free memory resources consumed and prevent
> > > > > from
> > > > > memory leaks.  The purpose of the forced kernel panic was to
> > > > > protect
> > > > > the context ID allocator from being silently destroyed if not
> > > > > all
> > > > > allocated IDs had been released.
> > > > 
> > > > Those open fd are still pointing into kernel memory where the
> > > > driver
> > > > used to be. The panic is entirely correct, we should not be
> > > > unloading
> > > > the module before those dangling pointers have been made safe.
> > > > 
> > > > This is papering over the symptom. How is the module being
> > > > unloaded
> > > > with
> > > > open fd? 
> > > 
> > > A user can play with the driver unbind or device remove sysfs
> > > interface.
> > 
> > Sure, but we must still follow all the steps before _unloading_ the
> > module or else the user is left pointing into reused kernel memory.
> 
> I'm not talking about unloading the module, that is prevented by open
> fds.  The driver still exists after being unbound from a device and may
> just respond with -ENODEV.

i915_gem_contexts_fini() *is* module unload.
-Chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
@ 2019-04-04 10:53           ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2019-04-04 10:53 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi

Quoting Janusz Krzysztofik (2019-04-04 11:50:14)
> On Thu, 2019-04-04 at 11:43 +0100, Chris Wilson wrote:
> > Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
> > > On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
> > > > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
> > > > > From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
> > > > > 
> > > > > In case the driver gets unbound while a device is open, kernel
> > > > > panic
> > > > > may be forced if a list of allocated context IDs is not empty.
> > > > > 
> > > > > When a device is open, the list may happen to be not empty
> > > > > because
> > > > > a
> > > > > context ID, once allocated by a context ID allocator to a
> > > > > context
> > > > > assosiated with that open file descriptor, is released as late
> > > > > as
> > > > > on device close.
> > > > > 
> > > > > On the other hand, there is a need to release all allocated
> > > > > context
> > > > > IDs
> > > > > and destroy the context ID allocator on driver unbind, even if
> > > > > a
> > > > > device
> > > > > is open, in order to free memory resources consumed and prevent
> > > > > from
> > > > > memory leaks.  The purpose of the forced kernel panic was to
> > > > > protect
> > > > > the context ID allocator from being silently destroyed if not
> > > > > all
> > > > > allocated IDs had been released.
> > > > 
> > > > Those open fd are still pointing into kernel memory where the
> > > > driver
> > > > used to be. The panic is entirely correct, we should not be
> > > > unloading
> > > > the module before those dangling pointers have been made safe.
> > > > 
> > > > This is papering over the symptom. How is the module being
> > > > unloaded
> > > > with
> > > > open fd? 
> > > 
> > > A user can play with the driver unbind or device remove sysfs
> > > interface.
> > 
> > Sure, but we must still follow all the steps before _unloading_ the
> > module or else the user is left pointing into reused kernel memory.
> 
> I'm not talking about unloading the module, that is prevented by open
> fds.  The driver still exists after being unbound from a device and may
> just respond with -ENODEV.

i915_gem_contexts_fini() *is* module unload.
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Fix context IDs not released on driver hot unbind
  2019-04-04 10:53           ` Chris Wilson
  (?)
@ 2019-04-04 13:47           ` Jani Nikula
  -1 siblings, 0 replies; 12+ messages in thread
From: Jani Nikula @ 2019-04-04 13:47 UTC (permalink / raw)
  To: Chris Wilson, Janusz Krzysztofik
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi

On Thu, 04 Apr 2019, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Janusz Krzysztofik (2019-04-04 11:50:14)
>> On Thu, 2019-04-04 at 11:43 +0100, Chris Wilson wrote:
>> > Quoting Janusz Krzysztofik (2019-04-04 11:40:24)
>> > > On Thu, 2019-04-04 at 11:28 +0100, Chris Wilson wrote:
>> > > > Quoting Janusz Krzysztofik (2019-04-04 11:24:45)
>> > > > > From: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
>> > > > > 
>> > > > > In case the driver gets unbound while a device is open, kernel
>> > > > > panic
>> > > > > may be forced if a list of allocated context IDs is not empty.
>> > > > > 
>> > > > > When a device is open, the list may happen to be not empty
>> > > > > because
>> > > > > a
>> > > > > context ID, once allocated by a context ID allocator to a
>> > > > > context
>> > > > > assosiated with that open file descriptor, is released as late
>> > > > > as
>> > > > > on device close.
>> > > > > 
>> > > > > On the other hand, there is a need to release all allocated
>> > > > > context
>> > > > > IDs
>> > > > > and destroy the context ID allocator on driver unbind, even if
>> > > > > a
>> > > > > device
>> > > > > is open, in order to free memory resources consumed and prevent
>> > > > > from
>> > > > > memory leaks.  The purpose of the forced kernel panic was to
>> > > > > protect
>> > > > > the context ID allocator from being silently destroyed if not
>> > > > > all
>> > > > > allocated IDs had been released.
>> > > > 
>> > > > Those open fd are still pointing into kernel memory where the
>> > > > driver
>> > > > used to be. The panic is entirely correct, we should not be
>> > > > unloading
>> > > > the module before those dangling pointers have been made safe.
>> > > > 
>> > > > This is papering over the symptom. How is the module being
>> > > > unloaded
>> > > > with
>> > > > open fd? 
>> > > 
>> > > A user can play with the driver unbind or device remove sysfs
>> > > interface.
>> > 
>> > Sure, but we must still follow all the steps before _unloading_ the
>> > module or else the user is left pointing into reused kernel memory.
>> 
>> I'm not talking about unloading the module, that is prevented by open
>> fds.  The driver still exists after being unbound from a device and may
>> just respond with -ENODEV.
>
> i915_gem_contexts_fini() *is* module unload.

Janusz, please describe what you're doing exactly.

BR,
Jani.



-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Fix context IDs not released on driver hot unbind
  2019-04-04 10:24 [PATCH] drm/i915: Fix context IDs not released on driver hot unbind Janusz Krzysztofik
  2019-04-04 10:28   ` Chris Wilson
@ 2019-04-04 16:57 ` Patchwork
  1 sibling, 0 replies; 12+ messages in thread
From: Patchwork @ 2019-04-04 16:57 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Fix context IDs not released on driver hot unbind
URL   : https://patchwork.freedesktop.org/series/58996/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5871 -> Patchwork_12683
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12683 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12683, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/58996/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12683:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_close_race@basic-threads:
    - fi-icl-y:           NOTRUN -> INCOMPLETE

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_ctx_exec@basic:
    - {fi-icl-u3}:        PASS -> INCOMPLETE

  
Known issues
------------

  Here are the changes found in Patchwork_12683 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@cs-compute:
    - fi-kbl-8809g:       NOTRUN -> FAIL [fdo#108094]

  * igt@amdgpu/amd_basic@query-info:
    - fi-bsw-kefka:       NOTRUN -> SKIP [fdo#109271] +55

  * igt@amdgpu/amd_cs_nop@fork-compute0:
    - fi-blb-e6850:       NOTRUN -> SKIP [fdo#109271] +18

  * igt@gem_ctx_create@basic-files:
    - fi-icl-u2:          PASS -> INCOMPLETE [fdo#109100]

  * igt@gem_exec_basic@readonly-bsd1:
    - fi-snb-2520m:       NOTRUN -> SKIP [fdo#109271] +57

  * igt@gem_exec_basic@readonly-bsd2:
    - fi-pnv-d510:        NOTRUN -> SKIP [fdo#109271] +76

  * igt@gem_exec_store@basic-bsd2:
    - fi-hsw-4770:        NOTRUN -> SKIP [fdo#109271] +41

  * igt@i915_selftest@live_evict:
    - fi-bsw-kefka:       NOTRUN -> DMESG-WARN [fdo#107709]

  * igt@i915_selftest@live_uncore:
    - fi-ivb-3770:        PASS -> DMESG-FAIL [fdo#110210]

  * igt@kms_addfb_basic@addfb25-y-tiled-small:
    - fi-byt-n2820:       NOTRUN -> SKIP [fdo#109271] +56

  * igt@kms_busy@basic-flip-a:
    - fi-bsw-n3050:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +1

  * igt@kms_busy@basic-flip-c:
    - fi-byt-j1900:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-bsw-kefka:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-pnv-d510:        NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-snb-2520m:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-byt-n2820:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-bsw-n3050:       NOTRUN -> SKIP [fdo#109271] +62
    - fi-byt-j1900:       NOTRUN -> SKIP [fdo#109271] +52

  * igt@runner@aborted:
    - fi-bsw-kefka:       NOTRUN -> FAIL [fdo#107709]

  
#### Possible fixes ####

  * igt@amdgpu/amd_basic@userptr:
    - fi-kbl-8809g:       DMESG-WARN [fdo#108965] -> PASS

  * igt@i915_module_load@reload:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@i915_selftest@live_hangcheck:
    - fi-skl-iommu:       INCOMPLETE [fdo#108602] / [fdo#108744] -> PASS

  * igt@kms_frontbuffer_tracking@basic:
    - fi-byt-clapper:     FAIL [fdo#103167] -> PASS

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-byt-clapper:     FAIL [fdo#103191] / [fdo#107362] -> PASS +1

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107709]: https://bugs.freedesktop.org/show_bug.cgi?id=107709
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108094]: https://bugs.freedesktop.org/show_bug.cgi?id=108094
  [fdo#108602]: https://bugs.freedesktop.org/show_bug.cgi?id=108602
  [fdo#108744]: https://bugs.freedesktop.org/show_bug.cgi?id=108744
  [fdo#108965]: https://bugs.freedesktop.org/show_bug.cgi?id=108965
  [fdo#109100]: https://bugs.freedesktop.org/show_bug.cgi?id=109100
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#110210]: https://bugs.freedesktop.org/show_bug.cgi?id=110210


Participating hosts (43 -> 43)
------------------------------

  Additional (8): fi-bsw-n3050 fi-byt-j1900 fi-snb-2520m fi-hsw-4770 fi-pnv-d510 fi-icl-y fi-byt-n2820 fi-bsw-kefka 
  Missing    (8): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-bwr-2160 fi-ctg-p8600 fi-gdg-551 


Build changes
-------------

    * Linux: CI_DRM_5871 -> Patchwork_12683

  CI_DRM_5871: b5981dd230e67825858b740b4a11360c11a4926c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4928: 014a6fa238322b497116b359cb92df1ce7fa8847 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12683: ab58355383ab54c8ee7e59fc310648e4cd7b9610 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ab58355383ab drm/i915: Fix context IDs not released on driver hot unbind

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12683/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-04 16:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-04 10:24 [PATCH] drm/i915: Fix context IDs not released on driver hot unbind Janusz Krzysztofik
2019-04-04 10:28 ` [Intel-gfx] " Chris Wilson
2019-04-04 10:28   ` Chris Wilson
2019-04-04 10:40   ` [Intel-gfx] " Janusz Krzysztofik
2019-04-04 10:40     ` Janusz Krzysztofik
2019-04-04 10:43     ` Chris Wilson
2019-04-04 10:43       ` Chris Wilson
2019-04-04 10:50       ` Janusz Krzysztofik
2019-04-04 10:53         ` Chris Wilson
2019-04-04 10:53           ` Chris Wilson
2019-04-04 13:47           ` Jani Nikula
2019-04-04 16:57 ` ✗ Fi.CI.BAT: failure for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.