All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Grab dev->struct_mutex in i915_gem_pageflip_info
@ 2014-06-17 20:34 Daniel Vetter
  2014-06-17 20:34 ` [PATCH 2/2] drm/i915: Don't BUG_ON in i915_gem_obj_offset Daniel Vetter
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2014-06-17 20:34 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

We could walk of a bad list otherwise when someone concurrently
unbinds stuff for fun.

I've suspected this as the root-cause behind seemingly inconsistent
state, but alas it's not.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0e0e6b6bffd1..5c029bd53a26 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -513,6 +513,11 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 	struct drm_device *dev = node->minor->dev;
 	unsigned long flags;
 	struct intel_crtc *crtc;
+	int ret;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
 
 	for_each_intel_crtc(dev, crtc) {
 		const char pipe = pipe_name(crtc->pipe);
@@ -554,6 +559,8 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 		spin_unlock_irqrestore(&dev->event_lock, flags);
 	}
 
+	mutex_unlock(&dev->struct_mutex);
+
 	return 0;
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] drm/i915: Don't BUG_ON in i915_gem_obj_offset
  2014-06-17 20:34 [PATCH 1/2] drm/i915: Grab dev->struct_mutex in i915_gem_pageflip_info Daniel Vetter
@ 2014-06-17 20:34 ` Daniel Vetter
  2014-06-17 22:04   ` Daniel Vetter
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2014-06-17 20:34 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

A WARN_ON is perfectly fine.

The BUG in here seems to be the cause behind hard-hangs when I cat the
i915_gem_pageflip debugfs file (which calls this from an irq
spinlock). But only while running a full igt run after a while. I
still need to root cause the underlying issue.

I'll also start reject patches which add new BUG_ON but don't come
with a really good justification for it. The general rule really
should be to just WARN and hope the driver survives for long enough.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 35b5027ba5a4..841e9cd38dc1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5101,7 +5101,7 @@ unsigned long i915_gem_obj_offset(struct drm_i915_gem_object *o,
 	    vm == &dev_priv->mm.aliasing_ppgtt->base)
 		vm = &dev_priv->gtt.base;
 
-	BUG_ON(list_empty(&o->vma_list));
+	WARN_ON(list_empty(&o->vma_list));
 	list_for_each_entry(vma, &o->vma_list, vma_link) {
 		if (vma->vm == vm)
 			return vma->node.start;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: Don't BUG_ON in i915_gem_obj_offset
  2014-06-17 20:34 ` [PATCH 2/2] drm/i915: Don't BUG_ON in i915_gem_obj_offset Daniel Vetter
@ 2014-06-17 22:04   ` Daniel Vetter
  2014-06-25  1:30     ` Ben Widawsky
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2014-06-17 22:04 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

On Tue, Jun 17, 2014 at 10:34:38PM +0200, Daniel Vetter wrote:
> A WARN_ON is perfectly fine.
> 
> The BUG in here seems to be the cause behind hard-hangs when I cat the
> i915_gem_pageflip debugfs file (which calls this from an irq
> spinlock). But only while running a full igt run after a while. I
> still need to root cause the underlying issue.
> 
> I'll also start reject patches which add new BUG_ON but don't come
> with a really good justification for it. The general rule really
> should be to just WARN and hope the driver survives for long enough.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Both patches merged, this one improved per Chris' suggestions on irc.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 35b5027ba5a4..841e9cd38dc1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5101,7 +5101,7 @@ unsigned long i915_gem_obj_offset(struct drm_i915_gem_object *o,
>  	    vm == &dev_priv->mm.aliasing_ppgtt->base)
>  		vm = &dev_priv->gtt.base;
>  
> -	BUG_ON(list_empty(&o->vma_list));
> +	WARN_ON(list_empty(&o->vma_list));
>  	list_for_each_entry(vma, &o->vma_list, vma_link) {
>  		if (vma->vm == vm)
>  			return vma->node.start;
> -- 
> 1.8.1.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: Don't BUG_ON in i915_gem_obj_offset
  2014-06-17 22:04   ` Daniel Vetter
@ 2014-06-25  1:30     ` Ben Widawsky
  2014-06-25  2:00       ` Dave Airlie
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Widawsky @ 2014-06-25  1:30 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development

On Wed, Jun 18, 2014 at 12:04:46AM +0200, Daniel Vetter wrote:
> On Tue, Jun 17, 2014 at 10:34:38PM +0200, Daniel Vetter wrote:
> > A WARN_ON is perfectly fine.
> > 
> > The BUG in here seems to be the cause behind hard-hangs when I cat the
> > i915_gem_pageflip debugfs file (which calls this from an irq
> > spinlock). But only while running a full igt run after a while. I
> > still need to root cause the underlying issue.
> > 
> > I'll also start reject patches which add new BUG_ON but don't come
> > with a really good justification for it. The general rule really
> > should be to just WARN and hope the driver survives for long enough.
> > 
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Both patches merged, this one improved per Chris' suggestions on irc.
> -Daniel
> 

Hey, here's an idea. How about we root cause bugs instead of making
blanket statements about the validity of real assertions? If the callers
of ggtt_offset are calling it on unbound objects, it's a violation of
the design. And in the other cases, it's a real bug.

I'd NAK this patch if it wasn't already merged, and my NAK meant
something.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: Don't BUG_ON in i915_gem_obj_offset
  2014-06-25  1:30     ` Ben Widawsky
@ 2014-06-25  2:00       ` Dave Airlie
  2014-06-25  4:00         ` Ben Widawsky
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Airlie @ 2014-06-25  2:00 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, Intel Graphics Development

On 25 June 2014 11:30, Ben Widawsky <ben@bwidawsk.net> wrote:
> On Wed, Jun 18, 2014 at 12:04:46AM +0200, Daniel Vetter wrote:
>> On Tue, Jun 17, 2014 at 10:34:38PM +0200, Daniel Vetter wrote:
>> > A WARN_ON is perfectly fine.
>> >
>> > The BUG in here seems to be the cause behind hard-hangs when I cat the
>> > i915_gem_pageflip debugfs file (which calls this from an irq
>> > spinlock). But only while running a full igt run after a while. I
>> > still need to root cause the underlying issue.
>> >
>> > I'll also start reject patches which add new BUG_ON but don't come
>> > with a really good justification for it. The general rule really
>> > should be to just WARN and hope the driver survives for long enough.
>> >
>> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>
>> Both patches merged, this one improved per Chris' suggestions on irc.
>> -Daniel
>>
>
> Hey, here's an idea. How about we root cause bugs instead of making
> blanket statements about the validity of real assertions? If the callers
> of ggtt_offset are calling it on unbound objects, it's a violation of
> the design. And in the other cases, it's a real bug.
>
> I'd NAK this patch if it wasn't already merged, and my NAK meant
> something.
>

Its kinda hard to debug an assert if it takes the whole box down, and you
never see the assert printed anywhere. Any why should the whole
kernel die because the GPU driver stuffed up.

Maybe you confused this with userspace.

Dave.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] drm/i915: Don't BUG_ON in i915_gem_obj_offset
  2014-06-25  2:00       ` Dave Airlie
@ 2014-06-25  4:00         ` Ben Widawsky
  0 siblings, 0 replies; 6+ messages in thread
From: Ben Widawsky @ 2014-06-25  4:00 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Daniel Vetter, Intel Graphics Development

On Wed, Jun 25, 2014 at 12:00:35PM +1000, Dave Airlie wrote:
> On 25 June 2014 11:30, Ben Widawsky <ben@bwidawsk.net> wrote:
> > On Wed, Jun 18, 2014 at 12:04:46AM +0200, Daniel Vetter wrote:
> >> On Tue, Jun 17, 2014 at 10:34:38PM +0200, Daniel Vetter wrote:
> >> > A WARN_ON is perfectly fine.
> >> >
> >> > The BUG in here seems to be the cause behind hard-hangs when I cat the
> >> > i915_gem_pageflip debugfs file (which calls this from an irq
> >> > spinlock). But only while running a full igt run after a while. I
> >> > still need to root cause the underlying issue.
> >> >
> >> > I'll also start reject patches which add new BUG_ON but don't come
> >> > with a really good justification for it. The general rule really
> >> > should be to just WARN and hope the driver survives for long enough.
> >> >
> >> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>
> >> Both patches merged, this one improved per Chris' suggestions on irc.
> >> -Daniel
> >>
> >
> > Hey, here's an idea. How about we root cause bugs instead of making
> > blanket statements about the validity of real assertions? If the callers
> > of ggtt_offset are calling it on unbound objects, it's a violation of
> > the design. And in the other cases, it's a real bug.
> >
> > I'd NAK this patch if it wasn't already merged, and my NAK meant
> > something.
> >
> 
> Its kinda hard to debug an assert if it takes the whole box down, and you
> never see the assert printed anywhere. Any why should the whole
> kernel die because the GPU driver stuffed up.
> 
> Maybe you confused this with userspace.
> 
> Dave.

I shouldn't have said this. I was really pissed off that our PPGTT code
(which was relatively stable with some missing corner cases on merge) is
bit-rotting. It's essentially unusable on BDW, and has cost me more than
a week straight. Many of the original checks I had in place are now gone
for what usually appears to be laziness.

I have a very long-going fundamental disagreement with Daniel about
invariants (and apparently you as well). In either event, the snide
sarcasm was inappropriate of me.

So let's please nip this thread in the bud here.

Thanks.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-25  4:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-17 20:34 [PATCH 1/2] drm/i915: Grab dev->struct_mutex in i915_gem_pageflip_info Daniel Vetter
2014-06-17 20:34 ` [PATCH 2/2] drm/i915: Don't BUG_ON in i915_gem_obj_offset Daniel Vetter
2014-06-17 22:04   ` Daniel Vetter
2014-06-25  1:30     ` Ben Widawsky
2014-06-25  2:00       ` Dave Airlie
2014-06-25  4:00         ` Ben Widawsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.