linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
@ 2013-09-12 15:06 Peter Zijlstra
  2013-09-12 15:11 ` Maarten Lankhorst
  2013-09-12 15:36 ` Daniel Vetter
  0 siblings, 2 replies; 58+ messages in thread
From: Peter Zijlstra @ 2013-09-12 15:06 UTC (permalink / raw)
  To: airlied
  Cc: Daniel Vetter, Maarten Lankhorst, Thomas Hellstrom, intel-gfx,
	dri-devel, linux-kernel, Ingo Molnar, Thomas Gleixner

Hi Dave,

So I'm poking around the preemption code and stumbled upon:

drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();

All these sites basically do:

  while (!trylock())
  	yield();

which is a horrible and broken locking pattern. 

Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
task that preempted the lock holder at FIFOn.

Secondly the implementation is worse than usual by abusing
VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
doesn't retry, but you're using it as a get out of fault path. And
you're using set_need_resched() which is not something a driver should
_ever_ touch.

Now I'm going to take away set_need_resched() -- and while you can
'reimplement' it using set_thread_flag() you're not going to do that
because it will be broken due to changes to the preempt code.

So please as to fix ASAP and don't allow anybody to trick you into
merging silly things like that again ;-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:06 [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Peter Zijlstra
@ 2013-09-12 15:11 ` Maarten Lankhorst
  2013-09-12 15:14   ` Peter Zijlstra
  2013-09-12 15:36 ` Daniel Vetter
  1 sibling, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-12 15:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: airlied, Daniel Vetter, Thomas Hellstrom, intel-gfx, dri-devel,
	linux-kernel, Ingo Molnar, Thomas Gleixner

Op 12-09-13 17:06, Peter Zijlstra schreef:
> Hi Dave,
>
> So I'm poking around the preemption code and stumbled upon:
>
> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>
> All these sites basically do:
>
>   while (!trylock())
>   	yield();
>
> which is a horrible and broken locking pattern. 
>
> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
> task that preempted the lock holder at FIFOn.
>
> Secondly the implementation is worse than usual by abusing
> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
> doesn't retry, but you're using it as a get out of fault path. And
> you're using set_need_resched() which is not something a driver should
> _ever_ touch.
>
> Now I'm going to take away set_need_resched() -- and while you can
> 'reimplement' it using set_thread_flag() you're not going to do that
> because it will be broken due to changes to the preempt code.
>
> So please as to fix ASAP and don't allow anybody to trick you into
> merging silly things like that again ;-)
>
Agreed, but this is a case of locking inversion. How do you propose to get around that?

~Maarten

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:11 ` Maarten Lankhorst
@ 2013-09-12 15:14   ` Peter Zijlstra
  0 siblings, 0 replies; 58+ messages in thread
From: Peter Zijlstra @ 2013-09-12 15:14 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: airlied, Daniel Vetter, Thomas Hellstrom, intel-gfx, dri-devel,
	linux-kernel, Ingo Molnar, Thomas Gleixner

On Thu, Sep 12, 2013 at 05:11:25PM +0200, Maarten Lankhorst wrote:
> Op 12-09-13 17:06, Peter Zijlstra schreef:
> > Hi Dave,
> >
> > So I'm poking around the preemption code and stumbled upon:
> >
> > drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
> > drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
> > drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
> > drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
> >
> > All these sites basically do:
> >
> >   while (!trylock())
> >   	yield();
> >
> > which is a horrible and broken locking pattern. 
> >
> > Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
> > task that preempted the lock holder at FIFOn.
> >
> > Secondly the implementation is worse than usual by abusing
> > VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
> > doesn't retry, but you're using it as a get out of fault path. And
> > you're using set_need_resched() which is not something a driver should
> > _ever_ touch.
> >
> > Now I'm going to take away set_need_resched() -- and while you can
> > 'reimplement' it using set_thread_flag() you're not going to do that
> > because it will be broken due to changes to the preempt code.
> >
> > So please as to fix ASAP and don't allow anybody to trick you into
> > merging silly things like that again ;-)
> >
> Agreed, but this is a case of locking inversion. How do you propose to get around that?

me? No idea, I've never looked at the actual locking in DRM. Someone
who's familiar with that code would have to tackle that.

I just spotted the fail because I was going to remove set_need_resched()
and had a WTF moment when I tripped over this stuff.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:06 [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Peter Zijlstra
  2013-09-12 15:11 ` Maarten Lankhorst
@ 2013-09-12 15:36 ` Daniel Vetter
  2013-09-12 15:43   ` Peter Zijlstra
  2013-09-12 15:45   ` Maarten Lankhorst
  1 sibling, 2 replies; 58+ messages in thread
From: Daniel Vetter @ 2013-09-12 15:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Airlie, Maarten Lankhorst, Thomas Hellstrom, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> So I'm poking around the preemption code and stumbled upon:
>
> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>
> All these sites basically do:
>
>   while (!trylock())
>         yield();
>
> which is a horrible and broken locking pattern.
>
> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
> task that preempted the lock holder at FIFOn.
>
> Secondly the implementation is worse than usual by abusing
> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
> doesn't retry, but you're using it as a get out of fault path. And
> you're using set_need_resched() which is not something a driver should
> _ever_ touch.
>
> Now I'm going to take away set_need_resched() -- and while you can
> 'reimplement' it using set_thread_flag() you're not going to do that
> because it will be broken due to changes to the preempt code.
>
> So please as to fix ASAP and don't allow anybody to trick you into
> merging silly things like that again ;-)

The set_need_resched in i915_gem.c:i915_gem_fault can actually be
removed. It was there to give the error handler a chance to sneak in
and reset the hw/sw tracking when the gpu is dead. That hack goes back
to the days when the locking around our error handler was somewhere
between nonexistent and totally broken, nowadays we keep things from
live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
whip up a patch to rip this out. I'll also check that our testsuite
properly exercises this path (needs a bit of work on a quick look for
better coverage).

The one in ttm is just bonghits to shut up lockdep: ttm can recurse
into it's own pagefault handler and then deadlock, the trylock just
keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
fun userspace did and now have testcases for them. The right solution
to fix this is to use copy_to|from_user_atomic in ttm everywhere it
holds locks and have slowpaths which drops locks, copies stuff into a
temp allocation and then continues. At least that's how we've fixed
all those inversions in i915-gem. I'm not volunteering to fix this ;-)

The one in udl just looks like copypasta from i915, without any
justification (at least I don't see any) for why it's there. Probably
can die, too, since there isn't any gpu to reset on usb display-link
devices ...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:36 ` Daniel Vetter
@ 2013-09-12 15:43   ` Peter Zijlstra
  2013-09-12 15:58     ` Daniel Vetter
  2013-09-12 15:45   ` Maarten Lankhorst
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2013-09-12 15:43 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Dave Airlie, Maarten Lankhorst, Thomas Hellstrom, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Thu, Sep 12, 2013 at 05:36:43PM +0200, Daniel Vetter wrote:
> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
> removed. It was there to give the error handler a chance to sneak in
> and reset the hw/sw tracking when the gpu is dead. That hack goes back
> to the days when the locking around our error handler was somewhere
> between nonexistent and totally broken, nowadays we keep things from
> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
> whip up a patch to rip this out. I'll also check that our testsuite
> properly exercises this path (needs a bit of work on a quick look for
> better coverage).
> 
> The one in udl just looks like copypasta from i915, without any
> justification (at least I don't see any) for why it's there. Probably
> can die, too, since there isn't any gpu to reset on usb display-link
> devices ...

OK, awesome that. 2 down.

> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
> into it's own pagefault handler and then deadlock, the trylock just
> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
> fun userspace did and now have testcases for them. The right solution
> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
> holds locks and have slowpaths which drops locks, copies stuff into a
> temp allocation and then continues. At least that's how we've fixed
> all those inversions in i915-gem. I'm not volunteering to fix this ;-)

Yikes.. so how common is it? If I simply rip the set_need_resched() out
it will 'spin' on the fault a little longer until a 'natural' preemption
point -- if such a thing is every going to happen.

It would make that path take longer, but not be more or less broken.

So if its a rare path, I'll just rip the set_need_resched() out and you
DRM people can then fix up at your own pace.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:36 ` Daniel Vetter
  2013-09-12 15:43   ` Peter Zijlstra
@ 2013-09-12 15:45   ` Maarten Lankhorst
  2013-09-12 16:44     ` Thomas Hellstrom
  1 sibling, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-12 15:45 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Peter Zijlstra, Dave Airlie, Thomas Hellstrom, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

Op 12-09-13 17:36, Daniel Vetter schreef:
> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> So I'm poking around the preemption code and stumbled upon:
>>
>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>
>> All these sites basically do:
>>
>>   while (!trylock())
>>         yield();
>>
>> which is a horrible and broken locking pattern.
>>
>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>> task that preempted the lock holder at FIFOn.
>>
>> Secondly the implementation is worse than usual by abusing
>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>> doesn't retry, but you're using it as a get out of fault path. And
>> you're using set_need_resched() which is not something a driver should
>> _ever_ touch.
>>
>> Now I'm going to take away set_need_resched() -- and while you can
>> 'reimplement' it using set_thread_flag() you're not going to do that
>> because it will be broken due to changes to the preempt code.
>>
>> So please as to fix ASAP and don't allow anybody to trick you into
>> merging silly things like that again ;-)
> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
> removed. It was there to give the error handler a chance to sneak in
> and reset the hw/sw tracking when the gpu is dead. That hack goes back
> to the days when the locking around our error handler was somewhere
> between nonexistent and totally broken, nowadays we keep things from
> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
> whip up a patch to rip this out. I'll also check that our testsuite
> properly exercises this path (needs a bit of work on a quick look for
> better coverage).
>
> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
> into it's own pagefault handler and then deadlock, the trylock just
> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
> fun userspace did and now have testcases for them. The right solution
> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
> holds locks and have slowpaths which drops locks, copies stuff into a
> temp allocation and then continues. At least that's how we've fixed
> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
Ah the case where a mmap'd address is passed to the execbuf ioctl? :P

Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
> The one in udl just looks like copypasta from i915, without any
> justification (at least I don't see any) for why it's there. Probably
> can die, too, since there isn't any gpu to reset on usb display-link
> devices ...
>
> Cheers, Daniel


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:43   ` Peter Zijlstra
@ 2013-09-12 15:58     ` Daniel Vetter
  2013-09-12 16:22       ` Peter Zijlstra
  2013-09-12 16:33       ` Thomas Hellstrom
  0 siblings, 2 replies; 58+ messages in thread
From: Daniel Vetter @ 2013-09-12 15:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Airlie, Maarten Lankhorst, Thomas Hellstrom, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Thu, Sep 12, 2013 at 5:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>> into it's own pagefault handler and then deadlock, the trylock just
>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>> fun userspace did and now have testcases for them. The right solution
>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>> holds locks and have slowpaths which drops locks, copies stuff into a
>> temp allocation and then continues. At least that's how we've fixed
>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>
> Yikes.. so how common is it? If I simply rip the set_need_resched() out
> it will 'spin' on the fault a little longer until a 'natural' preemption
> point -- if such a thing is every going to happen.

It's a case of "our userspace doesn't do this", so as long as you're
not evil and frob the drm device nodes of ttm drivers directly the
deadlock will never happen. No idea how much contention actually
happens on e.g. shared buffer objects - in i915 we have just one lock
and so suffer quite a bit more from contention. So no idea how much
removing the yield would hurt.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:58     ` Daniel Vetter
@ 2013-09-12 16:22       ` Peter Zijlstra
  2013-09-12 16:35         ` Chris Wilson
  2013-09-12 19:52         ` Daniel Vetter
  2013-09-12 16:33       ` Thomas Hellstrom
  1 sibling, 2 replies; 58+ messages in thread
From: Peter Zijlstra @ 2013-09-12 16:22 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Dave Airlie, Maarten Lankhorst, Thomas Hellstrom, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Thu, Sep 12, 2013 at 05:58:49PM +0200, Daniel Vetter wrote:
> On Thu, Sep 12, 2013 at 5:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
> >> into it's own pagefault handler and then deadlock, the trylock just
> >> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
> >> fun userspace did and now have testcases for them. The right solution
> >> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
> >> holds locks and have slowpaths which drops locks, copies stuff into a
> >> temp allocation and then continues. At least that's how we've fixed
> >> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
> >
> > Yikes.. so how common is it? If I simply rip the set_need_resched() out
> > it will 'spin' on the fault a little longer until a 'natural' preemption
> > point -- if such a thing is every going to happen.
> 
> It's a case of "our userspace doesn't do this", so as long as you're
> not evil and frob the drm device nodes of ttm drivers directly the
> deadlock will never happen. No idea how much contention actually
> happens on e.g. shared buffer objects - in i915 we have just one lock
> and so suffer quite a bit more from contention. So no idea how much
> removing the yield would hurt.

If 'sane' userspace is never supposed to do this, then only insane
userspace is going to hurt from this and that's a GOOD (tm) thing,
right? ;-)

And it won't actually deadlock if you don't use FIFO, for the regular
scheduler class it'll just spin a little longer before getting preempted
so no real worries there.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:58     ` Daniel Vetter
  2013-09-12 16:22       ` Peter Zijlstra
@ 2013-09-12 16:33       ` Thomas Hellstrom
  1 sibling, 0 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-12 16:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Peter Zijlstra, Dave Airlie, Maarten Lankhorst, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On 09/12/2013 05:58 PM, Daniel Vetter wrote:
> On Thu, Sep 12, 2013 at 5:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>> into it's own pagefault handler and then deadlock, the trylock just
>>> keeps lockdep quiet.

Could you describe how it could recurse into it's own pagefault handler?
IIRC the VM flags of the TTM VMAs makes get_user_pages() refrain from 
touching these VMAs,
hence I don't think this code can deadlock, but admittedly it's far from 
the optimal solution.

Never mind, more on the set_need_resched() below.


>>>   We've had that bug arise in drm/i915 due to some
>>> fun userspace did and now have testcases for them. The right solution
>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>> temp allocation and then continues. At least that's how we've fixed
>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>> Yikes.. so how common is it? If I simply rip the set_need_resched() out
>> it will 'spin' on the fault a little longer until a 'natural' preemption
>> point -- if such a thing is every going to happen.

A typical case is if a process is throwing out a buffer from the GPU or 
system memory while another
process pagefaults while writing to it. It's not a common situation, and 
it's by no means a fastpath situation.
For correctness purposes, I think set_need_resched() can be safely removed.

> It's a case of "our userspace doesn't do this", so as long as you're
> not evil and frob the drm device nodes of ttm drivers directly the
> deadlock will never happen. No idea how much contention actually
> happens on e.g. shared buffer objects - in i915 we have just one lock
> and so suffer quite a bit more from contention. So no idea how much
> removing the yield would hurt.
> -Daniel

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 16:22       ` Peter Zijlstra
@ 2013-09-12 16:35         ` Chris Wilson
  2013-09-12 20:30           ` Peter Zijlstra
  2013-09-12 19:52         ` Daniel Vetter
  1 sibling, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2013-09-12 16:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Vetter, Dave Airlie, Maarten Lankhorst, Thomas Hellstrom,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Thu, Sep 12, 2013 at 06:22:10PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 12, 2013 at 05:58:49PM +0200, Daniel Vetter wrote:
> > On Thu, Sep 12, 2013 at 5:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
> > >> into it's own pagefault handler and then deadlock, the trylock just
> > >> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
> > >> fun userspace did and now have testcases for them. The right solution
> > >> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
> > >> holds locks and have slowpaths which drops locks, copies stuff into a
> > >> temp allocation and then continues. At least that's how we've fixed
> > >> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
> > >
> > > Yikes.. so how common is it? If I simply rip the set_need_resched() out
> > > it will 'spin' on the fault a little longer until a 'natural' preemption
> > > point -- if such a thing is every going to happen.
> > 
> > It's a case of "our userspace doesn't do this", so as long as you're
> > not evil and frob the drm device nodes of ttm drivers directly the
> > deadlock will never happen. No idea how much contention actually
> > happens on e.g. shared buffer objects - in i915 we have just one lock
> > and so suffer quite a bit more from contention. So no idea how much
> > removing the yield would hurt.
> 
> If 'sane' userspace is never supposed to do this, then only insane
> userspace is going to hurt from this and that's a GOOD (tm) thing,
> right? ;-)

Not quite, as it would be possible for the evil userspace to trigger a
GPU hang that would cause the sane userspace to spin indefinitely 
waiting for the error recovery to kick in.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 15:45   ` Maarten Lankhorst
@ 2013-09-12 16:44     ` Thomas Hellstrom
  2013-09-12 19:48       ` Daniel Vetter
  2013-09-12 21:50       ` Maarten Lankhorst
  0 siblings, 2 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-12 16:44 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
> Op 12-09-13 17:36, Daniel Vetter schreef:
>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> So I'm poking around the preemption code and stumbled upon:
>>>
>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>
>>> All these sites basically do:
>>>
>>>    while (!trylock())
>>>          yield();
>>>
>>> which is a horrible and broken locking pattern.
>>>
>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>> task that preempted the lock holder at FIFOn.
>>>
>>> Secondly the implementation is worse than usual by abusing
>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>> doesn't retry, but you're using it as a get out of fault path. And
>>> you're using set_need_resched() which is not something a driver should
>>> _ever_ touch.
>>>
>>> Now I'm going to take away set_need_resched() -- and while you can
>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>> because it will be broken due to changes to the preempt code.
>>>
>>> So please as to fix ASAP and don't allow anybody to trick you into
>>> merging silly things like that again ;-)
>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>> removed. It was there to give the error handler a chance to sneak in
>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>> to the days when the locking around our error handler was somewhere
>> between nonexistent and totally broken, nowadays we keep things from
>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>> whip up a patch to rip this out. I'll also check that our testsuite
>> properly exercises this path (needs a bit of work on a quick look for
>> better coverage).
>>
>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>> into it's own pagefault handler and then deadlock, the trylock just
>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>> fun userspace did and now have testcases for them. The right solution
>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>> holds locks and have slowpaths which drops locks, copies stuff into a
>> temp allocation and then continues. At least that's how we've fixed
>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>
> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..

I think a possible fix would be if fault() were allowed to return an 
error and drop the mmap_sem() before returning.

Otherwise we need to track down all copy_to_user / copy_from_user which 
happen with bo::reserve held.

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 16:44     ` Thomas Hellstrom
@ 2013-09-12 19:48       ` Daniel Vetter
  2013-09-12 21:50       ` Maarten Lankhorst
  1 sibling, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2013-09-12 19:48 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Maarten Lankhorst, Peter Zijlstra, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Thu, Sep 12, 2013 at 6:44 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>
> I think a possible fix would be if fault() were allowed to return an error
> and drop the mmap_sem() before returning.
>
> Otherwise we need to track down all copy_to_user / copy_from_user which
> happen with bo::reserve held.

For maximal evilness submit the relocation list (or whatever data
execbuf slurps in with copy_from_user while holding bo::reserve) of a
bo in the execbuf list. At least that's the testcase we have for
drm/i915. Then make sure that the execbuf wants the bo somewhere it
can't be mmaped from userspace, so needs to be moved both in the fault
handler and then back for the execbuf to continue ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 16:22       ` Peter Zijlstra
  2013-09-12 16:35         ` Chris Wilson
@ 2013-09-12 19:52         ` Daniel Vetter
  2013-09-12 19:58           ` Thomas Gleixner
  1 sibling, 1 reply; 58+ messages in thread
From: Daniel Vetter @ 2013-09-12 19:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Airlie, Maarten Lankhorst, Thomas Hellstrom, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Thu, Sep 12, 2013 at 6:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> If 'sane' userspace is never supposed to do this, then only insane
> userspace is going to hurt from this and that's a GOOD (tm) thing,
> right? ;-)

Afaik sane userspace doesn't hit the _deadlock_ (or lifelock if we
have the set_need_resched in there). drm/i915 is a bit different since
we have just one lock, and so the same design would actually deadlock
even for sane userspace. But hitting contention there and yielding is
somewhat expected. Obviously shouldn't happen too often since it'll
hurt performance, with either blocking or the yield spinning loop.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 19:52         ` Daniel Vetter
@ 2013-09-12 19:58           ` Thomas Gleixner
  2013-09-12 20:04             ` Daniel Vetter
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2013-09-12 19:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Peter Zijlstra, Dave Airlie, Maarten Lankhorst, Thomas Hellstrom,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Ingo Molnar

On Thu, 12 Sep 2013, Daniel Vetter wrote:

> On Thu, Sep 12, 2013 at 6:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > If 'sane' userspace is never supposed to do this, then only insane
> > userspace is going to hurt from this and that's a GOOD (tm) thing,
> > right? ;-)
> 
> Afaik sane userspace doesn't hit the _deadlock_ (or lifelock if we
> have the set_need_resched in there). drm/i915 is a bit different since
> we have just one lock, and so the same design would actually deadlock
> even for sane userspace. But hitting contention there and yielding is
> somewhat expected. Obviously shouldn't happen too often since it'll
> hurt performance, with either blocking or the yield spinning loop.

So this is actually a non priviledged DoS interface, right?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 19:58           ` Thomas Gleixner
@ 2013-09-12 20:04             ` Daniel Vetter
  2013-09-12 20:20               ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: Daniel Vetter @ 2013-09-12 20:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Dave Airlie, Maarten Lankhorst, Thomas Hellstrom,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Ingo Molnar

On Thu, Sep 12, 2013 at 9:58 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> On Thu, Sep 12, 2013 at 6:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > If 'sane' userspace is never supposed to do this, then only insane
>> > userspace is going to hurt from this and that's a GOOD (tm) thing,
>> > right? ;-)
>>
>> Afaik sane userspace doesn't hit the _deadlock_ (or lifelock if we
>> have the set_need_resched in there). drm/i915 is a bit different since
>> we have just one lock, and so the same design would actually deadlock
>> even for sane userspace. But hitting contention there and yielding is
>> somewhat expected. Obviously shouldn't happen too often since it'll
>> hurt performance, with either blocking or the yield spinning loop.
>
> So this is actually a non priviledged DoS interface, right?

I think for ttm drivers it's just execbuf being exploitable. But on
drm/i915 we've
had the same issue with the pwrite/pread ioctls, so a simple
glBufferData(glMap) kind of recursion from gl clients blew the kernel
to pieces ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 20:04             ` Daniel Vetter
@ 2013-09-12 20:20               ` Thomas Gleixner
  2013-09-12 20:23                 ` Daniel Vetter
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2013-09-12 20:20 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Peter Zijlstra, Dave Airlie, Maarten Lankhorst, Thomas Hellstrom,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Ingo Molnar

On Thu, 12 Sep 2013, Daniel Vetter wrote:

> On Thu, Sep 12, 2013 at 9:58 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> >> On Thu, Sep 12, 2013 at 6:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > If 'sane' userspace is never supposed to do this, then only insane
> >> > userspace is going to hurt from this and that's a GOOD (tm) thing,
> >> > right? ;-)
> >>
> >> Afaik sane userspace doesn't hit the _deadlock_ (or lifelock if we
> >> have the set_need_resched in there). drm/i915 is a bit different since
> >> we have just one lock, and so the same design would actually deadlock
> >> even for sane userspace. But hitting contention there and yielding is
> >> somewhat expected. Obviously shouldn't happen too often since it'll
> >> hurt performance, with either blocking or the yield spinning loop.
> >
> > So this is actually a non priviledged DoS interface, right?
> 
> I think for ttm drivers it's just execbuf being exploitable. But on
> drm/i915 we've
> had the same issue with the pwrite/pread ioctls, so a simple
> glBufferData(glMap) kind of recursion from gl clients blew the kernel
> to pieces ...

And the only answer you folks came up with is set_need_resched() and
yield()? Oh well....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 20:20               ` Thomas Gleixner
@ 2013-09-12 20:23                 ` Daniel Vetter
  2013-09-12 20:39                   ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: Daniel Vetter @ 2013-09-12 20:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Dave Airlie, Maarten Lankhorst, Thomas Hellstrom,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Ingo Molnar

On Thu, Sep 12, 2013 at 10:20 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> I think for ttm drivers it's just execbuf being exploitable. But on
>> drm/i915 we've
>> had the same issue with the pwrite/pread ioctls, so a simple
>> glBufferData(glMap) kind of recursion from gl clients blew the kernel
>> to pieces ...
>
> And the only answer you folks came up with is set_need_resched() and
> yield()? Oh well....

The yield was for a different lifelock, and that one is also fixed by
now. The fault handler deadlock was fixed in the usual "drop locks and
jump into slowpath" fasion, at least in drm/i915.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 16:35         ` Chris Wilson
@ 2013-09-12 20:30           ` Peter Zijlstra
  2013-09-12 20:37             ` Thomas Gleixner
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2013-09-12 20:30 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Dave Airlie, Maarten Lankhorst,
	Thomas Hellstrom, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

On Thu, Sep 12, 2013 at 05:35:43PM +0100, Chris Wilson wrote:
> Not quite, as it would be possible for the evil userspace to trigger a
> GPU hang that would cause the sane userspace to spin indefinitely 
> waiting for the error recovery to kick in.

So with FIFOn+1 preempting FIFOn its a live-lock because the faulting
thread will forever keep yielding to itself since its the highest
priority task around, therefore the set_need_resched() is an absolute
NOP in that case.

For OTHER it might run another task with set_need_resched(), without
set_need_resched() it'll simply spin on the fault until it runs out of
time and gets force preempted and another task gets to run.

So for either case, the set_need_resched() doesn't make an appreciable
difference.

Removing it will not make evil userspace much worse -- at worst it will
cause slightly more wasted cycles.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 20:30           ` Peter Zijlstra
@ 2013-09-12 20:37             ` Thomas Gleixner
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2013-09-12 20:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chris Wilson, Daniel Vetter, Dave Airlie, Maarten Lankhorst,
	Thomas Hellstrom, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar

On Thu, 12 Sep 2013, Peter Zijlstra wrote:

> On Thu, Sep 12, 2013 at 05:35:43PM +0100, Chris Wilson wrote:
> > Not quite, as it would be possible for the evil userspace to trigger a
> > GPU hang that would cause the sane userspace to spin indefinitely 
> > waiting for the error recovery to kick in.
> 
> So with FIFOn+1 preempting FIFOn its a live-lock because the faulting
> thread will forever keep yielding to itself since its the highest
> priority task around, therefore the set_need_resched() is an absolute
> NOP in that case.
> 
> For OTHER it might run another task with set_need_resched(), without
> set_need_resched() it'll simply spin on the fault until it runs out of
> time and gets force preempted and another task gets to run.
> 
> So for either case, the set_need_resched() doesn't make an appreciable
> difference.
> 
> Removing it will not make evil userspace much worse -- at worst it will
> cause slightly more wasted cycles.

Well, yield() is a completely doomed concept by definition no matter
whether you add set_need_resched() or not.

We really should put a

	schedule_timeout_uninterruptible(1hour);

into the yield() implementation to get finally rid of it.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 20:23                 ` Daniel Vetter
@ 2013-09-12 20:39                   ` Thomas Gleixner
  2013-09-12 20:48                     ` Thomas Hellstrom
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Gleixner @ 2013-09-12 20:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Peter Zijlstra, Dave Airlie, Maarten Lankhorst, Thomas Hellstrom,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Ingo Molnar

On Thu, 12 Sep 2013, Daniel Vetter wrote:

> On Thu, Sep 12, 2013 at 10:20 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> I think for ttm drivers it's just execbuf being exploitable. But on
> >> drm/i915 we've
> >> had the same issue with the pwrite/pread ioctls, so a simple
> >> glBufferData(glMap) kind of recursion from gl clients blew the kernel
> >> to pieces ...
> >
> > And the only answer you folks came up with is set_need_resched() and
> > yield()? Oh well....
> 
> The yield was for a different lifelock, and that one is also fixed by
> now. The fault handler deadlock was fixed in the usual "drop locks and
> jump into slowpath" fasion, at least in drm/i915.

So we can remove that whole yield/set_need_resched() mess completely ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 20:39                   ` Thomas Gleixner
@ 2013-09-12 20:48                     ` Thomas Hellstrom
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-12 20:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, Maarten Lankhorst,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Ingo Molnar

On 09/12/2013 10:39 PM, Thomas Gleixner wrote:
> On Thu, 12 Sep 2013, Daniel Vetter wrote:
>
>> On Thu, Sep 12, 2013 at 10:20 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>> I think for ttm drivers it's just execbuf being exploitable. But on
>>>> drm/i915 we've
>>>> had the same issue with the pwrite/pread ioctls, so a simple
>>>> glBufferData(glMap) kind of recursion from gl clients blew the kernel
>>>> to pieces ...
>>> And the only answer you folks came up with is set_need_resched() and
>>> yield()? Oh well....
>> The yield was for a different lifelock, and that one is also fixed by
>> now. The fault handler deadlock was fixed in the usual "drop locks and
>> jump into slowpath" fasion, at least in drm/i915.
> So we can remove that whole yield/set_need_resched() mess completely ?
>
> Thanks,
>
> 	tglx
No.

The while(trylock) is there to address a potential locking inversion 
deadlock. If the trylock fails, the code returns to user-space which 
retries the fault. This code needs to stay until we can come up with 
either a way to drop the mmap_sem and sleep before returning to 
user-space, or a bunch of code is fixed with a different locking order.

The set_need_resched() can (and should according to Peter) go.

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 16:44     ` Thomas Hellstrom
  2013-09-12 19:48       ` Daniel Vetter
@ 2013-09-12 21:50       ` Maarten Lankhorst
  2013-09-13  5:33         ` Thomas Hellstrom
  2013-09-13  6:44         ` [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Thomas Hellstrom
  1 sibling, 2 replies; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-12 21:50 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

Op 12-09-13 18:44, Thomas Hellstrom schreef:
> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>> So I'm poking around the preemption code and stumbled upon:
>>>>
>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>
>>>> All these sites basically do:
>>>>
>>>>    while (!trylock())
>>>>          yield();
>>>>
>>>> which is a horrible and broken locking pattern.
>>>>
>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>> task that preempted the lock holder at FIFOn.
>>>>
>>>> Secondly the implementation is worse than usual by abusing
>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>> you're using set_need_resched() which is not something a driver should
>>>> _ever_ touch.
>>>>
>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>> because it will be broken due to changes to the preempt code.
>>>>
>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>> merging silly things like that again ;-)
>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>> removed. It was there to give the error handler a chance to sneak in
>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>> to the days when the locking around our error handler was somewhere
>>> between nonexistent and totally broken, nowadays we keep things from
>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>> whip up a patch to rip this out. I'll also check that our testsuite
>>> properly exercises this path (needs a bit of work on a quick look for
>>> better coverage).
>>>
>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>> into it's own pagefault handler and then deadlock, the trylock just
>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>> fun userspace did and now have testcases for them. The right solution
>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>> temp allocation and then continues. At least that's how we've fixed
>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>
>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>
> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>
> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
CONFIG_PROVE_LOCKING=y

and hard grab that reserve lock within the fault handler, done.. lockdep will spit it out for you :p

~Maarten

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 21:50       ` Maarten Lankhorst
@ 2013-09-13  5:33         ` Thomas Hellstrom
  2013-09-13  8:26           ` Daniel Vetter
  2013-10-08 14:14           ` [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations Maarten Lankhorst
  2013-09-13  6:44         ` [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Thomas Hellstrom
  1 sibling, 2 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-13  5:33 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>> So I'm poking around the preemption code and stumbled upon:
>>>>>
>>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>>
>>>>> All these sites basically do:
>>>>>
>>>>>     while (!trylock())
>>>>>           yield();
>>>>>
>>>>> which is a horrible and broken locking pattern.
>>>>>
>>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>>> task that preempted the lock holder at FIFOn.
>>>>>
>>>>> Secondly the implementation is worse than usual by abusing
>>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>>> you're using set_need_resched() which is not something a driver should
>>>>> _ever_ touch.
>>>>>
>>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>>> because it will be broken due to changes to the preempt code.
>>>>>
>>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>>> merging silly things like that again ;-)
>>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>>> removed. It was there to give the error handler a chance to sneak in
>>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>>> to the days when the locking around our error handler was somewhere
>>>> between nonexistent and totally broken, nowadays we keep things from
>>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>>> whip up a patch to rip this out. I'll also check that our testsuite
>>>> properly exercises this path (needs a bit of work on a quick look for
>>>> better coverage).
>>>>
>>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>>> into it's own pagefault handler and then deadlock, the trylock just
>>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>>> fun userspace did and now have testcases for them. The right solution
>>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>>> temp allocation and then continues. At least that's how we've fixed
>>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>>
>>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>>
>> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
> CONFIG_PROVE_LOCKING=y
>
> and hard grab that reserve lock within the fault handler, done.. lockdep will spit it out for you :p
>
> ~Maarten

Given that all copy_to_user / copy_from_user paths are actually hit 
during testing, right?

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-12 21:50       ` Maarten Lankhorst
  2013-09-13  5:33         ` Thomas Hellstrom
@ 2013-09-13  6:44         ` Thomas Hellstrom
  2013-09-13  7:16           ` Maarten Lankhorst
  1 sibling, 1 reply; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-13  6:44 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>> So I'm poking around the preemption code and stumbled upon:
>>>>>
>>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>>
>>>>> All these sites basically do:
>>>>>
>>>>>     while (!trylock())
>>>>>           yield();
>>>>>
>>>>> which is a horrible and broken locking pattern.
>>>>>
>>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>>> task that preempted the lock holder at FIFOn.
>>>>>
>>>>> Secondly the implementation is worse than usual by abusing
>>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>>> you're using set_need_resched() which is not something a driver should
>>>>> _ever_ touch.
>>>>>
>>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>>> because it will be broken due to changes to the preempt code.
>>>>>
>>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>>> merging silly things like that again ;-)
>>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>>> removed. It was there to give the error handler a chance to sneak in
>>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>>> to the days when the locking around our error handler was somewhere
>>>> between nonexistent and totally broken, nowadays we keep things from
>>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>>> whip up a patch to rip this out. I'll also check that our testsuite
>>>> properly exercises this path (needs a bit of work on a quick look for
>>>> better coverage).
>>>>
>>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>>> into it's own pagefault handler and then deadlock, the trylock just
>>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>>> fun userspace did and now have testcases for them. The right solution
>>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>>> temp allocation and then continues. At least that's how we've fixed
>>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>>
>>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>>
>> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.

Actually, from looking at the mm code, it seems OK to do the following:

if (!bo_tryreserve()) {
     up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
     bo_reserve();               // Wait for the BO to become available 
(interruptible)
     bo_unreserve();           // Where is bo_wait_unreserved() when we 
need it, Maarten :P
     return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after 
regrabbing
}

Somebody conveniently added a VM_FAULT_RETRY, but for a different purpose.

If possible, I suggest to take this route for now to avoid the mess of 
changing locking order in all TTM drivers, with
all give-up-locking slowpaths that comes with it. IIRC it took some time 
for i915 to get that right, and completely get rid of all lockdep warnings.

This will keep the official locking order
bo::reserve
mmap_sem

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  6:44         ` [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Thomas Hellstrom
@ 2013-09-13  7:16           ` Maarten Lankhorst
  2013-09-13  7:46             ` Thomas Hellstrom
  0 siblings, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-13  7:16 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

Op 13-09-13 08:44, Thomas Hellstrom schreef:
> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
>> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>>>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>> So I'm poking around the preemption code and stumbled upon:
>>>>>>
>>>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>>>
>>>>>> All these sites basically do:
>>>>>>
>>>>>>     while (!trylock())
>>>>>>           yield();
>>>>>>
>>>>>> which is a horrible and broken locking pattern.
>>>>>>
>>>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>>>> task that preempted the lock holder at FIFOn.
>>>>>>
>>>>>> Secondly the implementation is worse than usual by abusing
>>>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>>>> you're using set_need_resched() which is not something a driver should
>>>>>> _ever_ touch.
>>>>>>
>>>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>>>> because it will be broken due to changes to the preempt code.
>>>>>>
>>>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>>>> merging silly things like that again ;-)
>>>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>>>> removed. It was there to give the error handler a chance to sneak in
>>>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>>>> to the days when the locking around our error handler was somewhere
>>>>> between nonexistent and totally broken, nowadays we keep things from
>>>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>>>> whip up a patch to rip this out. I'll also check that our testsuite
>>>>> properly exercises this path (needs a bit of work on a quick look for
>>>>> better coverage).
>>>>>
>>>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>>>> into it's own pagefault handler and then deadlock, the trylock just
>>>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>>>> fun userspace did and now have testcases for them. The right solution
>>>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>>>> temp allocation and then continues. At least that's how we've fixed
>>>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>>>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>>>
>>>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>>> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>>>
>>> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
>
> Actually, from looking at the mm code, it seems OK to do the following:
>
> if (!bo_tryreserve()) {
>     up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>     bo_reserve();               // Wait for the BO to become available (interruptible)
>     bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>     return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
> }
Is this meant as a jab at me? You're doing locking wrong here! Again!

> Somebody conveniently added a VM_FAULT_RETRY, but for a different purpose.
>
> If possible, I suggest to take this route for now to avoid the mess of changing locking order in all TTM drivers, with
> all give-up-locking slowpaths that comes with it. IIRC it took some time for i915 to get that right, and completely get rid of all lockdep warnings.
Sorry, but it's still the right thing to do. I can convert nouveau and take a look at radeon. Locking
slowpaths are easy to test too with CONFIG_DEBUG_WW_MUTEX_SLOWPATH.
Just because it's harder, doesn't mean we have to avoid doing it.

The might_fault function will verify the usage of mmap_sem with lockdep automatically when PROVE_LOCKING=y.
This means that any copy_from_user / copy_to_user will always check mmap_sem.

> This will keep the official locking order
> bo::reserve
> mmap_sem
Disagree, fix the order and the trylock and 'wait for unreserved' half assed locking will disappear.

~Maarten

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  7:16           ` Maarten Lankhorst
@ 2013-09-13  7:46             ` Thomas Hellstrom
  2013-09-13  7:51               ` Maarten Lankhorst
  2013-09-13  8:29               ` Peter Zijlstra
  0 siblings, 2 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-13  7:46 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:
> Op 13-09-13 08:44, Thomas Hellstrom schreef:
>> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
>>> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>>>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>>>>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>>>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>> So I'm poking around the preemption code and stumbled upon:
>>>>>>>
>>>>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>>>>
>>>>>>> All these sites basically do:
>>>>>>>
>>>>>>>      while (!trylock())
>>>>>>>            yield();
>>>>>>>
>>>>>>> which is a horrible and broken locking pattern.
>>>>>>>
>>>>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>>>>> task that preempted the lock holder at FIFOn.
>>>>>>>
>>>>>>> Secondly the implementation is worse than usual by abusing
>>>>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>>>>> you're using set_need_resched() which is not something a driver should
>>>>>>> _ever_ touch.
>>>>>>>
>>>>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>>>>> because it will be broken due to changes to the preempt code.
>>>>>>>
>>>>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>>>>> merging silly things like that again ;-)
>>>>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>>>>> removed. It was there to give the error handler a chance to sneak in
>>>>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>>>>> to the days when the locking around our error handler was somewhere
>>>>>> between nonexistent and totally broken, nowadays we keep things from
>>>>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>>>>> whip up a patch to rip this out. I'll also check that our testsuite
>>>>>> properly exercises this path (needs a bit of work on a quick look for
>>>>>> better coverage).
>>>>>>
>>>>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>>>>> into it's own pagefault handler and then deadlock, the trylock just
>>>>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>>>>> fun userspace did and now have testcases for them. The right solution
>>>>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>>>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>>>>> temp allocation and then continues. At least that's how we've fixed
>>>>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>>>>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>>>>
>>>>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>>>> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>>>>
>>>> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
>> Actually, from looking at the mm code, it seems OK to do the following:
>>
>> if (!bo_tryreserve()) {
>>      up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>      bo_reserve();               // Wait for the BO to become available (interruptible)
>>      bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>      return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>> }
> Is this meant as a jab at me? You're doing locking wrong here! Again!

It's not meant as a jab at you.  I'm sorry if it came out that way. It 
was meant as a joke. I wasn't aware the topic was sensitive.

Anyway, could you describe what is wrong, with the above solution, 
because it seems perfectly legal to me.
There is no substantial overhead, and there is no risc of deadlocks. Or 
do you mean it's bad because it confuses lockdep?

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  7:46             ` Thomas Hellstrom
@ 2013-09-13  7:51               ` Maarten Lankhorst
  2013-09-13  8:23                 ` Thomas Hellstrom
  2013-09-13  8:29               ` Peter Zijlstra
  1 sibling, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-13  7:51 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

Op 13-09-13 09:46, Thomas Hellstrom schreef:
> On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:
>> Op 13-09-13 08:44, Thomas Hellstrom schreef:
>>> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
>>>> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>>>>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>>>>>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>>>>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>> So I'm poking around the preemption code and stumbled upon:
>>>>>>>>
>>>>>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>>>>>
>>>>>>>> All these sites basically do:
>>>>>>>>
>>>>>>>>      while (!trylock())
>>>>>>>>            yield();
>>>>>>>>
>>>>>>>> which is a horrible and broken locking pattern.
>>>>>>>>
>>>>>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>>>>>> task that preempted the lock holder at FIFOn.
>>>>>>>>
>>>>>>>> Secondly the implementation is worse than usual by abusing
>>>>>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>>>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>>>>>> you're using set_need_resched() which is not something a driver should
>>>>>>>> _ever_ touch.
>>>>>>>>
>>>>>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>>>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>>>>>> because it will be broken due to changes to the preempt code.
>>>>>>>>
>>>>>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>>>>>> merging silly things like that again ;-)
>>>>>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>>>>>> removed. It was there to give the error handler a chance to sneak in
>>>>>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>>>>>> to the days when the locking around our error handler was somewhere
>>>>>>> between nonexistent and totally broken, nowadays we keep things from
>>>>>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>>>>>> whip up a patch to rip this out. I'll also check that our testsuite
>>>>>>> properly exercises this path (needs a bit of work on a quick look for
>>>>>>> better coverage).
>>>>>>>
>>>>>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>>>>>> into it's own pagefault handler and then deadlock, the trylock just
>>>>>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>>>>>> fun userspace did and now have testcases for them. The right solution
>>>>>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>>>>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>>>>>> temp allocation and then continues. At least that's how we've fixed
>>>>>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>>>>>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>>>>>
>>>>>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>>>>> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>>>>>
>>>>> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
>>> Actually, from looking at the mm code, it seems OK to do the following:
>>>
>>> if (!bo_tryreserve()) {
>>>      up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>      bo_reserve();               // Wait for the BO to become available (interruptible)
>>>      bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>      return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>> }
>> Is this meant as a jab at me? You're doing locking wrong here! Again!
>
> It's not meant as a jab at you.  I'm sorry if it came out that way. It was meant as a joke. I wasn't aware the topic was sensitive.
>
> Anyway, could you describe what is wrong, with the above solution, because it seems perfectly legal to me.
> There is no substantial overhead, and there is no risc of deadlocks. Or do you mean it's bad because it confuses lockdep?
Evil userspace can pass a bo as pointer to use for relocation lists, lockdep will warn when that locks up, but still..
This is already a problem now, and your fixing will only cause lockdep to explicitly warn on it.

You can make a complicated user program to test this, or simply use this function for debugging:
void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&obj.lock); reservation_object_fini(&obj); }

Put it near every instance of copy_to_user/copy_from_user and you'll find the bugs. :)

~Maarten


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  7:51               ` Maarten Lankhorst
@ 2013-09-13  8:23                 ` Thomas Hellstrom
  2013-09-13  8:32                   ` Daniel Vetter
  2013-09-13  8:58                   ` Maarten Lankhorst
  0 siblings, 2 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-13  8:23 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

On 09/13/2013 09:51 AM, Maarten Lankhorst wrote:
> Op 13-09-13 09:46, Thomas Hellstrom schreef:
>> On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:
>>> Op 13-09-13 08:44, Thomas Hellstrom schreef:
>>>> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
>>>>> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>>>>>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>>>>>>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>>>>>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>>> So I'm poking around the preemption code and stumbled upon:
>>>>>>>>>
>>>>>>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>>>>>>
>>>>>>>>> All these sites basically do:
>>>>>>>>>
>>>>>>>>>       while (!trylock())
>>>>>>>>>             yield();
>>>>>>>>>
>>>>>>>>> which is a horrible and broken locking pattern.
>>>>>>>>>
>>>>>>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>>>>>>> task that preempted the lock holder at FIFOn.
>>>>>>>>>
>>>>>>>>> Secondly the implementation is worse than usual by abusing
>>>>>>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>>>>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>>>>>>> you're using set_need_resched() which is not something a driver should
>>>>>>>>> _ever_ touch.
>>>>>>>>>
>>>>>>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>>>>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>>>>>>> because it will be broken due to changes to the preempt code.
>>>>>>>>>
>>>>>>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>>>>>>> merging silly things like that again ;-)
>>>>>>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>>>>>>> removed. It was there to give the error handler a chance to sneak in
>>>>>>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>>>>>>> to the days when the locking around our error handler was somewhere
>>>>>>>> between nonexistent and totally broken, nowadays we keep things from
>>>>>>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>>>>>>> whip up a patch to rip this out. I'll also check that our testsuite
>>>>>>>> properly exercises this path (needs a bit of work on a quick look for
>>>>>>>> better coverage).
>>>>>>>>
>>>>>>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>>>>>>> into it's own pagefault handler and then deadlock, the trylock just
>>>>>>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>>>>>>> fun userspace did and now have testcases for them. The right solution
>>>>>>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>>>>>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>>>>>>> temp allocation and then continues. At least that's how we've fixed
>>>>>>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>>>>>>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>>>>>>
>>>>>>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>>>>>> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>>>>>>
>>>>>> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
>>>> Actually, from looking at the mm code, it seems OK to do the following:
>>>>
>>>> if (!bo_tryreserve()) {
>>>>       up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>       bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>       bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>       return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>> }
>>> Is this meant as a jab at me? You're doing locking wrong here! Again!
>> It's not meant as a jab at you.  I'm sorry if it came out that way. It was meant as a joke. I wasn't aware the topic was sensitive.
>>
>> Anyway, could you describe what is wrong, with the above solution, because it seems perfectly legal to me.
>> There is no substantial overhead, and there is no risc of deadlocks. Or do you mean it's bad because it confuses lockdep?
> Evil userspace can pass a bo as pointer to use for relocation lists, lockdep will warn when that locks up, but still..
> This is already a problem now, and your fixing will only cause lockdep to explicitly warn on it.

As previously mentioned, copy_from_user should return -EFAULT, since the 
VMAs are marked with VM_IO. It should not recurse into fault(), so evil 
user-space looses.

>
> You can make a complicated user program to test this, or simply use this function for debugging:
> void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&obj.lock); reservation_object_fini(&obj); }
>
> Put it near every instance of copy_to_user/copy_from_user and you'll find the bugs. :)

I'm still not convinced that there are any problems with this solution. 
Did you take what's said above into account?


Now, could we try to approach this based on pros and cons? Let's say we 
would be able to choose locking order without doing anything ugly. I'd 
put it like this:

mmap_sem->bo_reserve:
Good: Native locking order of VM subsystem. Good if we in the future 
will need to reserve in mmap().
Bad: pwrite, pread, copy_to user, copy_from_user usage needs a slowpath 
that releases all locking, which has to be done in multiple places in 
multiple drivers. Grabbing the mmap_sem and then waiting for multiple 
possibly sleeping bo_reserves in slow paths will stall VMA write 
operations for this MM.

bo_reserve->mmap_sem:
Good: Natural locking order for all driver ioctls. Slowpath needs to be 
done in a single place, in common code.
Bad: Bad if we ever need to perform bo_reserve in mmap.

In my view we have a clear winner. Given the problems i915 had when 
converting their driver, and the bashing they had to withstand, we have 
an even clearer winner.

And then we need to take into account that, (given that I understand 
things correctly) lockdep will complain because it thinks there is a 
recursion that will never happen.
That will make the bo_reserve->mmap_sem solution look bad, but is this 
really enough to justify giving it up?

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  5:33         ` Thomas Hellstrom
@ 2013-09-13  8:26           ` Daniel Vetter
  2013-10-08 14:14           ` [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations Maarten Lankhorst
  1 sibling, 0 replies; 58+ messages in thread
From: Daniel Vetter @ 2013-09-13  8:26 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Maarten Lankhorst, Peter Zijlstra, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Fri, Sep 13, 2013 at 7:33 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> Given that all copy_to_user / copy_from_user paths are actually hit during
> testing, right?

Ime it requires a bit of ingenuity to properly test this from
userspace. We're using a few tricks in drm/i915 kernel testing:
- When we hand a gtt mmap pointer to execbuf or other ioctls we upload
the data in there through pwrite (or if you don't have that use the
gpu to blt it there). This way you can careful control when the
pagefault will happen. Also since we supply correct data we can make
sure that the kernel actually does the right thing and not just
whether it'll blow up.
- We have a module parameter which can be changed at runtime to
disable all the prefaulting we're doing.
- We have a debugfs interface to drop caches/evict lrus. If you have a
parallel thread that regularly forces the inactive list to be evicted
we can force a refault even after the first fault already happend.
That's useful to test the slowpath after a slowpath already happened,
e.g. when trying to copy reloc offset out to userspace after execbuf
completed.

With these tricks we have imo great test coverage for i915.ko and more
important good assurance that any regressions in this tricky code will
get caught.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  7:46             ` Thomas Hellstrom
  2013-09-13  7:51               ` Maarten Lankhorst
@ 2013-09-13  8:29               ` Peter Zijlstra
  2013-09-13  8:41                 ` Daniel Vetter
  1 sibling, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2013-09-13  8:29 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Maarten Lankhorst, Daniel Vetter, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
> >>if (!bo_tryreserve()) {
> >>     up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
> >>     bo_reserve();               // Wait for the BO to become available (interruptible)
> >>     bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
> >>     return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
> >>}
> 
> Anyway, could you describe what is wrong, with the above solution, because
> it seems perfectly legal to me.

Luckily the rule of law doesn't have anything to do with this stuff --
at least I sincerely hope so.

The thing that's wrong with that pattern is that its still not
deterministic - although its a lot better than the pure trylock. Because
you have to release and re-acquire with the trylock another user might
have gotten in again. Its utterly prone to starvation.

The acquire+release does remove the dead/life-lock scenario from the
FIFO case, since blocking on the acquire will allow the other task to
run (or even get boosted on -rt).

Aside from that there's nothing particularly wrong with it and lockdep
should be happy afaict (but I haven't had my morning juice yet).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  8:23                 ` Thomas Hellstrom
@ 2013-09-13  8:32                   ` Daniel Vetter
  2013-09-13  8:39                     ` Thomas Hellstrom
  2013-09-13  8:58                   ` Maarten Lankhorst
  1 sibling, 1 reply; 58+ messages in thread
From: Daniel Vetter @ 2013-09-13  8:32 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Maarten Lankhorst, Peter Zijlstra, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Fri, Sep 13, 2013 at 10:23 AM, Thomas Hellstrom
<thellstrom@vmware.com> wrote:
> As previously mentioned, copy_from_user should return -EFAULT, since the
> VMAs are marked with VM_IO. It should not recurse into fault(), so evil
> user-space looses.

I haven't put a printk in the code to prove this, but gem mmap also
sets VM_IO in drm_gem_mmap_obj. And we can very much hit our own fault
handler and deadlock ....

On a _very_ quick reading (and definitely not enough coffee yet for
reading mm/* stuff) it looks like it's get_user_pages that will return
an -EFAULT when hitting upon a VM_IO mapping (which makes sense since
there's really no page backing it). Actually using get_user_pages was
the original slowpath we've had in a few places until we've noticed
that for pwrite that breaks legit userspace (the glBufferData(glMap))
use-case), so we've switched to lock dropping and proper slowpaths
using copy_*_user everywhere instead of trying to pin the userspace
storage with get_user_pages.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  8:32                   ` Daniel Vetter
@ 2013-09-13  8:39                     ` Thomas Hellstrom
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-13  8:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Maarten Lankhorst, Peter Zijlstra, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On 09/13/2013 10:32 AM, Daniel Vetter wrote:
> On Fri, Sep 13, 2013 at 10:23 AM, Thomas Hellstrom
> <thellstrom@vmware.com> wrote:
>> As previously mentioned, copy_from_user should return -EFAULT, since the
>> VMAs are marked with VM_IO. It should not recurse into fault(), so evil
>> user-space looses.
> I haven't put a printk in the code to prove this, but gem mmap also
> sets VM_IO in drm_gem_mmap_obj. And we can very much hit our own fault
> handler and deadlock ....

If this is indeed true, I guess I need to accept the fact that my 
solution is bad.
(and worse I can't blame not having my morning coffee).

I'll take a deeper look.
/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  8:29               ` Peter Zijlstra
@ 2013-09-13  8:41                 ` Daniel Vetter
  2013-09-13  9:00                   ` Peter Zijlstra
  0 siblings, 1 reply; 58+ messages in thread
From: Daniel Vetter @ 2013-09-13  8:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Hellstrom, Maarten Lankhorst, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>> >>if (!bo_tryreserve()) {
>> >>     up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>> >>     bo_reserve();               // Wait for the BO to become available (interruptible)
>> >>     bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>> >>     return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>> >>}
>>
>> Anyway, could you describe what is wrong, with the above solution, because
>> it seems perfectly legal to me.
>
> Luckily the rule of law doesn't have anything to do with this stuff --
> at least I sincerely hope so.
>
> The thing that's wrong with that pattern is that its still not
> deterministic - although its a lot better than the pure trylock. Because
> you have to release and re-acquire with the trylock another user might
> have gotten in again. Its utterly prone to starvation.
>
> The acquire+release does remove the dead/life-lock scenario from the
> FIFO case, since blocking on the acquire will allow the other task to
> run (or even get boosted on -rt).
>
> Aside from that there's nothing particularly wrong with it and lockdep
> should be happy afaict (but I haven't had my morning juice yet).

bo_reserve internally maps to a ww-mutex and task can already hold
ww-mutex (potentially even the same for especially nasty userspace).
So lockdep will complain and I think the only way to properly solve
this is to have lock-dropping slowpaths around all copy_*_user
callsites that already hold a bo_reserve ww_mutex. At least that's
been my conclusion after much head-banging against this issue for
drm/i915, and we've tried a lot approaches ;-)
-Daniel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  8:23                 ` Thomas Hellstrom
  2013-09-13  8:32                   ` Daniel Vetter
@ 2013-09-13  8:58                   ` Maarten Lankhorst
  2013-09-13  9:21                     ` Thomas Hellstrom
  1 sibling, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-13  8:58 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

Op 13-09-13 10:23, Thomas Hellstrom schreef:
> On 09/13/2013 09:51 AM, Maarten Lankhorst wrote:
>> Op 13-09-13 09:46, Thomas Hellstrom schreef:
>>> On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:
>>>> Op 13-09-13 08:44, Thomas Hellstrom schreef:
>>>>> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
>>>>>> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>>>>>>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>>>>>>>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>>>>>>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>>>> So I'm poking around the preemption code and stumbled upon:
>>>>>>>>>>
>>>>>>>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>>>>>>>
>>>>>>>>>> All these sites basically do:
>>>>>>>>>>
>>>>>>>>>>       while (!trylock())
>>>>>>>>>>             yield();
>>>>>>>>>>
>>>>>>>>>> which is a horrible and broken locking pattern.
>>>>>>>>>>
>>>>>>>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>>>>>>>> task that preempted the lock holder at FIFOn.
>>>>>>>>>>
>>>>>>>>>> Secondly the implementation is worse than usual by abusing
>>>>>>>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>>>>>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>>>>>>>> you're using set_need_resched() which is not something a driver should
>>>>>>>>>> _ever_ touch.
>>>>>>>>>>
>>>>>>>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>>>>>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>>>>>>>> because it will be broken due to changes to the preempt code.
>>>>>>>>>>
>>>>>>>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>>>>>>>> merging silly things like that again ;-)
>>>>>>>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>>>>>>>> removed. It was there to give the error handler a chance to sneak in
>>>>>>>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>>>>>>>> to the days when the locking around our error handler was somewhere
>>>>>>>>> between nonexistent and totally broken, nowadays we keep things from
>>>>>>>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>>>>>>>> whip up a patch to rip this out. I'll also check that our testsuite
>>>>>>>>> properly exercises this path (needs a bit of work on a quick look for
>>>>>>>>> better coverage).
>>>>>>>>>
>>>>>>>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>>>>>>>> into it's own pagefault handler and then deadlock, the trylock just
>>>>>>>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>>>>>>>> fun userspace did and now have testcases for them. The right solution
>>>>>>>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>>>>>>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>>>>>>>> temp allocation and then continues. At least that's how we've fixed
>>>>>>>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>>>>>>>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>>>>>>>
>>>>>>>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>>>>>>> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>>>>>>>
>>>>>>> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
>>>>> Actually, from looking at the mm code, it seems OK to do the following:
>>>>>
>>>>> if (!bo_tryreserve()) {
>>>>>       up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>       bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>       bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>       return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>> }
>>>> Is this meant as a jab at me? You're doing locking wrong here! Again!
>>> It's not meant as a jab at you.  I'm sorry if it came out that way. It was meant as a joke. I wasn't aware the topic was sensitive.
>>>
>>> Anyway, could you describe what is wrong, with the above solution, because it seems perfectly legal to me.
>>> There is no substantial overhead, and there is no risc of deadlocks. Or do you mean it's bad because it confuses lockdep?
>> Evil userspace can pass a bo as pointer to use for relocation lists, lockdep will warn when that locks up, but still..
>> This is already a problem now, and your fixing will only cause lockdep to explicitly warn on it.
>
> As previously mentioned, copy_from_user should return -EFAULT, since the VMAs are marked with VM_IO. It should not recurse into fault(), so evil user-space looses.
>
>>
>> You can make a complicated user program to test this, or simply use this function for debugging:
>> void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&obj.lock); reservation_object_fini(&obj); }
>>
>> Put it near every instance of copy_to_user/copy_from_user and you'll find the bugs. :)
>
> I'm still not convinced that there are any problems with this solution. Did you take what's said above into account?
>
>
> Now, could we try to approach this based on pros and cons? Let's say we would be able to choose locking order without doing anything ugly. I'd put it like this:
>
> mmap_sem->bo_reserve:
> Good: Native locking order of VM subsystem. Good if we in the future will need to reserve in mmap().
> Bad: pwrite, pread, copy_to user, copy_from_user usage needs a slowpath that releases all locking, which has to be done in multiple places in multiple drivers. Grabbing the mmap_sem and then waiting for multiple possibly sleeping bo_reserves in slow paths will stall VMA write operations for this MM.
I think the good offsets the bad a million times here. Just because it's harder.

> bo_reserve->mmap_sem:
> Good: Natural locking order for all driver ioctls. Slowpath needs to be done in a single place, in common code.
> Bad: Bad if we ever need to perform bo_reserve in mmap.
Considering you're open coding a mutex_lock with the reserve/unreserve+trylock, I think this is a horrible approach. The possibility of a deadlock still exists too. :(
> In my view we have a clear winner. Given the problems i915 had when converting their driver, and the bashing they had to withstand, we have an even clearer winner.
>
> And then we need to take into account that, (given that I understand things correctly) lockdep will complain because it thinks there is a recursion that will never happen.
> That will make the bo_reserve->mmap_sem solution look bad, but is this really enough to justify giving it up?
>
> /Thomas
>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  8:41                 ` Daniel Vetter
@ 2013-09-13  9:00                   ` Peter Zijlstra
  2013-09-23 15:33                     ` [RFC PATCH] drm/nouveau: fix nested locking in mmap handler Maarten Lankhorst
  0 siblings, 1 reply; 58+ messages in thread
From: Peter Zijlstra @ 2013-09-13  9:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Thomas Hellstrom, Maarten Lankhorst, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
> >> >>if (!bo_tryreserve()) {
> >> >>     up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
> >> >>     bo_reserve();               // Wait for the BO to become available (interruptible)
> >> >>     bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
> >> >>     return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
> >> >>}
> >>
> >> Anyway, could you describe what is wrong, with the above solution, because
> >> it seems perfectly legal to me.
> >
> > Luckily the rule of law doesn't have anything to do with this stuff --
> > at least I sincerely hope so.
> >
> > The thing that's wrong with that pattern is that its still not
> > deterministic - although its a lot better than the pure trylock. Because
> > you have to release and re-acquire with the trylock another user might
> > have gotten in again. Its utterly prone to starvation.
> >
> > The acquire+release does remove the dead/life-lock scenario from the
> > FIFO case, since blocking on the acquire will allow the other task to
> > run (or even get boosted on -rt).
> >
> > Aside from that there's nothing particularly wrong with it and lockdep
> > should be happy afaict (but I haven't had my morning juice yet).
> 
> bo_reserve internally maps to a ww-mutex and task can already hold
> ww-mutex (potentially even the same for especially nasty userspace).

OK, yes I wasn't aware of that. Yes in that case you're quite right.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE
  2013-09-13  8:58                   ` Maarten Lankhorst
@ 2013-09-13  9:21                     ` Thomas Hellstrom
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-13  9:21 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

On 09/13/2013 10:58 AM, Maarten Lankhorst wrote:
> Op 13-09-13 10:23, Thomas Hellstrom schreef:
>> On 09/13/2013 09:51 AM, Maarten Lankhorst wrote:
>>> Op 13-09-13 09:46, Thomas Hellstrom schreef:
>>>> On 09/13/2013 09:16 AM, Maarten Lankhorst wrote:
>>>>> Op 13-09-13 08:44, Thomas Hellstrom schreef:
>>>>>> On 09/12/2013 11:50 PM, Maarten Lankhorst wrote:
>>>>>>> Op 12-09-13 18:44, Thomas Hellstrom schreef:
>>>>>>>> On 09/12/2013 05:45 PM, Maarten Lankhorst wrote:
>>>>>>>>> Op 12-09-13 17:36, Daniel Vetter schreef:
>>>>>>>>>> On Thu, Sep 12, 2013 at 5:06 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>>>>> So I'm poking around the preemption code and stumbled upon:
>>>>>>>>>>>
>>>>>>>>>>> drivers/gpu/drm/i915/i915_gem.c:                set_need_resched();
>>>>>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c:                        set_need_resched();
>>>>>>>>>>> drivers/gpu/drm/udl/udl_gem.c:          set_need_resched();
>>>>>>>>>>>
>>>>>>>>>>> All these sites basically do:
>>>>>>>>>>>
>>>>>>>>>>>        while (!trylock())
>>>>>>>>>>>              yield();
>>>>>>>>>>>
>>>>>>>>>>> which is a horrible and broken locking pattern.
>>>>>>>>>>>
>>>>>>>>>>> Firstly its deadlock prone, suppose the faulting process is a FIFOn+1
>>>>>>>>>>> task that preempted the lock holder at FIFOn.
>>>>>>>>>>>
>>>>>>>>>>> Secondly the implementation is worse than usual by abusing
>>>>>>>>>>> VM_FAULT_NOPAGE, which is supposed to install a PTE so that the fault
>>>>>>>>>>> doesn't retry, but you're using it as a get out of fault path. And
>>>>>>>>>>> you're using set_need_resched() which is not something a driver should
>>>>>>>>>>> _ever_ touch.
>>>>>>>>>>>
>>>>>>>>>>> Now I'm going to take away set_need_resched() -- and while you can
>>>>>>>>>>> 'reimplement' it using set_thread_flag() you're not going to do that
>>>>>>>>>>> because it will be broken due to changes to the preempt code.
>>>>>>>>>>>
>>>>>>>>>>> So please as to fix ASAP and don't allow anybody to trick you into
>>>>>>>>>>> merging silly things like that again ;-)
>>>>>>>>>> The set_need_resched in i915_gem.c:i915_gem_fault can actually be
>>>>>>>>>> removed. It was there to give the error handler a chance to sneak in
>>>>>>>>>> and reset the hw/sw tracking when the gpu is dead. That hack goes back
>>>>>>>>>> to the days when the locking around our error handler was somewhere
>>>>>>>>>> between nonexistent and totally broken, nowadays we keep things from
>>>>>>>>>> live-locking by a bit of magic in i915_mutex_lock_interruptible. I'll
>>>>>>>>>> whip up a patch to rip this out. I'll also check that our testsuite
>>>>>>>>>> properly exercises this path (needs a bit of work on a quick look for
>>>>>>>>>> better coverage).
>>>>>>>>>>
>>>>>>>>>> The one in ttm is just bonghits to shut up lockdep: ttm can recurse
>>>>>>>>>> into it's own pagefault handler and then deadlock, the trylock just
>>>>>>>>>> keeps lockdep quiet. We've had that bug arise in drm/i915 due to some
>>>>>>>>>> fun userspace did and now have testcases for them. The right solution
>>>>>>>>>> to fix this is to use copy_to|from_user_atomic in ttm everywhere it
>>>>>>>>>> holds locks and have slowpaths which drops locks, copies stuff into a
>>>>>>>>>> temp allocation and then continues. At least that's how we've fixed
>>>>>>>>>> all those inversions in i915-gem. I'm not volunteering to fix this ;-)
>>>>>>>>> Ah the case where a mmap'd address is passed to the execbuf ioctl? :P
>>>>>>>>>
>>>>>>>>> Fine I'll look into it a bit, hopefully before tuesday. Else it might take a bit longer since I'll be on my way to plumbers..
>>>>>>>> I think a possible fix would be if fault() were allowed to return an error and drop the mmap_sem() before returning.
>>>>>>>>
>>>>>>>> Otherwise we need to track down all copy_to_user / copy_from_user which happen with bo::reserve held.
>>>>>> Actually, from looking at the mm code, it seems OK to do the following:
>>>>>>
>>>>>> if (!bo_tryreserve()) {
>>>>>>        up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>        bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>        bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>        return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>> }
>>>>> Is this meant as a jab at me? You're doing locking wrong here! Again!
>>>> It's not meant as a jab at you.  I'm sorry if it came out that way. It was meant as a joke. I wasn't aware the topic was sensitive.
>>>>
>>>> Anyway, could you describe what is wrong, with the above solution, because it seems perfectly legal to me.
>>>> There is no substantial overhead, and there is no risc of deadlocks. Or do you mean it's bad because it confuses lockdep?
>>> Evil userspace can pass a bo as pointer to use for relocation lists, lockdep will warn when that locks up, but still..
>>> This is already a problem now, and your fixing will only cause lockdep to explicitly warn on it.
>> As previously mentioned, copy_from_user should return -EFAULT, since the VMAs are marked with VM_IO. It should not recurse into fault(), so evil user-space looses.
>>
>>> You can make a complicated user program to test this, or simply use this function for debugging:
>>> void ttm_might_fault(void) { struct reservation_object obj; reservation_object_init(&obj); ww_mutex_lock(&obj.lock, NULL); ww_mutex_unlock(&obj.lock); reservation_object_fini(&obj); }
>>>
>>> Put it near every instance of copy_to_user/copy_from_user and you'll find the bugs. :)
>> I'm still not convinced that there are any problems with this solution. Did you take what's said above into account?


8<---------------------------------------------------------------------------------------

>>
>> Now, could we try to approach this based on pros and cons? Let's say we would be able to choose locking order without doing anything ugly. I'd put it like this:
>>
>> mmap_sem->bo_reserve:
>> Good: Native locking order of VM subsystem. Good if we in the future will need to reserve in mmap().
>> Bad: pwrite, pread, copy_to user, copy_from_user usage needs a slowpath that releases all locking, which has to be done in multiple places in multiple drivers. Grabbing the mmap_sem and then waiting for multiple possibly sleeping bo_reserves in slow paths will stall VMA write operations for this MM.
> I think the good offsets the bad a million times here. Just because it's harder.
>
>> bo_reserve->mmap_sem:
>> Good: Natural locking order for all driver ioctls. Slowpath needs to be done in a single place, in common code.
>> Bad: Bad if we ever need to perform bo_reserve in mmap.
> Considering you're open coding a mutex_lock with the reserve/unreserve+trylock, I think this is a horrible approach. The possibility of a deadlock still exists too. :(

8<---------------------------------------------------------------------------

Maarten, the above section was intended to discuss the benefits of 
different locking orders, assuming, as stated, that there
was a legitimatate solution to either choice.
Wording like "it's better bacause it's harder", "your implementation is 
horrible", "half-assed" doesn't really add much.

I still think, from a code maintenance point of view that the 
bo:reserve->mmap_sem locking order is superior, since we don't have the 
same tools that the Intel guys have. But implementation doesn't seem 
feasible ATM, both because as Peter pointed out, the pattern is 
non-deterministic, and as Daniel correctly pointed out, the copy_xx_user 
paths use regular page-faults, which makes recursion possible.

So I guess we need to take the mmap_sem->bo:reserve route after all.

/Thomas





>> In my view we have a clear winner. Given the problems i915 had when converting their driver, and the bashing they had to withstand, we have an even clearer winner.
>>
>> And then we need to take into account that, (given that I understand things correctly) lockdep will complain because it thinks there is a recursion that will never happen.
>> That will make the bo_reserve->mmap_sem solution look bad, but is this really enough to justify giving it up?
>>
>> /Thomas
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-13  9:00                   ` Peter Zijlstra
@ 2013-09-23 15:33                     ` Maarten Lankhorst
  2013-09-24  7:22                       ` Thomas Hellstrom
  0 siblings, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-23 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Vetter, Thomas Hellstrom, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner, Ben Skeggs, Alex Deucher

Hey,

Op 13-09-13 11:00, Peter Zijlstra schreef:
> On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
>> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>>>>>> if (!bo_tryreserve()) {
>>>>>>     up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>     bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>     bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>     return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>> }
>>>> Anyway, could you describe what is wrong, with the above solution, because
>>>> it seems perfectly legal to me.
>>> Luckily the rule of law doesn't have anything to do with this stuff --
>>> at least I sincerely hope so.
>>>
>>> The thing that's wrong with that pattern is that its still not
>>> deterministic - although its a lot better than the pure trylock. Because
>>> you have to release and re-acquire with the trylock another user might
>>> have gotten in again. Its utterly prone to starvation.
>>>
>>> The acquire+release does remove the dead/life-lock scenario from the
>>> FIFO case, since blocking on the acquire will allow the other task to
>>> run (or even get boosted on -rt).
>>>
>>> Aside from that there's nothing particularly wrong with it and lockdep
>>> should be happy afaict (but I haven't had my morning juice yet).
>> bo_reserve internally maps to a ww-mutex and task can already hold
>> ww-mutex (potentially even the same for especially nasty userspace).
> OK, yes I wasn't aware of that. Yes in that case you're quite right.
>
I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.

This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
Nouveau was a bit of a headache, but afaict it should work.

In almost all cases relocs are not updated, so I kept intact the fastpath
of not copying relocs from userspace. The slowpath tries to copy it atomically,
and if that fails it will unreserve all bo's and copy everything.

One thing to note is that the command submission ioctl may fail now with -EFAULT
if presumed cannot be updated, while the commands are submitted succesfully.

I'm not sure what the right behavior was here, and this can only happen if you
touch the memory during the ioctl or use a read-only page. Both of them are not done
in the common case.

Reviews welcome. :P

8<---

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index e4d60e7..2964bb7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -445,8 +445,6 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
 	      uint64_t user_pbbo_ptr)
 {
 	struct nouveau_drm *drm = chan->drm;
-	struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
-				(void __force __user *)(uintptr_t)user_pbbo_ptr;
 	struct nouveau_bo *nvbo;
 	int ret, relocs = 0;
 
@@ -475,7 +473,7 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
 			return ret;
 		}
 
-		if (nv_device(drm->device)->card_type < NV_50) {
+		if (nv_device(drm->device)->card_type < NV_50 && !relocs) {
 			if (nvbo->bo.offset == b->presumed.offset &&
 			    ((nvbo->bo.mem.mem_type == TTM_PL_VRAM &&
 			      b->presumed.domain & NOUVEAU_GEM_DOMAIN_VRAM) ||
@@ -483,53 +481,86 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
 			      b->presumed.domain & NOUVEAU_GEM_DOMAIN_GART)))
 				continue;
 
-			if (nvbo->bo.mem.mem_type == TTM_PL_TT)
-				b->presumed.domain = NOUVEAU_GEM_DOMAIN_GART;
-			else
-				b->presumed.domain = NOUVEAU_GEM_DOMAIN_VRAM;
-			b->presumed.offset = nvbo->bo.offset;
-			b->presumed.valid = 0;
-			relocs++;
-
-			if (DRM_COPY_TO_USER(&upbbo[nvbo->pbbo_index].presumed,
-					     &b->presumed, sizeof(b->presumed)))
-				return -EFAULT;
+			relocs = 1;
 		}
 	}
 
 	return relocs;
 }
 
+static inline void
+u_free(void *addr)
+{
+	if (!is_vmalloc_addr(addr))
+		kfree(addr);
+	else
+		vfree(addr);
+}
+
+static inline void *
+u_memcpya(uint64_t user, unsigned nmemb, unsigned size, unsigned inatomic)
+{
+	void *mem;
+	void __user *userptr = (void __force __user *)(uintptr_t)user;
+
+	size *= nmemb;
+
+	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
+	if (!mem)
+		mem = vmalloc(size);
+	if (!mem)
+		return ERR_PTR(-ENOMEM);
+
+	if (inatomic && (!access_ok(VERIFY_READ, userptr, size) ||
+	    __copy_from_user_inatomic(mem, userptr, size))) {
+		u_free(mem);
+		return ERR_PTR(-EFAULT);
+	} else if (!inatomic && copy_from_user(mem, userptr, size)) {
+		u_free(mem);
+		return ERR_PTR(-EFAULT);
+	}
+
+	return mem;
+}
+
+static int
+nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
+				struct drm_nouveau_gem_pushbuf *req,
+				struct drm_nouveau_gem_pushbuf_bo *bo,
+				struct drm_nouveau_gem_pushbuf_reloc *reloc);
+
 static int
 nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 			     struct drm_file *file_priv,
 			     struct drm_nouveau_gem_pushbuf_bo *pbbo,
+			     struct drm_nouveau_gem_pushbuf *req,
 			     uint64_t user_buffers, int nr_buffers,
-			     struct validate_op *op, int *apply_relocs)
+			     struct validate_op *op, int *do_reloc)
 {
 	struct nouveau_cli *cli = nouveau_cli(file_priv);
 	int ret, relocs = 0;
+	struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL;
+
+	if (nr_buffers == 0)
+		return 0;
 
+restart:
 	INIT_LIST_HEAD(&op->vram_list);
 	INIT_LIST_HEAD(&op->gart_list);
 	INIT_LIST_HEAD(&op->both_list);
 
-	if (nr_buffers == 0)
-		return 0;
-
 	ret = validate_init(chan, file_priv, pbbo, nr_buffers, op);
 	if (unlikely(ret)) {
 		if (ret != -ERESTARTSYS)
 			NV_ERROR(cli, "validate_init\n");
-		return ret;
+		goto err;
 	}
 
 	ret = validate_list(chan, cli, &op->vram_list, pbbo, user_buffers);
 	if (unlikely(ret < 0)) {
 		if (ret != -ERESTARTSYS)
 			NV_ERROR(cli, "validate vram_list\n");
-		validate_fini(op, NULL);
-		return ret;
+		goto err_fini;
 	}
 	relocs += ret;
 
@@ -537,8 +568,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 	if (unlikely(ret < 0)) {
 		if (ret != -ERESTARTSYS)
 			NV_ERROR(cli, "validate gart_list\n");
-		validate_fini(op, NULL);
-		return ret;
+		goto err_fini;
 	}
 	relocs += ret;
 
@@ -546,58 +576,93 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 	if (unlikely(ret < 0)) {
 		if (ret != -ERESTARTSYS)
 			NV_ERROR(cli, "validate both_list\n");
-		validate_fini(op, NULL);
-		return ret;
+		goto err_fini;
 	}
 	relocs += ret;
+	if (relocs) {
+		if (!reloc) {
+			//reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc), 1);
+			reloc = ERR_PTR(-EFAULT); NV_ERROR(cli, "slowpath!\n");
+		}
+		if (IS_ERR(reloc)) {
+			validate_fini(op, NULL);
+
+			if (PTR_ERR(reloc) == -EFAULT)
+				reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc), 0);
+
+			if (IS_ERR(reloc))
+				return PTR_ERR(reloc);
+			goto restart;
+		}
+
+		ret = nouveau_gem_pushbuf_reloc_apply(cli, req, pbbo, reloc);
+		if (ret) {
+			NV_ERROR(cli, "reloc apply: %d\n", ret);
+			/* No validate_fini, already called. */
+			return ret;
+		}
+		u_free(reloc);
+		*do_reloc = 1;
+	}
 
-	*apply_relocs = relocs;
 	return 0;
-}
 
-static inline void
-u_free(void *addr)
-{
-	if (!is_vmalloc_addr(addr))
-		kfree(addr);
-	else
-		vfree(addr);
+err_fini:
+	validate_fini(op, NULL);
+err:
+	if (reloc)
+		u_free(reloc);
+	return ret;
 }
 
-static inline void *
-u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
+static int
+nouveau_gem_pushbuf_reloc_copy_to_user(struct drm_nouveau_gem_pushbuf *req,
+				       struct drm_nouveau_gem_pushbuf_bo *bo)
 {
-	void *mem;
-	void __user *userptr = (void __force __user *)(uintptr_t)user;
+	struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
+				 (void __force __user *)(uintptr_t)req->buffers;
+	unsigned i;
 
-	size *= nmemb;
+	for (i = 0; i < req->nr_buffers; ++i) {
+		struct drm_nouveau_gem_pushbuf_bo *b = &bo[i];
 
-	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
-	if (!mem)
-		mem = vmalloc(size);
-	if (!mem)
-		return ERR_PTR(-ENOMEM);
-
-	if (DRM_COPY_FROM_USER(mem, userptr, size)) {
-		u_free(mem);
-		return ERR_PTR(-EFAULT);
+		if (!b->presumed.valid &&
+		    copy_to_user(&upbbo[i].presumed, &b->presumed, sizeof(b->presumed)))
+			return -EFAULT;
 	}
-
-	return mem;
+	return 0;
 }
 
 static int
 nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
 				struct drm_nouveau_gem_pushbuf *req,
-				struct drm_nouveau_gem_pushbuf_bo *bo)
+				struct drm_nouveau_gem_pushbuf_bo *bo,
+				struct drm_nouveau_gem_pushbuf_reloc *reloc)
 {
-	struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL;
 	int ret = 0;
 	unsigned i;
 
-	reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc));
-	if (IS_ERR(reloc))
-		return PTR_ERR(reloc);
+	for (i = 0; i < req->nr_buffers; ++i) {
+		struct drm_nouveau_gem_pushbuf_bo *b = &bo[i];
+		struct nouveau_bo *nvbo = (void *)(unsigned long)
+			bo[i].user_priv;
+
+		if (nvbo->bo.offset == b->presumed.offset &&
+		    ((nvbo->bo.mem.mem_type == TTM_PL_VRAM &&
+		      b->presumed.domain & NOUVEAU_GEM_DOMAIN_VRAM) ||
+		     (nvbo->bo.mem.mem_type == TTM_PL_TT &&
+		      b->presumed.domain & NOUVEAU_GEM_DOMAIN_GART))) {
+			b->presumed.valid = 1;
+			continue;
+		}
+
+		if (nvbo->bo.mem.mem_type == TTM_PL_TT)
+			b->presumed.domain = NOUVEAU_GEM_DOMAIN_GART;
+		else
+			b->presumed.domain = NOUVEAU_GEM_DOMAIN_VRAM;
+		b->presumed.offset = nvbo->bo.offset;
+		b->presumed.valid = 0;
+	}
 
 	for (i = 0; i < req->nr_relocs; i++) {
 		struct drm_nouveau_gem_pushbuf_reloc *r = &reloc[i];
@@ -664,8 +729,6 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
 
 		nouveau_bo_wr32(nvbo, r->reloc_bo_offset >> 2, data);
 	}
-
-	u_free(reloc);
 	return ret;
 }
 
@@ -721,11 +784,11 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
 		return nouveau_abi16_put(abi16, -EINVAL);
 	}
 
-	push = u_memcpya(req->push, req->nr_push, sizeof(*push));
+	push = u_memcpya(req->push, req->nr_push, sizeof(*push), 0);
 	if (IS_ERR(push))
 		return nouveau_abi16_put(abi16, PTR_ERR(push));
 
-	bo = u_memcpya(req->buffers, req->nr_buffers, sizeof(*bo));
+	bo = u_memcpya(req->buffers, req->nr_buffers, sizeof(*bo), 0);
 	if (IS_ERR(bo)) {
 		u_free(push);
 		return nouveau_abi16_put(abi16, PTR_ERR(bo));
@@ -741,7 +804,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
 	}
 
 	/* Validate buffer list */
-	ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->buffers,
+	ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req, req->buffers,
 					   req->nr_buffers, &op, &do_reloc);
 	if (ret) {
 		if (ret != -ERESTARTSYS)
@@ -749,15 +812,6 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
 		goto out_prevalid;
 	}
 
-	/* Apply any relocations that are required */
-	if (do_reloc) {
-		ret = nouveau_gem_pushbuf_reloc_apply(cli, req, bo);
-		if (ret) {
-			NV_ERROR(cli, "reloc apply: %d\n", ret);
-			goto out;
-		}
-	}
-
 	if (chan->dma.ib_max) {
 		ret = nouveau_dma_wait(chan, req->nr_push + 1, 16);
 		if (ret) {
@@ -837,6 +891,17 @@ out:
 	validate_fini(&op, fence);
 	nouveau_fence_unref(&fence);
 
+	if (do_reloc && !ret) {
+		ret = nouveau_gem_pushbuf_reloc_copy_to_user(req, bo);
+		if (ret) {
+			NV_ERROR(cli, "error copying presumed back to userspace: %d\n", ret);
+			/*
+			 * XXX: We return -EFAULT, but command submission
+			 * has already been completed.
+			 */
+		}
+	}
+
 out_prevalid:
 	u_free(bo);
 	u_free(push);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 1006c15..829e911 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -64,12 +64,9 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 * for reserve, and if it fails, retry the fault after scheduling.
 	 */
 
-	ret = ttm_bo_reserve(bo, true, true, false, 0);
-	if (unlikely(ret != 0)) {
-		if (ret == -EBUSY)
-			set_need_resched();
+	ret = ttm_bo_reserve(bo, true, false, false, 0);
+	if (unlikely(ret != 0))
 		return VM_FAULT_NOPAGE;
-	}
 
 	if (bdev->driver->fault_reserve_notify) {
 		ret = bdev->driver->fault_reserve_notify(bo);
@@ -77,6 +74,7 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		case 0:
 			break;
 		case -EBUSY:
+			WARN_ON(1);
 			set_need_resched();
 		case -ERESTARTSYS:
 			retval = VM_FAULT_NOPAGE;


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-23 15:33                     ` [RFC PATCH] drm/nouveau: fix nested locking in mmap handler Maarten Lankhorst
@ 2013-09-24  7:22                       ` Thomas Hellstrom
  2013-09-24  7:34                         ` Maarten Lankhorst
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-24  7:22 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Peter Zijlstra, Daniel Vetter, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Ben Skeggs, Alex Deucher

On 09/23/2013 05:33 PM, Maarten Lankhorst wrote:
> Hey,
>
> Op 13-09-13 11:00, Peter Zijlstra schreef:
>> On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
>>> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>>>>>>> if (!bo_tryreserve()) {
>>>>>>>      up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>>      bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>>      bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>>      return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>>> }
>>>>> Anyway, could you describe what is wrong, with the above solution, because
>>>>> it seems perfectly legal to me.
>>>> Luckily the rule of law doesn't have anything to do with this stuff --
>>>> at least I sincerely hope so.
>>>>
>>>> The thing that's wrong with that pattern is that its still not
>>>> deterministic - although its a lot better than the pure trylock. Because
>>>> you have to release and re-acquire with the trylock another user might
>>>> have gotten in again. Its utterly prone to starvation.
>>>>
>>>> The acquire+release does remove the dead/life-lock scenario from the
>>>> FIFO case, since blocking on the acquire will allow the other task to
>>>> run (or even get boosted on -rt).
>>>>
>>>> Aside from that there's nothing particularly wrong with it and lockdep
>>>> should be happy afaict (but I haven't had my morning juice yet).
>>> bo_reserve internally maps to a ww-mutex and task can already hold
>>> ww-mutex (potentially even the same for especially nasty userspace).
>> OK, yes I wasn't aware of that. Yes in that case you're quite right.
>>
> I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.
>
> This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
> Nouveau was a bit of a headache, but afaict it should work.
>
> In almost all cases relocs are not updated, so I kept intact the fastpath
> of not copying relocs from userspace. The slowpath tries to copy it atomically,
> and if that fails it will unreserve all bo's and copy everything.
>
> One thing to note is that the command submission ioctl may fail now with -EFAULT
> if presumed cannot be updated, while the commands are submitted succesfully.

I think the Nouveau guys need to comment further on this, but returning 
-EFAULT might break existing user-space, and that's not allowed, but 
IIRC the return value of "presumed" is only a hint, and if it's 
incorrect will only trigger future command stream patching.

Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell the 
vmwgfx driver doesn't need any fixups.
>
> I'm not sure what the right behavior was here, and this can only happen if you
> touch the memory during the ioctl or use a read-only page. Both of them are not done
> in the common case.
>
> Reviews welcome. :P
>
> 8<---
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index e4d60e7..2964bb7 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -445,8 +445,6 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
>   	      uint64_t user_pbbo_ptr)
>   {
>   	struct nouveau_drm *drm = chan->drm;
> -	struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
> -				(void __force __user *)(uintptr_t)user_pbbo_ptr;
>   	struct nouveau_bo *nvbo;
>   	int ret, relocs = 0;
>   
> @@ -475,7 +473,7 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
>   			return ret;
>   		}
>   
> -		if (nv_device(drm->device)->card_type < NV_50) {
> +		if (nv_device(drm->device)->card_type < NV_50 && !relocs) {
>   			if (nvbo->bo.offset == b->presumed.offset &&
>   			    ((nvbo->bo.mem.mem_type == TTM_PL_VRAM &&
>   			      b->presumed.domain & NOUVEAU_GEM_DOMAIN_VRAM) ||
> @@ -483,53 +481,86 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
>   			      b->presumed.domain & NOUVEAU_GEM_DOMAIN_GART)))
>   				continue;
>   
> -			if (nvbo->bo.mem.mem_type == TTM_PL_TT)
> -				b->presumed.domain = NOUVEAU_GEM_DOMAIN_GART;
> -			else
> -				b->presumed.domain = NOUVEAU_GEM_DOMAIN_VRAM;
> -			b->presumed.offset = nvbo->bo.offset;
> -			b->presumed.valid = 0;
> -			relocs++;
> -
> -			if (DRM_COPY_TO_USER(&upbbo[nvbo->pbbo_index].presumed,
> -					     &b->presumed, sizeof(b->presumed)))
> -				return -EFAULT;
> +			relocs = 1;
>   		}
>   	}
>   
>   	return relocs;
>   }
>   
> +static inline void
> +u_free(void *addr)
> +{
> +	if (!is_vmalloc_addr(addr))
> +		kfree(addr);
> +	else
> +		vfree(addr);
> +}

Isn't there a DRM utilty for this?

> +
> +static inline void *
> +u_memcpya(uint64_t user, unsigned nmemb, unsigned size, unsigned inatomic)
> +{
> +	void *mem;
> +	void __user *userptr = (void __force __user *)(uintptr_t)user;
> +
> +	size *= nmemb;
> +
> +	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> +	if (!mem)
> +		mem = vmalloc(size);
And for the above as well?

> +	if (!mem)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (inatomic && (!access_ok(VERIFY_READ, userptr, size) ||
> +	    __copy_from_user_inatomic(mem, userptr, size))) {
> +		u_free(mem);
> +		return ERR_PTR(-EFAULT);
> +	} else if (!inatomic && copy_from_user(mem, userptr, size)) {
> +		u_free(mem);
> +		return ERR_PTR(-EFAULT);
> +	}
> +
> +	return mem;
> +}
> +
> +static int
> +nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
> +				struct drm_nouveau_gem_pushbuf *req,
> +				struct drm_nouveau_gem_pushbuf_bo *bo,
> +				struct drm_nouveau_gem_pushbuf_reloc *reloc);
> +
>   static int
>   nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
>   			     struct drm_file *file_priv,
>   			     struct drm_nouveau_gem_pushbuf_bo *pbbo,
> +			     struct drm_nouveau_gem_pushbuf *req,
>   			     uint64_t user_buffers, int nr_buffers,
> -			     struct validate_op *op, int *apply_relocs)
> +			     struct validate_op *op, int *do_reloc)
>   {
>   	struct nouveau_cli *cli = nouveau_cli(file_priv);
>   	int ret, relocs = 0;
> +	struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL;
> +
> +	if (nr_buffers == 0)
> +		return 0;
>   
> +restart:
>   	INIT_LIST_HEAD(&op->vram_list);
>   	INIT_LIST_HEAD(&op->gart_list);
>   	INIT_LIST_HEAD(&op->both_list);
>   
> -	if (nr_buffers == 0)
> -		return 0;
> -
>   	ret = validate_init(chan, file_priv, pbbo, nr_buffers, op);
>   	if (unlikely(ret)) {
>   		if (ret != -ERESTARTSYS)
>   			NV_ERROR(cli, "validate_init\n");
> -		return ret;
> +		goto err;
>   	}
>   
>   	ret = validate_list(chan, cli, &op->vram_list, pbbo, user_buffers);
>   	if (unlikely(ret < 0)) {
>   		if (ret != -ERESTARTSYS)
>   			NV_ERROR(cli, "validate vram_list\n");
> -		validate_fini(op, NULL);
> -		return ret;
> +		goto err_fini;
>   	}
>   	relocs += ret;
>   
> @@ -537,8 +568,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
>   	if (unlikely(ret < 0)) {
>   		if (ret != -ERESTARTSYS)
>   			NV_ERROR(cli, "validate gart_list\n");
> -		validate_fini(op, NULL);
> -		return ret;
> +		goto err_fini;
>   	}
>   	relocs += ret;
>   
> @@ -546,58 +576,93 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
>   	if (unlikely(ret < 0)) {
>   		if (ret != -ERESTARTSYS)
>   			NV_ERROR(cli, "validate both_list\n");
> -		validate_fini(op, NULL);
> -		return ret;
> +		goto err_fini;
>   	}
>   	relocs += ret;
> +	if (relocs) {
> +		if (!reloc) {
> +			//reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc), 1);
> +			reloc = ERR_PTR(-EFAULT); NV_ERROR(cli, "slowpath!\n");
> +		}
> +		if (IS_ERR(reloc)) {
> +			validate_fini(op, NULL);
> +
> +			if (PTR_ERR(reloc) == -EFAULT)
> +				reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc), 0);
> +
> +			if (IS_ERR(reloc))
> +				return PTR_ERR(reloc);
> +			goto restart;
> +		}
> +
> +		ret = nouveau_gem_pushbuf_reloc_apply(cli, req, pbbo, reloc);
> +		if (ret) {
> +			NV_ERROR(cli, "reloc apply: %d\n", ret);
> +			/* No validate_fini, already called. */
> +			return ret;
> +		}
> +		u_free(reloc);
> +		*do_reloc = 1;
> +	}
>   
> -	*apply_relocs = relocs;
>   	return 0;
> -}
>   
> -static inline void
> -u_free(void *addr)
> -{
> -	if (!is_vmalloc_addr(addr))
> -		kfree(addr);
> -	else
> -		vfree(addr);
> +err_fini:
> +	validate_fini(op, NULL);
> +err:
> +	if (reloc)
> +		u_free(reloc);
> +	return ret;
>   }
>   
> -static inline void *
> -u_memcpya(uint64_t user, unsigned nmemb, unsigned size)
> +static int
> +nouveau_gem_pushbuf_reloc_copy_to_user(struct drm_nouveau_gem_pushbuf *req,
> +				       struct drm_nouveau_gem_pushbuf_bo *bo)
>   {
> -	void *mem;
> -	void __user *userptr = (void __force __user *)(uintptr_t)user;
> +	struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
> +				 (void __force __user *)(uintptr_t)req->buffers;
> +	unsigned i;
>   
> -	size *= nmemb;
> +	for (i = 0; i < req->nr_buffers; ++i) {
> +		struct drm_nouveau_gem_pushbuf_bo *b = &bo[i];
>   
> -	mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
> -	if (!mem)
> -		mem = vmalloc(size);
> -	if (!mem)
> -		return ERR_PTR(-ENOMEM);
> -
> -	if (DRM_COPY_FROM_USER(mem, userptr, size)) {
> -		u_free(mem);
> -		return ERR_PTR(-EFAULT);
> +		if (!b->presumed.valid &&
> +		    copy_to_user(&upbbo[i].presumed, &b->presumed, sizeof(b->presumed)))
> +			return -EFAULT;
>   	}
> -
> -	return mem;
> +	return 0;
>   }
>   
>   static int
>   nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
>   				struct drm_nouveau_gem_pushbuf *req,
> -				struct drm_nouveau_gem_pushbuf_bo *bo)
> +				struct drm_nouveau_gem_pushbuf_bo *bo,
> +				struct drm_nouveau_gem_pushbuf_reloc *reloc)
>   {
> -	struct drm_nouveau_gem_pushbuf_reloc *reloc = NULL;
>   	int ret = 0;
>   	unsigned i;
>   
> -	reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc));
> -	if (IS_ERR(reloc))
> -		return PTR_ERR(reloc);
> +	for (i = 0; i < req->nr_buffers; ++i) {
> +		struct drm_nouveau_gem_pushbuf_bo *b = &bo[i];
> +		struct nouveau_bo *nvbo = (void *)(unsigned long)
> +			bo[i].user_priv;
> +
> +		if (nvbo->bo.offset == b->presumed.offset &&
> +		    ((nvbo->bo.mem.mem_type == TTM_PL_VRAM &&
> +		      b->presumed.domain & NOUVEAU_GEM_DOMAIN_VRAM) ||
> +		     (nvbo->bo.mem.mem_type == TTM_PL_TT &&
> +		      b->presumed.domain & NOUVEAU_GEM_DOMAIN_GART))) {
> +			b->presumed.valid = 1;
> +			continue;
> +		}
> +
> +		if (nvbo->bo.mem.mem_type == TTM_PL_TT)
> +			b->presumed.domain = NOUVEAU_GEM_DOMAIN_GART;
> +		else
> +			b->presumed.domain = NOUVEAU_GEM_DOMAIN_VRAM;
> +		b->presumed.offset = nvbo->bo.offset;
> +		b->presumed.valid = 0;
> +	}
>   
>   	for (i = 0; i < req->nr_relocs; i++) {
>   		struct drm_nouveau_gem_pushbuf_reloc *r = &reloc[i];
> @@ -664,8 +729,6 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,
>   
>   		nouveau_bo_wr32(nvbo, r->reloc_bo_offset >> 2, data);
>   	}
> -
> -	u_free(reloc);
>   	return ret;
>   }
>   
> @@ -721,11 +784,11 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
>   		return nouveau_abi16_put(abi16, -EINVAL);
>   	}
>   
> -	push = u_memcpya(req->push, req->nr_push, sizeof(*push));
> +	push = u_memcpya(req->push, req->nr_push, sizeof(*push), 0);
>   	if (IS_ERR(push))
>   		return nouveau_abi16_put(abi16, PTR_ERR(push));
>   
> -	bo = u_memcpya(req->buffers, req->nr_buffers, sizeof(*bo));
> +	bo = u_memcpya(req->buffers, req->nr_buffers, sizeof(*bo), 0);
>   	if (IS_ERR(bo)) {
>   		u_free(push);
>   		return nouveau_abi16_put(abi16, PTR_ERR(bo));
> @@ -741,7 +804,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
>   	}
>   
>   	/* Validate buffer list */
> -	ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req->buffers,
> +	ret = nouveau_gem_pushbuf_validate(chan, file_priv, bo, req, req->buffers,
>   					   req->nr_buffers, &op, &do_reloc);
>   	if (ret) {
>   		if (ret != -ERESTARTSYS)
> @@ -749,15 +812,6 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
>   		goto out_prevalid;
>   	}
>   
> -	/* Apply any relocations that are required */
> -	if (do_reloc) {
> -		ret = nouveau_gem_pushbuf_reloc_apply(cli, req, bo);
> -		if (ret) {
> -			NV_ERROR(cli, "reloc apply: %d\n", ret);
> -			goto out;
> -		}
> -	}
> -
>   	if (chan->dma.ib_max) {
>   		ret = nouveau_dma_wait(chan, req->nr_push + 1, 16);
>   		if (ret) {
> @@ -837,6 +891,17 @@ out:
>   	validate_fini(&op, fence);
>   	nouveau_fence_unref(&fence);
>   
> +	if (do_reloc && !ret) {
> +		ret = nouveau_gem_pushbuf_reloc_copy_to_user(req, bo);
> +		if (ret) {
> +			NV_ERROR(cli, "error copying presumed back to userspace: %d\n", ret);
> +			/*
> +			 * XXX: We return -EFAULT, but command submission
> +			 * has already been completed.
> +			 */
> +		}
> +	}
> +
>   out_prevalid:
>   	u_free(bo);
>   	u_free(push);
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 1006c15..829e911 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -64,12 +64,9 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>   	 * for reserve, and if it fails, retry the fault after scheduling.
>   	 */
>   
> -	ret = ttm_bo_reserve(bo, true, true, false, 0);
> -	if (unlikely(ret != 0)) {
> -		if (ret == -EBUSY)
> -			set_need_resched();
> +	ret = ttm_bo_reserve(bo, true, false, false, 0);
> +	if (unlikely(ret != 0))
>   		return VM_FAULT_NOPAGE;
> -	}
>   

Actually, it's not the locking order bo:reserve -> mmap_sem that 
triggers the lockdep warning, right? but the fact that copy_from_user 
could recurse into the fault handler? Grabbing bo:reseves recursively? 
which should leave us free to choose locking order at this point.

>   	if (bdev->driver->fault_reserve_notify) {
>   		ret = bdev->driver->fault_reserve_notify(bo);
> @@ -77,6 +74,7 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>   		case 0:
>   			break;
>   		case -EBUSY:
> +			WARN_ON(1);
>   			set_need_resched();

I don't think this is used anymore, so set_need_resched might go.

>   		case -ERESTARTSYS:
>   			retval = VM_FAULT_NOPAGE;
/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24  7:22                       ` Thomas Hellstrom
@ 2013-09-24  7:34                         ` Maarten Lankhorst
  2013-09-24  8:20                           ` Ingo Molnar
  2013-09-24  9:03                           ` Thomas Hellstrom
  0 siblings, 2 replies; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-24  7:34 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Peter Zijlstra, Daniel Vetter, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Ben Skeggs, Alex Deucher

Op 24-09-13 09:22, Thomas Hellstrom schreef:
> On 09/23/2013 05:33 PM, Maarten Lankhorst wrote:
>> Hey,
>>
>> Op 13-09-13 11:00, Peter Zijlstra schreef:
>>> On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
>>>> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>>>>>>>> if (!bo_tryreserve()) {
>>>>>>>>      up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>>>      bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>>>      bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>>>      return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>>>> }
>>>>>> Anyway, could you describe what is wrong, with the above solution, because
>>>>>> it seems perfectly legal to me.
>>>>> Luckily the rule of law doesn't have anything to do with this stuff --
>>>>> at least I sincerely hope so.
>>>>>
>>>>> The thing that's wrong with that pattern is that its still not
>>>>> deterministic - although its a lot better than the pure trylock. Because
>>>>> you have to release and re-acquire with the trylock another user might
>>>>> have gotten in again. Its utterly prone to starvation.
>>>>>
>>>>> The acquire+release does remove the dead/life-lock scenario from the
>>>>> FIFO case, since blocking on the acquire will allow the other task to
>>>>> run (or even get boosted on -rt).
>>>>>
>>>>> Aside from that there's nothing particularly wrong with it and lockdep
>>>>> should be happy afaict (but I haven't had my morning juice yet).
>>>> bo_reserve internally maps to a ww-mutex and task can already hold
>>>> ww-mutex (potentially even the same for especially nasty userspace).
>>> OK, yes I wasn't aware of that. Yes in that case you're quite right.
>>>
>> I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.
>>
>> This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
>> Nouveau was a bit of a headache, but afaict it should work.
>>
>> In almost all cases relocs are not updated, so I kept intact the fastpath
>> of not copying relocs from userspace. The slowpath tries to copy it atomically,
>> and if that fails it will unreserve all bo's and copy everything.
>>
>> One thing to note is that the command submission ioctl may fail now with -EFAULT
>> if presumed cannot be updated, while the commands are submitted succesfully.
>
> I think the Nouveau guys need to comment further on this, but returning -EFAULT might break existing user-space, and that's not allowed, but IIRC the return value of "presumed" is only a hint, and if it's incorrect will only trigger future command stream patching.
>
> Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell the vmwgfx driver doesn't need any fixups.
Well because we read the list of buffer objects the presumed offsets are at least read-mapped. Although I guess in the worst case the mapping might disappear before the syscall copies back the data.
So if -EFAULT happens here then userspace messed up in some way, either by forgetting to map the offsets read-write, which cannot happen with libdrm or free'ing the bo list before the syscall returns,
which would probably result in libdrm crashing shortly afterwards anyway.

So I don't know whether to swallow that -EFAULT or not, which is what I asked.

And for vmwgfx just run it through PROVE_LOCKING and make sure all the copy_to/from_user call sites are called at least once, and then you can be certain it doesn't need fixups. ;)
Lockdep ftw..

>>
>> I'm not sure what the right behavior was here, and this can only happen if you
>> touch the memory during the ioctl or use a read-only page. Both of them are not done
>> in the common case.
>>
>> Reviews welcome. :P
>>
>> 8<---
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> index e4d60e7..2964bb7 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
>> @@ -445,8 +445,6 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
>>             uint64_t user_pbbo_ptr)
>>   {
>>       struct nouveau_drm *drm = chan->drm;
>> -    struct drm_nouveau_gem_pushbuf_bo __user *upbbo =
>> -                (void __force __user *)(uintptr_t)user_pbbo_ptr;
>>       struct nouveau_bo *nvbo;
>>       int ret, relocs = 0;
>>   @@ -475,7 +473,7 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
>>               return ret;
>>           }
>>   -        if (nv_device(drm->device)->card_type < NV_50) {
>> +        if (nv_device(drm->device)->card_type < NV_50 && !relocs) {
>>               if (nvbo->bo.offset == b->presumed.offset &&
>>                   ((nvbo->bo.mem.mem_type == TTM_PL_VRAM &&
>>                     b->presumed.domain & NOUVEAU_GEM_DOMAIN_VRAM) ||
>> @@ -483,53 +481,86 @@ validate_list(struct nouveau_channel *chan, struct nouveau_cli *cli,
>>                     b->presumed.domain & NOUVEAU_GEM_DOMAIN_GART)))
>>                   continue;
>>   -            if (nvbo->bo.mem.mem_type == TTM_PL_TT)
>> -                b->presumed.domain = NOUVEAU_GEM_DOMAIN_GART;
>> -            else
>> -                b->presumed.domain = NOUVEAU_GEM_DOMAIN_VRAM;
>> -            b->presumed.offset = nvbo->bo.offset;
>> -            b->presumed.valid = 0;
>> -            relocs++;
>> -
>> -            if (DRM_COPY_TO_USER(&upbbo[nvbo->pbbo_index].presumed,
>> -                         &b->presumed, sizeof(b->presumed)))
>> -                return -EFAULT;
>> +            relocs = 1;
>>           }
>>       }
>>         return relocs;
>>   }
>>   +static inline void
>> +u_free(void *addr)
>> +{
>> +    if (!is_vmalloc_addr(addr))
>> +        kfree(addr);
>> +    else
>> +        vfree(addr);
>> +}
>
> Isn't there a DRM utilty for this?
Oops, seems you're right..

Those functions can be replaced with drm_malloc_ab and drm_free_large.
>> +
>> +static inline void *
>> +u_memcpya(uint64_t user, unsigned nmemb, unsigned size, unsigned inatomic)
>> +{
>> +    void *mem;
>> +    void __user *userptr = (void __force __user *)(uintptr_t)user;
>> +
>> +    size *= nmemb;
>> +
>> +    mem = kmalloc(size, GFP_KERNEL | __GFP_NOWARN);
>> +    if (!mem)
>> +        mem = vmalloc(size);
> And for the above as well?
Indeed..
>
>> ...
>>   out_prevalid:
>>       u_free(bo);
>>       u_free(push);
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> index 1006c15..829e911 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> @@ -64,12 +64,9 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>>        * for reserve, and if it fails, retry the fault after scheduling.
>>        */
>>   -    ret = ttm_bo_reserve(bo, true, true, false, 0);
>> -    if (unlikely(ret != 0)) {
>> -        if (ret == -EBUSY)
>> -            set_need_resched();
>> +    ret = ttm_bo_reserve(bo, true, false, false, 0);
>> +    if (unlikely(ret != 0))
>>           return VM_FAULT_NOPAGE;
>> -    }
>>   
>
> Actually, it's not the locking order bo:reserve -> mmap_sem that triggers the lockdep warning, right? but the fact that copy_from_user could recurse into the fault handler? Grabbing bo:reseves recursively? which should leave us free to choose locking order at this point.
Same thing.

When PROVE_LOCKING is enabled the might_fault calls in copy_to/from_user do a fake locking of mmap_sem, which means all locking errors, provided that the reservation lock is taken at least once with mmap_sem held (eg the ttm fault handler is called at least once, it can be done separately, it doesn't need to be done in the same syscall). So any bugs will be found. The only thing to worry about is that lockdep turns off after finding 1 error, so you have to make sure it doesn't bomb out before completing tests, which is sometimes a problem on early rc kernels. ;)

~Maarten

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24  7:34                         ` Maarten Lankhorst
@ 2013-09-24  8:20                           ` Ingo Molnar
  2013-09-24  9:03                           ` Thomas Hellstrom
  1 sibling, 0 replies; 58+ messages in thread
From: Ingo Molnar @ 2013-09-24  8:20 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Thomas Hellstrom, Peter Zijlstra, Daniel Vetter, Dave Airlie,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Thomas Gleixner,
	Ben Skeggs, Alex Deucher


* Maarten Lankhorst <maarten.lankhorst@canonical.com> wrote:

> > I think the Nouveau guys need to comment further on this, but 
> > returning -EFAULT might break existing user-space, and that's not 
> > allowed, but IIRC the return value of "presumed" is only a hint, and 
> > if it's incorrect will only trigger future command stream patching.
> >
> > Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell 
> > the vmwgfx driver doesn't need any fixups.
>
> Well because we read the list of buffer objects the presumed offsets are 
> at least read-mapped. Although I guess in the worst case the mapping 
> might disappear before the syscall copies back the data. So if -EFAULT 
> happens here then userspace messed up in some way, either by forgetting 
> to map the offsets read-write, which cannot happen with libdrm or 
> free'ing the bo list before the syscall returns, which would probably 
> result in libdrm crashing shortly afterwards anyway.
> 
> So I don't know whether to swallow that -EFAULT or not, which is what I 
> asked.

In such a case returning -EFAULT is very strongly preferred.

If there's a user-space bug, such as a context life time race between 
graphics context creation/destruction and user threads making use of the 
graphics context, then getting the -EFAULT would be very helpful to 
user-space debugging as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24  7:34                         ` Maarten Lankhorst
  2013-09-24  8:20                           ` Ingo Molnar
@ 2013-09-24  9:03                           ` Thomas Hellstrom
  2013-09-24  9:36                             ` Daniel Vetter
  2013-09-24  9:43                             ` Maarten Lankhorst
  1 sibling, 2 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-24  9:03 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Peter Zijlstra, Daniel Vetter, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Ben Skeggs, Alex Deucher

On 09/24/2013 09:34 AM, Maarten Lankhorst wrote:
> Op 24-09-13 09:22, Thomas Hellstrom schreef:
>> On 09/23/2013 05:33 PM, Maarten Lankhorst wrote:
>>> Hey,
>>>
>>> Op 13-09-13 11:00, Peter Zijlstra schreef:
>>>> On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
>>>>> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>>>>>>>>> if (!bo_tryreserve()) {
>>>>>>>>>       up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>>>>       bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>>>>       bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>>>>       return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>>>>> }
>>>>>>> Anyway, could you describe what is wrong, with the above solution, because
>>>>>>> it seems perfectly legal to me.
>>>>>> Luckily the rule of law doesn't have anything to do with this stuff --
>>>>>> at least I sincerely hope so.
>>>>>>
>>>>>> The thing that's wrong with that pattern is that its still not
>>>>>> deterministic - although its a lot better than the pure trylock. Because
>>>>>> you have to release and re-acquire with the trylock another user might
>>>>>> have gotten in again. Its utterly prone to starvation.
>>>>>>
>>>>>> The acquire+release does remove the dead/life-lock scenario from the
>>>>>> FIFO case, since blocking on the acquire will allow the other task to
>>>>>> run (or even get boosted on -rt).
>>>>>>
>>>>>> Aside from that there's nothing particularly wrong with it and lockdep
>>>>>> should be happy afaict (but I haven't had my morning juice yet).
>>>>> bo_reserve internally maps to a ww-mutex and task can already hold
>>>>> ww-mutex (potentially even the same for especially nasty userspace).
>>>> OK, yes I wasn't aware of that. Yes in that case you're quite right.
>>>>
>>> I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.
>>>
>>> This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
>>> Nouveau was a bit of a headache, but afaict it should work.
>>>
>>> In almost all cases relocs are not updated, so I kept intact the fastpath
>>> of not copying relocs from userspace. The slowpath tries to copy it atomically,
>>> and if that fails it will unreserve all bo's and copy everything.
>>>
>>> One thing to note is that the command submission ioctl may fail now with -EFAULT
>>> if presumed cannot be updated, while the commands are submitted succesfully.
>> I think the Nouveau guys need to comment further on this, but returning -EFAULT might break existing user-space, and that's not allowed, but IIRC the return value of "presumed" is only a hint, and if it's incorrect will only trigger future command stream patching.
>>
>> Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell the vmwgfx driver doesn't need any fixups.
> Well because we read the list of buffer objects the presumed offsets are at least read-mapped. Although I guess in the worst case the mapping might disappear before the syscall copies back the data.
> So if -EFAULT happens here then userspace messed up in some way, either by forgetting to map the offsets read-write, which cannot happen with libdrm or free'ing the bo list before the syscall returns,
> which would probably result in libdrm crashing shortly afterwards anyway.

Hmm, is the list of buffer objects (and the "presumed" members) really 
in DRM memory? Because if it resides or may reside in anonymous system 
memory, it may well be paged out between reading and writing, in which 
case the -EFAULT return is incorrect.

In fact failures of pushbuf / execbuf *after* commands are successfully 
submitted are generally very hard to recover from. That's why the kernel 
should do whatever it takes to recover from such failures, and 
user-space should do whatever it takes to recover from copy-to-user 
failures of needed info from the kernel, and it really depends on the 
user-space usage pattern of "presumed". IIRC the original reason for 
copying it back to user-space was, that if a relocation offsets were 
patched up by the kernel, and then the process was sent a signal causing 
it to retry execbuf, then "presumed" had to be updated, otherwise it 
would be inconsistent with what's currently in the command stream, which 
is very bad. If "presumed" is, however, only used by user-space to guess 
an offset, the correct action would be to swallow the -EFAULT.

>
> So I don't know whether to swallow that -EFAULT or not, which is what I asked.
>
> And for vmwgfx just run it through PROVE_LOCKING and make sure all the copy_to/from_user call sites are called at least once, and then you can be certain it doesn't need fixups. ;)
> Lockdep ftw..
Will do that.

>
>> Actually, it's not the locking order bo:reserve -> mmap_sem that triggers the lockdep warning, right? but the fact that copy_from_user could recurse into the fault handler? Grabbing bo:reseves recursively? which should leave us free to choose locking order at this point.
> Same thing.
>
> When PROVE_LOCKING is enabled the might_fault calls in copy_to/from_user do a fake locking of mmap_sem, which means all locking errors, provided that the reservation lock is taken at least once with mmap_sem held (eg the ttm fault handler is called at least once, it can be done separately, it doesn't need to be done in the same syscall). So any bugs will be found. The only thing to worry about is that lockdep turns off after finding 1 error, so you have to make sure it doesn't bomb out before completing tests, which is sometimes a problem on early rc kernels. ;)
>
> ~Maarten

My point was that when we only have copy_[to|from]_user_inatomic inside 
any bo:reservations, the might_fault would never be called inside any 
reservations and we should, in principle, be free to choose locking 
order, provided of course it could be done cleanly in fault()?

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24  9:03                           ` Thomas Hellstrom
@ 2013-09-24  9:36                             ` Daniel Vetter
  2013-09-24 10:11                               ` Maarten Lankhorst
  2013-09-24  9:43                             ` Maarten Lankhorst
  1 sibling, 1 reply; 58+ messages in thread
From: Daniel Vetter @ 2013-09-24  9:36 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Maarten Lankhorst, Peter Zijlstra, Daniel Vetter, Dave Airlie,
	intel-gfx, dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner, Ben Skeggs, Alex Deucher

On Tue, Sep 24, 2013 at 11:03:37AM +0200, Thomas Hellstrom wrote:
> On 09/24/2013 09:34 AM, Maarten Lankhorst wrote:
> >Op 24-09-13 09:22, Thomas Hellstrom schreef:
> >>On 09/23/2013 05:33 PM, Maarten Lankhorst wrote:
> >>>Hey,
> >>>
> >>>Op 13-09-13 11:00, Peter Zijlstra schreef:
> >>>>On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
> >>>>>On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>>>>>On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
> >>>>>>>>>if (!bo_tryreserve()) {
> >>>>>>>>>      up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
> >>>>>>>>>      bo_reserve();               // Wait for the BO to become available (interruptible)
> >>>>>>>>>      bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
> >>>>>>>>>      return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
> >>>>>>>>>}
> >>>>>>>Anyway, could you describe what is wrong, with the above solution, because
> >>>>>>>it seems perfectly legal to me.
> >>>>>>Luckily the rule of law doesn't have anything to do with this stuff --
> >>>>>>at least I sincerely hope so.
> >>>>>>
> >>>>>>The thing that's wrong with that pattern is that its still not
> >>>>>>deterministic - although its a lot better than the pure trylock. Because
> >>>>>>you have to release and re-acquire with the trylock another user might
> >>>>>>have gotten in again. Its utterly prone to starvation.
> >>>>>>
> >>>>>>The acquire+release does remove the dead/life-lock scenario from the
> >>>>>>FIFO case, since blocking on the acquire will allow the other task to
> >>>>>>run (or even get boosted on -rt).
> >>>>>>
> >>>>>>Aside from that there's nothing particularly wrong with it and lockdep
> >>>>>>should be happy afaict (but I haven't had my morning juice yet).
> >>>>>bo_reserve internally maps to a ww-mutex and task can already hold
> >>>>>ww-mutex (potentially even the same for especially nasty userspace).
> >>>>OK, yes I wasn't aware of that. Yes in that case you're quite right.
> >>>>
> >>>I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.
> >>>
> >>>This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
> >>>Nouveau was a bit of a headache, but afaict it should work.
> >>>
> >>>In almost all cases relocs are not updated, so I kept intact the fastpath
> >>>of not copying relocs from userspace. The slowpath tries to copy it atomically,
> >>>and if that fails it will unreserve all bo's and copy everything.
> >>>
> >>>One thing to note is that the command submission ioctl may fail now with -EFAULT
> >>>if presumed cannot be updated, while the commands are submitted succesfully.
> >>I think the Nouveau guys need to comment further on this, but returning -EFAULT might break existing user-space, and that's not allowed, but IIRC the return value of "presumed" is only a hint, and if it's incorrect will only trigger future command stream patching.
> >>
> >>Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell the vmwgfx driver doesn't need any fixups.
> >Well because we read the list of buffer objects the presumed offsets are at least read-mapped. Although I guess in the worst case the mapping might disappear before the syscall copies back the data.
> >So if -EFAULT happens here then userspace messed up in some way, either by forgetting to map the offsets read-write, which cannot happen with libdrm or free'ing the bo list before the syscall returns,
> >which would probably result in libdrm crashing shortly afterwards anyway.
> 
> Hmm, is the list of buffer objects (and the "presumed" members)
> really in DRM memory? Because if it resides or may reside in
> anonymous system memory, it may well be paged out between reading
> and writing, in which case the -EFAULT return is incorrect.
> 
> In fact failures of pushbuf / execbuf *after* commands are
> successfully submitted are generally very hard to recover from.
> That's why the kernel should do whatever it takes to recover from
> such failures, and user-space should do whatever it takes to recover
> from copy-to-user failures of needed info from the kernel, and it
> really depends on the user-space usage pattern of "presumed". IIRC
> the original reason for copying it back to user-space was, that if a
> relocation offsets were patched up by the kernel, and then the
> process was sent a signal causing it to retry execbuf, then
> "presumed" had to be updated, otherwise it would be inconsistent
> with what's currently in the command stream, which is very bad. If
> "presumed" is, however, only used by user-space to guess an offset,
> the correct action would be to swallow the -EFAULT.

In i915 we've had tons of fun with a regression in 3.7 where exactly this
blows up: Some of our userspace (UXA ddx specifically) retains
relocations-trees partially accross an execbuf. Which means if the kernel
updates the relocations it _must_ update the presumed offset for
otherwise things will go haywire on the next execbuf. So we can't return
-EFAULT if the userspace memory needs to be just refaulted but still need
to guarante a "correct" presumed offset.

Since we didn't come up with a way to make sure this will work in all
cases when we get an -EFAULT when writing back presumed offsets we have a
rather tricky piece of fallback logic.
- Any -EFAULT error in the fastpath will drop us into the relocation
  slowpath. The slowpath completly processes relocs anew, so any updates
  done by the fastpath are irrelevant.

- The first thing the slowpath does is set the presumed offset in the
  userspace reloc lists to an invalid address (we pick -1) to make sure
  that any subsequent execbuf with the same partial reloc tree will again
  go through the relocation update code.

- Then we do the usual slowpath, i.e. copy relocs from userspace, re-grab
  locks and then process them. The copy-back of the presumed offset
  happens with an copy_to_user_inatomic, and we ignore any errors.

Of course we try really hard to make sure that we never hit the reloc
slowpath ;-) But nowadays this is all fully tested with some nasty
testcases (and a small kernel option to disable prefaulting).

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24  9:03                           ` Thomas Hellstrom
  2013-09-24  9:36                             ` Daniel Vetter
@ 2013-09-24  9:43                             ` Maarten Lankhorst
  2013-09-24 17:53                               ` Thomas Hellstrom
  1 sibling, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-24  9:43 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Peter Zijlstra, Daniel Vetter, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Ben Skeggs, Alex Deucher

Op 24-09-13 11:03, Thomas Hellstrom schreef:
> On 09/24/2013 09:34 AM, Maarten Lankhorst wrote:
>> Op 24-09-13 09:22, Thomas Hellstrom schreef:
>>> On 09/23/2013 05:33 PM, Maarten Lankhorst wrote:
>>>> Hey,
>>>>
>>>> Op 13-09-13 11:00, Peter Zijlstra schreef:
>>>>> On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
>>>>>> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>>>>>>>>>> if (!bo_tryreserve()) {
>>>>>>>>>>       up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>>>>>       bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>>>>>       bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>>>>>       return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>>>>>> }
>>>>>>>> Anyway, could you describe what is wrong, with the above solution, because
>>>>>>>> it seems perfectly legal to me.
>>>>>>> Luckily the rule of law doesn't have anything to do with this stuff --
>>>>>>> at least I sincerely hope so.
>>>>>>>
>>>>>>> The thing that's wrong with that pattern is that its still not
>>>>>>> deterministic - although its a lot better than the pure trylock. Because
>>>>>>> you have to release and re-acquire with the trylock another user might
>>>>>>> have gotten in again. Its utterly prone to starvation.
>>>>>>>
>>>>>>> The acquire+release does remove the dead/life-lock scenario from the
>>>>>>> FIFO case, since blocking on the acquire will allow the other task to
>>>>>>> run (or even get boosted on -rt).
>>>>>>>
>>>>>>> Aside from that there's nothing particularly wrong with it and lockdep
>>>>>>> should be happy afaict (but I haven't had my morning juice yet).
>>>>>> bo_reserve internally maps to a ww-mutex and task can already hold
>>>>>> ww-mutex (potentially even the same for especially nasty userspace).
>>>>> OK, yes I wasn't aware of that. Yes in that case you're quite right.
>>>>>
>>>> I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.
>>>>
>>>> This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
>>>> Nouveau was a bit of a headache, but afaict it should work.
>>>>
>>>> In almost all cases relocs are not updated, so I kept intact the fastpath
>>>> of not copying relocs from userspace. The slowpath tries to copy it atomically,
>>>> and if that fails it will unreserve all bo's and copy everything.
>>>>
>>>> One thing to note is that the command submission ioctl may fail now with -EFAULT
>>>> if presumed cannot be updated, while the commands are submitted succesfully.
>>> I think the Nouveau guys need to comment further on this, but returning -EFAULT might break existing user-space, and that's not allowed, but IIRC the return value of "presumed" is only a hint, and if it's incorrect will only trigger future command stream patching.
>>>
>>> Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell the vmwgfx driver doesn't need any fixups.
>> Well because we read the list of buffer objects the presumed offsets are at least read-mapped. Although I guess in the worst case the mapping might disappear before the syscall copies back the data.
>> So if -EFAULT happens here then userspace messed up in some way, either by forgetting to map the offsets read-write, which cannot happen with libdrm or free'ing the bo list before the syscall returns,
>> which would probably result in libdrm crashing shortly afterwards anyway.
>
> Hmm, is the list of buffer objects (and the "presumed" members) really in DRM memory? Because if it resides or may reside in anonymous system memory, it may well be paged out between reading and writing, in which case the -EFAULT return is incorrect.
It may be swapped out, but I don't use read.*atomic copy calls, faults are allowed to happen at that point and no -EFAULT will be returned from
that. The only -EFAULT that can happen is if the memory ends up being read-only, or ceased to exist.

> In fact failures of pushbuf / execbuf *after* commands are successfully submitted are generally very hard to recover from. That's why the kernel should do whatever it takes to recover from such failures, and user-space should do whatever it takes to recover from copy-to-user failures of needed info from the kernel, and it really depends on the user-space usage pattern of "presumed". IIRC the original reason for copying it back to user-space was, that if a relocation offsets were patched up by the kernel, and then the process was sent a signal causing it to retry execbuf, then "presumed" had to be updated, otherwise it would be inconsistent with what's currently in the command stream, which is very bad. If "presumed" is, however, only used by user-space to guess an offset, the correct action would be to swallow the -EFAULT.
Yeah it's a guess from userspace. But it's probably still a programming error by userspace if it uses a read-only
mapping for presumed, or a mapping that's invalidated. :P But there's nothing I can do at that point,
I can't undo something I just committed. This is why I asked whether I should swallow it or not, because it's
unclear to me. :) There's not much of an impact to userspace if I swallow it, but userspace did deliberately
mess up the call.

>>
>> So I don't know whether to swallow that -EFAULT or not, which is what I asked.
>>
>> And for vmwgfx just run it through PROVE_LOCKING and make sure all the copy_to/from_user call sites are called at least once, and then you can be certain it doesn't need fixups. ;)
>> Lockdep ftw..
> Will do that.
>
>>
>>> Actually, it's not the locking order bo:reserve -> mmap_sem that triggers the lockdep warning, right? but the fact that copy_from_user could recurse into the fault handler? Grabbing bo:reseves recursively? which should leave us free to choose locking order at this point.
>> Same thing.
>>
>> When PROVE_LOCKING is enabled the might_fault calls in copy_to/from_user do a fake locking of mmap_sem, which means all locking errors, provided that the reservation lock is taken at least once with mmap_sem held (eg the ttm fault handler is called at least once, it can be done separately, it doesn't need to be done in the same syscall). So any bugs will be found. The only thing to worry about is that lockdep turns off after finding 1 error, so you have to make sure it doesn't bomb out before completing tests, which is sometimes a problem on early rc kernels. ;)
>
> My point was that when we only have copy_[to|from]_user_inatomic inside any bo:reservations, the might_fault would never be called inside any reservations and we should, in principle, be free to choose locking order, provided of course it could be done cleanly in fault()?
I think it makes sense to keep mmap_sem the outer lock here, and not do scary things in fault. Especially if the mm locking is going to be changed in the future. But you're correct, if holding reservations only inatomic should be used.

~Maarten


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24  9:36                             ` Daniel Vetter
@ 2013-09-24 10:11                               ` Maarten Lankhorst
  2013-09-24 10:33                                 ` Thomas Hellstrom
  0 siblings, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-24 10:11 UTC (permalink / raw)
  To: Thomas Hellstrom, Peter Zijlstra, Dave Airlie, intel-gfx,
	dri-devel, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner, Ben Skeggs, Alex Deucher

Op 24-09-13 11:36, Daniel Vetter schreef:
> On Tue, Sep 24, 2013 at 11:03:37AM +0200, Thomas Hellstrom wrote:
>> On 09/24/2013 09:34 AM, Maarten Lankhorst wrote:
>>> Op 24-09-13 09:22, Thomas Hellstrom schreef:
>>>> On 09/23/2013 05:33 PM, Maarten Lankhorst wrote:
>>>>> Hey,
>>>>>
>>>>> Op 13-09-13 11:00, Peter Zijlstra schreef:
>>>>>> On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
>>>>>>> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>>>>>>>>>>> if (!bo_tryreserve()) {
>>>>>>>>>>>      up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>>>>>>      bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>>>>>>      bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>>>>>>      return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>>>>>>> }
>>>>>>>>> Anyway, could you describe what is wrong, with the above solution, because
>>>>>>>>> it seems perfectly legal to me.
>>>>>>>> Luckily the rule of law doesn't have anything to do with this stuff --
>>>>>>>> at least I sincerely hope so.
>>>>>>>>
>>>>>>>> The thing that's wrong with that pattern is that its still not
>>>>>>>> deterministic - although its a lot better than the pure trylock. Because
>>>>>>>> you have to release and re-acquire with the trylock another user might
>>>>>>>> have gotten in again. Its utterly prone to starvation.
>>>>>>>>
>>>>>>>> The acquire+release does remove the dead/life-lock scenario from the
>>>>>>>> FIFO case, since blocking on the acquire will allow the other task to
>>>>>>>> run (or even get boosted on -rt).
>>>>>>>>
>>>>>>>> Aside from that there's nothing particularly wrong with it and lockdep
>>>>>>>> should be happy afaict (but I haven't had my morning juice yet).
>>>>>>> bo_reserve internally maps to a ww-mutex and task can already hold
>>>>>>> ww-mutex (potentially even the same for especially nasty userspace).
>>>>>> OK, yes I wasn't aware of that. Yes in that case you're quite right.
>>>>>>
>>>>> I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.
>>>>>
>>>>> This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
>>>>> Nouveau was a bit of a headache, but afaict it should work.
>>>>>
>>>>> In almost all cases relocs are not updated, so I kept intact the fastpath
>>>>> of not copying relocs from userspace. The slowpath tries to copy it atomically,
>>>>> and if that fails it will unreserve all bo's and copy everything.
>>>>>
>>>>> One thing to note is that the command submission ioctl may fail now with -EFAULT
>>>>> if presumed cannot be updated, while the commands are submitted succesfully.
>>>> I think the Nouveau guys need to comment further on this, but returning -EFAULT might break existing user-space, and that's not allowed, but IIRC the return value of "presumed" is only a hint, and if it's incorrect will only trigger future command stream patching.
>>>>
>>>> Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell the vmwgfx driver doesn't need any fixups.
>>> Well because we read the list of buffer objects the presumed offsets are at least read-mapped. Although I guess in the worst case the mapping might disappear before the syscall copies back the data.
>>> So if -EFAULT happens here then userspace messed up in some way, either by forgetting to map the offsets read-write, which cannot happen with libdrm or free'ing the bo list before the syscall returns,
>>> which would probably result in libdrm crashing shortly afterwards anyway.
>> Hmm, is the list of buffer objects (and the "presumed" members)
>> really in DRM memory? Because if it resides or may reside in
>> anonymous system memory, it may well be paged out between reading
>> and writing, in which case the -EFAULT return is incorrect.
>>
>> In fact failures of pushbuf / execbuf *after* commands are
>> successfully submitted are generally very hard to recover from.
>> That's why the kernel should do whatever it takes to recover from
>> such failures, and user-space should do whatever it takes to recover
>> from copy-to-user failures of needed info from the kernel, and it
>> really depends on the user-space usage pattern of "presumed". IIRC
>> the original reason for copying it back to user-space was, that if a
>> relocation offsets were patched up by the kernel, and then the
>> process was sent a signal causing it to retry execbuf, then
>> "presumed" had to be updated, otherwise it would be inconsistent
>> with what's currently in the command stream, which is very bad. If
>> "presumed" is, however, only used by user-space to guess an offset,
>> the correct action would be to swallow the -EFAULT.
> In i915 we've had tons of fun with a regression in 3.7 where exactly this
> blows up: Some of our userspace (UXA ddx specifically) retains
> relocations-trees partially accross an execbuf. Which means if the kernel
> updates the relocations it _must_ update the presumed offset for
> otherwise things will go haywire on the next execbuf. So we can't return
> -EFAULT if the userspace memory needs to be just refaulted but still need
> to guarante a "correct" presumed offset.
>
> Since we didn't come up with a way to make sure this will work in all
> cases when we get an -EFAULT when writing back presumed offsets we have a
> rather tricky piece of fallback logic.
> - Any -EFAULT error in the fastpath will drop us into the relocation
>   slowpath. The slowpath completly processes relocs anew, so any updates
>   done by the fastpath are irrelevant.
>
> - The first thing the slowpath does is set the presumed offset in the
>   userspace reloc lists to an invalid address (we pick -1) to make sure
>   that any subsequent execbuf with the same partial reloc tree will again
>   go through the relocation update code.
>
> - Then we do the usual slowpath, i.e. copy relocs from userspace, re-grab
>   locks and then process them. The copy-back of the presumed offset
>   happens with an copy_to_user_inatomic, and we ignore any errors.
>
> Of course we try really hard to make sure that we never hit the reloc
> slowpath ;-) But nowadays this is all fully tested with some nasty
> testcases (and a small kernel option to disable prefaulting).
>
It seems userspace only updates offset and domain in nouveau. If it fails to update
it would result in the same affect as when the buffer gets moved around by TTM.
But hey maybe I'll have some fun, I'll lie to userspace, hardcode userspace offset
to 0x40000000, always force domain to be different and see what breaks.

My guess is absolutely nothing, except it might expose some bugs where we forgot annotation..

~Maarten


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24 10:11                               ` Maarten Lankhorst
@ 2013-09-24 10:33                                 ` Thomas Hellstrom
  2013-09-24 11:32                                   ` Maarten Lankhorst
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-24 10:33 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Ben Skeggs, Alex Deucher

On 09/24/2013 12:11 PM, Maarten Lankhorst wrote:
> Op 24-09-13 11:36, Daniel Vetter schreef:
>> On Tue, Sep 24, 2013 at 11:03:37AM +0200, Thomas Hellstrom wrote:
>>> On 09/24/2013 09:34 AM, Maarten Lankhorst wrote:
>>>> Op 24-09-13 09:22, Thomas Hellstrom schreef:
>>>>> On 09/23/2013 05:33 PM, Maarten Lankhorst wrote:
>>>>>> Hey,
>>>>>>
>>>>>> Op 13-09-13 11:00, Peter Zijlstra schreef:
>>>>>>> On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
>>>>>>>> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>>> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>>>>>>>>>>>> if (!bo_tryreserve()) {
>>>>>>>>>>>>       up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>>>>>>>       bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>>>>>>>       bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>>>>>>>       return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>>>>>>>> }
>>>>>>>>>> Anyway, could you describe what is wrong, with the above solution, because
>>>>>>>>>> it seems perfectly legal to me.
>>>>>>>>> Luckily the rule of law doesn't have anything to do with this stuff --
>>>>>>>>> at least I sincerely hope so.
>>>>>>>>>
>>>>>>>>> The thing that's wrong with that pattern is that its still not
>>>>>>>>> deterministic - although its a lot better than the pure trylock. Because
>>>>>>>>> you have to release and re-acquire with the trylock another user might
>>>>>>>>> have gotten in again. Its utterly prone to starvation.
>>>>>>>>>
>>>>>>>>> The acquire+release does remove the dead/life-lock scenario from the
>>>>>>>>> FIFO case, since blocking on the acquire will allow the other task to
>>>>>>>>> run (or even get boosted on -rt).
>>>>>>>>>
>>>>>>>>> Aside from that there's nothing particularly wrong with it and lockdep
>>>>>>>>> should be happy afaict (but I haven't had my morning juice yet).
>>>>>>>> bo_reserve internally maps to a ww-mutex and task can already hold
>>>>>>>> ww-mutex (potentially even the same for especially nasty userspace).
>>>>>>> OK, yes I wasn't aware of that. Yes in that case you're quite right.
>>>>>>>
>>>>>> I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.
>>>>>>
>>>>>> This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
>>>>>> Nouveau was a bit of a headache, but afaict it should work.
>>>>>>
>>>>>> In almost all cases relocs are not updated, so I kept intact the fastpath
>>>>>> of not copying relocs from userspace. The slowpath tries to copy it atomically,
>>>>>> and if that fails it will unreserve all bo's and copy everything.
>>>>>>
>>>>>> One thing to note is that the command submission ioctl may fail now with -EFAULT
>>>>>> if presumed cannot be updated, while the commands are submitted succesfully.
>>>>> I think the Nouveau guys need to comment further on this, but returning -EFAULT might break existing user-space, and that's not allowed, but IIRC the return value of "presumed" is only a hint, and if it's incorrect will only trigger future command stream patching.
>>>>>
>>>>> Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell the vmwgfx driver doesn't need any fixups.
>>>> Well because we read the list of buffer objects the presumed offsets are at least read-mapped. Although I guess in the worst case the mapping might disappear before the syscall copies back the data.
>>>> So if -EFAULT happens here then userspace messed up in some way, either by forgetting to map the offsets read-write, which cannot happen with libdrm or free'ing the bo list before the syscall returns,
>>>> which would probably result in libdrm crashing shortly afterwards anyway.
>>> Hmm, is the list of buffer objects (and the "presumed" members)
>>> really in DRM memory? Because if it resides or may reside in
>>> anonymous system memory, it may well be paged out between reading
>>> and writing, in which case the -EFAULT return is incorrect.
>>>
>>> In fact failures of pushbuf / execbuf *after* commands are
>>> successfully submitted are generally very hard to recover from.
>>> That's why the kernel should do whatever it takes to recover from
>>> such failures, and user-space should do whatever it takes to recover
>>> from copy-to-user failures of needed info from the kernel, and it
>>> really depends on the user-space usage pattern of "presumed". IIRC
>>> the original reason for copying it back to user-space was, that if a
>>> relocation offsets were patched up by the kernel, and then the
>>> process was sent a signal causing it to retry execbuf, then
>>> "presumed" had to be updated, otherwise it would be inconsistent
>>> with what's currently in the command stream, which is very bad. If
>>> "presumed" is, however, only used by user-space to guess an offset,
>>> the correct action would be to swallow the -EFAULT.
>> In i915 we've had tons of fun with a regression in 3.7 where exactly this
>> blows up: Some of our userspace (UXA ddx specifically) retains
>> relocations-trees partially accross an execbuf. Which means if the kernel
>> updates the relocations it _must_ update the presumed offset for
>> otherwise things will go haywire on the next execbuf. So we can't return
>> -EFAULT if the userspace memory needs to be just refaulted but still need
>> to guarante a "correct" presumed offset.
>>
>> Since we didn't come up with a way to make sure this will work in all
>> cases when we get an -EFAULT when writing back presumed offsets we have a
>> rather tricky piece of fallback logic.
>> - Any -EFAULT error in the fastpath will drop us into the relocation
>>    slowpath. The slowpath completly processes relocs anew, so any updates
>>    done by the fastpath are irrelevant.
>>
>> - The first thing the slowpath does is set the presumed offset in the
>>    userspace reloc lists to an invalid address (we pick -1) to make sure
>>    that any subsequent execbuf with the same partial reloc tree will again
>>    go through the relocation update code.
>>
>> - Then we do the usual slowpath, i.e. copy relocs from userspace, re-grab
>>    locks and then process them. The copy-back of the presumed offset
>>    happens with an copy_to_user_inatomic, and we ignore any errors.
>>
>> Of course we try really hard to make sure that we never hit the reloc
>> slowpath ;-) But nowadays this is all fully tested with some nasty
>> testcases (and a small kernel option to disable prefaulting).
>>
> It seems userspace only updates offset and domain in nouveau. If it fails to update
> it would result in the same affect as when the buffer gets moved around by TTM.
> But hey maybe I'll have some fun, I'll lie to userspace, hardcode userspace offset
> to 0x40000000, always force domain to be different and see what breaks.
>
> My guess is absolutely nothing, except it might expose some bugs where we forgot annotation..
I think that would certainly break if your return an -ERESTARTSYS after 
applying relocations but
before submitting the command stream to hardware....

/Thomas

>
> ~Maarten

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24 10:33                                 ` Thomas Hellstrom
@ 2013-09-24 11:32                                   ` Maarten Lankhorst
  2013-09-24 17:04                                     ` Thomas Hellstrom
  0 siblings, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-09-24 11:32 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Ben Skeggs, Alex Deucher

Op 24-09-13 12:33, Thomas Hellstrom schreef:
> On 09/24/2013 12:11 PM, Maarten Lankhorst wrote:
>> Op 24-09-13 11:36, Daniel Vetter schreef:
>>> On Tue, Sep 24, 2013 at 11:03:37AM +0200, Thomas Hellstrom wrote:
>>>> On 09/24/2013 09:34 AM, Maarten Lankhorst wrote:
>>>>> Op 24-09-13 09:22, Thomas Hellstrom schreef:
>>>>>> On 09/23/2013 05:33 PM, Maarten Lankhorst wrote:
>>>>>>> Hey,
>>>>>>>
>>>>>>> Op 13-09-13 11:00, Peter Zijlstra schreef:
>>>>>>>> On Fri, Sep 13, 2013 at 10:41:54AM +0200, Daniel Vetter wrote:
>>>>>>>>> On Fri, Sep 13, 2013 at 10:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>>>> On Fri, Sep 13, 2013 at 09:46:03AM +0200, Thomas Hellstrom wrote:
>>>>>>>>>>>>> if (!bo_tryreserve()) {
>>>>>>>>>>>>>       up_read mmap_sem(); // Release the mmap_sem to avoid deadlocks.
>>>>>>>>>>>>>       bo_reserve();               // Wait for the BO to become available (interruptible)
>>>>>>>>>>>>>       bo_unreserve();           // Where is bo_wait_unreserved() when we need it, Maarten :P
>>>>>>>>>>>>>       return VM_FAULT_RETRY; // Go ahead and retry the VMA walk, after regrabbing
>>>>>>>>>>>>> }
>>>>>>>>>>> Anyway, could you describe what is wrong, with the above solution, because
>>>>>>>>>>> it seems perfectly legal to me.
>>>>>>>>>> Luckily the rule of law doesn't have anything to do with this stuff --
>>>>>>>>>> at least I sincerely hope so.
>>>>>>>>>>
>>>>>>>>>> The thing that's wrong with that pattern is that its still not
>>>>>>>>>> deterministic - although its a lot better than the pure trylock. Because
>>>>>>>>>> you have to release and re-acquire with the trylock another user might
>>>>>>>>>> have gotten in again. Its utterly prone to starvation.
>>>>>>>>>>
>>>>>>>>>> The acquire+release does remove the dead/life-lock scenario from the
>>>>>>>>>> FIFO case, since blocking on the acquire will allow the other task to
>>>>>>>>>> run (or even get boosted on -rt).
>>>>>>>>>>
>>>>>>>>>> Aside from that there's nothing particularly wrong with it and lockdep
>>>>>>>>>> should be happy afaict (but I haven't had my morning juice yet).
>>>>>>>>> bo_reserve internally maps to a ww-mutex and task can already hold
>>>>>>>>> ww-mutex (potentially even the same for especially nasty userspace).
>>>>>>>> OK, yes I wasn't aware of that. Yes in that case you're quite right.
>>>>>>>>
>>>>>>> I added a RFC patch below.  I only tested with PROVE_LOCKING, and always forced the slowpath for debugging.
>>>>>>>
>>>>>>> This fixes nouveau and core ttm to always use blocking acquisition in fastpath.
>>>>>>> Nouveau was a bit of a headache, but afaict it should work.
>>>>>>>
>>>>>>> In almost all cases relocs are not updated, so I kept intact the fastpath
>>>>>>> of not copying relocs from userspace. The slowpath tries to copy it atomically,
>>>>>>> and if that fails it will unreserve all bo's and copy everything.
>>>>>>>
>>>>>>> One thing to note is that the command submission ioctl may fail now with -EFAULT
>>>>>>> if presumed cannot be updated, while the commands are submitted succesfully.
>>>>>> I think the Nouveau guys need to comment further on this, but returning -EFAULT might break existing user-space, and that's not allowed, but IIRC the return value of "presumed" is only a hint, and if it's incorrect will only trigger future command stream patching.
>>>>>>
>>>>>> Otherwise reviewing mostly the TTM stuff. FWIW, from wat I can tell the vmwgfx driver doesn't need any fixups.
>>>>> Well because we read the list of buffer objects the presumed offsets are at least read-mapped. Although I guess in the worst case the mapping might disappear before the syscall copies back the data.
>>>>> So if -EFAULT happens here then userspace messed up in some way, either by forgetting to map the offsets read-write, which cannot happen with libdrm or free'ing the bo list before the syscall returns,
>>>>> which would probably result in libdrm crashing shortly afterwards anyway.
>>>> Hmm, is the list of buffer objects (and the "presumed" members)
>>>> really in DRM memory? Because if it resides or may reside in
>>>> anonymous system memory, it may well be paged out between reading
>>>> and writing, in which case the -EFAULT return is incorrect.
>>>>
>>>> In fact failures of pushbuf / execbuf *after* commands are
>>>> successfully submitted are generally very hard to recover from.
>>>> That's why the kernel should do whatever it takes to recover from
>>>> such failures, and user-space should do whatever it takes to recover
>>>> from copy-to-user failures of needed info from the kernel, and it
>>>> really depends on the user-space usage pattern of "presumed". IIRC
>>>> the original reason for copying it back to user-space was, that if a
>>>> relocation offsets were patched up by the kernel, and then the
>>>> process was sent a signal causing it to retry execbuf, then
>>>> "presumed" had to be updated, otherwise it would be inconsistent
>>>> with what's currently in the command stream, which is very bad. If
>>>> "presumed" is, however, only used by user-space to guess an offset,
>>>> the correct action would be to swallow the -EFAULT.
>>> In i915 we've had tons of fun with a regression in 3.7 where exactly this
>>> blows up: Some of our userspace (UXA ddx specifically) retains
>>> relocations-trees partially accross an execbuf. Which means if the kernel
>>> updates the relocations it _must_ update the presumed offset for
>>> otherwise things will go haywire on the next execbuf. So we can't return
>>> -EFAULT if the userspace memory needs to be just refaulted but still need
>>> to guarante a "correct" presumed offset.
>>>
>>> Since we didn't come up with a way to make sure this will work in all
>>> cases when we get an -EFAULT when writing back presumed offsets we have a
>>> rather tricky piece of fallback logic.
>>> - Any -EFAULT error in the fastpath will drop us into the relocation
>>>    slowpath. The slowpath completly processes relocs anew, so any updates
>>>    done by the fastpath are irrelevant.
>>>
>>> - The first thing the slowpath does is set the presumed offset in the
>>>    userspace reloc lists to an invalid address (we pick -1) to make sure
>>>    that any subsequent execbuf with the same partial reloc tree will again
>>>    go through the relocation update code.
>>>
>>> - Then we do the usual slowpath, i.e. copy relocs from userspace, re-grab
>>>    locks and then process them. The copy-back of the presumed offset
>>>    happens with an copy_to_user_inatomic, and we ignore any errors.
>>>
>>> Of course we try really hard to make sure that we never hit the reloc
>>> slowpath ;-) But nowadays this is all fully tested with some nasty
>>> testcases (and a small kernel option to disable prefaulting).
>>>
>> It seems userspace only updates offset and domain in nouveau. If it fails to update
>> it would result in the same affect as when the buffer gets moved around by TTM.
>> But hey maybe I'll have some fun, I'll lie to userspace, hardcode userspace offset
>> to 0x40000000, always force domain to be different and see what breaks.
>>
>> My guess is absolutely nothing, except it might expose some bugs where we forgot annotation..
> I think that would certainly break if your return an -ERESTARTSYS after applying relocations but
> before submitting the command stream to hardware....
The relocations are updated before submitting the command stream, but it's copied back to userspace
after submitting the command stream. I'm not sure what -ERESTARTSYS would change, the syscall
is not in an interruptible state.

Oh and after some testing with relocations it seems they always break nouveau. Originally I've set domain
and offset differently, but that crashed the card. In my second try I kept domain and offset correct, and hacked
the kernel to force apply all relocations. Unsurprisingly nouveau still breaks, even if the relocations themselves
should have been a noop. :/

Swallowing -EFAULT is fine theoretically though, I wouldn't worry about it..

~Maarten


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24 11:32                                   ` Maarten Lankhorst
@ 2013-09-24 17:04                                     ` Thomas Hellstrom
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-24 17:04 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Ben Skeggs, Alex Deucher

On 09/24/2013 01:32 PM, Maarten Lankhorst wrote:
> Op 24-09-13 12:33, Thomas Hellstrom schreef:
>> On 09/24/2013 12:11 PM, Maarten Lankhorst wrote:
>>>
>>> It seems userspace only updates offset and domain in nouveau. If it fails to update
>>> it would result in the same affect as when the buffer gets moved around by TTM.
>>> But hey maybe I'll have some fun, I'll lie to userspace, hardcode userspace offset
>>> to 0x40000000, always force domain to be different and see what breaks.
>>>
>>> My guess is absolutely nothing, except it might expose some bugs where we forgot annotation..
>> I think that would certainly break if your return an -ERESTARTSYS after applying relocations but
>> before submitting the command stream to hardware....
> The relocations are updated before submitting the command stream, but it's copied back to userspace
> after submitting the command stream. I'm not sure what -ERESTARTSYS would change, the syscall
> is not in an interruptible state.
Hmm, I was assuming without looking at the code that there was an 
interruptible wait for pipe/ring space after
relocation application.

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/nouveau: fix nested locking in mmap handler
  2013-09-24  9:43                             ` Maarten Lankhorst
@ 2013-09-24 17:53                               ` Thomas Hellstrom
  0 siblings, 0 replies; 58+ messages in thread
From: Thomas Hellstrom @ 2013-09-24 17:53 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Peter Zijlstra, Daniel Vetter, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Ben Skeggs, Alex Deucher

On 09/24/2013 11:43 AM, Maarten Lankhorst wrote:
> Op 24-09-13 11:03, Thomas Hellstrom schreef:
>> On 09/24/2013 09:34 AM, Maarten Lankhorst wrote:
>>> Op 24-09-13 09:22, Thomas Hellstrom schreef:
>>>> Actually, it's not the locking order bo:reserve -> mmap_sem that triggers the lockdep warning, right? but the fact that copy_from_user could recurse into the fault handler? Grabbing bo:reseves recursively? which should leave us free to choose locking order at this point.
>>> Same thing.
>>>
>>> When PROVE_LOCKING is enabled the might_fault calls in copy_to/from_user do a fake locking of mmap_sem, which means all locking errors, provided that the reservation lock is taken at least once with mmap_sem held (eg the ttm fault handler is called at least once, it can be done separately, it doesn't need to be done in the same syscall). So any bugs will be found. The only thing to worry about is that lockdep turns off after finding 1 error, so you have to make sure it doesn't bomb out before completing tests, which is sometimes a problem on early rc kernels. ;)
>> My point was that when we only have copy_[to|from]_user_inatomic inside any bo:reservations, the might_fault would never be called inside any reservations and we should, in principle, be free to choose locking order, provided of course it could be done cleanly in fault()?
> I think it makes sense to keep mmap_sem the outer lock here, and not do scary things in fault. Especially if the mm locking is going to be changed in the future. But you're correct, if holding reservations only inatomic should be used.
>
> ~Maarten

Now that we after all were forced to enter the dark realm of copy-user 
slowpaths, I don't really have a strong opinion anymore, other than that 
we should try to avoid building too much code in that depends on either 
locking order, so that we could change it if _really_ needed.

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-09-13  5:33         ` Thomas Hellstrom
  2013-09-13  8:26           ` Daniel Vetter
@ 2013-10-08 14:14           ` Maarten Lankhorst
  2013-10-08 14:33             ` Jerome Glisse
  1 sibling, 1 reply; 58+ messages in thread
From: Maarten Lankhorst @ 2013-10-08 14:14 UTC (permalink / raw)
  To: Thomas Hellstrom, Alex Deucher
  Cc: Daniel Vetter, Peter Zijlstra, Dave Airlie, intel-gfx, dri-devel,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner

Allocate and copy all kernel memory before doing reservations. This prevents a locking
inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
in ttm_bo_vm_fault without upsetting lockdep.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index 01a3ec8..efa9bca 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -2391,18 +2391,13 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
 	ib_chunk = &parser.chunks[parser.chunk_ib_idx];
 	parser.ib.length_dw = ib_chunk->length_dw;
 	*l = parser.ib.length_dw;
+	memcpy(ib, ib_chunk->kdata, ib_chunk->length_dw * 4);
 	r = r600_cs_parse(&parser);
 	if (r) {
 		DRM_ERROR("Invalid command stream !\n");
 		r600_cs_parser_fini(&parser, r);
 		return r;
 	}
-	r = radeon_cs_finish_pages(&parser);
-	if (r) {
-		DRM_ERROR("Invalid command stream !\n");
-		r600_cs_parser_fini(&parser, r);
-		return r;
-	}
 	r600_cs_parser_fini(&parser, r);
 	return r;
 }
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index e067024..c52bb5e 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -983,12 +983,7 @@ struct radeon_cs_reloc {
 struct radeon_cs_chunk {
 	uint32_t		chunk_id;
 	uint32_t		length_dw;
-	int			kpage_idx[2];
-	uint32_t		*kpage[2];
 	uint32_t		*kdata;
-	void __user		*user_ptr;
-	int			last_copied_page;
-	int			last_page_index;
 };
 
 struct radeon_cs_parser {
@@ -1027,8 +1022,13 @@ struct radeon_cs_parser {
 	struct radeon_sa_bo	*cpu_sema;
 };
 
-extern int radeon_cs_finish_pages(struct radeon_cs_parser *p);
-extern u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
+static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
+{
+	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
+
+	return ibc->kdata[idx];
+}
+
 
 struct radeon_cs_packet {
 	unsigned	idx;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 7d7322e..98d8898 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -217,9 +217,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 			return -EFAULT;
 		}
 		p->chunks[i].length_dw = user_chunk.length_dw;
-		p->chunks[i].kdata = NULL;
 		p->chunks[i].chunk_id = user_chunk.chunk_id;
-		p->chunks[i].user_ptr = (void __user *)(unsigned long)user_chunk.chunk_data;
 		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) {
 			p->chunk_relocs_idx = i;
 		}
@@ -242,25 +240,22 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 				return -EINVAL;
 		}
 
-		cdata = (uint32_t *)(unsigned long)user_chunk.chunk_data;
-		if ((p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) ||
-		    (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS)) {
-			size = p->chunks[i].length_dw * sizeof(uint32_t);
-			p->chunks[i].kdata = kmalloc(size, GFP_KERNEL);
-			if (p->chunks[i].kdata == NULL) {
-				return -ENOMEM;
-			}
-			if (DRM_COPY_FROM_USER(p->chunks[i].kdata,
-					       p->chunks[i].user_ptr, size)) {
-				return -EFAULT;
-			}
-			if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
-				p->cs_flags = p->chunks[i].kdata[0];
-				if (p->chunks[i].length_dw > 1)
-					ring = p->chunks[i].kdata[1];
-				if (p->chunks[i].length_dw > 2)
-					priority = (s32)p->chunks[i].kdata[2];
-			}
+		size = p->chunks[i].length_dw;
+		p->chunks[i].kdata = drm_malloc_ab(size, sizeof(uint32_t));
+		size *= sizeof(uint32_t);
+		if (p->chunks[i].kdata == NULL) {
+			return -ENOMEM;
+		}
+		cdata = (void __user *)(unsigned long)user_chunk.chunk_data;
+		if (DRM_COPY_FROM_USER(p->chunks[i].kdata, cdata, size)) {
+			return -EFAULT;
+		}
+		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
+			p->cs_flags = p->chunks[i].kdata[0];
+			if (p->chunks[i].length_dw > 1)
+				ring = p->chunks[i].kdata[1];
+			if (p->chunks[i].length_dw > 2)
+				priority = (s32)p->chunks[i].kdata[2];
 		}
 	}
 
@@ -283,34 +278,6 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 		}
 	}
 
-	/* deal with non-vm */
-	if ((p->chunk_ib_idx != -1) &&
-	    ((p->cs_flags & RADEON_CS_USE_VM) == 0) &&
-	    (p->chunks[p->chunk_ib_idx].chunk_id == RADEON_CHUNK_ID_IB)) {
-		if (p->chunks[p->chunk_ib_idx].length_dw > (16 * 1024)) {
-			DRM_ERROR("cs IB too big: %d\n",
-				  p->chunks[p->chunk_ib_idx].length_dw);
-			return -EINVAL;
-		}
-		if (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) {
-			p->chunks[p->chunk_ib_idx].kpage[0] = kmalloc(PAGE_SIZE, GFP_KERNEL);
-			p->chunks[p->chunk_ib_idx].kpage[1] = kmalloc(PAGE_SIZE, GFP_KERNEL);
-			if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL ||
-			    p->chunks[p->chunk_ib_idx].kpage[1] == NULL) {
-				kfree(p->chunks[p->chunk_ib_idx].kpage[0]);
-				kfree(p->chunks[p->chunk_ib_idx].kpage[1]);
-				p->chunks[p->chunk_ib_idx].kpage[0] = NULL;
-				p->chunks[p->chunk_ib_idx].kpage[1] = NULL;
-				return -ENOMEM;
-			}
-		}
-		p->chunks[p->chunk_ib_idx].kpage_idx[0] = -1;
-		p->chunks[p->chunk_ib_idx].kpage_idx[1] = -1;
-		p->chunks[p->chunk_ib_idx].last_copied_page = -1;
-		p->chunks[p->chunk_ib_idx].last_page_index =
-			((p->chunks[p->chunk_ib_idx].length_dw * 4) - 1) / PAGE_SIZE;
-	}
-
 	return 0;
 }
 
@@ -450,13 +417,8 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 	kfree(parser->track);
 	kfree(parser->relocs);
 	kfree(parser->relocs_ptr);
-	for (i = 0; i < parser->nchunks; i++) {
-		kfree(parser->chunks[i].kdata);
-		if ((parser->rdev->flags & RADEON_IS_AGP)) {
-			kfree(parser->chunks[i].kpage[0]);
-			kfree(parser->chunks[i].kpage[1]);
-		}
-	}
+	for (i = 0; i < parser->nchunks; i++)
+		drm_free_large(parser->chunks[i].kdata);
 	kfree(parser->chunks);
 	kfree(parser->chunks_array);
 	radeon_ib_free(parser->rdev, &parser->ib);
@@ -483,17 +445,15 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
 		DRM_ERROR("Failed to get ib !\n");
 		return r;
 	}
+
+	memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
 	parser->ib.length_dw = ib_chunk->length_dw;
+
 	r = radeon_cs_parse(rdev, parser->ring, parser);
 	if (r || parser->parser_error) {
 		DRM_ERROR("Invalid command stream !\n");
 		return r;
 	}
-	r = radeon_cs_finish_pages(parser);
-	if (r) {
-		DRM_ERROR("Invalid command stream !\n");
-		return r;
-	}
 
 	if (parser->ring == R600_RING_TYPE_UVD_INDEX)
 		radeon_uvd_note_usage(rdev);
@@ -555,10 +515,8 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 		parser->const_ib.is_const_ib = true;
 		parser->const_ib.length_dw = ib_chunk->length_dw;
 		/* Copy the packet into the IB */
-		if (DRM_COPY_FROM_USER(parser->const_ib.ptr, ib_chunk->user_ptr,
-				       ib_chunk->length_dw * 4)) {
-			return -EFAULT;
-		}
+		memcpy(parser->const_ib.ptr, ib_chunk->kdata,
+		       ib_chunk->length_dw * 4);
 		r = radeon_ring_ib_parse(rdev, parser->ring, &parser->const_ib);
 		if (r) {
 			return r;
@@ -578,10 +536,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 	}
 	parser->ib.length_dw = ib_chunk->length_dw;
 	/* Copy the packet into the IB */
-	if (DRM_COPY_FROM_USER(parser->ib.ptr, ib_chunk->user_ptr,
-			       ib_chunk->length_dw * 4)) {
-		return -EFAULT;
-	}
+	memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
 	r = radeon_ring_ib_parse(rdev, parser->ring, &parser->ib);
 	if (r) {
 		return r;
@@ -704,97 +659,6 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	return r;
 }
 
-int radeon_cs_finish_pages(struct radeon_cs_parser *p)
-{
-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
-	int i;
-	int size = PAGE_SIZE;
-
-	for (i = ibc->last_copied_page + 1; i <= ibc->last_page_index; i++) {
-		if (i == ibc->last_page_index) {
-			size = (ibc->length_dw * 4) % PAGE_SIZE;
-			if (size == 0)
-				size = PAGE_SIZE;
-		}
-		
-		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
-				       ibc->user_ptr + (i * PAGE_SIZE),
-				       size))
-			return -EFAULT;
-	}
-	return 0;
-}
-
-static int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx)
-{
-	int new_page;
-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
-	int i;
-	int size = PAGE_SIZE;
-	bool copy1 = (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) ?
-		false : true;
-
-	for (i = ibc->last_copied_page + 1; i < pg_idx; i++) {
-		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
-				       ibc->user_ptr + (i * PAGE_SIZE),
-				       PAGE_SIZE)) {
-			p->parser_error = -EFAULT;
-			return 0;
-		}
-	}
-
-	if (pg_idx == ibc->last_page_index) {
-		size = (ibc->length_dw * 4) % PAGE_SIZE;
-		if (size == 0)
-			size = PAGE_SIZE;
-	}
-
-	new_page = ibc->kpage_idx[0] < ibc->kpage_idx[1] ? 0 : 1;
-	if (copy1)
-		ibc->kpage[new_page] = p->ib.ptr + (pg_idx * (PAGE_SIZE / 4));
-
-	if (DRM_COPY_FROM_USER(ibc->kpage[new_page],
-			       ibc->user_ptr + (pg_idx * PAGE_SIZE),
-			       size)) {
-		p->parser_error = -EFAULT;
-		return 0;
-	}
-
-	/* copy to IB for non single case */
-	if (!copy1)
-		memcpy((void *)(p->ib.ptr+(pg_idx*(PAGE_SIZE/4))), ibc->kpage[new_page], size);
-
-	ibc->last_copied_page = pg_idx;
-	ibc->kpage_idx[new_page] = pg_idx;
-
-	return new_page;
-}
-
-u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
-{
-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
-	u32 pg_idx, pg_offset;
-	u32 idx_value = 0;
-	int new_page;
-
-	pg_idx = (idx * 4) / PAGE_SIZE;
-	pg_offset = (idx * 4) % PAGE_SIZE;
-
-	if (ibc->kpage_idx[0] == pg_idx)
-		return ibc->kpage[0][pg_offset/4];
-	if (ibc->kpage_idx[1] == pg_idx)
-		return ibc->kpage[1][pg_offset/4];
-
-	new_page = radeon_cs_update_pages(p, pg_idx);
-	if (new_page < 0) {
-		p->parser_error = new_page;
-		return 0;
-	}
-
-	idx_value = ibc->kpage[new_page][pg_offset/4];
-	return idx_value;
-}
-
 /**
  * radeon_cs_packet_parse() - parse cp packet and point ib index to next packet
  * @parser:	parser structure holding parsing context.


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-10-08 14:14           ` [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations Maarten Lankhorst
@ 2013-10-08 14:33             ` Jerome Glisse
  2013-10-08 14:45               ` Christian König
  2013-10-08 14:45               ` Maarten Lankhorst
  0 siblings, 2 replies; 58+ messages in thread
From: Jerome Glisse @ 2013-10-08 14:33 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Thomas Hellstrom, Alex Deucher, Peter Zijlstra, Daniel Vetter,
	intel-gfx, Linux Kernel Mailing List, dri-devel, Thomas Gleixner,
	Ingo Molnar

On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
> Allocate and copy all kernel memory before doing reservations. This prevents a locking
> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
> in ttm_bo_vm_fault without upsetting lockdep.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>

I would say NAK. Current code only allocate temporary page in AGP case.
So AGP case is userspace -> temp page -> cs checker -> radeon ib.

Non AGP is directly memcpy to radeon IB.

Your patch allocate memory memcpy userspace to it and it will then be
memcpy to IB. Which means you introduce an extra memcpy in the process
not something we want.

Cheers,
Jerome

> ---
> diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
> index 01a3ec8..efa9bca 100644
> --- a/drivers/gpu/drm/radeon/r600_cs.c
> +++ b/drivers/gpu/drm/radeon/r600_cs.c
> @@ -2391,18 +2391,13 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
>  	ib_chunk = &parser.chunks[parser.chunk_ib_idx];
>  	parser.ib.length_dw = ib_chunk->length_dw;
>  	*l = parser.ib.length_dw;
> +	memcpy(ib, ib_chunk->kdata, ib_chunk->length_dw * 4);
>  	r = r600_cs_parse(&parser);
>  	if (r) {
>  		DRM_ERROR("Invalid command stream !\n");
>  		r600_cs_parser_fini(&parser, r);
>  		return r;
>  	}
> -	r = radeon_cs_finish_pages(&parser);
> -	if (r) {
> -		DRM_ERROR("Invalid command stream !\n");
> -		r600_cs_parser_fini(&parser, r);
> -		return r;
> -	}
>  	r600_cs_parser_fini(&parser, r);
>  	return r;
>  }
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index e067024..c52bb5e 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -983,12 +983,7 @@ struct radeon_cs_reloc {
>  struct radeon_cs_chunk {
>  	uint32_t		chunk_id;
>  	uint32_t		length_dw;
> -	int			kpage_idx[2];
> -	uint32_t		*kpage[2];
>  	uint32_t		*kdata;
> -	void __user		*user_ptr;
> -	int			last_copied_page;
> -	int			last_page_index;
>  };
>  
>  struct radeon_cs_parser {
> @@ -1027,8 +1022,13 @@ struct radeon_cs_parser {
>  	struct radeon_sa_bo	*cpu_sema;
>  };
>  
> -extern int radeon_cs_finish_pages(struct radeon_cs_parser *p);
> -extern u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
> +static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
> +{
> +	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> +
> +	return ibc->kdata[idx];
> +}
> +
>  
>  struct radeon_cs_packet {
>  	unsigned	idx;
> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> index 7d7322e..98d8898 100644
> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> @@ -217,9 +217,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
>  			return -EFAULT;
>  		}
>  		p->chunks[i].length_dw = user_chunk.length_dw;
> -		p->chunks[i].kdata = NULL;
>  		p->chunks[i].chunk_id = user_chunk.chunk_id;
> -		p->chunks[i].user_ptr = (void __user *)(unsigned long)user_chunk.chunk_data;
>  		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) {
>  			p->chunk_relocs_idx = i;
>  		}
> @@ -242,25 +240,22 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
>  				return -EINVAL;
>  		}
>  
> -		cdata = (uint32_t *)(unsigned long)user_chunk.chunk_data;
> -		if ((p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) ||
> -		    (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS)) {
> -			size = p->chunks[i].length_dw * sizeof(uint32_t);
> -			p->chunks[i].kdata = kmalloc(size, GFP_KERNEL);
> -			if (p->chunks[i].kdata == NULL) {
> -				return -ENOMEM;
> -			}
> -			if (DRM_COPY_FROM_USER(p->chunks[i].kdata,
> -					       p->chunks[i].user_ptr, size)) {
> -				return -EFAULT;
> -			}
> -			if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
> -				p->cs_flags = p->chunks[i].kdata[0];
> -				if (p->chunks[i].length_dw > 1)
> -					ring = p->chunks[i].kdata[1];
> -				if (p->chunks[i].length_dw > 2)
> -					priority = (s32)p->chunks[i].kdata[2];
> -			}
> +		size = p->chunks[i].length_dw;
> +		p->chunks[i].kdata = drm_malloc_ab(size, sizeof(uint32_t));
> +		size *= sizeof(uint32_t);
> +		if (p->chunks[i].kdata == NULL) {
> +			return -ENOMEM;
> +		}
> +		cdata = (void __user *)(unsigned long)user_chunk.chunk_data;
> +		if (DRM_COPY_FROM_USER(p->chunks[i].kdata, cdata, size)) {
> +			return -EFAULT;
> +		}
> +		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
> +			p->cs_flags = p->chunks[i].kdata[0];
> +			if (p->chunks[i].length_dw > 1)
> +				ring = p->chunks[i].kdata[1];
> +			if (p->chunks[i].length_dw > 2)
> +				priority = (s32)p->chunks[i].kdata[2];
>  		}
>  	}
>  
> @@ -283,34 +278,6 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
>  		}
>  	}
>  
> -	/* deal with non-vm */
> -	if ((p->chunk_ib_idx != -1) &&
> -	    ((p->cs_flags & RADEON_CS_USE_VM) == 0) &&
> -	    (p->chunks[p->chunk_ib_idx].chunk_id == RADEON_CHUNK_ID_IB)) {
> -		if (p->chunks[p->chunk_ib_idx].length_dw > (16 * 1024)) {
> -			DRM_ERROR("cs IB too big: %d\n",
> -				  p->chunks[p->chunk_ib_idx].length_dw);
> -			return -EINVAL;
> -		}
> -		if (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) {
> -			p->chunks[p->chunk_ib_idx].kpage[0] = kmalloc(PAGE_SIZE, GFP_KERNEL);
> -			p->chunks[p->chunk_ib_idx].kpage[1] = kmalloc(PAGE_SIZE, GFP_KERNEL);
> -			if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL ||
> -			    p->chunks[p->chunk_ib_idx].kpage[1] == NULL) {
> -				kfree(p->chunks[p->chunk_ib_idx].kpage[0]);
> -				kfree(p->chunks[p->chunk_ib_idx].kpage[1]);
> -				p->chunks[p->chunk_ib_idx].kpage[0] = NULL;
> -				p->chunks[p->chunk_ib_idx].kpage[1] = NULL;
> -				return -ENOMEM;
> -			}
> -		}
> -		p->chunks[p->chunk_ib_idx].kpage_idx[0] = -1;
> -		p->chunks[p->chunk_ib_idx].kpage_idx[1] = -1;
> -		p->chunks[p->chunk_ib_idx].last_copied_page = -1;
> -		p->chunks[p->chunk_ib_idx].last_page_index =
> -			((p->chunks[p->chunk_ib_idx].length_dw * 4) - 1) / PAGE_SIZE;
> -	}
> -
>  	return 0;
>  }
>  
> @@ -450,13 +417,8 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
>  	kfree(parser->track);
>  	kfree(parser->relocs);
>  	kfree(parser->relocs_ptr);
> -	for (i = 0; i < parser->nchunks; i++) {
> -		kfree(parser->chunks[i].kdata);
> -		if ((parser->rdev->flags & RADEON_IS_AGP)) {
> -			kfree(parser->chunks[i].kpage[0]);
> -			kfree(parser->chunks[i].kpage[1]);
> -		}
> -	}
> +	for (i = 0; i < parser->nchunks; i++)
> +		drm_free_large(parser->chunks[i].kdata);
>  	kfree(parser->chunks);
>  	kfree(parser->chunks_array);
>  	radeon_ib_free(parser->rdev, &parser->ib);
> @@ -483,17 +445,15 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
>  		DRM_ERROR("Failed to get ib !\n");
>  		return r;
>  	}
> +
> +	memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
>  	parser->ib.length_dw = ib_chunk->length_dw;
> +
>  	r = radeon_cs_parse(rdev, parser->ring, parser);
>  	if (r || parser->parser_error) {
>  		DRM_ERROR("Invalid command stream !\n");
>  		return r;
>  	}
> -	r = radeon_cs_finish_pages(parser);
> -	if (r) {
> -		DRM_ERROR("Invalid command stream !\n");
> -		return r;
> -	}
>  
>  	if (parser->ring == R600_RING_TYPE_UVD_INDEX)
>  		radeon_uvd_note_usage(rdev);
> @@ -555,10 +515,8 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
>  		parser->const_ib.is_const_ib = true;
>  		parser->const_ib.length_dw = ib_chunk->length_dw;
>  		/* Copy the packet into the IB */
> -		if (DRM_COPY_FROM_USER(parser->const_ib.ptr, ib_chunk->user_ptr,
> -				       ib_chunk->length_dw * 4)) {
> -			return -EFAULT;
> -		}
> +		memcpy(parser->const_ib.ptr, ib_chunk->kdata,
> +		       ib_chunk->length_dw * 4);
>  		r = radeon_ring_ib_parse(rdev, parser->ring, &parser->const_ib);
>  		if (r) {
>  			return r;
> @@ -578,10 +536,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
>  	}
>  	parser->ib.length_dw = ib_chunk->length_dw;
>  	/* Copy the packet into the IB */
> -	if (DRM_COPY_FROM_USER(parser->ib.ptr, ib_chunk->user_ptr,
> -			       ib_chunk->length_dw * 4)) {
> -		return -EFAULT;
> -	}
> +	memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
>  	r = radeon_ring_ib_parse(rdev, parser->ring, &parser->ib);
>  	if (r) {
>  		return r;
> @@ -704,97 +659,6 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>  	return r;
>  }
>  
> -int radeon_cs_finish_pages(struct radeon_cs_parser *p)
> -{
> -	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> -	int i;
> -	int size = PAGE_SIZE;
> -
> -	for (i = ibc->last_copied_page + 1; i <= ibc->last_page_index; i++) {
> -		if (i == ibc->last_page_index) {
> -			size = (ibc->length_dw * 4) % PAGE_SIZE;
> -			if (size == 0)
> -				size = PAGE_SIZE;
> -		}
> -		
> -		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
> -				       ibc->user_ptr + (i * PAGE_SIZE),
> -				       size))
> -			return -EFAULT;
> -	}
> -	return 0;
> -}
> -
> -static int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx)
> -{
> -	int new_page;
> -	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> -	int i;
> -	int size = PAGE_SIZE;
> -	bool copy1 = (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) ?
> -		false : true;
> -
> -	for (i = ibc->last_copied_page + 1; i < pg_idx; i++) {
> -		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
> -				       ibc->user_ptr + (i * PAGE_SIZE),
> -				       PAGE_SIZE)) {
> -			p->parser_error = -EFAULT;
> -			return 0;
> -		}
> -	}
> -
> -	if (pg_idx == ibc->last_page_index) {
> -		size = (ibc->length_dw * 4) % PAGE_SIZE;
> -		if (size == 0)
> -			size = PAGE_SIZE;
> -	}
> -
> -	new_page = ibc->kpage_idx[0] < ibc->kpage_idx[1] ? 0 : 1;
> -	if (copy1)
> -		ibc->kpage[new_page] = p->ib.ptr + (pg_idx * (PAGE_SIZE / 4));
> -
> -	if (DRM_COPY_FROM_USER(ibc->kpage[new_page],
> -			       ibc->user_ptr + (pg_idx * PAGE_SIZE),
> -			       size)) {
> -		p->parser_error = -EFAULT;
> -		return 0;
> -	}
> -
> -	/* copy to IB for non single case */
> -	if (!copy1)
> -		memcpy((void *)(p->ib.ptr+(pg_idx*(PAGE_SIZE/4))), ibc->kpage[new_page], size);
> -
> -	ibc->last_copied_page = pg_idx;
> -	ibc->kpage_idx[new_page] = pg_idx;
> -
> -	return new_page;
> -}
> -
> -u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
> -{
> -	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> -	u32 pg_idx, pg_offset;
> -	u32 idx_value = 0;
> -	int new_page;
> -
> -	pg_idx = (idx * 4) / PAGE_SIZE;
> -	pg_offset = (idx * 4) % PAGE_SIZE;
> -
> -	if (ibc->kpage_idx[0] == pg_idx)
> -		return ibc->kpage[0][pg_offset/4];
> -	if (ibc->kpage_idx[1] == pg_idx)
> -		return ibc->kpage[1][pg_offset/4];
> -
> -	new_page = radeon_cs_update_pages(p, pg_idx);
> -	if (new_page < 0) {
> -		p->parser_error = new_page;
> -		return 0;
> -	}
> -
> -	idx_value = ibc->kpage[new_page][pg_offset/4];
> -	return idx_value;
> -}
> -
>  /**
>   * radeon_cs_packet_parse() - parse cp packet and point ib index to next packet
>   * @parser:	parser structure holding parsing context.
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-10-08 14:33             ` Jerome Glisse
@ 2013-10-08 14:45               ` Christian König
  2013-10-08 14:55                 ` Jerome Glisse
  2013-10-08 14:45               ` Maarten Lankhorst
  1 sibling, 1 reply; 58+ messages in thread
From: Christian König @ 2013-10-08 14:45 UTC (permalink / raw)
  To: Jerome Glisse, Maarten Lankhorst
  Cc: Thomas Hellstrom, Peter Zijlstra, Daniel Vetter, intel-gfx,
	Linux Kernel Mailing List, dri-devel, Alex Deucher,
	Thomas Gleixner, Ingo Molnar

Am 08.10.2013 16:33, schrieb Jerome Glisse:
> On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
>> Allocate and copy all kernel memory before doing reservations. This prevents a locking
>> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
>> in ttm_bo_vm_fault without upsetting lockdep.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> I would say NAK. Current code only allocate temporary page in AGP case.
> So AGP case is userspace -> temp page -> cs checker -> radeon ib.
>
> Non AGP is directly memcpy to radeon IB.
>
> Your patch allocate memory memcpy userspace to it and it will then be
> memcpy to IB. Which means you introduce an extra memcpy in the process
> not something we want.

Totally agree. Additional to that there is no good reason to provide 
anything else than anonymous system memory to the CS ioctl, so the 
dependency between the mmap_sem and reservations are not really clear to me.

Christian.

> Cheers,
> Jerome
>
>> ---
>> diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
>> index 01a3ec8..efa9bca 100644
>> --- a/drivers/gpu/drm/radeon/r600_cs.c
>> +++ b/drivers/gpu/drm/radeon/r600_cs.c
>> @@ -2391,18 +2391,13 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
>>   	ib_chunk = &parser.chunks[parser.chunk_ib_idx];
>>   	parser.ib.length_dw = ib_chunk->length_dw;
>>   	*l = parser.ib.length_dw;
>> +	memcpy(ib, ib_chunk->kdata, ib_chunk->length_dw * 4);
>>   	r = r600_cs_parse(&parser);
>>   	if (r) {
>>   		DRM_ERROR("Invalid command stream !\n");
>>   		r600_cs_parser_fini(&parser, r);
>>   		return r;
>>   	}
>> -	r = radeon_cs_finish_pages(&parser);
>> -	if (r) {
>> -		DRM_ERROR("Invalid command stream !\n");
>> -		r600_cs_parser_fini(&parser, r);
>> -		return r;
>> -	}
>>   	r600_cs_parser_fini(&parser, r);
>>   	return r;
>>   }
>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>> index e067024..c52bb5e 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -983,12 +983,7 @@ struct radeon_cs_reloc {
>>   struct radeon_cs_chunk {
>>   	uint32_t		chunk_id;
>>   	uint32_t		length_dw;
>> -	int			kpage_idx[2];
>> -	uint32_t		*kpage[2];
>>   	uint32_t		*kdata;
>> -	void __user		*user_ptr;
>> -	int			last_copied_page;
>> -	int			last_page_index;
>>   };
>>   
>>   struct radeon_cs_parser {
>> @@ -1027,8 +1022,13 @@ struct radeon_cs_parser {
>>   	struct radeon_sa_bo	*cpu_sema;
>>   };
>>   
>> -extern int radeon_cs_finish_pages(struct radeon_cs_parser *p);
>> -extern u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
>> +static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
>> +{
>> +	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
>> +
>> +	return ibc->kdata[idx];
>> +}
>> +
>>   
>>   struct radeon_cs_packet {
>>   	unsigned	idx;
>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
>> index 7d7322e..98d8898 100644
>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>> @@ -217,9 +217,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
>>   			return -EFAULT;
>>   		}
>>   		p->chunks[i].length_dw = user_chunk.length_dw;
>> -		p->chunks[i].kdata = NULL;
>>   		p->chunks[i].chunk_id = user_chunk.chunk_id;
>> -		p->chunks[i].user_ptr = (void __user *)(unsigned long)user_chunk.chunk_data;
>>   		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) {
>>   			p->chunk_relocs_idx = i;
>>   		}
>> @@ -242,25 +240,22 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
>>   				return -EINVAL;
>>   		}
>>   
>> -		cdata = (uint32_t *)(unsigned long)user_chunk.chunk_data;
>> -		if ((p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) ||
>> -		    (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS)) {
>> -			size = p->chunks[i].length_dw * sizeof(uint32_t);
>> -			p->chunks[i].kdata = kmalloc(size, GFP_KERNEL);
>> -			if (p->chunks[i].kdata == NULL) {
>> -				return -ENOMEM;
>> -			}
>> -			if (DRM_COPY_FROM_USER(p->chunks[i].kdata,
>> -					       p->chunks[i].user_ptr, size)) {
>> -				return -EFAULT;
>> -			}
>> -			if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
>> -				p->cs_flags = p->chunks[i].kdata[0];
>> -				if (p->chunks[i].length_dw > 1)
>> -					ring = p->chunks[i].kdata[1];
>> -				if (p->chunks[i].length_dw > 2)
>> -					priority = (s32)p->chunks[i].kdata[2];
>> -			}
>> +		size = p->chunks[i].length_dw;
>> +		p->chunks[i].kdata = drm_malloc_ab(size, sizeof(uint32_t));
>> +		size *= sizeof(uint32_t);
>> +		if (p->chunks[i].kdata == NULL) {
>> +			return -ENOMEM;
>> +		}
>> +		cdata = (void __user *)(unsigned long)user_chunk.chunk_data;
>> +		if (DRM_COPY_FROM_USER(p->chunks[i].kdata, cdata, size)) {
>> +			return -EFAULT;
>> +		}
>> +		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
>> +			p->cs_flags = p->chunks[i].kdata[0];
>> +			if (p->chunks[i].length_dw > 1)
>> +				ring = p->chunks[i].kdata[1];
>> +			if (p->chunks[i].length_dw > 2)
>> +				priority = (s32)p->chunks[i].kdata[2];
>>   		}
>>   	}
>>   
>> @@ -283,34 +278,6 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
>>   		}
>>   	}
>>   
>> -	/* deal with non-vm */
>> -	if ((p->chunk_ib_idx != -1) &&
>> -	    ((p->cs_flags & RADEON_CS_USE_VM) == 0) &&
>> -	    (p->chunks[p->chunk_ib_idx].chunk_id == RADEON_CHUNK_ID_IB)) {
>> -		if (p->chunks[p->chunk_ib_idx].length_dw > (16 * 1024)) {
>> -			DRM_ERROR("cs IB too big: %d\n",
>> -				  p->chunks[p->chunk_ib_idx].length_dw);
>> -			return -EINVAL;
>> -		}
>> -		if (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) {
>> -			p->chunks[p->chunk_ib_idx].kpage[0] = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> -			p->chunks[p->chunk_ib_idx].kpage[1] = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> -			if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL ||
>> -			    p->chunks[p->chunk_ib_idx].kpage[1] == NULL) {
>> -				kfree(p->chunks[p->chunk_ib_idx].kpage[0]);
>> -				kfree(p->chunks[p->chunk_ib_idx].kpage[1]);
>> -				p->chunks[p->chunk_ib_idx].kpage[0] = NULL;
>> -				p->chunks[p->chunk_ib_idx].kpage[1] = NULL;
>> -				return -ENOMEM;
>> -			}
>> -		}
>> -		p->chunks[p->chunk_ib_idx].kpage_idx[0] = -1;
>> -		p->chunks[p->chunk_ib_idx].kpage_idx[1] = -1;
>> -		p->chunks[p->chunk_ib_idx].last_copied_page = -1;
>> -		p->chunks[p->chunk_ib_idx].last_page_index =
>> -			((p->chunks[p->chunk_ib_idx].length_dw * 4) - 1) / PAGE_SIZE;
>> -	}
>> -
>>   	return 0;
>>   }
>>   
>> @@ -450,13 +417,8 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
>>   	kfree(parser->track);
>>   	kfree(parser->relocs);
>>   	kfree(parser->relocs_ptr);
>> -	for (i = 0; i < parser->nchunks; i++) {
>> -		kfree(parser->chunks[i].kdata);
>> -		if ((parser->rdev->flags & RADEON_IS_AGP)) {
>> -			kfree(parser->chunks[i].kpage[0]);
>> -			kfree(parser->chunks[i].kpage[1]);
>> -		}
>> -	}
>> +	for (i = 0; i < parser->nchunks; i++)
>> +		drm_free_large(parser->chunks[i].kdata);
>>   	kfree(parser->chunks);
>>   	kfree(parser->chunks_array);
>>   	radeon_ib_free(parser->rdev, &parser->ib);
>> @@ -483,17 +445,15 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
>>   		DRM_ERROR("Failed to get ib !\n");
>>   		return r;
>>   	}
>> +
>> +	memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
>>   	parser->ib.length_dw = ib_chunk->length_dw;
>> +
>>   	r = radeon_cs_parse(rdev, parser->ring, parser);
>>   	if (r || parser->parser_error) {
>>   		DRM_ERROR("Invalid command stream !\n");
>>   		return r;
>>   	}
>> -	r = radeon_cs_finish_pages(parser);
>> -	if (r) {
>> -		DRM_ERROR("Invalid command stream !\n");
>> -		return r;
>> -	}
>>   
>>   	if (parser->ring == R600_RING_TYPE_UVD_INDEX)
>>   		radeon_uvd_note_usage(rdev);
>> @@ -555,10 +515,8 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
>>   		parser->const_ib.is_const_ib = true;
>>   		parser->const_ib.length_dw = ib_chunk->length_dw;
>>   		/* Copy the packet into the IB */
>> -		if (DRM_COPY_FROM_USER(parser->const_ib.ptr, ib_chunk->user_ptr,
>> -				       ib_chunk->length_dw * 4)) {
>> -			return -EFAULT;
>> -		}
>> +		memcpy(parser->const_ib.ptr, ib_chunk->kdata,
>> +		       ib_chunk->length_dw * 4);
>>   		r = radeon_ring_ib_parse(rdev, parser->ring, &parser->const_ib);
>>   		if (r) {
>>   			return r;
>> @@ -578,10 +536,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
>>   	}
>>   	parser->ib.length_dw = ib_chunk->length_dw;
>>   	/* Copy the packet into the IB */
>> -	if (DRM_COPY_FROM_USER(parser->ib.ptr, ib_chunk->user_ptr,
>> -			       ib_chunk->length_dw * 4)) {
>> -		return -EFAULT;
>> -	}
>> +	memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
>>   	r = radeon_ring_ib_parse(rdev, parser->ring, &parser->ib);
>>   	if (r) {
>>   		return r;
>> @@ -704,97 +659,6 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
>>   	return r;
>>   }
>>   
>> -int radeon_cs_finish_pages(struct radeon_cs_parser *p)
>> -{
>> -	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
>> -	int i;
>> -	int size = PAGE_SIZE;
>> -
>> -	for (i = ibc->last_copied_page + 1; i <= ibc->last_page_index; i++) {
>> -		if (i == ibc->last_page_index) {
>> -			size = (ibc->length_dw * 4) % PAGE_SIZE;
>> -			if (size == 0)
>> -				size = PAGE_SIZE;
>> -		}
>> -		
>> -		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
>> -				       ibc->user_ptr + (i * PAGE_SIZE),
>> -				       size))
>> -			return -EFAULT;
>> -	}
>> -	return 0;
>> -}
>> -
>> -static int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx)
>> -{
>> -	int new_page;
>> -	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
>> -	int i;
>> -	int size = PAGE_SIZE;
>> -	bool copy1 = (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) ?
>> -		false : true;
>> -
>> -	for (i = ibc->last_copied_page + 1; i < pg_idx; i++) {
>> -		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
>> -				       ibc->user_ptr + (i * PAGE_SIZE),
>> -				       PAGE_SIZE)) {
>> -			p->parser_error = -EFAULT;
>> -			return 0;
>> -		}
>> -	}
>> -
>> -	if (pg_idx == ibc->last_page_index) {
>> -		size = (ibc->length_dw * 4) % PAGE_SIZE;
>> -		if (size == 0)
>> -			size = PAGE_SIZE;
>> -	}
>> -
>> -	new_page = ibc->kpage_idx[0] < ibc->kpage_idx[1] ? 0 : 1;
>> -	if (copy1)
>> -		ibc->kpage[new_page] = p->ib.ptr + (pg_idx * (PAGE_SIZE / 4));
>> -
>> -	if (DRM_COPY_FROM_USER(ibc->kpage[new_page],
>> -			       ibc->user_ptr + (pg_idx * PAGE_SIZE),
>> -			       size)) {
>> -		p->parser_error = -EFAULT;
>> -		return 0;
>> -	}
>> -
>> -	/* copy to IB for non single case */
>> -	if (!copy1)
>> -		memcpy((void *)(p->ib.ptr+(pg_idx*(PAGE_SIZE/4))), ibc->kpage[new_page], size);
>> -
>> -	ibc->last_copied_page = pg_idx;
>> -	ibc->kpage_idx[new_page] = pg_idx;
>> -
>> -	return new_page;
>> -}
>> -
>> -u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
>> -{
>> -	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
>> -	u32 pg_idx, pg_offset;
>> -	u32 idx_value = 0;
>> -	int new_page;
>> -
>> -	pg_idx = (idx * 4) / PAGE_SIZE;
>> -	pg_offset = (idx * 4) % PAGE_SIZE;
>> -
>> -	if (ibc->kpage_idx[0] == pg_idx)
>> -		return ibc->kpage[0][pg_offset/4];
>> -	if (ibc->kpage_idx[1] == pg_idx)
>> -		return ibc->kpage[1][pg_offset/4];
>> -
>> -	new_page = radeon_cs_update_pages(p, pg_idx);
>> -	if (new_page < 0) {
>> -		p->parser_error = new_page;
>> -		return 0;
>> -	}
>> -
>> -	idx_value = ibc->kpage[new_page][pg_offset/4];
>> -	return idx_value;
>> -}
>> -
>>   /**
>>    * radeon_cs_packet_parse() - parse cp packet and point ib index to next packet
>>    * @parser:	parser structure holding parsing context.
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-10-08 14:33             ` Jerome Glisse
  2013-10-08 14:45               ` Christian König
@ 2013-10-08 14:45               ` Maarten Lankhorst
  1 sibling, 0 replies; 58+ messages in thread
From: Maarten Lankhorst @ 2013-10-08 14:45 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Thomas Hellstrom, Alex Deucher, Peter Zijlstra, Daniel Vetter,
	intel-gfx, Linux Kernel Mailing List, dri-devel, Thomas Gleixner,
	Ingo Molnar

op 08-10-13 16:33, Jerome Glisse schreef:
> On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
>> Allocate and copy all kernel memory before doing reservations. This prevents a locking
>> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
>> in ttm_bo_vm_fault without upsetting lockdep.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> I would say NAK. Current code only allocate temporary page in AGP case.
> So AGP case is userspace -> temp page -> cs checker -> radeon ib.
>
> Non AGP is directly memcpy to radeon IB.
>
> Your patch allocate memory memcpy userspace to it and it will then be
> memcpy to IB. Which means you introduce an extra memcpy in the process
> not something we want.
>
Can we move up the call to radeon_ib_get then to be done before calling radeon_cs_parser_relocs?

~Maarten

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-10-08 14:45               ` Christian König
@ 2013-10-08 14:55                 ` Jerome Glisse
  2013-10-08 16:29                   ` Thomas Hellstrom
  0 siblings, 1 reply; 58+ messages in thread
From: Jerome Glisse @ 2013-10-08 14:55 UTC (permalink / raw)
  To: Christian König
  Cc: Maarten Lankhorst, Thomas Hellstrom, Peter Zijlstra,
	Daniel Vetter, intel-gfx, Linux Kernel Mailing List, dri-devel,
	Alex Deucher, Thomas Gleixner, Ingo Molnar

On Tue, Oct 08, 2013 at 04:45:18PM +0200, Christian König wrote:
> Am 08.10.2013 16:33, schrieb Jerome Glisse:
> >On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
> >>Allocate and copy all kernel memory before doing reservations. This prevents a locking
> >>inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
> >>in ttm_bo_vm_fault without upsetting lockdep.
> >>
> >>Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> >I would say NAK. Current code only allocate temporary page in AGP case.
> >So AGP case is userspace -> temp page -> cs checker -> radeon ib.
> >
> >Non AGP is directly memcpy to radeon IB.
> >
> >Your patch allocate memory memcpy userspace to it and it will then be
> >memcpy to IB. Which means you introduce an extra memcpy in the process
> >not something we want.
> 
> Totally agree. Additional to that there is no good reason to provide
> anything else than anonymous system memory to the CS ioctl, so the
> dependency between the mmap_sem and reservations are not really
> clear to me.
> 
> Christian.

I think is that in other code path you take mmap_sem first then reserve
bo. But here we reserve bo and then we take mmap_sem because of copy
from user.

Cheers,
Jerome

> 
> >Cheers,
> >Jerome
> >
> >>---
> >>diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
> >>index 01a3ec8..efa9bca 100644
> >>--- a/drivers/gpu/drm/radeon/r600_cs.c
> >>+++ b/drivers/gpu/drm/radeon/r600_cs.c
> >>@@ -2391,18 +2391,13 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
> >>  	ib_chunk = &parser.chunks[parser.chunk_ib_idx];
> >>  	parser.ib.length_dw = ib_chunk->length_dw;
> >>  	*l = parser.ib.length_dw;
> >>+	memcpy(ib, ib_chunk->kdata, ib_chunk->length_dw * 4);
> >>  	r = r600_cs_parse(&parser);
> >>  	if (r) {
> >>  		DRM_ERROR("Invalid command stream !\n");
> >>  		r600_cs_parser_fini(&parser, r);
> >>  		return r;
> >>  	}
> >>-	r = radeon_cs_finish_pages(&parser);
> >>-	if (r) {
> >>-		DRM_ERROR("Invalid command stream !\n");
> >>-		r600_cs_parser_fini(&parser, r);
> >>-		return r;
> >>-	}
> >>  	r600_cs_parser_fini(&parser, r);
> >>  	return r;
> >>  }
> >>diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> >>index e067024..c52bb5e 100644
> >>--- a/drivers/gpu/drm/radeon/radeon.h
> >>+++ b/drivers/gpu/drm/radeon/radeon.h
> >>@@ -983,12 +983,7 @@ struct radeon_cs_reloc {
> >>  struct radeon_cs_chunk {
> >>  	uint32_t		chunk_id;
> >>  	uint32_t		length_dw;
> >>-	int			kpage_idx[2];
> >>-	uint32_t		*kpage[2];
> >>  	uint32_t		*kdata;
> >>-	void __user		*user_ptr;
> >>-	int			last_copied_page;
> >>-	int			last_page_index;
> >>  };
> >>  struct radeon_cs_parser {
> >>@@ -1027,8 +1022,13 @@ struct radeon_cs_parser {
> >>  	struct radeon_sa_bo	*cpu_sema;
> >>  };
> >>-extern int radeon_cs_finish_pages(struct radeon_cs_parser *p);
> >>-extern u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
> >>+static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
> >>+{
> >>+	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> >>+
> >>+	return ibc->kdata[idx];
> >>+}
> >>+
> >>  struct radeon_cs_packet {
> >>  	unsigned	idx;
> >>diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> >>index 7d7322e..98d8898 100644
> >>--- a/drivers/gpu/drm/radeon/radeon_cs.c
> >>+++ b/drivers/gpu/drm/radeon/radeon_cs.c
> >>@@ -217,9 +217,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
> >>  			return -EFAULT;
> >>  		}
> >>  		p->chunks[i].length_dw = user_chunk.length_dw;
> >>-		p->chunks[i].kdata = NULL;
> >>  		p->chunks[i].chunk_id = user_chunk.chunk_id;
> >>-		p->chunks[i].user_ptr = (void __user *)(unsigned long)user_chunk.chunk_data;
> >>  		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) {
> >>  			p->chunk_relocs_idx = i;
> >>  		}
> >>@@ -242,25 +240,22 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
> >>  				return -EINVAL;
> >>  		}
> >>-		cdata = (uint32_t *)(unsigned long)user_chunk.chunk_data;
> >>-		if ((p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) ||
> >>-		    (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS)) {
> >>-			size = p->chunks[i].length_dw * sizeof(uint32_t);
> >>-			p->chunks[i].kdata = kmalloc(size, GFP_KERNEL);
> >>-			if (p->chunks[i].kdata == NULL) {
> >>-				return -ENOMEM;
> >>-			}
> >>-			if (DRM_COPY_FROM_USER(p->chunks[i].kdata,
> >>-					       p->chunks[i].user_ptr, size)) {
> >>-				return -EFAULT;
> >>-			}
> >>-			if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
> >>-				p->cs_flags = p->chunks[i].kdata[0];
> >>-				if (p->chunks[i].length_dw > 1)
> >>-					ring = p->chunks[i].kdata[1];
> >>-				if (p->chunks[i].length_dw > 2)
> >>-					priority = (s32)p->chunks[i].kdata[2];
> >>-			}
> >>+		size = p->chunks[i].length_dw;
> >>+		p->chunks[i].kdata = drm_malloc_ab(size, sizeof(uint32_t));
> >>+		size *= sizeof(uint32_t);
> >>+		if (p->chunks[i].kdata == NULL) {
> >>+			return -ENOMEM;
> >>+		}
> >>+		cdata = (void __user *)(unsigned long)user_chunk.chunk_data;
> >>+		if (DRM_COPY_FROM_USER(p->chunks[i].kdata, cdata, size)) {
> >>+			return -EFAULT;
> >>+		}
> >>+		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
> >>+			p->cs_flags = p->chunks[i].kdata[0];
> >>+			if (p->chunks[i].length_dw > 1)
> >>+				ring = p->chunks[i].kdata[1];
> >>+			if (p->chunks[i].length_dw > 2)
> >>+				priority = (s32)p->chunks[i].kdata[2];
> >>  		}
> >>  	}
> >>@@ -283,34 +278,6 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
> >>  		}
> >>  	}
> >>-	/* deal with non-vm */
> >>-	if ((p->chunk_ib_idx != -1) &&
> >>-	    ((p->cs_flags & RADEON_CS_USE_VM) == 0) &&
> >>-	    (p->chunks[p->chunk_ib_idx].chunk_id == RADEON_CHUNK_ID_IB)) {
> >>-		if (p->chunks[p->chunk_ib_idx].length_dw > (16 * 1024)) {
> >>-			DRM_ERROR("cs IB too big: %d\n",
> >>-				  p->chunks[p->chunk_ib_idx].length_dw);
> >>-			return -EINVAL;
> >>-		}
> >>-		if (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) {
> >>-			p->chunks[p->chunk_ib_idx].kpage[0] = kmalloc(PAGE_SIZE, GFP_KERNEL);
> >>-			p->chunks[p->chunk_ib_idx].kpage[1] = kmalloc(PAGE_SIZE, GFP_KERNEL);
> >>-			if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL ||
> >>-			    p->chunks[p->chunk_ib_idx].kpage[1] == NULL) {
> >>-				kfree(p->chunks[p->chunk_ib_idx].kpage[0]);
> >>-				kfree(p->chunks[p->chunk_ib_idx].kpage[1]);
> >>-				p->chunks[p->chunk_ib_idx].kpage[0] = NULL;
> >>-				p->chunks[p->chunk_ib_idx].kpage[1] = NULL;
> >>-				return -ENOMEM;
> >>-			}
> >>-		}
> >>-		p->chunks[p->chunk_ib_idx].kpage_idx[0] = -1;
> >>-		p->chunks[p->chunk_ib_idx].kpage_idx[1] = -1;
> >>-		p->chunks[p->chunk_ib_idx].last_copied_page = -1;
> >>-		p->chunks[p->chunk_ib_idx].last_page_index =
> >>-			((p->chunks[p->chunk_ib_idx].length_dw * 4) - 1) / PAGE_SIZE;
> >>-	}
> >>-
> >>  	return 0;
> >>  }
> >>@@ -450,13 +417,8 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
> >>  	kfree(parser->track);
> >>  	kfree(parser->relocs);
> >>  	kfree(parser->relocs_ptr);
> >>-	for (i = 0; i < parser->nchunks; i++) {
> >>-		kfree(parser->chunks[i].kdata);
> >>-		if ((parser->rdev->flags & RADEON_IS_AGP)) {
> >>-			kfree(parser->chunks[i].kpage[0]);
> >>-			kfree(parser->chunks[i].kpage[1]);
> >>-		}
> >>-	}
> >>+	for (i = 0; i < parser->nchunks; i++)
> >>+		drm_free_large(parser->chunks[i].kdata);
> >>  	kfree(parser->chunks);
> >>  	kfree(parser->chunks_array);
> >>  	radeon_ib_free(parser->rdev, &parser->ib);
> >>@@ -483,17 +445,15 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
> >>  		DRM_ERROR("Failed to get ib !\n");
> >>  		return r;
> >>  	}
> >>+
> >>+	memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
> >>  	parser->ib.length_dw = ib_chunk->length_dw;
> >>+
> >>  	r = radeon_cs_parse(rdev, parser->ring, parser);
> >>  	if (r || parser->parser_error) {
> >>  		DRM_ERROR("Invalid command stream !\n");
> >>  		return r;
> >>  	}
> >>-	r = radeon_cs_finish_pages(parser);
> >>-	if (r) {
> >>-		DRM_ERROR("Invalid command stream !\n");
> >>-		return r;
> >>-	}
> >>  	if (parser->ring == R600_RING_TYPE_UVD_INDEX)
> >>  		radeon_uvd_note_usage(rdev);
> >>@@ -555,10 +515,8 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
> >>  		parser->const_ib.is_const_ib = true;
> >>  		parser->const_ib.length_dw = ib_chunk->length_dw;
> >>  		/* Copy the packet into the IB */
> >>-		if (DRM_COPY_FROM_USER(parser->const_ib.ptr, ib_chunk->user_ptr,
> >>-				       ib_chunk->length_dw * 4)) {
> >>-			return -EFAULT;
> >>-		}
> >>+		memcpy(parser->const_ib.ptr, ib_chunk->kdata,
> >>+		       ib_chunk->length_dw * 4);
> >>  		r = radeon_ring_ib_parse(rdev, parser->ring, &parser->const_ib);
> >>  		if (r) {
> >>  			return r;
> >>@@ -578,10 +536,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
> >>  	}
> >>  	parser->ib.length_dw = ib_chunk->length_dw;
> >>  	/* Copy the packet into the IB */
> >>-	if (DRM_COPY_FROM_USER(parser->ib.ptr, ib_chunk->user_ptr,
> >>-			       ib_chunk->length_dw * 4)) {
> >>-		return -EFAULT;
> >>-	}
> >>+	memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
> >>  	r = radeon_ring_ib_parse(rdev, parser->ring, &parser->ib);
> >>  	if (r) {
> >>  		return r;
> >>@@ -704,97 +659,6 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
> >>  	return r;
> >>  }
> >>-int radeon_cs_finish_pages(struct radeon_cs_parser *p)
> >>-{
> >>-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> >>-	int i;
> >>-	int size = PAGE_SIZE;
> >>-
> >>-	for (i = ibc->last_copied_page + 1; i <= ibc->last_page_index; i++) {
> >>-		if (i == ibc->last_page_index) {
> >>-			size = (ibc->length_dw * 4) % PAGE_SIZE;
> >>-			if (size == 0)
> >>-				size = PAGE_SIZE;
> >>-		}
> >>-		
> >>-		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
> >>-				       ibc->user_ptr + (i * PAGE_SIZE),
> >>-				       size))
> >>-			return -EFAULT;
> >>-	}
> >>-	return 0;
> >>-}
> >>-
> >>-static int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx)
> >>-{
> >>-	int new_page;
> >>-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> >>-	int i;
> >>-	int size = PAGE_SIZE;
> >>-	bool copy1 = (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) ?
> >>-		false : true;
> >>-
> >>-	for (i = ibc->last_copied_page + 1; i < pg_idx; i++) {
> >>-		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
> >>-				       ibc->user_ptr + (i * PAGE_SIZE),
> >>-				       PAGE_SIZE)) {
> >>-			p->parser_error = -EFAULT;
> >>-			return 0;
> >>-		}
> >>-	}
> >>-
> >>-	if (pg_idx == ibc->last_page_index) {
> >>-		size = (ibc->length_dw * 4) % PAGE_SIZE;
> >>-		if (size == 0)
> >>-			size = PAGE_SIZE;
> >>-	}
> >>-
> >>-	new_page = ibc->kpage_idx[0] < ibc->kpage_idx[1] ? 0 : 1;
> >>-	if (copy1)
> >>-		ibc->kpage[new_page] = p->ib.ptr + (pg_idx * (PAGE_SIZE / 4));
> >>-
> >>-	if (DRM_COPY_FROM_USER(ibc->kpage[new_page],
> >>-			       ibc->user_ptr + (pg_idx * PAGE_SIZE),
> >>-			       size)) {
> >>-		p->parser_error = -EFAULT;
> >>-		return 0;
> >>-	}
> >>-
> >>-	/* copy to IB for non single case */
> >>-	if (!copy1)
> >>-		memcpy((void *)(p->ib.ptr+(pg_idx*(PAGE_SIZE/4))), ibc->kpage[new_page], size);
> >>-
> >>-	ibc->last_copied_page = pg_idx;
> >>-	ibc->kpage_idx[new_page] = pg_idx;
> >>-
> >>-	return new_page;
> >>-}
> >>-
> >>-u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
> >>-{
> >>-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
> >>-	u32 pg_idx, pg_offset;
> >>-	u32 idx_value = 0;
> >>-	int new_page;
> >>-
> >>-	pg_idx = (idx * 4) / PAGE_SIZE;
> >>-	pg_offset = (idx * 4) % PAGE_SIZE;
> >>-
> >>-	if (ibc->kpage_idx[0] == pg_idx)
> >>-		return ibc->kpage[0][pg_offset/4];
> >>-	if (ibc->kpage_idx[1] == pg_idx)
> >>-		return ibc->kpage[1][pg_offset/4];
> >>-
> >>-	new_page = radeon_cs_update_pages(p, pg_idx);
> >>-	if (new_page < 0) {
> >>-		p->parser_error = new_page;
> >>-		return 0;
> >>-	}
> >>-
> >>-	idx_value = ibc->kpage[new_page][pg_offset/4];
> >>-	return idx_value;
> >>-}
> >>-
> >>  /**
> >>   * radeon_cs_packet_parse() - parse cp packet and point ib index to next packet
> >>   * @parser:	parser structure holding parsing context.
> >>
> >>_______________________________________________
> >>dri-devel mailing list
> >>dri-devel@lists.freedesktop.org
> >>http://lists.freedesktop.org/mailman/listinfo/dri-devel
> >_______________________________________________
> >dri-devel mailing list
> >dri-devel@lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-10-08 14:55                 ` Jerome Glisse
@ 2013-10-08 16:29                   ` Thomas Hellstrom
  2013-10-08 16:47                     ` Jerome Glisse
  0 siblings, 1 reply; 58+ messages in thread
From: Thomas Hellstrom @ 2013-10-08 16:29 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Christian König, Maarten Lankhorst, Peter Zijlstra,
	Daniel Vetter, intel-gfx, Linux Kernel Mailing List, dri-devel,
	Alex Deucher, Thomas Gleixner, Ingo Molnar

On 10/08/2013 04:55 PM, Jerome Glisse wrote:
> On Tue, Oct 08, 2013 at 04:45:18PM +0200, Christian König wrote:
>> Am 08.10.2013 16:33, schrieb Jerome Glisse:
>>> On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
>>>> Allocate and copy all kernel memory before doing reservations. This prevents a locking
>>>> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
>>>> in ttm_bo_vm_fault without upsetting lockdep.
>>>>
>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>> I would say NAK. Current code only allocate temporary page in AGP case.
>>> So AGP case is userspace -> temp page -> cs checker -> radeon ib.
>>>
>>> Non AGP is directly memcpy to radeon IB.
>>>
>>> Your patch allocate memory memcpy userspace to it and it will then be
>>> memcpy to IB. Which means you introduce an extra memcpy in the process
>>> not something we want.
>> Totally agree. Additional to that there is no good reason to provide
>> anything else than anonymous system memory to the CS ioctl, so the
>> dependency between the mmap_sem and reservations are not really
>> clear to me.
>>
>> Christian.
> I think is that in other code path you take mmap_sem first then reserve
> bo. But here we reserve bo and then we take mmap_sem because of copy
> from user.
>
> Cheers,
> Jerome
>
Actually the log message is a little confusing. I think the mmap_sem 
locking inversion problem is orthogonal to what's being fixed here.

This patch fixes the possible recursive bo::reserve caused by malicious 
user-space handing a pointer to ttm memory so that the ttm fault handler 
is called when bos are already reserved. That may cause a (possibly 
interruptible) livelock.

Once that is fixed, we are free to choose the mmap_sem -> bo::reserve 
locking order. Currently it's bo::reserve->mmap_sem(), but the hack 
required in the ttm fault handler is admittedly a bit ugly.  The plan is 
to change the locking order to mmap_sem->bo::reserve

I'm not sure if it applies to this particular case, but it should be 
possible to make sure that copy_from_user_inatomic() will always 
succeed, by making sure the pages are present using get_user_pages(), 
and release the pages after copy_from_user_inatomic() is done. That way 
there's no need for a double memcpy slowpath, but if the copied data is 
very fragmented I guess the resulting code may look ugly. The 
get_user_pages() function will return an error if it hits TTM pages.

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-10-08 16:29                   ` Thomas Hellstrom
@ 2013-10-08 16:47                     ` Jerome Glisse
  2013-10-08 16:58                       ` Thomas Hellstrom
  2013-10-09 10:58                       ` [RFC PATCH] drm/radeon: fixup locking inversion between " Maarten Lankhorst
  0 siblings, 2 replies; 58+ messages in thread
From: Jerome Glisse @ 2013-10-08 16:47 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Christian König, Maarten Lankhorst, Peter Zijlstra,
	Daniel Vetter, intel-gfx, Linux Kernel Mailing List, dri-devel,
	Alex Deucher, Thomas Gleixner, Ingo Molnar

On Tue, Oct 08, 2013 at 06:29:35PM +0200, Thomas Hellstrom wrote:
> On 10/08/2013 04:55 PM, Jerome Glisse wrote:
> >On Tue, Oct 08, 2013 at 04:45:18PM +0200, Christian König wrote:
> >>Am 08.10.2013 16:33, schrieb Jerome Glisse:
> >>>On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
> >>>>Allocate and copy all kernel memory before doing reservations. This prevents a locking
> >>>>inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
> >>>>in ttm_bo_vm_fault without upsetting lockdep.
> >>>>
> >>>>Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> >>>I would say NAK. Current code only allocate temporary page in AGP case.
> >>>So AGP case is userspace -> temp page -> cs checker -> radeon ib.
> >>>
> >>>Non AGP is directly memcpy to radeon IB.
> >>>
> >>>Your patch allocate memory memcpy userspace to it and it will then be
> >>>memcpy to IB. Which means you introduce an extra memcpy in the process
> >>>not something we want.
> >>Totally agree. Additional to that there is no good reason to provide
> >>anything else than anonymous system memory to the CS ioctl, so the
> >>dependency between the mmap_sem and reservations are not really
> >>clear to me.
> >>
> >>Christian.
> >I think is that in other code path you take mmap_sem first then reserve
> >bo. But here we reserve bo and then we take mmap_sem because of copy
> >from user.
> >
> >Cheers,
> >Jerome
> >
> Actually the log message is a little confusing. I think the mmap_sem
> locking inversion problem is orthogonal to what's being fixed here.
> 
> This patch fixes the possible recursive bo::reserve caused by
> malicious user-space handing a pointer to ttm memory so that the ttm
> fault handler is called when bos are already reserved. That may
> cause a (possibly interruptible) livelock.
> 
> Once that is fixed, we are free to choose the mmap_sem ->
> bo::reserve locking order. Currently it's bo::reserve->mmap_sem(),
> but the hack required in the ttm fault handler is admittedly a bit
> ugly.  The plan is to change the locking order to
> mmap_sem->bo::reserve
> 
> I'm not sure if it applies to this particular case, but it should be
> possible to make sure that copy_from_user_inatomic() will always
> succeed, by making sure the pages are present using
> get_user_pages(), and release the pages after
> copy_from_user_inatomic() is done. That way there's no need for a
> double memcpy slowpath, but if the copied data is very fragmented I
> guess the resulting code may look ugly. The get_user_pages()
> function will return an error if it hits TTM pages.
> 
> /Thomas

get_user_pages + copy_from_user_inatomic is overkill. We should just
do get_user_pages which fails with ttm memory and then use copy_highpage
helper.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-10-08 16:47                     ` Jerome Glisse
@ 2013-10-08 16:58                       ` Thomas Hellstrom
  2013-10-09 12:36                         ` [RFC PATCH v2] drm/radeon: fixup locking inversion between, " Maarten Lankhorst
  2013-10-09 10:58                       ` [RFC PATCH] drm/radeon: fixup locking inversion between " Maarten Lankhorst
  1 sibling, 1 reply; 58+ messages in thread
From: Thomas Hellstrom @ 2013-10-08 16:58 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Christian König, Maarten Lankhorst, Peter Zijlstra,
	Daniel Vetter, intel-gfx, Linux Kernel Mailing List, dri-devel,
	Alex Deucher, Thomas Gleixner, Ingo Molnar

On 10/08/2013 06:47 PM, Jerome Glisse wrote:
> On Tue, Oct 08, 2013 at 06:29:35PM +0200, Thomas Hellstrom wrote:
>> On 10/08/2013 04:55 PM, Jerome Glisse wrote:
>>> On Tue, Oct 08, 2013 at 04:45:18PM +0200, Christian König wrote:
>>>> Am 08.10.2013 16:33, schrieb Jerome Glisse:
>>>>> On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
>>>>>> Allocate and copy all kernel memory before doing reservations. This prevents a locking
>>>>>> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
>>>>>> in ttm_bo_vm_fault without upsetting lockdep.
>>>>>>
>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>> I would say NAK. Current code only allocate temporary page in AGP case.
>>>>> So AGP case is userspace -> temp page -> cs checker -> radeon ib.
>>>>>
>>>>> Non AGP is directly memcpy to radeon IB.
>>>>>
>>>>> Your patch allocate memory memcpy userspace to it and it will then be
>>>>> memcpy to IB. Which means you introduce an extra memcpy in the process
>>>>> not something we want.
>>>> Totally agree. Additional to that there is no good reason to provide
>>>> anything else than anonymous system memory to the CS ioctl, so the
>>>> dependency between the mmap_sem and reservations are not really
>>>> clear to me.
>>>>
>>>> Christian.
>>> I think is that in other code path you take mmap_sem first then reserve
>>> bo. But here we reserve bo and then we take mmap_sem because of copy
>> >from user.
>>> Cheers,
>>> Jerome
>>>
>> Actually the log message is a little confusing. I think the mmap_sem
>> locking inversion problem is orthogonal to what's being fixed here.
>>
>> This patch fixes the possible recursive bo::reserve caused by
>> malicious user-space handing a pointer to ttm memory so that the ttm
>> fault handler is called when bos are already reserved. That may
>> cause a (possibly interruptible) livelock.
>>
>> Once that is fixed, we are free to choose the mmap_sem ->
>> bo::reserve locking order. Currently it's bo::reserve->mmap_sem(),
>> but the hack required in the ttm fault handler is admittedly a bit
>> ugly.  The plan is to change the locking order to
>> mmap_sem->bo::reserve
>>
>> I'm not sure if it applies to this particular case, but it should be
>> possible to make sure that copy_from_user_inatomic() will always
>> succeed, by making sure the pages are present using
>> get_user_pages(), and release the pages after
>> copy_from_user_inatomic() is done. That way there's no need for a
>> double memcpy slowpath, but if the copied data is very fragmented I
>> guess the resulting code may look ugly. The get_user_pages()
>> function will return an error if it hits TTM pages.
>>
>> /Thomas
> get_user_pages + copy_from_user_inatomic is overkill. We should just
> do get_user_pages which fails with ttm memory and then use copy_highpage
> helper.
>
> Cheers,
> Jerome
Yeah, it may well be that that's the preferred solution.

/Thomas

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations
  2013-10-08 16:47                     ` Jerome Glisse
  2013-10-08 16:58                       ` Thomas Hellstrom
@ 2013-10-09 10:58                       ` Maarten Lankhorst
  1 sibling, 0 replies; 58+ messages in thread
From: Maarten Lankhorst @ 2013-10-09 10:58 UTC (permalink / raw)
  To: Jerome Glisse, Thomas Hellstrom
  Cc: Christian König, Peter Zijlstra, Daniel Vetter, intel-gfx,
	Linux Kernel Mailing List, dri-devel, Alex Deucher,
	Thomas Gleixner, Ingo Molnar

op 08-10-13 18:47, Jerome Glisse schreef:
> On Tue, Oct 08, 2013 at 06:29:35PM +0200, Thomas Hellstrom wrote:
>> On 10/08/2013 04:55 PM, Jerome Glisse wrote:
>>> On Tue, Oct 08, 2013 at 04:45:18PM +0200, Christian König wrote:
>>>> Am 08.10.2013 16:33, schrieb Jerome Glisse:
>>>>> On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
>>>>>> Allocate and copy all kernel memory before doing reservations. This prevents a locking
>>>>>> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
>>>>>> in ttm_bo_vm_fault without upsetting lockdep.
>>>>>>
>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>> I would say NAK. Current code only allocate temporary page in AGP case.
>>>>> So AGP case is userspace -> temp page -> cs checker -> radeon ib.
>>>>>
>>>>> Non AGP is directly memcpy to radeon IB.
>>>>>
>>>>> Your patch allocate memory memcpy userspace to it and it will then be
>>>>> memcpy to IB. Which means you introduce an extra memcpy in the process
>>>>> not something we want.
>>>> Totally agree. Additional to that there is no good reason to provide
>>>> anything else than anonymous system memory to the CS ioctl, so the
>>>> dependency between the mmap_sem and reservations are not really
>>>> clear to me.
>>>>
>>>> Christian.
>>> I think is that in other code path you take mmap_sem first then reserve
>>> bo. But here we reserve bo and then we take mmap_sem because of copy
>> >from user.
>>> Cheers,
>>> Jerome
>>>
>> Actually the log message is a little confusing. I think the mmap_sem
>> locking inversion problem is orthogonal to what's being fixed here.
>>
>> This patch fixes the possible recursive bo::reserve caused by
>> malicious user-space handing a pointer to ttm memory so that the ttm
>> fault handler is called when bos are already reserved. That may
>> cause a (possibly interruptible) livelock.
>>
>> Once that is fixed, we are free to choose the mmap_sem ->
>> bo::reserve locking order. Currently it's bo::reserve->mmap_sem(),
>> but the hack required in the ttm fault handler is admittedly a bit
>> ugly.  The plan is to change the locking order to
>> mmap_sem->bo::reserve
>>
>> I'm not sure if it applies to this particular case, but it should be
>> possible to make sure that copy_from_user_inatomic() will always
>> succeed, by making sure the pages are present using
>> get_user_pages(), and release the pages after
>> copy_from_user_inatomic() is done. That way there's no need for a
>> double memcpy slowpath, but if the copied data is very fragmented I
>> guess the resulting code may look ugly. The get_user_pages()
>> function will return an error if it hits TTM pages.
>>
>> /Thomas
> get_user_pages + copy_from_user_inatomic is overkill. We should just
> do get_user_pages which fails with ttm memory and then use copy_highpage
> helper.
I don't think we have to do anything that complicated, the easiest solution seems to be to
call radeon_ib_get before calling radeon_cs_parser_relocs, and copy everything to the ib
buffer before taking the reserve lock.

~Maarten

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC PATCH v2] drm/radeon: fixup locking inversion between, mmap_sem and reservations
  2013-10-08 16:58                       ` Thomas Hellstrom
@ 2013-10-09 12:36                         ` Maarten Lankhorst
  0 siblings, 0 replies; 58+ messages in thread
From: Maarten Lankhorst @ 2013-10-09 12:36 UTC (permalink / raw)
  To: Thomas Hellstrom, Jerome Glisse
  Cc: Christian König, Peter Zijlstra, Daniel Vetter, intel-gfx,
	Linux Kernel Mailing List, dri-devel, Alex Deucher,
	Thomas Gleixner, Ingo Molnar

op 08-10-13 18:58, Thomas Hellstrom schreef:
> On 10/08/2013 06:47 PM, Jerome Glisse wrote:
>> On Tue, Oct 08, 2013 at 06:29:35PM +0200, Thomas Hellstrom wrote:
>>> On 10/08/2013 04:55 PM, Jerome Glisse wrote:
>>>> On Tue, Oct 08, 2013 at 04:45:18PM +0200, Christian König wrote:
>>>>> Am 08.10.2013 16:33, schrieb Jerome Glisse:
>>>>>> On Tue, Oct 08, 2013 at 04:14:40PM +0200, Maarten Lankhorst wrote:
>>>>>>> Allocate and copy all kernel memory before doing reservations. This prevents a locking
>>>>>>> inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
>>>>>>> in ttm_bo_vm_fault without upsetting lockdep.
>>>>>>>
>>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>>> I would say NAK. Current code only allocate temporary page in AGP case.
>>>>>> So AGP case is userspace -> temp page -> cs checker -> radeon ib.
>>>>>>
>>>>>> Non AGP is directly memcpy to radeon IB.
>>>>>>
>>>>>> Your patch allocate memory memcpy userspace to it and it will then be
>>>>>> memcpy to IB. Which means you introduce an extra memcpy in the process
>>>>>> not something we want.
>>>>> Totally agree. Additional to that there is no good reason to provide
>>>>> anything else than anonymous system memory to the CS ioctl, so the
>>>>> dependency between the mmap_sem and reservations are not really
>>>>> clear to me.
>>>>>
>>>>> Christian.
>>>> I think is that in other code path you take mmap_sem first then reserve
>>>> bo. But here we reserve bo and then we take mmap_sem because of copy
>>> >from user.
>>>> Cheers,
>>>> Jerome
>>>>
>>> Actually the log message is a little confusing. I think the mmap_sem
>>> locking inversion problem is orthogonal to what's being fixed here.
>>>
>>> This patch fixes the possible recursive bo::reserve caused by
>>> malicious user-space handing a pointer to ttm memory so that the ttm
>>> fault handler is called when bos are already reserved. That may
>>> cause a (possibly interruptible) livelock.
>>>
>>> Once that is fixed, we are free to choose the mmap_sem ->
>>> bo::reserve locking order. Currently it's bo::reserve->mmap_sem(),
>>> but the hack required in the ttm fault handler is admittedly a bit
>>> ugly.  The plan is to change the locking order to
>>> mmap_sem->bo::reserve
>>>
>>> I'm not sure if it applies to this particular case, but it should be
>>> possible to make sure that copy_from_user_inatomic() will always
>>> succeed, by making sure the pages are present using
>>> get_user_pages(), and release the pages after
>>> copy_from_user_inatomic() is done. That way there's no need for a
>>> double memcpy slowpath, but if the copied data is very fragmented I
>>> guess the resulting code may look ugly. The get_user_pages()
>>> function will return an error if it hits TTM pages.
>>>
>>> /Thomas
>> get_user_pages + copy_from_user_inatomic is overkill. We should just
>> do get_user_pages which fails with ttm memory and then use copy_highpage
>> helper.
>>
>> Cheers,
>> Jerome
> Yeah, it may well be that that's the preferred solution.
>
> /Thomas
>
I still disagree, and shuffled radeon_ib_get around to be called sooner.

How does the patch below look?
8<-------
Allocate and copy all kernel memory before doing reservations. This prevents a locking
inversion between mmap_sem and reservation_class, and allows us to drop the trylocking
in ttm_bo_vm_fault without upsetting lockdep.

Changes since v1:
- Kill extra memcpy for !AGP case.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/radeon/r600_cs.c   |  16 +-
 drivers/gpu/drm/radeon/radeon.h    |  15 +-
 drivers/gpu/drm/radeon/radeon_cs.c | 298 ++++++++++++-------------------------
 3 files changed, 106 insertions(+), 223 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index 01a3ec8..1abaa2b 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -2328,13 +2328,8 @@ static void r600_cs_parser_fini(struct radeon_cs_parser *parser, int error)
 	unsigned i;
 
 	kfree(parser->relocs);
-	for (i = 0; i < parser->nchunks; i++) {
-		kfree(parser->chunks[i].kdata);
-		if (parser->rdev && (parser->rdev->flags & RADEON_IS_AGP)) {
-			kfree(parser->chunks[i].kpage[0]);
-			kfree(parser->chunks[i].kpage[1]);
-		}
-	}
+	for (i = 0; i < parser->nchunks; i++)
+		drm_free_large(parser->chunks[i].kdata);
 	kfree(parser->chunks);
 	kfree(parser->chunks_array);
 }
@@ -2391,13 +2386,12 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
 	ib_chunk = &parser.chunks[parser.chunk_ib_idx];
 	parser.ib.length_dw = ib_chunk->length_dw;
 	*l = parser.ib.length_dw;
-	r = r600_cs_parse(&parser);
-	if (r) {
-		DRM_ERROR("Invalid command stream !\n");
+	if (DRM_COPY_FROM_USER(ib, ib_chunk->user_ptr, ib_chunk->length_dw * 4)) {
+		r = -EFAULT;
 		r600_cs_parser_fini(&parser, r);
 		return r;
 	}
-	r = radeon_cs_finish_pages(&parser);
+	r = r600_cs_parse(&parser);
 	if (r) {
 		DRM_ERROR("Invalid command stream !\n");
 		r600_cs_parser_fini(&parser, r);
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index a400ac1..837c577 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -967,12 +967,8 @@ struct radeon_cs_reloc {
 struct radeon_cs_chunk {
 	uint32_t		chunk_id;
 	uint32_t		length_dw;
-	int			kpage_idx[2];
-	uint32_t		*kpage[2];
 	uint32_t		*kdata;
 	void __user		*user_ptr;
-	int			last_copied_page;
-	int			last_page_index;
 };
 
 struct radeon_cs_parser {
@@ -1007,8 +1003,15 @@ struct radeon_cs_parser {
 	struct ww_acquire_ctx	ticket;
 };
 
-extern int radeon_cs_finish_pages(struct radeon_cs_parser *p);
-extern u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx);
+static inline u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
+{
+	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
+
+	if (ibc->kdata)
+		return ibc->kdata[idx];
+	return p->ib.ptr[idx];
+}
+
 
 struct radeon_cs_packet {
 	unsigned	idx;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 66c2228..c8ab019 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -213,9 +213,7 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 			return -EFAULT;
 		}
 		p->chunks[i].length_dw = user_chunk.length_dw;
-		p->chunks[i].kdata = NULL;
 		p->chunks[i].chunk_id = user_chunk.chunk_id;
-		p->chunks[i].user_ptr = (void __user *)(unsigned long)user_chunk.chunk_data;
 		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) {
 			p->chunk_relocs_idx = i;
 		}
@@ -238,25 +236,31 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 				return -EINVAL;
 		}
 
-		cdata = (uint32_t *)(unsigned long)user_chunk.chunk_data;
-		if ((p->chunks[i].chunk_id == RADEON_CHUNK_ID_RELOCS) ||
-		    (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS)) {
-			size = p->chunks[i].length_dw * sizeof(uint32_t);
-			p->chunks[i].kdata = kmalloc(size, GFP_KERNEL);
-			if (p->chunks[i].kdata == NULL) {
-				return -ENOMEM;
-			}
-			if (DRM_COPY_FROM_USER(p->chunks[i].kdata,
-					       p->chunks[i].user_ptr, size)) {
-				return -EFAULT;
-			}
-			if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
-				p->cs_flags = p->chunks[i].kdata[0];
-				if (p->chunks[i].length_dw > 1)
-					ring = p->chunks[i].kdata[1];
-				if (p->chunks[i].length_dw > 2)
-					priority = (s32)p->chunks[i].kdata[2];
-			}
+		size = p->chunks[i].length_dw;
+		cdata = (void __user *)(unsigned long)user_chunk.chunk_data;
+		p->chunks[i].user_ptr = cdata;
+		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_CONST_IB)
+			continue;
+
+		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_IB) {
+			if (!p->rdev || !(p->rdev->flags & RADEON_IS_AGP))
+				continue;
+		}
+
+		p->chunks[i].kdata = drm_malloc_ab(size, sizeof(uint32_t));
+		size *= sizeof(uint32_t);
+		if (p->chunks[i].kdata == NULL) {
+			return -ENOMEM;
+		}
+		if (DRM_COPY_FROM_USER(p->chunks[i].kdata, cdata, size)) {
+			return -EFAULT;
+		}
+		if (p->chunks[i].chunk_id == RADEON_CHUNK_ID_FLAGS) {
+			p->cs_flags = p->chunks[i].kdata[0];
+			if (p->chunks[i].length_dw > 1)
+				ring = p->chunks[i].kdata[1];
+			if (p->chunks[i].length_dw > 2)
+				priority = (s32)p->chunks[i].kdata[2];
 		}
 	}
 
@@ -279,34 +283,6 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 		}
 	}
 
-	/* deal with non-vm */
-	if ((p->chunk_ib_idx != -1) &&
-	    ((p->cs_flags & RADEON_CS_USE_VM) == 0) &&
-	    (p->chunks[p->chunk_ib_idx].chunk_id == RADEON_CHUNK_ID_IB)) {
-		if (p->chunks[p->chunk_ib_idx].length_dw > (16 * 1024)) {
-			DRM_ERROR("cs IB too big: %d\n",
-				  p->chunks[p->chunk_ib_idx].length_dw);
-			return -EINVAL;
-		}
-		if (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) {
-			p->chunks[p->chunk_ib_idx].kpage[0] = kmalloc(PAGE_SIZE, GFP_KERNEL);
-			p->chunks[p->chunk_ib_idx].kpage[1] = kmalloc(PAGE_SIZE, GFP_KERNEL);
-			if (p->chunks[p->chunk_ib_idx].kpage[0] == NULL ||
-			    p->chunks[p->chunk_ib_idx].kpage[1] == NULL) {
-				kfree(p->chunks[p->chunk_ib_idx].kpage[0]);
-				kfree(p->chunks[p->chunk_ib_idx].kpage[1]);
-				p->chunks[p->chunk_ib_idx].kpage[0] = NULL;
-				p->chunks[p->chunk_ib_idx].kpage[1] = NULL;
-				return -ENOMEM;
-			}
-		}
-		p->chunks[p->chunk_ib_idx].kpage_idx[0] = -1;
-		p->chunks[p->chunk_ib_idx].kpage_idx[1] = -1;
-		p->chunks[p->chunk_ib_idx].last_copied_page = -1;
-		p->chunks[p->chunk_ib_idx].last_page_index =
-			((p->chunks[p->chunk_ib_idx].length_dw * 4) - 1) / PAGE_SIZE;
-	}
-
 	return 0;
 }
 
@@ -340,13 +316,8 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 	kfree(parser->track);
 	kfree(parser->relocs);
 	kfree(parser->relocs_ptr);
-	for (i = 0; i < parser->nchunks; i++) {
-		kfree(parser->chunks[i].kdata);
-		if ((parser->rdev->flags & RADEON_IS_AGP)) {
-			kfree(parser->chunks[i].kpage[0]);
-			kfree(parser->chunks[i].kpage[1]);
-		}
-	}
+	for (i = 0; i < parser->nchunks; i++)
+		drm_free_large(parser->chunks[i].kdata);
 	kfree(parser->chunks);
 	kfree(parser->chunks_array);
 	radeon_ib_free(parser->rdev, &parser->ib);
@@ -356,7 +327,6 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 static int radeon_cs_ib_chunk(struct radeon_device *rdev,
 			      struct radeon_cs_parser *parser)
 {
-	struct radeon_cs_chunk *ib_chunk;
 	int r;
 
 	if (parser->chunk_ib_idx == -1)
@@ -365,28 +335,11 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
 	if (parser->cs_flags & RADEON_CS_USE_VM)
 		return 0;
 
-	ib_chunk = &parser->chunks[parser->chunk_ib_idx];
-	/* Copy the packet into the IB, the parser will read from the
-	 * input memory (cached) and write to the IB (which can be
-	 * uncached).
-	 */
-	r =  radeon_ib_get(rdev, parser->ring, &parser->ib,
-			   NULL, ib_chunk->length_dw * 4);
-	if (r) {
-		DRM_ERROR("Failed to get ib !\n");
-		return r;
-	}
-	parser->ib.length_dw = ib_chunk->length_dw;
 	r = radeon_cs_parse(rdev, parser->ring, parser);
 	if (r || parser->parser_error) {
 		DRM_ERROR("Invalid command stream !\n");
 		return r;
 	}
-	r = radeon_cs_finish_pages(parser);
-	if (r) {
-		DRM_ERROR("Invalid command stream !\n");
-		return r;
-	}
 
 	if (parser->ring == R600_RING_TYPE_UVD_INDEX)
 		radeon_uvd_note_usage(rdev);
@@ -424,7 +377,6 @@ static int radeon_bo_vm_update_pte(struct radeon_cs_parser *parser,
 static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 				 struct radeon_cs_parser *parser)
 {
-	struct radeon_cs_chunk *ib_chunk;
 	struct radeon_fpriv *fpriv = parser->filp->driver_priv;
 	struct radeon_vm *vm = &fpriv->vm;
 	int r;
@@ -434,49 +386,13 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 	if ((parser->cs_flags & RADEON_CS_USE_VM) == 0)
 		return 0;
 
-	if ((rdev->family >= CHIP_TAHITI) &&
-	    (parser->chunk_const_ib_idx != -1)) {
-		ib_chunk = &parser->chunks[parser->chunk_const_ib_idx];
-		if (ib_chunk->length_dw > RADEON_IB_VM_MAX_SIZE) {
-			DRM_ERROR("cs IB CONST too big: %d\n", ib_chunk->length_dw);
-			return -EINVAL;
-		}
-		r =  radeon_ib_get(rdev, parser->ring, &parser->const_ib,
-				   vm, ib_chunk->length_dw * 4);
-		if (r) {
-			DRM_ERROR("Failed to get const ib !\n");
-			return r;
-		}
-		parser->const_ib.is_const_ib = true;
-		parser->const_ib.length_dw = ib_chunk->length_dw;
-		/* Copy the packet into the IB */
-		if (DRM_COPY_FROM_USER(parser->const_ib.ptr, ib_chunk->user_ptr,
-				       ib_chunk->length_dw * 4)) {
-			return -EFAULT;
-		}
+	if (parser->const_ib.length_dw) {
 		r = radeon_ring_ib_parse(rdev, parser->ring, &parser->const_ib);
 		if (r) {
 			return r;
 		}
 	}
 
-	ib_chunk = &parser->chunks[parser->chunk_ib_idx];
-	if (ib_chunk->length_dw > RADEON_IB_VM_MAX_SIZE) {
-		DRM_ERROR("cs IB too big: %d\n", ib_chunk->length_dw);
-		return -EINVAL;
-	}
-	r =  radeon_ib_get(rdev, parser->ring, &parser->ib,
-			   vm, ib_chunk->length_dw * 4);
-	if (r) {
-		DRM_ERROR("Failed to get ib !\n");
-		return r;
-	}
-	parser->ib.length_dw = ib_chunk->length_dw;
-	/* Copy the packet into the IB */
-	if (DRM_COPY_FROM_USER(parser->ib.ptr, ib_chunk->user_ptr,
-			       ib_chunk->length_dw * 4)) {
-		return -EFAULT;
-	}
 	r = radeon_ring_ib_parse(rdev, parser->ring, &parser->ib);
 	if (r) {
 		return r;
@@ -528,6 +444,62 @@ static int radeon_cs_handle_lockup(struct radeon_device *rdev, int r)
 	return r;
 }
 
+static int radeon_cs_ib_fill(struct radeon_device *rdev, struct radeon_cs_parser *parser)
+{
+	struct radeon_cs_chunk *ib_chunk;
+	struct radeon_vm *vm = NULL;
+	int r;
+
+	if (parser->chunk_ib_idx == -1)
+		return 0;
+
+	if (parser->cs_flags & RADEON_CS_USE_VM) {
+		struct radeon_fpriv *fpriv = parser->filp->driver_priv;
+		vm = &fpriv->vm;
+
+		if ((rdev->family >= CHIP_TAHITI) &&
+		    (parser->chunk_const_ib_idx != -1)) {
+			ib_chunk = &parser->chunks[parser->chunk_const_ib_idx];
+			if (ib_chunk->length_dw > RADEON_IB_VM_MAX_SIZE) {
+				DRM_ERROR("cs IB CONST too big: %d\n", ib_chunk->length_dw);
+				return -EINVAL;
+			}
+			r =  radeon_ib_get(rdev, parser->ring, &parser->const_ib,
+					   vm, ib_chunk->length_dw * 4);
+			if (r) {
+				DRM_ERROR("Failed to get const ib !\n");
+				return r;
+			}
+			parser->const_ib.is_const_ib = true;
+			parser->const_ib.length_dw = ib_chunk->length_dw;
+			if (DRM_COPY_FROM_USER(parser->const_ib.ptr,
+					       ib_chunk->user_ptr,
+					       ib_chunk->length_dw * 4))
+				return -EFAULT;
+		}
+
+		ib_chunk = &parser->chunks[parser->chunk_ib_idx];
+		if (ib_chunk->length_dw > RADEON_IB_VM_MAX_SIZE) {
+			DRM_ERROR("cs IB too big: %d\n", ib_chunk->length_dw);
+			return -EINVAL;
+		}
+	}
+	ib_chunk = &parser->chunks[parser->chunk_ib_idx];
+
+	r =  radeon_ib_get(rdev, parser->ring, &parser->ib,
+			   vm, ib_chunk->length_dw * 4);
+	if (r) {
+		DRM_ERROR("Failed to get ib !\n");
+		return r;
+	}
+	parser->ib.length_dw = ib_chunk->length_dw;
+	if (ib_chunk->kdata)
+		memcpy(parser->ib.ptr, ib_chunk->kdata, ib_chunk->length_dw * 4);
+	else if (DRM_COPY_FROM_USER(parser->ib.ptr, ib_chunk->user_ptr, ib_chunk->length_dw * 4))
+		return -EFAULT;
+	return 0;
+}
+
 int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 {
 	struct radeon_device *rdev = dev->dev_private;
@@ -553,10 +525,15 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		r = radeon_cs_handle_lockup(rdev, r);
 		return r;
 	}
-	r = radeon_cs_parser_relocs(&parser);
-	if (r) {
-		if (r != -ERESTARTSYS)
+
+	r = radeon_cs_ib_fill(rdev, &parser);
+	if (!r) {
+		r = radeon_cs_parser_relocs(&parser);
+		if (r && r != -ERESTARTSYS)
 			DRM_ERROR("Failed to parse relocation %d!\n", r);
+	}
+
+	if (r) {
 		radeon_cs_parser_fini(&parser, r, false);
 		up_read(&rdev->exclusive_lock);
 		r = radeon_cs_handle_lockup(rdev, r);
@@ -580,97 +557,6 @@ out:
 	return r;
 }
 
-int radeon_cs_finish_pages(struct radeon_cs_parser *p)
-{
-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
-	int i;
-	int size = PAGE_SIZE;
-
-	for (i = ibc->last_copied_page + 1; i <= ibc->last_page_index; i++) {
-		if (i == ibc->last_page_index) {
-			size = (ibc->length_dw * 4) % PAGE_SIZE;
-			if (size == 0)
-				size = PAGE_SIZE;
-		}
-		
-		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
-				       ibc->user_ptr + (i * PAGE_SIZE),
-				       size))
-			return -EFAULT;
-	}
-	return 0;
-}
-
-static int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx)
-{
-	int new_page;
-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
-	int i;
-	int size = PAGE_SIZE;
-	bool copy1 = (p->rdev && (p->rdev->flags & RADEON_IS_AGP)) ?
-		false : true;
-
-	for (i = ibc->last_copied_page + 1; i < pg_idx; i++) {
-		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
-				       ibc->user_ptr + (i * PAGE_SIZE),
-				       PAGE_SIZE)) {
-			p->parser_error = -EFAULT;
-			return 0;
-		}
-	}
-
-	if (pg_idx == ibc->last_page_index) {
-		size = (ibc->length_dw * 4) % PAGE_SIZE;
-		if (size == 0)
-			size = PAGE_SIZE;
-	}
-
-	new_page = ibc->kpage_idx[0] < ibc->kpage_idx[1] ? 0 : 1;
-	if (copy1)
-		ibc->kpage[new_page] = p->ib.ptr + (pg_idx * (PAGE_SIZE / 4));
-
-	if (DRM_COPY_FROM_USER(ibc->kpage[new_page],
-			       ibc->user_ptr + (pg_idx * PAGE_SIZE),
-			       size)) {
-		p->parser_error = -EFAULT;
-		return 0;
-	}
-
-	/* copy to IB for non single case */
-	if (!copy1)
-		memcpy((void *)(p->ib.ptr+(pg_idx*(PAGE_SIZE/4))), ibc->kpage[new_page], size);
-
-	ibc->last_copied_page = pg_idx;
-	ibc->kpage_idx[new_page] = pg_idx;
-
-	return new_page;
-}
-
-u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
-{
-	struct radeon_cs_chunk *ibc = &p->chunks[p->chunk_ib_idx];
-	u32 pg_idx, pg_offset;
-	u32 idx_value = 0;
-	int new_page;
-
-	pg_idx = (idx * 4) / PAGE_SIZE;
-	pg_offset = (idx * 4) % PAGE_SIZE;
-
-	if (ibc->kpage_idx[0] == pg_idx)
-		return ibc->kpage[0][pg_offset/4];
-	if (ibc->kpage_idx[1] == pg_idx)
-		return ibc->kpage[1][pg_offset/4];
-
-	new_page = radeon_cs_update_pages(p, pg_idx);
-	if (new_page < 0) {
-		p->parser_error = new_page;
-		return 0;
-	}
-
-	idx_value = ibc->kpage[new_page][pg_offset/4];
-	return idx_value;
-}
-
 /**
  * radeon_cs_packet_parse() - parse cp packet and point ib index to next packet
  * @parser:	parser structure holding parsing context.


^ permalink raw reply related	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2013-10-09 12:37 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-12 15:06 [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Peter Zijlstra
2013-09-12 15:11 ` Maarten Lankhorst
2013-09-12 15:14   ` Peter Zijlstra
2013-09-12 15:36 ` Daniel Vetter
2013-09-12 15:43   ` Peter Zijlstra
2013-09-12 15:58     ` Daniel Vetter
2013-09-12 16:22       ` Peter Zijlstra
2013-09-12 16:35         ` Chris Wilson
2013-09-12 20:30           ` Peter Zijlstra
2013-09-12 20:37             ` Thomas Gleixner
2013-09-12 19:52         ` Daniel Vetter
2013-09-12 19:58           ` Thomas Gleixner
2013-09-12 20:04             ` Daniel Vetter
2013-09-12 20:20               ` Thomas Gleixner
2013-09-12 20:23                 ` Daniel Vetter
2013-09-12 20:39                   ` Thomas Gleixner
2013-09-12 20:48                     ` Thomas Hellstrom
2013-09-12 16:33       ` Thomas Hellstrom
2013-09-12 15:45   ` Maarten Lankhorst
2013-09-12 16:44     ` Thomas Hellstrom
2013-09-12 19:48       ` Daniel Vetter
2013-09-12 21:50       ` Maarten Lankhorst
2013-09-13  5:33         ` Thomas Hellstrom
2013-09-13  8:26           ` Daniel Vetter
2013-10-08 14:14           ` [RFC PATCH] drm/radeon: fixup locking inversion between mmap_sem and reservations Maarten Lankhorst
2013-10-08 14:33             ` Jerome Glisse
2013-10-08 14:45               ` Christian König
2013-10-08 14:55                 ` Jerome Glisse
2013-10-08 16:29                   ` Thomas Hellstrom
2013-10-08 16:47                     ` Jerome Glisse
2013-10-08 16:58                       ` Thomas Hellstrom
2013-10-09 12:36                         ` [RFC PATCH v2] drm/radeon: fixup locking inversion between, " Maarten Lankhorst
2013-10-09 10:58                       ` [RFC PATCH] drm/radeon: fixup locking inversion between " Maarten Lankhorst
2013-10-08 14:45               ` Maarten Lankhorst
2013-09-13  6:44         ` [BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE Thomas Hellstrom
2013-09-13  7:16           ` Maarten Lankhorst
2013-09-13  7:46             ` Thomas Hellstrom
2013-09-13  7:51               ` Maarten Lankhorst
2013-09-13  8:23                 ` Thomas Hellstrom
2013-09-13  8:32                   ` Daniel Vetter
2013-09-13  8:39                     ` Thomas Hellstrom
2013-09-13  8:58                   ` Maarten Lankhorst
2013-09-13  9:21                     ` Thomas Hellstrom
2013-09-13  8:29               ` Peter Zijlstra
2013-09-13  8:41                 ` Daniel Vetter
2013-09-13  9:00                   ` Peter Zijlstra
2013-09-23 15:33                     ` [RFC PATCH] drm/nouveau: fix nested locking in mmap handler Maarten Lankhorst
2013-09-24  7:22                       ` Thomas Hellstrom
2013-09-24  7:34                         ` Maarten Lankhorst
2013-09-24  8:20                           ` Ingo Molnar
2013-09-24  9:03                           ` Thomas Hellstrom
2013-09-24  9:36                             ` Daniel Vetter
2013-09-24 10:11                               ` Maarten Lankhorst
2013-09-24 10:33                                 ` Thomas Hellstrom
2013-09-24 11:32                                   ` Maarten Lankhorst
2013-09-24 17:04                                     ` Thomas Hellstrom
2013-09-24  9:43                             ` Maarten Lankhorst
2013-09-24 17:53                               ` Thomas Hellstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).