kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible use_mm() mis-uses
@ 2018-08-22 16:44 Linus Torvalds
       [not found] ` <CA+55aFz+Unj0zVNd79vpd41mtee3DV6tp_Ozr7WaZscqCQc9hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Linus Torvalds @ 2018-08-22 16:44 UTC (permalink / raw)
  To: Oded Gabbay, Alex Deucher, Christian König,
	David (ChunMing) Zhou, David Airlie, Michael S. Tsirkin,
	Jason Wang, Paolo Bonzini, Radim Krčmář,
	Zhenyu Wang, Zhi Wang, Jani Nikula, Joonas Lahtinen,
	Felipe Balbi, Greg Kroah-Hartman
  Cc: KVM list, intel-gfx, USB list, amd-gfx, DRI, intel-gvt-dev

Guys and gals,
 this is a *very* random list of people on the recipients list, but we
had a subtle TLB shootdown issue in the VM, and that brought up some
issues when people then went through the code more carefully.

I think we have a handle on the TLB shootdown bug itself. But when
people were discussing all the possible situations, one thing that
came up was "use_mm()" that takes a mm, and makes it temporarily the
mm for a kernel thread (until "unuse_mm()", duh).

And it turns out that some of those uses are definitely wrong, some of
them are right, and some of them are suspect or at least so overly
complicated that it's hard for the VM people to know if they are ok.

Basically, the rule for "use_mm()" is that the mm in question *has* to
have a valid page table associated with it over the whole use_mm() ->
unuse_mm() sequence. That may sound obvious, and I guess it actually
is so obvious that there isn't even a comment about it, but the actual
users are showing that it's sadly apparently not so obvious after all.

There is one user that uses the "obviously correct" model: the vhost
driver does a "mmget()" and "mmput()" pair around its use of it,
thanks to vhost_dev_set_owner() doing a

        dev->mm = get_task_mm(current);

to look up the mm, and then the teardown case does a

        if (dev->mm)
                mmput(dev->mm);
        dev->mm = NULL;

This is the *right* sequence. A gold star to the vhost people.

Sadly, the vhost people are the only ones who seem to get things
unquestionably right. And some of those gold star people are also
apparently involved in the cases that didn't get things right.

An example of something that *isn't* right, is the i915 kvm interface,
which does

        use_mm(kvm->mm);

on an mm that was initialized in virt/kvm/kvm_main.c using

        mmgrab(current->mm);
        kvm->mm = current->mm;

which is *not* right. Using "mmgrab()" does indeed guarantee the
lifetime of the 'struct mm_struct' itself, but it does *not* guarantee
the lifetime of the page tables. You need to use "mmget()" and
"mmput()", which get the reference to the actual process address
space!

Now, it is *possible* that the kvm use is correct too, because kvm
does register a mmu_notifier chain, and in theory you can avoid the
proper refcounting by just making sure the mmu "release" notifier
kills any existing uses, but I don't really see kvm doing that. Kvm
does register a release notifier, but that just flushes the shadow
page tables, it doesn't kill any use_mm() use by some i915 use case.

So while the vhost use looks right, the kvm/i915 use looks definitely wrong.

The other users of "use_mm()" and "unuse_mm()" are less
black-and-white right vs wrong..

One of the complex ones is the amdgpu driver. It does a
"use_mm(mmptr)" deep deep in the guts of a macro that ends up being
used in fa few places, and it's very hard to tell if it's right.

It looks almost certainly buggy (there is no mmget/mmput to get the
refcount), but there _is_ a "release" mmu_notifier function and that
one - unlike the kvm case - looks like it might actually be trying to
flush the existing pending users of that mm.

But on the whole, I'm suspicious of the amdgpu use. It smells. Jann
Horn pointed out that even if it migth be ok due to the mmu_notifier,
the comments are garbage:

>  Where "process" in the uniquely-named "struct queue" is a "struct
>  kfd_process"; that struct's definition has this comment in it:
>
>        /*
>         * Opaque pointer to mm_struct. We don't hold a reference to
>         * it so it should never be dereferenced from here. This is
>         * only used for looking up processes by their mm.
>         */
>        void *mm;
>
>  So I think either that comment is wrong, or their code is wrong?

so I'm chalking the amdgpu use up in the "broken" column.

It's also actually quite hard to synchronze with some other kernel
worker thread correctly, so just on general principles, if you use
"use_mm()" it really really should be on something that you've
properly gotten a mm refcount on with mmget(). Really. Even if you try
to synchronize things.

The two final cases are two uses in the USB gadget driver. Again, they
don't have the proper mmget/mmput, but they may br ok simply because
the uses are done for AIO, and the VM teardown is preceded by an AIO
teardown, so the proper serialization may come in from that.

Anyway, sorry for the long email, and the big list of participants and
odd mailing lists, but I'd really like people to look at their
"use_mm()" cases, and ask themselves if they have done enough to
guarantee that the full mm exists. Again, "mmgrab()" is *not* enough
on its own. You need either "mmget()" or some lifetime guarantee.

And if you do have those lifetime guarantees, it would be really nice
to get a good explanatory comment about said lifetime guarantees above
the "use_mm()" call. Ok?

Note that the lifetime rules are very important, because obviously
use_mm() itself is never called in the context of the process that has
the mm. By definition, we're in a kernel thread and it uses somebody
elses mm. So it's important to show that that "somebody else" really
is serialized with the kernel thread somehow, and keeps the mm alive.
The easiest way by far to have that guarantee is to have that
"mmget/mmput" model.

Please? Even if you're convinced your code is right, just a comment
about _why_ it's right, ok?

And no, use_mm() cannot just do the mmget internally, only to have
unuse_mm() do the mmput().  That was our first reaction when looking
at this, but if the caller has screwed up the lifetime rules, it's not
actually clear that the mm is guaranteed to be around even when use_mm
is entered. So the onus of proper lifetime rules really is on the
caller, not on use_mm()/unuse_mm().

             Linus
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
       [not found] ` <CA+55aFz+Unj0zVNd79vpd41mtee3DV6tp_Ozr7WaZscqCQc9hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-08-22 18:13   ` Christian König
  2018-08-22 19:44     ` Felix Kuehling
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2018-08-22 18:13 UTC (permalink / raw)
  To: Linus Torvalds, Oded Gabbay, Alex Deucher, David (ChunMing) Zhou,
	David Airlie, Michael S. Tsirkin, Jason Wang, Paolo Bonzini,
	Radim Krčmář,
	Zhenyu Wang, Zhi Wang, Jani Nikula, Joonas Lahtinen,
	Felipe Balbi, Greg Kroah-Hartman, Felix Kuehling
  Cc: KVM list, intel-gfx, USB list,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, DRI,
	intel-gvt-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Adding Felix because the KFD part of amdgpu is actually his responsibility.

If I'm not completely mistaken the release callback of the mmu_notifier 
should take care of that for amdgpu.

Regards,
Christian.

Am 22.08.2018 um 18:44 schrieb Linus Torvalds:
> Guys and gals,
>   this is a *very* random list of people on the recipients list, but we
> had a subtle TLB shootdown issue in the VM, and that brought up some
> issues when people then went through the code more carefully.
>
> I think we have a handle on the TLB shootdown bug itself. But when
> people were discussing all the possible situations, one thing that
> came up was "use_mm()" that takes a mm, and makes it temporarily the
> mm for a kernel thread (until "unuse_mm()", duh).
>
> And it turns out that some of those uses are definitely wrong, some of
> them are right, and some of them are suspect or at least so overly
> complicated that it's hard for the VM people to know if they are ok.
>
> Basically, the rule for "use_mm()" is that the mm in question *has* to
> have a valid page table associated with it over the whole use_mm() ->
> unuse_mm() sequence. That may sound obvious, and I guess it actually
> is so obvious that there isn't even a comment about it, but the actual
> users are showing that it's sadly apparently not so obvious after all.
>
> There is one user that uses the "obviously correct" model: the vhost
> driver does a "mmget()" and "mmput()" pair around its use of it,
> thanks to vhost_dev_set_owner() doing a
>
>          dev->mm = get_task_mm(current);
>
> to look up the mm, and then the teardown case does a
>
>          if (dev->mm)
>                  mmput(dev->mm);
>          dev->mm = NULL;
>
> This is the *right* sequence. A gold star to the vhost people.
>
> Sadly, the vhost people are the only ones who seem to get things
> unquestionably right. And some of those gold star people are also
> apparently involved in the cases that didn't get things right.
>
> An example of something that *isn't* right, is the i915 kvm interface,
> which does
>
>          use_mm(kvm->mm);
>
> on an mm that was initialized in virt/kvm/kvm_main.c using
>
>          mmgrab(current->mm);
>          kvm->mm = current->mm;
>
> which is *not* right. Using "mmgrab()" does indeed guarantee the
> lifetime of the 'struct mm_struct' itself, but it does *not* guarantee
> the lifetime of the page tables. You need to use "mmget()" and
> "mmput()", which get the reference to the actual process address
> space!
>
> Now, it is *possible* that the kvm use is correct too, because kvm
> does register a mmu_notifier chain, and in theory you can avoid the
> proper refcounting by just making sure the mmu "release" notifier
> kills any existing uses, but I don't really see kvm doing that. Kvm
> does register a release notifier, but that just flushes the shadow
> page tables, it doesn't kill any use_mm() use by some i915 use case.
>
> So while the vhost use looks right, the kvm/i915 use looks definitely wrong.
>
> The other users of "use_mm()" and "unuse_mm()" are less
> black-and-white right vs wrong..
>
> One of the complex ones is the amdgpu driver. It does a
> "use_mm(mmptr)" deep deep in the guts of a macro that ends up being
> used in fa few places, and it's very hard to tell if it's right.
>
> It looks almost certainly buggy (there is no mmget/mmput to get the
> refcount), but there _is_ a "release" mmu_notifier function and that
> one - unlike the kvm case - looks like it might actually be trying to
> flush the existing pending users of that mm.
>
> But on the whole, I'm suspicious of the amdgpu use. It smells. Jann
> Horn pointed out that even if it migth be ok due to the mmu_notifier,
> the comments are garbage:
>
>>   Where "process" in the uniquely-named "struct queue" is a "struct
>>   kfd_process"; that struct's definition has this comment in it:
>>
>>         /*
>>          * Opaque pointer to mm_struct. We don't hold a reference to
>>          * it so it should never be dereferenced from here. This is
>>          * only used for looking up processes by their mm.
>>          */
>>         void *mm;
>>
>>   So I think either that comment is wrong, or their code is wrong?
> so I'm chalking the amdgpu use up in the "broken" column.
>
> It's also actually quite hard to synchronze with some other kernel
> worker thread correctly, so just on general principles, if you use
> "use_mm()" it really really should be on something that you've
> properly gotten a mm refcount on with mmget(). Really. Even if you try
> to synchronize things.
>
> The two final cases are two uses in the USB gadget driver. Again, they
> don't have the proper mmget/mmput, but they may br ok simply because
> the uses are done for AIO, and the VM teardown is preceded by an AIO
> teardown, so the proper serialization may come in from that.
>
> Anyway, sorry for the long email, and the big list of participants and
> odd mailing lists, but I'd really like people to look at their
> "use_mm()" cases, and ask themselves if they have done enough to
> guarantee that the full mm exists. Again, "mmgrab()" is *not* enough
> on its own. You need either "mmget()" or some lifetime guarantee.
>
> And if you do have those lifetime guarantees, it would be really nice
> to get a good explanatory comment about said lifetime guarantees above
> the "use_mm()" call. Ok?
>
> Note that the lifetime rules are very important, because obviously
> use_mm() itself is never called in the context of the process that has
> the mm. By definition, we're in a kernel thread and it uses somebody
> elses mm. So it's important to show that that "somebody else" really
> is serialized with the kernel thread somehow, and keeps the mm alive.
> The easiest way by far to have that guarantee is to have that
> "mmget/mmput" model.
>
> Please? Even if you're convinced your code is right, just a comment
> about _why_ it's right, ok?
>
> And no, use_mm() cannot just do the mmget internally, only to have
> unuse_mm() do the mmput().  That was our first reaction when looking
> at this, but if the caller has screwed up the lifetime rules, it's not
> actually clear that the mm is guaranteed to be around even when use_mm
> is entered. So the onus of proper lifetime rules really is on the
> caller, not on use_mm()/unuse_mm().
>
>               Linus

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 16:44 Possible use_mm() mis-uses Linus Torvalds
       [not found] ` <CA+55aFz+Unj0zVNd79vpd41mtee3DV6tp_Ozr7WaZscqCQc9hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-08-22 18:20 ` Paolo Bonzini
  2018-08-22 18:33   ` Linus Torvalds
       [not found]   ` <e50816ec-cf5e-4848-93d0-dacc28f816fc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2018-08-22 19:05 ` Zhi Wang
  2018-08-22 19:37 ` Oded Gabbay
  3 siblings, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2018-08-22 18:20 UTC (permalink / raw)
  To: Linus Torvalds, Oded Gabbay, Alex Deucher, Christian König,
	David (ChunMing) Zhou, David Airlie, Michael S. Tsirkin,
	Jason Wang, Radim Krčmář,
	Zhenyu Wang, Zhi Wang, Jani Nikula, Joonas Lahtinen,
	Felipe Balbi, Greg Kroah-Hartman
  Cc: KVM list, intel-gfx, USB list, amd-gfx, DRI, intel-gvt-dev

On 22/08/2018 18:44, Linus Torvalds wrote:
> An example of something that *isn't* right, is the i915 kvm interface,
> which does
> 
>         use_mm(kvm->mm);
> 
> on an mm that was initialized in virt/kvm/kvm_main.c using
> 
>         mmgrab(current->mm);
>         kvm->mm = current->mm;
> 
> which is *not* right. Using "mmgrab()" does indeed guarantee the
> lifetime of the 'struct mm_struct' itself, but it does *not* guarantee
> the lifetime of the page tables. You need to use "mmget()" and
> "mmput()", which get the reference to the actual process address
> space!
> 
> Now, it is *possible* that the kvm use is correct too, because kvm
> does register a mmu_notifier chain, and in theory you can avoid the
> proper refcounting by just making sure the mmu "release" notifier
> kills any existing uses, but I don't really see kvm doing that. Kvm
> does register a release notifier, but that just flushes the shadow
> page tables, it doesn't kill any use_mm() use by some i915 use case.

Yes, KVM is correct but the i915 bits are at least fishy.  It's probably
as simple as adding a mmget/mmput pair respectively in kvmgt_guest_init
and kvmgt_guest_exit, or maybe mmget_not_zero.

Paolo
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 18:20 ` Paolo Bonzini
@ 2018-08-22 18:33   ` Linus Torvalds
  2018-08-22 18:57     ` Linus Torvalds
       [not found]   ` <e50816ec-cf5e-4848-93d0-dacc28f816fc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2018-08-22 18:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Oded Gabbay, David (ChunMing) Zhou, DRI, KVM list,
	Michael S. Tsirkin, Dave Airlie, Greg Kroah-Hartman, Jason Wang,
	Radim Krčmář,
	USB list, intel-gfx, Alex Deucher, amd-gfx, intel-gvt-dev,
	Christian König, Felipe Balbi

On Wed, Aug 22, 2018 at 11:21 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Yes, KVM is correct but the i915 bits are at least fishy.  It's probably
> as simple as adding a mmget/mmput pair respectively in kvmgt_guest_init
> and kvmgt_guest_exit, or maybe mmget_not_zero.

Definitely mmget_not_zero(). If it was just mmgrab()'ed earlier, the
actual page tables might already be gone.

              Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 18:33   ` Linus Torvalds
@ 2018-08-22 18:57     ` Linus Torvalds
  0 siblings, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2018-08-22 18:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: DRI, KVM list, Michael S. Tsirkin, Dave Airlie,
	Greg Kroah-Hartman, Jason Wang, Radim Krčmář,
	USB list, intel-gfx, Alex Deucher, amd-gfx, intel-gvt-dev,
	Christian König, Zhi Wang, Felipe Balbi

On Wed, Aug 22, 2018 at 11:33 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Aug 22, 2018 at 11:21 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > Yes, KVM is correct but the i915 bits are at least fishy.  It's probably
> > as simple as adding a mmget/mmput pair respectively in kvmgt_guest_init
> > and kvmgt_guest_exit, or maybe mmget_not_zero.
>
> Definitely mmget_not_zero(). If it was just mmgrab()'ed earlier, the
> actual page tables might already be gone.

Side note: we _could_ do the mmget_not_zero() inside use_mm() itself,
if we just knew that the mm was at least mmgrab()'ed correctly.

But for some of the uses, even that isn't clear. It's not entirely
obvious that the "struct mm_struct" exists _at_all_ at that point, and
that a mmget_not_zero() wouldn't just have some use-after-free access.

Again, independent lifetime rules could show that this isn't the case
(ie "exit_aio() is always called before exit_mmap(), and kill_ioctx()
takes care of it all"), but it would be good to have the users of
"use_mm()" actually verify their lifetime rules are correct and
enforced.

Because quite often, the lifetime rule might nbot be a mmu notifier or
aio_exit at all, but just be "oh, the user won't exit until this is
all done". But do you *control* the user? What if the user is buggy?

             Linus
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 16:44 Possible use_mm() mis-uses Linus Torvalds
       [not found] ` <CA+55aFz+Unj0zVNd79vpd41mtee3DV6tp_Ozr7WaZscqCQc9hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2018-08-22 18:20 ` Paolo Bonzini
@ 2018-08-22 19:05 ` Zhi Wang
  2018-08-22 19:37 ` Oded Gabbay
  3 siblings, 0 replies; 14+ messages in thread
From: Zhi Wang @ 2018-08-22 19:05 UTC (permalink / raw)
  To: Linus Torvalds, Oded Gabbay, Alex Deucher, Christian König,
	David (ChunMing) Zhou, David Airlie, Michael S. Tsirkin,
	Jason Wang, Paolo Bonzini, Radim Krčmář,
	Zhenyu Wang, Jani Nikula, Joonas Lahtinen, Felipe Balbi,
	Greg Kroah-Hartman
  Cc: KVM list, intel-gfx, USB list, amd-gfx, DRI, intel-gvt-dev

Hi Linus:

Thanks for letting us know that. We would fix this ASAP. The kvmgt.c 
module is a part of GVT-g code. It's our fault that we didn't find this 
mis-uses, not i915 or KVM guys. Wish they would feel better after seeing 
this message.

Thanks,
Zhi.

On 08/23/18 00:44, Linus Torvalds wrote:
> Guys and gals,
>   this is a *very* random list of people on the recipients list, but we
> had a subtle TLB shootdown issue in the VM, and that brought up some
> issues when people then went through the code more carefully.
> 
> I think we have a handle on the TLB shootdown bug itself. But when
> people were discussing all the possible situations, one thing that
> came up was "use_mm()" that takes a mm, and makes it temporarily the
> mm for a kernel thread (until "unuse_mm()", duh).
> 
> And it turns out that some of those uses are definitely wrong, some of
> them are right, and some of them are suspect or at least so overly
> complicated that it's hard for the VM people to know if they are ok.
> 
> Basically, the rule for "use_mm()" is that the mm in question *has* to
> have a valid page table associated with it over the whole use_mm() ->
> unuse_mm() sequence. That may sound obvious, and I guess it actually
> is so obvious that there isn't even a comment about it, but the actual
> users are showing that it's sadly apparently not so obvious after all.
> 
> There is one user that uses the "obviously correct" model: the vhost
> driver does a "mmget()" and "mmput()" pair around its use of it,
> thanks to vhost_dev_set_owner() doing a
> 
>          dev->mm = get_task_mm(current);
> 
> to look up the mm, and then the teardown case does a
> 
>          if (dev->mm)
>                  mmput(dev->mm);
>          dev->mm = NULL;
> 
> This is the *right* sequence. A gold star to the vhost people.
> 
> Sadly, the vhost people are the only ones who seem to get things
> unquestionably right. And some of those gold star people are also
> apparently involved in the cases that didn't get things right.
> 
> An example of something that *isn't* right, is the i915 kvm interface,
> which does
> 
>          use_mm(kvm->mm);
> 
> on an mm that was initialized in virt/kvm/kvm_main.c using
> 
>          mmgrab(current->mm);
>          kvm->mm = current->mm;
> 
> which is *not* right. Using "mmgrab()" does indeed guarantee the
> lifetime of the 'struct mm_struct' itself, but it does *not* guarantee
> the lifetime of the page tables. You need to use "mmget()" and
> "mmput()", which get the reference to the actual process address
> space!
> 
> Now, it is *possible* that the kvm use is correct too, because kvm
> does register a mmu_notifier chain, and in theory you can avoid the
> proper refcounting by just making sure the mmu "release" notifier
> kills any existing uses, but I don't really see kvm doing that. Kvm
> does register a release notifier, but that just flushes the shadow
> page tables, it doesn't kill any use_mm() use by some i915 use case.
> 
> So while the vhost use looks right, the kvm/i915 use looks definitely wrong.
> 
> The other users of "use_mm()" and "unuse_mm()" are less
> black-and-white right vs wrong..
> 
> One of the complex ones is the amdgpu driver. It does a
> "use_mm(mmptr)" deep deep in the guts of a macro that ends up being
> used in fa few places, and it's very hard to tell if it's right.
> 
> It looks almost certainly buggy (there is no mmget/mmput to get the
> refcount), but there _is_ a "release" mmu_notifier function and that
> one - unlike the kvm case - looks like it might actually be trying to
> flush the existing pending users of that mm.
> 
> But on the whole, I'm suspicious of the amdgpu use. It smells. Jann
> Horn pointed out that even if it migth be ok due to the mmu_notifier,
> the comments are garbage:
> 
>>   Where "process" in the uniquely-named "struct queue" is a "struct
>>   kfd_process"; that struct's definition has this comment in it:
>>
>>         /*
>>          * Opaque pointer to mm_struct. We don't hold a reference to
>>          * it so it should never be dereferenced from here. This is
>>          * only used for looking up processes by their mm.
>>          */
>>         void *mm;
>>
>>   So I think either that comment is wrong, or their code is wrong?
> 
> so I'm chalking the amdgpu use up in the "broken" column.
> 
> It's also actually quite hard to synchronze with some other kernel
> worker thread correctly, so just on general principles, if you use
> "use_mm()" it really really should be on something that you've
> properly gotten a mm refcount on with mmget(). Really. Even if you try
> to synchronize things.
> 
> The two final cases are two uses in the USB gadget driver. Again, they
> don't have the proper mmget/mmput, but they may br ok simply because
> the uses are done for AIO, and the VM teardown is preceded by an AIO
> teardown, so the proper serialization may come in from that.
> 
> Anyway, sorry for the long email, and the big list of participants and
> odd mailing lists, but I'd really like people to look at their
> "use_mm()" cases, and ask themselves if they have done enough to
> guarantee that the full mm exists. Again, "mmgrab()" is *not* enough
> on its own. You need either "mmget()" or some lifetime guarantee.
> 
> And if you do have those lifetime guarantees, it would be really nice
> to get a good explanatory comment about said lifetime guarantees above
> the "use_mm()" call. Ok?
> 
> Note that the lifetime rules are very important, because obviously
> use_mm() itself is never called in the context of the process that has
> the mm. By definition, we're in a kernel thread and it uses somebody
> elses mm. So it's important to show that that "somebody else" really
> is serialized with the kernel thread somehow, and keeps the mm alive.
> The easiest way by far to have that guarantee is to have that
> "mmget/mmput" model.
> 
> Please? Even if you're convinced your code is right, just a comment
> about _why_ it's right, ok?
> 
> And no, use_mm() cannot just do the mmget internally, only to have
> unuse_mm() do the mmput().  That was our first reaction when looking
> at this, but if the caller has screwed up the lifetime rules, it's not
> actually clear that the mm is guaranteed to be around even when use_mm
> is entered. So the onus of proper lifetime rules really is on the
> caller, not on use_mm()/unuse_mm().
> 
>               Linus
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 16:44 Possible use_mm() mis-uses Linus Torvalds
                   ` (2 preceding siblings ...)
  2018-08-22 19:05 ` Zhi Wang
@ 2018-08-22 19:37 ` Oded Gabbay
  2018-08-22 19:58   ` Linus Torvalds
  3 siblings, 1 reply; 14+ messages in thread
From: Oded Gabbay @ 2018-08-22 19:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maling list - DRI developers, kvm, mst, David Airlie,
	Greg Kroah-Hartman, jasowang, rkrcmar, linux-usb,
	Intel Graphics Development, Alex Deucher, pbonzini, Kuehling,
	Felix, amd-gfx list, intel-gvt-dev, Christian König,
	zhi.a.wang, balbi

On Wed, Aug 22, 2018 at 7:44 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> One of the complex ones is the amdgpu driver. It does a
> "use_mm(mmptr)" deep deep in the guts of a macro that ends up being
> used in fa few places, and it's very hard to tell if it's right.
>
> It looks almost certainly buggy (there is no mmget/mmput to get the
> refcount), but there _is_ a "release" mmu_notifier function and that
> one - unlike the kvm case - looks like it might actually be trying to
> flush the existing pending users of that mm.
>
> But on the whole, I'm suspicious of the amdgpu use. It smells. Jann
> Horn pointed out that even if it migth be ok due to the mmu_notifier,
> the comments are garbage:
>
> >  Where "process" in the uniquely-named "struct queue" is a "struct
> >  kfd_process"; that struct's definition has this comment in it:
> >
> >        /*
> >         * Opaque pointer to mm_struct. We don't hold a reference to
> >         * it so it should never be dereferenced from here. This is
> >         * only used for looking up processes by their mm.
> >         */
> >        void *mm;
> >
> >  So I think either that comment is wrong, or their code is wrong?
>
> so I'm chalking the amdgpu use up in the "broken" column.
>
Hello Linus,

I looked at the amdkfd code and indeed the comment does not match the
actual code because the mm pointer is clearly dereferenced directly in
the macro you mentioned (read_user_wptr). That macro is used in the
code path of loading a descriptor to the H/W (load_hqd). That function
is called in several cases, where in some of them we are in the
context of the calling process, but in others we are in a kernel
thread context (hence the use_mm). That's why we check these two
situations inside that macro and only do use_mm if we are in kernel
thread.

We need to fix that behavior and obviously make sure that in future
code we don't cast this pointer to mm_struct* and derereference it
directly.
Actually, the original code had "mm_struct *mm" instead of "void *mm"
in the structure, and I think the reason we changed it to void* is to
"make sure" that we won't dereference it directly, but clearly that
failed :(

Having said that, I think we *are* protected by the mmu_notifier
release because if the process suddenly dies, we will gracefully clean
the process's data in our driver and on the H/W before returning to
the mm core code. And before we return to the mm core code, we set the
mm pointer to NULL. And the graceful cleaning should be serialized
with the load_hqd uses.

Felix, do you have anything to add here that I might have missed ?

Thanks,
Oded
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 18:13   ` Christian König
@ 2018-08-22 19:44     ` Felix Kuehling
  2018-08-22 20:07       ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Felix Kuehling @ 2018-08-22 19:44 UTC (permalink / raw)
  To: Christian König, Linus Torvalds, Oded Gabbay, Alex Deucher,
	David (ChunMing) Zhou, David Airlie, Michael S. Tsirkin,
	Jason Wang, Paolo Bonzini, Radim Krčmář,
	Zhenyu Wang, Zhi Wang, Jani Nikula, Joonas Lahtinen,
	Felipe Balbi, Greg Kroah-Hartman
  Cc: KVM list, intel-gfx, USB list, amd-gfx, DRI, intel-gvt-dev

[-- Attachment #1: Type: text/plain, Size: 7011 bytes --]


On 2018-08-22 02:13 PM, Christian König wrote:
> Adding Felix because the KFD part of amdgpu is actually his
> responsibility.
>
> If I'm not completely mistaken the release callback of the
> mmu_notifier should take care of that for amdgpu.

You're right, but that's a bit fragile and convoluted. I'll fix KFD to
handle this more robustly. See the attached (untested) patch. And
obviously that opaque pointer didn't work as intended. It just gets
promoted to an mm_struct * without a warning from the compiler. Maybe I
should change that to a long to make abuse easier to spot.

Regards,
  Felix

>
> Regards,
> Christian.
>
> Am 22.08.2018 um 18:44 schrieb Linus Torvalds:
>> Guys and gals,
>>   this is a *very* random list of people on the recipients list, but we
>> had a subtle TLB shootdown issue in the VM, and that brought up some
>> issues when people then went through the code more carefully.
>>
>> I think we have a handle on the TLB shootdown bug itself. But when
>> people were discussing all the possible situations, one thing that
>> came up was "use_mm()" that takes a mm, and makes it temporarily the
>> mm for a kernel thread (until "unuse_mm()", duh).
>>
>> And it turns out that some of those uses are definitely wrong, some of
>> them are right, and some of them are suspect or at least so overly
>> complicated that it's hard for the VM people to know if they are ok.
>>
>> Basically, the rule for "use_mm()" is that the mm in question *has* to
>> have a valid page table associated with it over the whole use_mm() ->
>> unuse_mm() sequence. That may sound obvious, and I guess it actually
>> is so obvious that there isn't even a comment about it, but the actual
>> users are showing that it's sadly apparently not so obvious after all.
>>
>> There is one user that uses the "obviously correct" model: the vhost
>> driver does a "mmget()" and "mmput()" pair around its use of it,
>> thanks to vhost_dev_set_owner() doing a
>>
>>          dev->mm = get_task_mm(current);
>>
>> to look up the mm, and then the teardown case does a
>>
>>          if (dev->mm)
>>                  mmput(dev->mm);
>>          dev->mm = NULL;
>>
>> This is the *right* sequence. A gold star to the vhost people.
>>
>> Sadly, the vhost people are the only ones who seem to get things
>> unquestionably right. And some of those gold star people are also
>> apparently involved in the cases that didn't get things right.
>>
>> An example of something that *isn't* right, is the i915 kvm interface,
>> which does
>>
>>          use_mm(kvm->mm);
>>
>> on an mm that was initialized in virt/kvm/kvm_main.c using
>>
>>          mmgrab(current->mm);
>>          kvm->mm = current->mm;
>>
>> which is *not* right. Using "mmgrab()" does indeed guarantee the
>> lifetime of the 'struct mm_struct' itself, but it does *not* guarantee
>> the lifetime of the page tables. You need to use "mmget()" and
>> "mmput()", which get the reference to the actual process address
>> space!
>>
>> Now, it is *possible* that the kvm use is correct too, because kvm
>> does register a mmu_notifier chain, and in theory you can avoid the
>> proper refcounting by just making sure the mmu "release" notifier
>> kills any existing uses, but I don't really see kvm doing that. Kvm
>> does register a release notifier, but that just flushes the shadow
>> page tables, it doesn't kill any use_mm() use by some i915 use case.
>>
>> So while the vhost use looks right, the kvm/i915 use looks definitely
>> wrong.
>>
>> The other users of "use_mm()" and "unuse_mm()" are less
>> black-and-white right vs wrong..
>>
>> One of the complex ones is the amdgpu driver. It does a
>> "use_mm(mmptr)" deep deep in the guts of a macro that ends up being
>> used in fa few places, and it's very hard to tell if it's right.
>>
>> It looks almost certainly buggy (there is no mmget/mmput to get the
>> refcount), but there _is_ a "release" mmu_notifier function and that
>> one - unlike the kvm case - looks like it might actually be trying to
>> flush the existing pending users of that mm.
>>
>> But on the whole, I'm suspicious of the amdgpu use. It smells. Jann
>> Horn pointed out that even if it migth be ok due to the mmu_notifier,
>> the comments are garbage:
>>
>>>   Where "process" in the uniquely-named "struct queue" is a "struct
>>>   kfd_process"; that struct's definition has this comment in it:
>>>
>>>         /*
>>>          * Opaque pointer to mm_struct. We don't hold a reference to
>>>          * it so it should never be dereferenced from here. This is
>>>          * only used for looking up processes by their mm.
>>>          */
>>>         void *mm;
>>>
>>>   So I think either that comment is wrong, or their code is wrong?
>> so I'm chalking the amdgpu use up in the "broken" column.
>>
>> It's also actually quite hard to synchronze with some other kernel
>> worker thread correctly, so just on general principles, if you use
>> "use_mm()" it really really should be on something that you've
>> properly gotten a mm refcount on with mmget(). Really. Even if you try
>> to synchronize things.
>>
>> The two final cases are two uses in the USB gadget driver. Again, they
>> don't have the proper mmget/mmput, but they may br ok simply because
>> the uses are done for AIO, and the VM teardown is preceded by an AIO
>> teardown, so the proper serialization may come in from that.
>>
>> Anyway, sorry for the long email, and the big list of participants and
>> odd mailing lists, but I'd really like people to look at their
>> "use_mm()" cases, and ask themselves if they have done enough to
>> guarantee that the full mm exists. Again, "mmgrab()" is *not* enough
>> on its own. You need either "mmget()" or some lifetime guarantee.
>>
>> And if you do have those lifetime guarantees, it would be really nice
>> to get a good explanatory comment about said lifetime guarantees above
>> the "use_mm()" call. Ok?
>>
>> Note that the lifetime rules are very important, because obviously
>> use_mm() itself is never called in the context of the process that has
>> the mm. By definition, we're in a kernel thread and it uses somebody
>> elses mm. So it's important to show that that "somebody else" really
>> is serialized with the kernel thread somehow, and keeps the mm alive.
>> The easiest way by far to have that guarantee is to have that
>> "mmget/mmput" model.
>>
>> Please? Even if you're convinced your code is right, just a comment
>> about _why_ it's right, ok?
>>
>> And no, use_mm() cannot just do the mmget internally, only to have
>> unuse_mm() do the mmput().  That was our first reaction when looking
>> at this, but if the caller has screwed up the lifetime rules, it's not
>> actually clear that the mm is guaranteed to be around even when use_mm
>> is entered. So the onus of proper lifetime rules really is on the
>> caller, not on use_mm()/unuse_mm().
>>
>>               Linus
>


[-- Attachment #2: 0001-drm-amdkfd-Fix-incorrect-use-of-process-mm.patch --]
[-- Type: text/x-patch, Size: 3742 bytes --]

>From 05cdd85427c8ce6a05435df3ac2307cf362c1aec Mon Sep 17 00:00:00 2001
From: Felix Kuehling <Felix.Kuehling@amd.com>
Date: Wed, 22 Aug 2018 15:28:44 -0400
Subject: [PATCH 1/1] drm/amdkfd: Fix incorrect use of process->mm

This mm_struct pointer should never be dereferenced. If running in
a user thread, just use current->mm. If running in a kernel worker
use get_task_mm to get a safe reference to the mm_struct.

Change-Id: Id42976075adce45007c10481c6a8f29df67e54a9
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 37 +++++++++++++++++-----
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 405db4f..692ce2d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -383,8 +383,8 @@ static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
 					struct queue *q,
 					struct qcm_process_device *qpd)
 {
-	int retval;
 	struct mqd_manager *mqd_mgr;
+	int retval;
 
 	mqd_mgr = dqm->ops.get_mqd_manager(dqm, KFD_MQD_TYPE_COMPUTE);
 	if (!mqd_mgr)
@@ -412,8 +412,12 @@ static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
 	if (!q->properties.is_active)
 		return 0;
 
-	retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe, q->queue,
-			&q->properties, q->process->mm);
+	if (WARN(q->process->mm != current->mm,
+		 "should only run in user thread"))
+		retval = -EFAULT;
+	else
+		retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe, q->queue,
+					   &q->properties, current->mm);
 	if (retval)
 		goto out_uninit_mqd;
 
@@ -570,9 +574,15 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
 		retval = map_queues_cpsch(dqm);
 	else if (q->properties.is_active &&
 		 (q->properties.type == KFD_QUEUE_TYPE_COMPUTE ||
-		  q->properties.type == KFD_QUEUE_TYPE_SDMA))
-		retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe, q->queue,
-				       &q->properties, q->process->mm);
+		  q->properties.type == KFD_QUEUE_TYPE_SDMA)) {
+		if (WARN(q->process->mm != current->mm,
+			 "should only run in user thread"))
+			retval = -EFAULT;
+		else
+			retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd,
+						   q->pipe, q->queue,
+						   &q->properties, current->mm);
+	}
 
 out_unlock:
 	dqm_unlock(dqm);
@@ -678,6 +688,7 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
 static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
 					  struct qcm_process_device *qpd)
 {
+	struct mm_struct *mm = NULL;
 	struct queue *q;
 	struct mqd_manager *mqd_mgr;
 	struct kfd_process_device *pdd;
@@ -711,6 +722,15 @@ static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
 		kfd_flush_tlb(pdd);
 	}
 
+	/* Take a safe reference to the mm_struct, which may otherwise
+	 * disappear even while the kfd_process is still referenced.
+	 */
+	mm = get_task_mm(pdd->process->lead_thread);
+	if (!mm) {
+		retval = -EFAULT;
+		goto out;
+	}
+
 	/* activate all active queues on the qpd */
 	list_for_each_entry(q, &qpd->queues_list, list) {
 		if (!q->properties.is_evicted)
@@ -725,14 +745,15 @@ static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
 		q->properties.is_evicted = false;
 		q->properties.is_active = true;
 		retval = mqd_mgr->load_mqd(mqd_mgr, q->mqd, q->pipe,
-				       q->queue, &q->properties,
-				       q->process->mm);
+				       q->queue, &q->properties, mm);
 		if (retval)
 			goto out;
 		dqm->queue_count++;
 	}
 	qpd->evicted = 0;
 out:
+	if (mm)
+		mmput(mm);
 	dqm_unlock(dqm);
 	return retval;
 }
-- 
2.7.4


[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 19:37 ` Oded Gabbay
@ 2018-08-22 19:58   ` Linus Torvalds
  2018-08-22 20:01     ` Oded Gabbay
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2018-08-22 19:58 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David (ChunMing) Zhou, DRI, KVM list, Michael S. Tsirkin,
	Dave Airlie, Greg Kroah-Hartman, Jason Wang,
	Radim Krčmář,
	USB list, intel-gfx, Alex Deucher, Paolo Bonzini, Felix Kuehling,
	amd-gfx, intel-gvt-dev, Christian König, Felipe Balbi

On Wed, Aug 22, 2018 at 12:37 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
>
> Having said that, I think we *are* protected by the mmu_notifier
> release because if the process suddenly dies, we will gracefully clean
> the process's data in our driver and on the H/W before returning to
> the mm core code. And before we return to the mm core code, we set the
> mm pointer to NULL. And the graceful cleaning should be serialized
> with the load_hqd uses.

So I'm a bit nervous about the mmu_notifier model (and the largely
equivalent exit_aio() model for the USB gardget AIO uses).

The reason I'm nervous about it is that the mmu_notifier() gets called
only after the mm_users count has already been decremented to zero
(and the exact same thing goes for exit_aio()).

Now that's fine if you actually get rid of all accesses in
mmu_notifier_release() or in exit_aio(), because the page tables still
exist at that point - they are in the process of being torn down, but
they haven't been torn down yet.

But for something like a kernel thread doing use_mm(), the thing that
worries me is a pattern something like this:

  kwork thread          exit thread
  --------              --------

                        mmput() ->
                          mm_users goes to zero

  use_mm(mmptr);
  ..

                          mmu_notifier_release();
                          exit_mm() ->
                            exit_aio()

and the pattern is basically the same regatdless of whether you use
mmu_notifier_release() or depend on some exit_aio() flushing your aio
work: the use_mm() can be called with a mm that has already had its
mm_users count decremented to zero, and that is now scheduled to be
free'd.

Does it "work"? Yes. Kind of. At least if the mmu notifier and/or
exit_aio() actually makes sure to wait for any kwork thread thing. But
it's a bit of a worrisome pattern.

           Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 19:58   ` Linus Torvalds
@ 2018-08-22 20:01     ` Oded Gabbay
  0 siblings, 0 replies; 14+ messages in thread
From: Oded Gabbay @ 2018-08-22 20:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David (ChunMing) Zhou, Maling list - DRI developers, kvm, mst,
	David Airlie, Greg Kroah-Hartman, jasowang, rkrcmar, linux-usb,
	Intel Graphics Development, Alex Deucher, pbonzini, Kuehling,
	Felix, amd-gfx list, intel-gvt-dev, Christian König, balbi

On Wed, Aug 22, 2018 at 10:58 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Aug 22, 2018 at 12:37 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> >
> > Having said that, I think we *are* protected by the mmu_notifier
> > release because if the process suddenly dies, we will gracefully clean
> > the process's data in our driver and on the H/W before returning to
> > the mm core code. And before we return to the mm core code, we set the
> > mm pointer to NULL. And the graceful cleaning should be serialized
> > with the load_hqd uses.
>
> So I'm a bit nervous about the mmu_notifier model (and the largely
> equivalent exit_aio() model for the USB gardget AIO uses).
>
> The reason I'm nervous about it is that the mmu_notifier() gets called
> only after the mm_users count has already been decremented to zero
> (and the exact same thing goes for exit_aio()).
>
> Now that's fine if you actually get rid of all accesses in
> mmu_notifier_release() or in exit_aio(), because the page tables still
> exist at that point - they are in the process of being torn down, but
> they haven't been torn down yet.
>
> But for something like a kernel thread doing use_mm(), the thing that
> worries me is a pattern something like this:
>
>   kwork thread          exit thread
>   --------              --------
>
>                         mmput() ->
>                           mm_users goes to zero
>
>   use_mm(mmptr);
>   ..
>
>                           mmu_notifier_release();
>                           exit_mm() ->
>                             exit_aio()
>
> and the pattern is basically the same regatdless of whether you use
> mmu_notifier_release() or depend on some exit_aio() flushing your aio
> work: the use_mm() can be called with a mm that has already had its
> mm_users count decremented to zero, and that is now scheduled to be
> free'd.
>
> Does it "work"? Yes. Kind of. At least if the mmu notifier and/or
> exit_aio() actually makes sure to wait for any kwork thread thing. But
> it's a bit of a worrisome pattern.
>
>            Linus

Yes, agreed, and that's why we will be on the safe side and eliminate
this pattern from our code and make sure we won't add this pattern in
the future.

Oded
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-22 19:44     ` Felix Kuehling
@ 2018-08-22 20:07       ` Linus Torvalds
  0 siblings, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2018-08-22 20:07 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Oded Gabbay, David (ChunMing) Zhou, DRI, KVM list,
	Michael S. Tsirkin, Dave Airlie, Greg Kroah-Hartman, Jason Wang,
	Radim Krčmář,
	USB list, intel-gfx, Alex Deucher, Paolo Bonzini, amd-gfx,
	intel-gvt-dev, Christian König, Felipe Balbi

On Wed, Aug 22, 2018 at 12:44 PM Felix Kuehling <felix.kuehling@amd.com> wrote:
>
> You're right, but that's a bit fragile and convoluted. I'll fix KFD to
> handle this more robustly. See the attached (untested) patch.

Yes, this patch that makes the whole "has to use current mm" or uses
"get_task_mm()" looks good from a VM< worry standpoint.

Thanks.

> And
> obviously that opaque pointer didn't work as intended. It just gets
> promoted to an mm_struct * without a warning from the compiler. Maybe I
> should change that to a long to make abuse easier to spot.

Using a "void *" is actually just about the worst possible type for
something that should be a cookie, because it silently translates to
any pointer.

"long" is actually not much better, becuase it will silently convert
to any integer type.

A good fairly type-safe cookie type is actually this:

    typedef volatile const struct no_such_struct *cookie_ptr_t;

and now something of type "cookie_ptr_t" is actually very  hard to
convert to other types by mistake.

Note that the "volatile const" is not just random noise - it's so that
it won't even convert without warnings to things that take a "const
void *" as an argument (like, say, the source of 'memcpy()').

So you almost _have_ to explicitly cast it to use it.

           Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
       [not found]   ` <e50816ec-cf5e-4848-93d0-dacc28f816fc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-08-23  6:07     ` Zhenyu Wang
  2018-08-23  8:38       ` Paolo Bonzini
  2018-08-23 18:03       ` Linus Torvalds
  0 siblings, 2 replies; 14+ messages in thread
From: Zhenyu Wang @ 2018-08-23  6:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Oded Gabbay, David (ChunMing) Zhou, Joonas Lahtinen, DRI,
	KVM list, Michael S. Tsirkin, David Airlie, Greg Kroah-Hartman,
	Jason Wang, Radim Krčmář,
	Zhenyu Wang, USB list, Jani Nikula, intel-gfx, Alex Deucher,
	intel-gvt-dev-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Linus Torvalds,
	Christian König, Zhi Wang, Felipe Balbi


[-- Attachment #1.1: Type: text/plain, Size: 3822 bytes --]

On 2018.08.22 20:20:46 +0200, Paolo Bonzini wrote:
> On 22/08/2018 18:44, Linus Torvalds wrote:
> > An example of something that *isn't* right, is the i915 kvm interface,
> > which does
> > 
> >         use_mm(kvm->mm);
> > 
> > on an mm that was initialized in virt/kvm/kvm_main.c using
> > 
> >         mmgrab(current->mm);
> >         kvm->mm = current->mm;
> > 
> > which is *not* right. Using "mmgrab()" does indeed guarantee the
> > lifetime of the 'struct mm_struct' itself, but it does *not* guarantee
> > the lifetime of the page tables. You need to use "mmget()" and
> > "mmput()", which get the reference to the actual process address
> > space!
> > 
> > Now, it is *possible* that the kvm use is correct too, because kvm
> > does register a mmu_notifier chain, and in theory you can avoid the
> > proper refcounting by just making sure the mmu "release" notifier
> > kills any existing uses, but I don't really see kvm doing that. Kvm
> > does register a release notifier, but that just flushes the shadow
> > page tables, it doesn't kill any use_mm() use by some i915 use case.
> 
> Yes, KVM is correct but the i915 bits are at least fishy.  It's probably
> as simple as adding a mmget/mmput pair respectively in kvmgt_guest_init
> and kvmgt_guest_exit, or maybe mmget_not_zero.
> 

yeah, that's the clear way to fix this imo. We only depend on guest
life cycle to access guest memory properly. Here's proposed fix, will
verify and integrate it later.

Thanks!

From 5e5a8d0409aa150884adf5a4d0b956fd0b9906b3 Mon Sep 17 00:00:00 2001
From: Zhenyu Wang <zhenyuw-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Date: Thu, 23 Aug 2018 14:08:06 +0800
Subject: [PATCH] drm/i915/gvt: Fix life cycle reference on KVM mm

Handle guest mm access life cycle properly with mmget()/mmput()
through guest init()/exit(). As noted by Linus, use_mm() depends
on valid live page table but KVM's mmgrab() doesn't guarantee that.
As vGPU usage depends on guest VM life cycle, need to make sure to
use mmget()/mmput() to guarantee VM address access.

Cc: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Zhi Wang <zhi.a.wang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Zhenyu Wang <zhenyuw-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 71751be329e3..4a0988747d08 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -32,6 +32,7 @@
 #include <linux/device.h>
 #include <linux/mm.h>
 #include <linux/mmu_context.h>
+#include <linux/sched/mm.h>
 #include <linux/types.h>
 #include <linux/list.h>
 #include <linux/rbtree.h>
@@ -1614,9 +1615,16 @@ static int kvmgt_guest_init(struct mdev_device *mdev)
 	if (__kvmgt_vgpu_exist(vgpu, kvm))
 		return -EEXIST;
 
+	if (!mmget_not_zero(kvm->mm)) {
+		gvt_vgpu_err("Can't get KVM mm reference\n");
+		return -EINVAL;
+	}
+
 	info = vzalloc(sizeof(struct kvmgt_guest_info));
-	if (!info)
+	if (!info) {
+		mmput(kvm->mm);
 		return -ENOMEM;
+	}
 
 	vgpu->handle = (unsigned long)info;
 	info->vgpu = vgpu;
@@ -1647,6 +1655,8 @@ static bool kvmgt_guest_exit(struct kvmgt_guest_info *info)
 	debugfs_remove(info->debugfs_cache_entries);
 
 	kvm_page_track_unregister_notifier(info->kvm, &info->track_node);
+	if (info->kvm->mm)
+		mmput(info->kvm->mm);
 	kvm_put_kvm(info->kvm);
 	kvmgt_protect_table_destroy(info);
 	gvt_cache_destroy(info->vgpu);
-- 
2.18.0


-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-23  6:07     ` Zhenyu Wang
@ 2018-08-23  8:38       ` Paolo Bonzini
  2018-08-23 18:03       ` Linus Torvalds
  1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2018-08-23  8:38 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: DRI, KVM list, Michael S. Tsirkin, David Airlie,
	Greg Kroah-Hartman, Jason Wang, Radim Krčmář,
	USB list, amd-gfx, intel-gfx, Alex Deucher, intel-gvt-dev,
	Linus Torvalds, Christian König, Zhi Wang, Felipe Balbi


[-- Attachment #1.1.1: Type: text/plain, Size: 3798 bytes --]

On 23/08/2018 08:07, Zhenyu Wang wrote:
> On 2018.08.22 20:20:46 +0200, Paolo Bonzini wrote:
>> On 22/08/2018 18:44, Linus Torvalds wrote:
>>> An example of something that *isn't* right, is the i915 kvm interface,
>>> which does
>>>
>>>         use_mm(kvm->mm);
>>>
>>> on an mm that was initialized in virt/kvm/kvm_main.c using
>>>
>>>         mmgrab(current->mm);
>>>         kvm->mm = current->mm;
>>>
>>> which is *not* right. Using "mmgrab()" does indeed guarantee the
>>> lifetime of the 'struct mm_struct' itself, but it does *not* guarantee
>>> the lifetime of the page tables. You need to use "mmget()" and
>>> "mmput()", which get the reference to the actual process address
>>> space!
>>>
>>> Now, it is *possible* that the kvm use is correct too, because kvm
>>> does register a mmu_notifier chain, and in theory you can avoid the
>>> proper refcounting by just making sure the mmu "release" notifier
>>> kills any existing uses, but I don't really see kvm doing that. Kvm
>>> does register a release notifier, but that just flushes the shadow
>>> page tables, it doesn't kill any use_mm() use by some i915 use case.
>>
>> Yes, KVM is correct but the i915 bits are at least fishy.  It's probably
>> as simple as adding a mmget/mmput pair respectively in kvmgt_guest_init
>> and kvmgt_guest_exit, or maybe mmget_not_zero.
>>
> 
> yeah, that's the clear way to fix this imo. We only depend on guest
> life cycle to access guest memory properly. Here's proposed fix, will
> verify and integrate it later.
> 
> Thanks!
> 
> From 5e5a8d0409aa150884adf5a4d0b956fd0b9906b3 Mon Sep 17 00:00:00 2001
> From: Zhenyu Wang <zhenyuw@linux.intel.com>
> Date: Thu, 23 Aug 2018 14:08:06 +0800
> Subject: [PATCH] drm/i915/gvt: Fix life cycle reference on KVM mm
> 
> Handle guest mm access life cycle properly with mmget()/mmput()
> through guest init()/exit(). As noted by Linus, use_mm() depends
> on valid live page table but KVM's mmgrab() doesn't guarantee that.
> As vGPU usage depends on guest VM life cycle, need to make sure to
> use mmget()/mmput() to guarantee VM address access.
> 
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 71751be329e3..4a0988747d08 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -32,6 +32,7 @@
>  #include <linux/device.h>
>  #include <linux/mm.h>
>  #include <linux/mmu_context.h>
> +#include <linux/sched/mm.h>
>  #include <linux/types.h>
>  #include <linux/list.h>
>  #include <linux/rbtree.h>
> @@ -1614,9 +1615,16 @@ static int kvmgt_guest_init(struct mdev_device *mdev)
>  	if (__kvmgt_vgpu_exist(vgpu, kvm))
>  		return -EEXIST;
>  
> +	if (!mmget_not_zero(kvm->mm)) {
> +		gvt_vgpu_err("Can't get KVM mm reference\n");
> +		return -EINVAL;
> +	}
> +
>  	info = vzalloc(sizeof(struct kvmgt_guest_info));
> -	if (!info)
> +	if (!info) {
> +		mmput(kvm->mm);
>  		return -ENOMEM;
> +	}
>  
>  	vgpu->handle = (unsigned long)info;
>  	info->vgpu = vgpu;
> @@ -1647,6 +1655,8 @@ static bool kvmgt_guest_exit(struct kvmgt_guest_info *info)
>  	debugfs_remove(info->debugfs_cache_entries);
>  
>  	kvm_page_track_unregister_notifier(info->kvm, &info->track_node);
> +	if (info->kvm->mm)
> +		mmput(info->kvm->mm);
>  	kvm_put_kvm(info->kvm);
>  	kvmgt_protect_table_destroy(info);
>  	gvt_cache_destroy(info->vgpu);
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Possible use_mm() mis-uses
  2018-08-23  6:07     ` Zhenyu Wang
  2018-08-23  8:38       ` Paolo Bonzini
@ 2018-08-23 18:03       ` Linus Torvalds
  1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2018-08-23 18:03 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: Oded Gabbay, David (ChunMing) Zhou, DRI, KVM list,
	Michael S. Tsirkin, Dave Airlie, Greg Kroah-Hartman, Jason Wang,
	Radim Krčmář,
	USB list, amd-gfx, Alex Deucher, Paolo Bonzini, intel-gvt-dev,
	intel-gfx, Christian König, Felipe Balbi

On Wed, Aug 22, 2018 at 11:16 PM Zhenyu Wang <zhenyuw@linux.intel.com> wrote:
>
> yeah, that's the clear way to fix this imo. We only depend on guest
> life cycle to access guest memory properly. Here's proposed fix, will
> verify and integrate it later.

Thanks, this looks sane to me,

                Linus
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-08-23 18:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-22 16:44 Possible use_mm() mis-uses Linus Torvalds
     [not found] ` <CA+55aFz+Unj0zVNd79vpd41mtee3DV6tp_Ozr7WaZscqCQc9hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-08-22 18:13   ` Christian König
2018-08-22 19:44     ` Felix Kuehling
2018-08-22 20:07       ` Linus Torvalds
2018-08-22 18:20 ` Paolo Bonzini
2018-08-22 18:33   ` Linus Torvalds
2018-08-22 18:57     ` Linus Torvalds
     [not found]   ` <e50816ec-cf5e-4848-93d0-dacc28f816fc-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-08-23  6:07     ` Zhenyu Wang
2018-08-23  8:38       ` Paolo Bonzini
2018-08-23 18:03       ` Linus Torvalds
2018-08-22 19:05 ` Zhi Wang
2018-08-22 19:37 ` Oded Gabbay
2018-08-22 19:58   ` Linus Torvalds
2018-08-22 20:01     ` Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).