linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* buggy-looking mm_struct refcounting in HFI1 infiniband driver
@ 2020-08-31 23:45 Jann Horn
  2020-09-01  0:21 ` Jason Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Jann Horn @ 2020-08-31 23:45 UTC (permalink / raw)
  To: Mike Marciniszyn, Dennis Dalessandro, Ira Weiny
  Cc: Linux-MM, Doug Ledford, linux-rdma, Jason Gunthorpe, Dean Luick

Hi!

mm_struct has two refcounts: ->mm_users for references that pin
everything (including page tables, VMAs, ...) and ->mm_count (just
protects the mm_struct itself and the pgd, but not the page tables or
VMAs).

struct hfi1_filedata has a member ->mm that holds a ->mm_count reference:

static int hfi1_file_open(struct inode *inode, struct file *fp)
{
        struct hfi1_filedata *fd;
[...]
        fd->mm = current->mm;
        mmgrab(fd->mm); // increments ->mm_count
[...]
}

It looks like that's from commit
e0cf75deab8155334c8228eb7f097b15127d0a49 ("IB/hfi1: Fix mm_struct use
after free").

However, e.g. the call chain hfi1_file_ioctl() -> user_exp_rcv_setup()
-> hfi1_user_exp_rcv_setup() -> pin_rcv_pages() ->
hfi1_acquire_user_pages() -> pin_user_pages_fast() can end up
traversing VMAs without holding any ->mm_users reference, as far as I
can tell. That will probably result in kernel memory corruption if
that races the wrong way with an exiting task (with the ioctl() call
coming from a task whose ->mm is different from fd->mm).

Disclaimer: I haven't actually tested this - I just stumbled over it
while working on some other stuff, and I don't have any infiniband
hardware to test with. So it might well be that I just missed an
mmget_not_zero() somewhere, or something like that.


AFAICS the three options to fix this, if it indeed is a real bug, would be:

1) use mmget_not_zero() around any operations that need the VMAs
and/or pagetables to be stable (if the mm_struct's virtual memory
should disappear once all tasks that were using it have gone away)
2) take a long-term ->mm_users reference (generally frowned upon, but
may be reasonable if you explicitly want to preserve mm contents even
if all tasks that were using the mm have gone away)
3) bail out early on any operation where `current->mm != fd->mm` and
declare that using the hfi1 fd from another process is not supported
(but I haven't checked whether you might still have some code paths
that will have to remotely operate on the mm_struct)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: buggy-looking mm_struct refcounting in HFI1 infiniband driver
  2020-08-31 23:45 buggy-looking mm_struct refcounting in HFI1 infiniband driver Jann Horn
@ 2020-09-01  0:21 ` Jason Gunthorpe
  2020-09-01 12:58   ` Dennis Dalessandro
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2020-09-01  0:21 UTC (permalink / raw)
  To: Jann Horn
  Cc: Mike Marciniszyn, Dennis Dalessandro, Ira Weiny, Linux-MM,
	Doug Ledford, linux-rdma, Dean Luick

On Tue, Sep 01, 2020 at 01:45:06AM +0200, Jann Horn wrote:

> struct hfi1_filedata has a member ->mm that holds a ->mm_count reference:
> 
> static int hfi1_file_open(struct inode *inode, struct file *fp)
> {
>         struct hfi1_filedata *fd;
> [...]
>         fd->mm = current->mm;
>         mmgrab(fd->mm); // increments ->mm_count
> [...]
> }

Yikes, gross.
 
> However, e.g. the call chain hfi1_file_ioctl() -> user_exp_rcv_setup()
> -> hfi1_user_exp_rcv_setup() -> pin_rcv_pages() ->
> hfi1_acquire_user_pages() -> pin_user_pages_fast() can end up
> traversing VMAs without holding any ->mm_users reference, as far as I
> can tell. That will probably result in kernel memory corruption if
> that races the wrong way with an exiting task (with the ioctl() call
> coming from a task whose ->mm is different from fd->mm).

It looks like this path should be using current and storing the grab'd
mm in the tidbuf for later use by hfi1_release_user_pages()

The only other use of file->mm is to setup a notifier, but this is
also under hfi1_user_exp_rcv_setup() so it should just use tidbuf->mm
== current anyhow.

The pq->mm looks similar, looks like the pq should use current->mm,
and it sets up an old-style notifier, but I didn't check carefully if
all the call paths are linked back to an ioctl..

It doesn't make sense that a RDMA driver would do any page pinning
outside an ioctl, so it should always use current.

> Disclaimer: I haven't actually tested this - I just stumbled over it
> while working on some other stuff, and I don't have any infiniband
> hardware to test with. So it might well be that I just missed an
> mmget_not_zero() somewhere, or something like that.

It looks wrong to me too.

Dennis?

Jason


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: buggy-looking mm_struct refcounting in HFI1 infiniband driver
  2020-09-01  0:21 ` Jason Gunthorpe
@ 2020-09-01 12:58   ` Dennis Dalessandro
  2020-09-01 13:00     ` Jason Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Dennis Dalessandro @ 2020-09-01 12:58 UTC (permalink / raw)
  To: Jason Gunthorpe, Jann Horn
  Cc: Mike Marciniszyn, Ira Weiny, Linux-MM, Doug Ledford, linux-rdma,
	Dean Luick

On 8/31/2020 8:21 PM, Jason Gunthorpe wrote:
> On Tue, Sep 01, 2020 at 01:45:06AM +0200, Jann Horn wrote:
> 
>> struct hfi1_filedata has a member ->mm that holds a ->mm_count reference:
>>
>> static int hfi1_file_open(struct inode *inode, struct file *fp)
>> {
>>          struct hfi1_filedata *fd;
>> [...]
>>          fd->mm = current->mm;
>>          mmgrab(fd->mm); // increments ->mm_count
>> [...]
>> }
> 
> Yikes, gross.
>   
>> However, e.g. the call chain hfi1_file_ioctl() -> user_exp_rcv_setup()
>> -> hfi1_user_exp_rcv_setup() -> pin_rcv_pages() ->
>> hfi1_acquire_user_pages() -> pin_user_pages_fast() can end up
>> traversing VMAs without holding any ->mm_users reference, as far as I
>> can tell. That will probably result in kernel memory corruption if
>> that races the wrong way with an exiting task (with the ioctl() call
>> coming from a task whose ->mm is different from fd->mm).
> 
> It looks like this path should be using current and storing the grab'd
> mm in the tidbuf for later use by hfi1_release_user_pages()
> 
> The only other use of file->mm is to setup a notifier, but this is
> also under hfi1_user_exp_rcv_setup() so it should just use tidbuf->mm
> == current anyhow.
> 
> The pq->mm looks similar, looks like the pq should use current->mm,
> and it sets up an old-style notifier, but I didn't check carefully if
> all the call paths are linked back to an ioctl..
> 
> It doesn't make sense that a RDMA driver would do any page pinning
> outside an ioctl, so it should always use current.

I sort of recall a bug where we were trusting current and it wasn't 
correct. I'll have to see if I can dig up the details and figure out 
what's going on here.

>> Disclaimer: I haven't actually tested this - I just stumbled over it
>> while working on some other stuff, and I don't have any infiniband
>> hardware to test with. So it might well be that I just missed an
>> mmget_not_zero() somewhere, or something like that.
> 
> It looks wrong to me too.
> 
> Dennis?

I'll look at it closer and either send a patch or an explanation. Thanks 
for bringing it to our attention Jann!

-Denny



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: buggy-looking mm_struct refcounting in HFI1 infiniband driver
  2020-09-01 12:58   ` Dennis Dalessandro
@ 2020-09-01 13:00     ` Jason Gunthorpe
  0 siblings, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2020-09-01 13:00 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: Jann Horn, Mike Marciniszyn, Ira Weiny, Linux-MM, Doug Ledford,
	linux-rdma, Dean Luick

On Tue, Sep 01, 2020 at 08:58:18AM -0400, Dennis Dalessandro wrote:
> On 8/31/2020 8:21 PM, Jason Gunthorpe wrote:
> > On Tue, Sep 01, 2020 at 01:45:06AM +0200, Jann Horn wrote:
> > 
> > > struct hfi1_filedata has a member ->mm that holds a ->mm_count reference:
> > > 
> > > static int hfi1_file_open(struct inode *inode, struct file *fp)
> > > {
> > >          struct hfi1_filedata *fd;
> > > [...]
> > >          fd->mm = current->mm;
> > >          mmgrab(fd->mm); // increments ->mm_count
> > > [...]
> > > }
> > 
> > Yikes, gross.
> > > However, e.g. the call chain hfi1_file_ioctl() -> user_exp_rcv_setup()
> > > -> hfi1_user_exp_rcv_setup() -> pin_rcv_pages() ->
> > > hfi1_acquire_user_pages() -> pin_user_pages_fast() can end up
> > > traversing VMAs without holding any ->mm_users reference, as far as I
> > > can tell. That will probably result in kernel memory corruption if
> > > that races the wrong way with an exiting task (with the ioctl() call
> > > coming from a task whose ->mm is different from fd->mm).
> > 
> > It looks like this path should be using current and storing the grab'd
> > mm in the tidbuf for later use by hfi1_release_user_pages()
> > 
> > The only other use of file->mm is to setup a notifier, but this is
> > also under hfi1_user_exp_rcv_setup() so it should just use tidbuf->mm
> > == current anyhow.
> > 
> > The pq->mm looks similar, looks like the pq should use current->mm,
> > and it sets up an old-style notifier, but I didn't check carefully if
> > all the call paths are linked back to an ioctl..
> > 
> > It doesn't make sense that a RDMA driver would do any page pinning
> > outside an ioctl, so it should always use current.
> 
> I sort of recall a bug where we were trusting current and it wasn't correct.
> I'll have to see if I can dig up the details and figure out what's going on
> here.

The trouble with not using current is it opens a security issue where
a process can access memory it shouldn't be allowed to access.

Jason


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-01 13:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-31 23:45 buggy-looking mm_struct refcounting in HFI1 infiniband driver Jann Horn
2020-09-01  0:21 ` Jason Gunthorpe
2020-09-01 12:58   ` Dennis Dalessandro
2020-09-01 13:00     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).