All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: Use after free in free_huge_page()
@ 2010-03-22 20:28 Andrew Hastings
  2010-03-23 14:02 ` Adam Litke
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Hastings @ 2010-03-22 20:28 UTC (permalink / raw)
  To: linux-mm; +Cc: Adam Litke

It looks like there's a use-after-free issue in free_huge_page().

free_huge_page() says:

        mapping = (struct address_space *) page_private(page);
	...
        if (mapping)
                hugetlb_put_quota(mapping, 1);

Running a kernel with CONFIG_DEBUG_SLAB, we get a "Oops: <NULL>" in
hugetlb_put_quota.  The stack backtrace looks like:

  free_huge_page
  put_page
  ... driver functions ...
  __fput
  fput
  filp_close
  put_files_struct
  exit_files
  do_exit
  do_group_exit
  get_signal_to_deliver
  do_notify_resume
  ptregscall_common

'mapping' points to memory containing POISON_FREE:

>> dump 0xffff880407464f20 16
0xffff880407464f20: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b : kkkkkkkkkkkkkkkk
0xffff880407464f30: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b : kkkkkkkkkkkkkkkk

I think what happens is:
1.  Driver does get_user_pages() for pages mapped by hugetlbfs.
2.  Process exits.
3.  hugetlbfs file is closed; the vma->vm_file->f_mapping value stored in
    page_private now points to freed memory
4.  Driver file is closed; driver's release() function calls put_page()
    which calls free_huge_page() which passes bogus mapping value to
    hugetlb_put_quota().

We've seen this with 2.6.27.42 but free_huge_page() is unchanged in 2.6.33.1.

git commit c79fb75e5a514a5a35f22c229042aa29f4237e3a ("hugetlb: fix quota
management for private mappings") is what introduced the reliance on mapping
in free_huge_page().

I'd like to help with a fix, but it's not immediately obvious to me what
the right path is.  Should hugetlb_no_page() always call add_to_page_cache()
even if VM_MAYSHARE is clear?

-Andrew Hastings
 Cray Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: Use after free in free_huge_page()
  2010-03-22 20:28 BUG: Use after free in free_huge_page() Andrew Hastings
@ 2010-03-23 14:02 ` Adam Litke
  2010-03-23 17:56   ` Mel Gorman
  0 siblings, 1 reply; 6+ messages in thread
From: Adam Litke @ 2010-03-23 14:02 UTC (permalink / raw)
  To: Andrew Hastings, Mel Gorman; +Cc: linux-mm

Hi Andrew, thanks for the detailed report.  I am taking a look at this 
but it seems a lot has happened since I last looked at this code.  (If 
anyone else knows what might be going on here, please do chime in).

Andrew Hastings wrote:
> I think what happens is:
> 1.  Driver does get_user_pages() for pages mapped by hugetlbfs.
> 2.  Process exits.
> 3.  hugetlbfs file is closed; the vma->vm_file->f_mapping value stored in
>     page_private now points to freed memory
> 4.  Driver file is closed; driver's release() function calls put_page()
>     which calls free_huge_page() which passes bogus mapping value to
>     hugetlb_put_quota().

:( Definitely seems plausible.

> I'd like to help with a fix, but it's not immediately obvious to me what
> the right path is.  Should hugetlb_no_page() always call add_to_page_cache()
> even if VM_MAYSHARE is clear?

Are you seeing any corruption in the HugePages_Rsvd: counter?  Would it 
be possible for you to run the libhugetlbfs test suite before and after 
trigerring the bug and let me know if any additional tests fail after 
you reproduce this?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: Use after free in free_huge_page()
  2010-03-23 14:02 ` Adam Litke
@ 2010-03-23 17:56   ` Mel Gorman
  2010-03-25  5:18     ` Andrew Hastings
  0 siblings, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2010-03-23 17:56 UTC (permalink / raw)
  To: Adam Litke; +Cc: Andrew Hastings, linux-mm

On Tue, Mar 23, 2010 at 09:02:08AM -0500, Adam Litke wrote:
> Hi Andrew, thanks for the detailed report.  I am taking a look at this  
> but it seems a lot has happened since I last looked at this code.  (If  
> anyone else knows what might be going on here, please do chime in).
>
> Andrew Hastings wrote:
>> I think what happens is:
>> 1.  Driver does get_user_pages() for pages mapped by hugetlbfs.
>> 2.  Process exits.
>> 3.  hugetlbfs file is closed; the vma->vm_file->f_mapping value stored in
>>     page_private now points to freed memory
>> 4.  Driver file is closed; driver's release() function calls put_page()
>>     which calls free_huge_page() which passes bogus mapping value to
>>     hugetlb_put_quota().
>
> :( Definitely seems plausible.
>

I haven't had a chance to look at this closely yet and it'll be a
minimum of a few days before I do. Hopefully Adam will spot something in
the meantime but I do have a question.

What driver is calling get_user_pages() on pages mapped by hugetlbfs?
It's not clear what "driver file" is involved but clearly it's not mapped
or it would have called get_file() as part of the mapping.

Again, without thinking about this too much, it seems more like a
reference-count problem rather than a race if the file is disappaering
before the pages being backed by it are freed.

>> I'd like to help with a fix, but it's not immediately obvious to me what
>> the right path is.  Should hugetlb_no_page() always call add_to_page_cache()
>> even if VM_MAYSHARE is clear?
>
> Are you seeing any corruption in the HugePages_Rsvd: counter?  Would it  
> be possible for you to run the libhugetlbfs test suite before and after  
> trigerring the bug and let me know if any additional tests fail after  
> you reproduce this?
>
> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: Use after free in free_huge_page()
  2010-03-23 17:56   ` Mel Gorman
@ 2010-03-25  5:18     ` Andrew Hastings
  2010-03-30 11:18       ` Mel Gorman
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Hastings @ 2010-03-25  5:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Adam Litke, linux-mm

Mel Gorman wrote:
> On Tue, Mar 23, 2010 at 09:02:08AM -0500, Adam Litke wrote:
>> Hi Andrew, thanks for the detailed report.  I am taking a look at this  
>> but it seems a lot has happened since I last looked at this code.  (If  
>> anyone else knows what might be going on here, please do chime in).
>>
>> Andrew Hastings wrote:
>>> I think what happens is:
>>> 1.  Driver does get_user_pages() for pages mapped by hugetlbfs.
>>> 2.  Process exits.
>>> 3.  hugetlbfs file is closed; the vma->vm_file->f_mapping value stored in
>>>     page_private now points to freed memory
>>> 4.  Driver file is closed; driver's release() function calls put_page()
>>>     which calls free_huge_page() which passes bogus mapping value to
>>>     hugetlb_put_quota().
>> :( Definitely seems plausible.
>>
> 
> I haven't had a chance to look at this closely yet and it'll be a
> minimum of a few days before I do. Hopefully Adam will spot something in
> the meantime but I do have a question.
> 
> What driver is calling get_user_pages() on pages mapped by hugetlbfs?
> It's not clear what "driver file" is involved but clearly it's not mapped
> or it would have called get_file() as part of the mapping.
> 
> Again, without thinking about this too much, it seems more like a
> reference-count problem rather than a race if the file is disappaering
> before the pages being backed by it are freed.

Mel:

Yeah, this is certainly a reference-counting problem, but I think it's probably in hugetlbfs.  

We are developing a device driver under GPL for a future product, our "Gemini" interconnect.  The "device file" I mentioned is simply the entry in sysfs that user space libraries use to communicate with the device driver.  The "Gemini" device supports RDMA, so the driver will "pin" user pages via get_user_pages() on user request, and "unpin" those pages via put_page() on user request or process exit.  The pages being "pinned" may or may not be pages mapped by hugetlbfs.  (Device drivers shouldn't have to know whether the pages they are doing DMA on are pages mapped by hugetlbfs, should they?) 

The "Gemini" device driver may be somewhat unusual in that it tends to "pin" pages for longer periods and is thus more likely to hit this race, but this race should exist for any driver that calls get_user_pages() on hugetlbfs-backed pages for an asynchronous DMA just before process exit, and does not complete that DMA until after the hugetlbfs file is released at exit time,  I'd imagine that this could happen with e.g. NFS and O_DIRECT if O_DIRECT is supported and the NFS server is slow.

It seems to me that hugetlbfs ought to take an extra reference on the vma or vm_file or f_mapping or _something_ if vma->vm_file->f_mapping is needed by free_huge_page().

Or is there something our "Gemini" driver should be doing to ensure DMAs complete before exit time?

Thanks for your insights into this problem!


>>> I'd like to help with a fix, but it's not immediately obvious to me what
>>> the right path is.  Should hugetlb_no_page() always call add_to_page_cache()
>>> even if VM_MAYSHARE is clear?
>> Are you seeing any corruption in the HugePages_Rsvd: counter?  Would it  
>> be possible for you to run the libhugetlbfs test suite before and after  
>> trigerring the bug and let me know if any additional tests fail after  
>> you reproduce this?

Adam:

Thanks!  I'll work on collecting the information you requested.


Best regards,
-Andrew Hastings
 Cray Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: Use after free in free_huge_page()
  2010-03-25  5:18     ` Andrew Hastings
@ 2010-03-30 11:18       ` Mel Gorman
  2010-04-07 18:20         ` Andrew Hastings
  0 siblings, 1 reply; 6+ messages in thread
From: Mel Gorman @ 2010-03-30 11:18 UTC (permalink / raw)
  To: Andrew Hastings; +Cc: Adam Litke, linux-mm

On Thu, Mar 25, 2010 at 12:18:05AM -0500, Andrew Hastings wrote:
> Mel Gorman wrote:
>> On Tue, Mar 23, 2010 at 09:02:08AM -0500, Adam Litke wrote:
>>> Hi Andrew, thanks for the detailed report.  I am taking a look at 
>>> this  but it seems a lot has happened since I last looked at this 
>>> code.  (If  anyone else knows what might be going on here, please do 
>>> chime in).
>>>
>>> Andrew Hastings wrote:
>>>> I think what happens is:
>>>> 1.  Driver does get_user_pages() for pages mapped by hugetlbfs.
>>>> 2.  Process exits.
>>>> 3.  hugetlbfs file is closed; the vma->vm_file->f_mapping value stored in
>>>>     page_private now points to freed memory
>>>> 4.  Driver file is closed; driver's release() function calls put_page()
>>>>     which calls free_huge_page() which passes bogus mapping value to
>>>>     hugetlb_put_quota().
>>> :( Definitely seems plausible.
>>>
>>
>> I haven't had a chance to look at this closely yet and it'll be a
>> minimum of a few days before I do. Hopefully Adam will spot something in
>> the meantime but I do have a question.
>>
>> What driver is calling get_user_pages() on pages mapped by hugetlbfs?
>> It's not clear what "driver file" is involved but clearly it's not mapped
>> or it would have called get_file() as part of the mapping.
>>
>> Again, without thinking about this too much, it seems more like a
>> reference-count problem rather than a race if the file is disappaering
>> before the pages being backed by it are freed.
>
> Mel:
>
> Yeah, this is certainly a reference-counting problem, but I think it's 
> probably in hugetlbfs.  
>
> We are developing a device driver under GPL for a future product, our 
> "Gemini" interconnect.  The "device file" I mentioned is simply the entry 
> in sysfs that user space libraries use to communicate with the device 
> driver.  The "Gemini" device supports RDMA, so the driver will "pin" user 
> pages via get_user_pages() on user request, and "unpin" those pages via 
> put_page() on user request or process exit.  The pages being "pinned" may 
> or may not be pages mapped by hugetlbfs.  (Device drivers shouldn't have 
> to know whether the pages they are doing DMA on are pages mapped by 
> hugetlbfs, should they?) 
>

No, they shouldn't but I'm having a wee bit of trouble seeing why DMA to a page
that is no longer reachable by any process is happening. I'm somewhat taking
your word for it that there is a proper use case. Even if RDMA is involved,
it does not explain what happens the sending process when it's end-point
has disappeared. My feeling is that more likely this is an "anomolous"
situation but the kernel shouldn't shoot itself when it occurs.

> The "Gemini" device driver may be somewhat unusual in that it tends to
> "pin" pages for longer periods and is thus more likely to hit this race,
> but this race should exist for any driver that calls get_user_pages() on
> hugetlbfs-backed pages for an asynchronous DMA just before process exit,
> and does not complete that DMA until after the hugetlbfs file is released
> at exit time,  I'd imagine that this could happen with e.g. NFS and O_DIRECT
> if O_DIRECT is supported and the NFS server is slow.
>
> It seems to me that hugetlbfs ought to take an extra reference on the vma
> or vm_file or f_mapping or _something_ if vma->vm_file->f_mapping is needed
> by free_huge_page().

Again, I haven't looked closely at this but a reference count on the VMA
wouldn't help. After all, the VMAs have already been cleared up and the
page tables. As far as the code is concerned, that file is no longer in
use. I'd also not try reference counting during get_user_pages and
someohw releasing that count later. Too much mess.

The most likely avenue is to store a reference to the superblock instead
of the mapping in page->private which is what put_quota is really
interested in. There might still be a race there if hugetlbfs managed to
get unmounted before the pages were freed though - not 100% sure.

> Or is there something our "Gemini" driver should be doing to ensure DMAs complete before exit time?
>

I'm not familiar enough with how RDMA should be implemented to answer
that question offhand.

> Thanks for your insights into this problem!
>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: Use after free in free_huge_page()
  2010-03-30 11:18       ` Mel Gorman
@ 2010-04-07 18:20         ` Andrew Hastings
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Hastings @ 2010-04-07 18:20 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Adam Litke, linux-mm

Mel Gorman wrote:
> On Thu, Mar 25, 2010 at 12:18:05AM -0500, Andrew Hastings wrote:
>> It seems to me that hugetlbfs ought to take an extra reference on the vma
>> or vm_file or f_mapping or _something_ if vma->vm_file->f_mapping is needed
>> by free_huge_page().
> 
> Again, I haven't looked closely at this but a reference count on the VMA
> wouldn't help. After all, the VMAs have already been cleared up and the
> page tables. As far as the code is concerned, that file is no longer in
> use. I'd also not try reference counting during get_user_pages and
> someohw releasing that count later. Too much mess.
> 
> The most likely avenue is to store a reference to the superblock instead
> of the mapping in page->private which is what put_quota is really
> interested in. There might still be a race there if hugetlbfs managed to
> get unmounted before the pages were freed though - not 100% sure.

The hugetlbfs_sb_info struct that holds the quota is allocated separately from
the superblock.  Would it make sense for page->private to point directly to
hugetlbfs_sb_info, and reference count hugetlbfs_sb_info instead?  Seems like
this would avoid the unmount race.

-Andrew Hastings
 Cray Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-07 18:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-22 20:28 BUG: Use after free in free_huge_page() Andrew Hastings
2010-03-23 14:02 ` Adam Litke
2010-03-23 17:56   ` Mel Gorman
2010-03-25  5:18     ` Andrew Hastings
2010-03-30 11:18       ` Mel Gorman
2010-04-07 18:20         ` Andrew Hastings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.