* + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
@ 2017-08-23 21:41 akpm
2017-08-24 6:09 ` Michal Hocko
0 siblings, 1 reply; 25+ messages in thread
From: akpm @ 2017-08-23 21:41 UTC (permalink / raw)
To: ebiggers, aarcange, dvyukov, hughd, mhocko, minchan, rientjes,
stable, mm-commits
The patch titled
Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
has been added to the -mm tree. Its filename is
mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Eric Biggers <ebiggers@google.com>
Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
If madvise(..., MADV_FREE) split a transparent hugepage, it called
put_page() before unlock_page(). This was wrong because put_page() can
free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
removed it from the memory mapping. put_page() then rightfully complained
about freeing a locked page.
Fix this by moving the unlock_page() before put_page().
This bug was found by syzkaller, which encountered the following splat:
BUG: Bad page state in process syzkaller412798 pfn:1bd800
page:ffffea0006f60000 count:0 mapcount:0 mapping: (null) index:0x20a00
flags: 0x200000000040019(locked|uptodate|dirty|swapbacked)
raw: 0200000000040019 0000000000000000 0000000000020a00 00000000ffffffff
raw: ffffea0006f60020 ffffea0006f60020 0000000000000000 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1(locked)
Modules linked in:
CPU: 1 PID: 3037 Comm: syzkaller412798 Not tainted 4.13.0-rc5+ #35
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
bad_page+0x230/0x2b0 mm/page_alloc.c:565
free_pages_check_bad+0x1f0/0x2e0 mm/page_alloc.c:943
free_pages_check mm/page_alloc.c:952 [inline]
free_pages_prepare mm/page_alloc.c:1043 [inline]
free_pcp_prepare mm/page_alloc.c:1068 [inline]
free_hot_cold_page+0x8cf/0x12b0 mm/page_alloc.c:2584
__put_single_page mm/swap.c:79 [inline]
__put_page+0xfb/0x160 mm/swap.c:113
put_page include/linux/mm.h:814 [inline]
madvise_free_pte_range+0x137a/0x1ec0 mm/madvise.c:371
walk_pmd_range mm/pagewalk.c:50 [inline]
walk_pud_range mm/pagewalk.c:108 [inline]
walk_p4d_range mm/pagewalk.c:134 [inline]
walk_pgd_range mm/pagewalk.c:160 [inline]
__walk_page_range+0xc3a/0x1450 mm/pagewalk.c:249
walk_page_range+0x200/0x470 mm/pagewalk.c:326
madvise_free_page_range.isra.9+0x17d/0x230 mm/madvise.c:444
madvise_free_single_vma+0x353/0x580 mm/madvise.c:471
madvise_dontneed_free mm/madvise.c:555 [inline]
madvise_vma mm/madvise.c:664 [inline]
SYSC_madvise mm/madvise.c:832 [inline]
SyS_madvise+0x7d3/0x13c0 mm/madvise.c:760
entry_SYSCALL_64_fastpath+0x1f/0xbe
Here is a C reproducer:
#define _GNU_SOURCE
#include <pthread.h>
#include <sys/mman.h>
#include <unistd.h>
#define MADV_FREE 8
#define PAGE_SIZE 4096
static void *mapping;
static const size_t mapping_size = 0x1000000;
static void *madvise_thrproc(void *arg)
{
madvise(mapping, mapping_size, (long)arg);
}
int main(void)
{
pthread_t t[2];
for (;;) {
mapping = mmap(NULL, mapping_size, PROT_WRITE,
MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
munmap(mapping + mapping_size / 2, PAGE_SIZE);
pthread_create(&t[0], 0, madvise_thrproc, (void*)MADV_DONTNEED);
pthread_create(&t[1], 0, madvise_thrproc, (void*)MADV_FREE);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
munmap(mapping, mapping_size);
}
}
Note: to see the splat, CONFIG_TRANSPARENT_HUGEPAGE=y and
CONFIG_DEBUG_VM=y are needed.
Google Bug Id: 64696096
Link: http://lkml.kernel.org/r/20170823205235.132061-1-ebiggers3@gmail.com
Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [v4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/madvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/madvise.c~mm-madvise-fix-freeing-of-locked-page-with-madv_free mm/madvise.c
--- a/mm/madvise.c~mm-madvise-fix-freeing-of-locked-page-with-madv_free
+++ a/mm/madvise.c
@@ -368,8 +368,8 @@ static int madvise_free_pte_range(pmd_t
pte_offset_map_lock(mm, pmd, addr, &ptl);
goto out;
}
- put_page(page);
unlock_page(page);
+ put_page(page);
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
pte--;
addr -= PAGE_SIZE;
_
Patches currently in -mm which might be from ebiggers@google.com are
mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
2017-08-23 21:41 + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree akpm
@ 2017-08-24 6:09 ` Michal Hocko
2017-08-24 6:53 ` Michal Hocko
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Michal Hocko @ 2017-08-24 6:09 UTC (permalink / raw)
To: akpm
Cc: ebiggers, aarcange, dvyukov, hughd, minchan, rientjes, stable,
mm-commits, linux-mm
Hmm, I do not see this neither in linux-mm nor LKML. Strange
On Wed 23-08-17 14:41:21, Andrew Morton wrote:
> From: Eric Biggers <ebiggers@google.com>
> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>
> If madvise(..., MADV_FREE) split a transparent hugepage, it called
> put_page() before unlock_page(). This was wrong because put_page() can
> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
> removed it from the memory mapping. put_page() then rightfully complained
> about freeing a locked page.
>
> Fix this by moving the unlock_page() before put_page().
>
> This bug was found by syzkaller, which encountered the following splat:
>
> BUG: Bad page state in process syzkaller412798 pfn:1bd800
> page:ffffea0006f60000 count:0 mapcount:0 mapping: (null) index:0x20a00
> flags: 0x200000000040019(locked|uptodate|dirty|swapbacked)
> raw: 0200000000040019 0000000000000000 0000000000020a00 00000000ffffffff
> raw: ffffea0006f60020 ffffea0006f60020 0000000000000000 0000000000000000
> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> bad because of flags: 0x1(locked)
> Modules linked in:
> CPU: 1 PID: 3037 Comm: syzkaller412798 Not tainted 4.13.0-rc5+ #35
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x194/0x257 lib/dump_stack.c:52
> bad_page+0x230/0x2b0 mm/page_alloc.c:565
> free_pages_check_bad+0x1f0/0x2e0 mm/page_alloc.c:943
> free_pages_check mm/page_alloc.c:952 [inline]
> free_pages_prepare mm/page_alloc.c:1043 [inline]
> free_pcp_prepare mm/page_alloc.c:1068 [inline]
> free_hot_cold_page+0x8cf/0x12b0 mm/page_alloc.c:2584
> __put_single_page mm/swap.c:79 [inline]
> __put_page+0xfb/0x160 mm/swap.c:113
> put_page include/linux/mm.h:814 [inline]
> madvise_free_pte_range+0x137a/0x1ec0 mm/madvise.c:371
> walk_pmd_range mm/pagewalk.c:50 [inline]
> walk_pud_range mm/pagewalk.c:108 [inline]
> walk_p4d_range mm/pagewalk.c:134 [inline]
> walk_pgd_range mm/pagewalk.c:160 [inline]
> __walk_page_range+0xc3a/0x1450 mm/pagewalk.c:249
> walk_page_range+0x200/0x470 mm/pagewalk.c:326
> madvise_free_page_range.isra.9+0x17d/0x230 mm/madvise.c:444
> madvise_free_single_vma+0x353/0x580 mm/madvise.c:471
> madvise_dontneed_free mm/madvise.c:555 [inline]
> madvise_vma mm/madvise.c:664 [inline]
> SYSC_madvise mm/madvise.c:832 [inline]
> SyS_madvise+0x7d3/0x13c0 mm/madvise.c:760
> entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> Here is a C reproducer:
>
> #define _GNU_SOURCE
> #include <pthread.h>
> #include <sys/mman.h>
> #include <unistd.h>
>
> #define MADV_FREE 8
> #define PAGE_SIZE 4096
>
> static void *mapping;
> static const size_t mapping_size = 0x1000000;
>
> static void *madvise_thrproc(void *arg)
> {
> madvise(mapping, mapping_size, (long)arg);
> }
>
> int main(void)
> {
> pthread_t t[2];
>
> for (;;) {
> mapping = mmap(NULL, mapping_size, PROT_WRITE,
> MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>
> munmap(mapping + mapping_size / 2, PAGE_SIZE);
>
> pthread_create(&t[0], 0, madvise_thrproc, (void*)MADV_DONTNEED);
> pthread_create(&t[1], 0, madvise_thrproc, (void*)MADV_FREE);
> pthread_join(t[0], NULL);
> pthread_join(t[1], NULL);
> munmap(mapping, mapping_size);
> }
> }
>
> Note: to see the splat, CONFIG_TRANSPARENT_HUGEPAGE=y and
> CONFIG_DEBUG_VM=y are needed.
>
> Google Bug Id: 64696096
Is this necessary in the changelog?
> Link: http://lkml.kernel.org/r/20170823205235.132061-1-ebiggers3@gmail.com
> Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)")
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: <stable@vger.kernel.org> [v4.5+]
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>
> mm/madvise.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff -puN mm/madvise.c~mm-madvise-fix-freeing-of-locked-page-with-madv_free mm/madvise.c
> --- a/mm/madvise.c~mm-madvise-fix-freeing-of-locked-page-with-madv_free
> +++ a/mm/madvise.c
> @@ -368,8 +368,8 @@ static int madvise_free_pte_range(pmd_t
> pte_offset_map_lock(mm, pmd, addr, &ptl);
> goto out;
> }
> - put_page(page);
> unlock_page(page);
> + put_page(page);
> pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
> pte--;
> addr -= PAGE_SIZE;
> _
>
> Patches currently in -mm which might be from ebiggers@google.com are
>
> mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch
>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
2017-08-24 6:09 ` Michal Hocko
@ 2017-08-24 6:53 ` Michal Hocko
2017-08-25 21:36 ` Andrew Morton
2017-08-25 22:02 ` Nadav Amit
2 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2017-08-24 6:53 UTC (permalink / raw)
To: akpm
Cc: ebiggers, aarcange, dvyukov, hughd, minchan, rientjes, stable,
mm-commits, linux-mm
On Thu 24-08-17 08:09:57, Michal Hocko wrote:
> Hmm, I do not see this neither in linux-mm nor LKML. Strange
Not strange, just my filters fooled me. Sorry about the confusion.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
2017-08-24 6:09 ` Michal Hocko
@ 2017-08-25 21:36 ` Andrew Morton
2017-08-25 21:36 ` Andrew Morton
2017-08-25 22:02 ` Nadav Amit
2 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-25 21:36 UTC (permalink / raw)
To: Michal Hocko
Cc: ebiggers, aarcange, dvyukov, hughd, minchan, rientjes, stable,
mm-commits, linux-mm
On Thu, 24 Aug 2017 08:09:57 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > Google Bug Id: 64696096
>
> Is this necessary in the changelog?
I tend to keep this sort of a thing as a courtesy to the sender.
But I did change it from "Google-Bug-Id:".
Documentation/process/submitting-patches.rst lists "Reported-by:,
Tested-by:, Reviewed-by:, Suggested-by: and Fixes:" as the recognized
patch tags and I don't think it's a good idea to introduce new ones -
that just creates more work for the people who maintain parsers for
this stuff.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
@ 2017-08-25 21:36 ` Andrew Morton
0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-25 21:36 UTC (permalink / raw)
To: Michal Hocko
Cc: ebiggers, aarcange, dvyukov, hughd, minchan, rientjes, stable,
mm-commits, linux-mm
On Thu, 24 Aug 2017 08:09:57 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > Google Bug Id: 64696096
>
> Is this necessary in the changelog?
I tend to keep this sort of a thing as a courtesy to the sender.
But I did change it from "Google-Bug-Id:".
Documentation/process/submitting-patches.rst lists "Reported-by:,
Tested-by:, Reviewed-by:, Suggested-by: and Fixes:" as the recognized
patch tags and I don't think it's a good idea to introduce new ones -
that just creates more work for the people who maintain parsers for
this stuff.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
2017-08-24 6:09 ` Michal Hocko
2017-08-24 6:53 ` Michal Hocko
2017-08-25 21:36 ` Andrew Morton
@ 2017-08-25 22:02 ` Nadav Amit
2017-08-25 22:31 ` Mike Kravetz
2 siblings, 1 reply; 25+ messages in thread
From: Nadav Amit @ 2017-08-25 22:02 UTC (permalink / raw)
To: mike.kravetz, ebiggers
Cc: Andrew Morton, Andrea Arcangeli, Dmitry Vyukov, Hugh Dickins,
Minchan Kim, rientjes, stable, mm-commits,
open list:MEMORY MANAGEMENT, Linux Kernel Mailing List,
Michal Hocko, nyc
Michal Hocko <mhocko@kernel.org> wrote:
> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>
> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>> From: Eric Biggers <ebiggers@google.com>
>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>
>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>> put_page() before unlock_page(). This was wrong because put_page() can
>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
>> removed it from the memory mapping. put_page() then rightfully complained
>> about freeing a locked page.
>>
>> Fix this by moving the unlock_page() before put_page().
Quick grep shows that a similar flow (put_page() followed by an
unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
well?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
2017-08-25 22:02 ` Nadav Amit
2017-08-25 22:31 ` Mike Kravetz
@ 2017-08-25 22:31 ` Mike Kravetz
0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-25 22:31 UTC (permalink / raw)
To: Nadav Amit, ebiggers
Cc: Andrew Morton, Andrea Arcangeli, Dmitry Vyukov, Hugh Dickins,
Minchan Kim, rientjes, stable, mm-commits,
open list:MEMORY MANAGEMENT, Linux Kernel Mailing List,
Michal Hocko, nyc
On 08/25/2017 03:02 PM, Nadav Amit wrote:
> Michal Hocko <mhocko@kernel.org> wrote:
>
>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>
>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>> From: Eric Biggers <ebiggers@google.com>
>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>
>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>> put_page() before unlock_page(). This was wrong because put_page() can
>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
>>> removed it from the memory mapping. put_page() then rightfully complained
>>> about freeing a locked page.
>>>
>>> Fix this by moving the unlock_page() before put_page().
>
> Quick grep shows that a similar flow (put_page() followed by an
> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
> well?
I assume you are asking about this block of code?
/*
* page_put due to reference from alloc_huge_page()
* unlock_page because locked by add_to_page_cache()
*/
put_page(page);
unlock_page(page);
Well, there is a typo (page_put) in the comment. :(
However, in this case we have just added the huge page to a hugetlbfs
file. The put_page() is there just to drop the reference count on the
page (taken when allocated). It will still be non-zero as we have
successfully added it to the page cache. So, we are not freeing the
page here, just dropping the reference count.
This should not cause a problem like that seen in madvise.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
@ 2017-08-25 22:31 ` Mike Kravetz
0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-25 22:31 UTC (permalink / raw)
To: Nadav Amit, ebiggers
Cc: Andrew Morton, Andrea Arcangeli, Dmitry Vyukov, Hugh Dickins,
Minchan Kim, rientjes, stable, mm-commits,
open list:MEMORY MANAGEMENT, Linux Kernel Mailing List,
Michal Hocko, nyc
On 08/25/2017 03:02 PM, Nadav Amit wrote:
> Michal Hocko <mhocko@kernel.org> wrote:
>
>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>
>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>> From: Eric Biggers <ebiggers@google.com>
>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>
>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>> put_page() before unlock_page(). This was wrong because put_page() can
>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
>>> removed it from the memory mapping. put_page() then rightfully complained
>>> about freeing a locked page.
>>>
>>> Fix this by moving the unlock_page() before put_page().
>
> Quick grep shows that a similar flow (put_page() followed by an
> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
> well?
I assume you are asking about this block of code?
/*
* page_put due to reference from alloc_huge_page()
* unlock_page because locked by add_to_page_cache()
*/
put_page(page);
unlock_page(page);
Well, there is a typo (page_put) in the comment. :(
However, in this case we have just added the huge page to a hugetlbfs
file. The put_page() is there just to drop the reference count on the
page (taken when allocated). It will still be non-zero as we have
successfully added it to the page cache. So, we are not freeing the
page here, just dropping the reference count.
This should not cause a problem like that seen in madvise.
--
Mike Kravetz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
@ 2017-08-25 22:31 ` Mike Kravetz
0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-25 22:31 UTC (permalink / raw)
To: Nadav Amit, ebiggers
Cc: Andrew Morton, Andrea Arcangeli, Dmitry Vyukov, Hugh Dickins,
Minchan Kim, rientjes, stable, mm-commits,
open list:MEMORY MANAGEMENT, Linux Kernel Mailing List,
Michal Hocko, nyc
On 08/25/2017 03:02 PM, Nadav Amit wrote:
> Michal Hocko <mhocko@kernel.org> wrote:
>
>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>
>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>> From: Eric Biggers <ebiggers@google.com>
>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>
>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>> put_page() before unlock_page(). This was wrong because put_page() can
>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
>>> removed it from the memory mapping. put_page() then rightfully complained
>>> about freeing a locked page.
>>>
>>> Fix this by moving the unlock_page() before put_page().
>
> Quick grep shows that a similar flow (put_page() followed by an
> unlock_page() ) also happens in hugetlbfs_fallocate(). Isna??t it a problem as
> well?
I assume you are asking about this block of code?
/*
* page_put due to reference from alloc_huge_page()
* unlock_page because locked by add_to_page_cache()
*/
put_page(page);
unlock_page(page);
Well, there is a typo (page_put) in the comment. :(
However, in this case we have just added the huge page to a hugetlbfs
file. The put_page() is there just to drop the reference count on the
page (taken when allocated). It will still be non-zero as we have
successfully added it to the page cache. So, we are not freeing the
page here, just dropping the reference count.
This should not cause a problem like that seen in madvise.
--
Mike Kravetz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
2017-08-25 22:31 ` Mike Kravetz
(?)
(?)
@ 2017-08-25 22:51 ` Nadav Amit
2017-08-25 23:41 ` Mike Kravetz
-1 siblings, 1 reply; 25+ messages in thread
From: Nadav Amit @ 2017-08-25 22:51 UTC (permalink / raw)
To: Mike Kravetz
Cc: ebiggers, Andrew Morton, Andrea Arcangeli, Dmitry Vyukov,
Hugh Dickins, Minchan Kim, rientjes, stable, mm-commits,
open list:MEMORY MANAGEMENT, Linux Kernel Mailing List,
Michal Hocko, nyc
Mike Kravetz <mike.kravetz@oracle.com> wrote:
> On 08/25/2017 03:02 PM, Nadav Amit wrote:
>> Michal Hocko <mhocko@kernel.org> wrote:
>>
>>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>>
>>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>>> From: Eric Biggers <ebiggers@google.com>
>>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>>
>>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>>> put_page() before unlock_page(). This was wrong because put_page() can
>>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
>>>> removed it from the memory mapping. put_page() then rightfully complained
>>>> about freeing a locked page.
>>>>
>>>> Fix this by moving the unlock_page() before put_page().
>>
>> Quick grep shows that a similar flow (put_page() followed by an
>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
>> well?
>
> I assume you are asking about this block of code?
Yes.
>
> /*
> * page_put due to reference from alloc_huge_page()
> * unlock_page because locked by add_to_page_cache()
> */
> put_page(page);
> unlock_page(page);
>
> Well, there is a typo (page_put) in the comment. :(
>
> However, in this case we have just added the huge page to a hugetlbfs
> file. The put_page() is there just to drop the reference count on the
> page (taken when allocated). It will still be non-zero as we have
> successfully added it to the page cache. So, we are not freeing the
> page here, just dropping the reference count.
>
> This should not cause a problem like that seen in madvise.
Thanks for the quick response.
I am not too familiar with this piece of code, so just for the matter of
understanding: what prevents the page from being removed from the page cache
shortly after it is added (even if it is highly unlikely)? The page lock? The
inode lock?
Thanks again,
Nadav
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
2017-08-25 22:51 ` Nadav Amit
2017-08-25 23:41 ` Mike Kravetz
@ 2017-08-25 23:41 ` Mike Kravetz
0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-25 23:41 UTC (permalink / raw)
To: Nadav Amit
Cc: ebiggers, Andrew Morton, Andrea Arcangeli, Dmitry Vyukov,
Hugh Dickins, Minchan Kim, rientjes, stable, mm-commits,
open list:MEMORY MANAGEMENT, Linux Kernel Mailing List,
Michal Hocko, nyc
On 08/25/2017 03:51 PM, Nadav Amit wrote:
> Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> On 08/25/2017 03:02 PM, Nadav Amit wrote:
>>> Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>>>
>>>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>>>> From: Eric Biggers <ebiggers@google.com>
>>>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>>>
>>>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>>>> put_page() before unlock_page(). This was wrong because put_page() can
>>>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
>>>>> removed it from the memory mapping. put_page() then rightfully complained
>>>>> about freeing a locked page.
>>>>>
>>>>> Fix this by moving the unlock_page() before put_page().
>>>
>>> Quick grep shows that a similar flow (put_page() followed by an
>>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
>>> well?
>>
>> I assume you are asking about this block of code?
>
> Yes.
>
>>
>> /*
>> * page_put due to reference from alloc_huge_page()
>> * unlock_page because locked by add_to_page_cache()
>> */
>> put_page(page);
>> unlock_page(page);
>>
>> Well, there is a typo (page_put) in the comment. :(
>>
>> However, in this case we have just added the huge page to a hugetlbfs
>> file. The put_page() is there just to drop the reference count on the
>> page (taken when allocated). It will still be non-zero as we have
>> successfully added it to the page cache. So, we are not freeing the
>> page here, just dropping the reference count.
>>
>> This should not cause a problem like that seen in madvise.
>
> Thanks for the quick response.
>
> I am not too familiar with this piece of code, so just for the matter of
> understanding: what prevents the page from being removed from the page cache
> shortly after it is added (even if it is highly unlikely)? The page lock? The
> inode lock?
Someone would need to acquire the inode lock to remove the page. This
is held until we exit the routine. Also note that put_page for this
type of huge page almost always results in the page being put back
on a free list within the hugetlb(fs) subsystem. It is not returned
to the 'normal' memory allocators for general use.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
@ 2017-08-25 23:41 ` Mike Kravetz
0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-25 23:41 UTC (permalink / raw)
To: Nadav Amit
Cc: ebiggers, Andrew Morton, Andrea Arcangeli, Dmitry Vyukov,
Hugh Dickins, Minchan Kim, rientjes, stable, mm-commits,
open list:MEMORY MANAGEMENT, Linux Kernel Mailing List,
Michal Hocko, nyc
On 08/25/2017 03:51 PM, Nadav Amit wrote:
> Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> On 08/25/2017 03:02 PM, Nadav Amit wrote:
>>> Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>>>
>>>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>>>> From: Eric Biggers <ebiggers@google.com>
>>>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>>>
>>>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>>>> put_page() before unlock_page(). This was wrong because put_page() can
>>>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
>>>>> removed it from the memory mapping. put_page() then rightfully complained
>>>>> about freeing a locked page.
>>>>>
>>>>> Fix this by moving the unlock_page() before put_page().
>>>
>>> Quick grep shows that a similar flow (put_page() followed by an
>>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
>>> well?
>>
>> I assume you are asking about this block of code?
>
> Yes.
>
>>
>> /*
>> * page_put due to reference from alloc_huge_page()
>> * unlock_page because locked by add_to_page_cache()
>> */
>> put_page(page);
>> unlock_page(page);
>>
>> Well, there is a typo (page_put) in the comment. :(
>>
>> However, in this case we have just added the huge page to a hugetlbfs
>> file. The put_page() is there just to drop the reference count on the
>> page (taken when allocated). It will still be non-zero as we have
>> successfully added it to the page cache. So, we are not freeing the
>> page here, just dropping the reference count.
>>
>> This should not cause a problem like that seen in madvise.
>
> Thanks for the quick response.
>
> I am not too familiar with this piece of code, so just for the matter of
> understanding: what prevents the page from being removed from the page cache
> shortly after it is added (even if it is highly unlikely)? The page lock? The
> inode lock?
Someone would need to acquire the inode lock to remove the page. This
is held until we exit the routine. Also note that put_page for this
type of huge page almost always results in the page being put back
on a free list within the hugetlb(fs) subsystem. It is not returned
to the 'normal' memory allocators for general use.
--
Mike Kravetz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
@ 2017-08-25 23:41 ` Mike Kravetz
0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-25 23:41 UTC (permalink / raw)
To: Nadav Amit
Cc: ebiggers, Andrew Morton, Andrea Arcangeli, Dmitry Vyukov,
Hugh Dickins, Minchan Kim, rientjes, stable, mm-commits,
open list:MEMORY MANAGEMENT, Linux Kernel Mailing List,
Michal Hocko, nyc
On 08/25/2017 03:51 PM, Nadav Amit wrote:
> Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> On 08/25/2017 03:02 PM, Nadav Amit wrote:
>>> Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>>> Hmm, I do not see this neither in linux-mm nor LKML. Strange
>>>>
>>>> On Wed 23-08-17 14:41:21, Andrew Morton wrote:
>>>>> From: Eric Biggers <ebiggers@google.com>
>>>>> Subject: mm/madvise.c: fix freeing of locked page with MADV_FREE
>>>>>
>>>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
>>>>> put_page() before unlock_page(). This was wrong because put_page() can
>>>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
>>>>> removed it from the memory mapping. put_page() then rightfully complained
>>>>> about freeing a locked page.
>>>>>
>>>>> Fix this by moving the unlock_page() before put_page().
>>>
>>> Quick grep shows that a similar flow (put_page() followed by an
>>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isna??t it a problem as
>>> well?
>>
>> I assume you are asking about this block of code?
>
> Yes.
>
>>
>> /*
>> * page_put due to reference from alloc_huge_page()
>> * unlock_page because locked by add_to_page_cache()
>> */
>> put_page(page);
>> unlock_page(page);
>>
>> Well, there is a typo (page_put) in the comment. :(
>>
>> However, in this case we have just added the huge page to a hugetlbfs
>> file. The put_page() is there just to drop the reference count on the
>> page (taken when allocated). It will still be non-zero as we have
>> successfully added it to the page cache. So, we are not freeing the
>> page here, just dropping the reference count.
>>
>> This should not cause a problem like that seen in madvise.
>
> Thanks for the quick response.
>
> I am not too familiar with this piece of code, so just for the matter of
> understanding: what prevents the page from being removed from the page cache
> shortly after it is added (even if it is highly unlikely)? The page lock? The
> inode lock?
Someone would need to acquire the inode lock to remove the page. This
is held until we exit the routine. Also note that put_page for this
type of huge page almost always results in the page being put back
on a free list within the hugetlb(fs) subsystem. It is not returned
to the 'normal' memory allocators for general use.
--
Mike Kravetz
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-08-26 21:09 ` Eric Biggers
(?)
(?)
@ 2017-08-26 19:11 ` Nadav Amit
2017-08-27 17:15 ` Mike Kravetz
` (2 more replies)
-1 siblings, 3 replies; 25+ messages in thread
From: Nadav Amit @ 2017-08-26 19:11 UTC (permalink / raw)
To: Nadia Yvette Chambers
Cc: linux-kernel, Nadav Amit, Nadav Amit, Eric Biggers, Mike Kravetz
hugetlfs_fallocate() currently performs put_page() before unlock_page().
This scenario opens a small time window, from the time the page is added
to the page cache, until it is unlocked, in which the page might be
removed from the page-cache by another core. If the page is removed
during this time windows, it might cause a memory corruption, as the
wrong page will be unlocked.
It is arguable whether this scenario can happen in a real system, and
there are several mitigating factors. The issue was found by code
inspection (actually grep), and not by actually triggering the flow.
Yet, since putting the page before unlocking is incorrect it should be
fixed, if only to prevent future breakage or someone copy-pasting this
code.
Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
cc: Eric Biggers <ebiggers3@gmail.com>
cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
fs/hugetlbfs/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 28d2753be094..9475fee79cee 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -655,11 +655,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
/*
- * page_put due to reference from alloc_huge_page()
* unlock_page because locked by add_to_page_cache()
+ * page_put due to reference from alloc_huge_page()
*/
- put_page(page);
unlock_page(page);
+ put_page(page);
}
if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
--
2.11.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
2017-08-25 23:41 ` Mike Kravetz
(?)
@ 2017-08-26 21:09 ` Eric Biggers
-1 siblings, 0 replies; 25+ messages in thread
From: Eric Biggers @ 2017-08-26 21:09 UTC (permalink / raw)
To: Mike Kravetz
Cc: Nadav Amit, ebiggers, Andrew Morton, Andrea Arcangeli,
Dmitry Vyukov, Hugh Dickins, Minchan Kim, rientjes, stable,
mm-commits, open list:MEMORY MANAGEMENT,
Linux Kernel Mailing List, Michal Hocko, nyc
On Fri, Aug 25, 2017 at 04:41:36PM -0700, Mike Kravetz wrote:
> >>>>>
> >>>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
> >>>>> put_page() before unlock_page(). This was wrong because put_page() can
> >>>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
> >>>>> removed it from the memory mapping. put_page() then rightfully complained
> >>>>> about freeing a locked page.
> >>>>>
> >>>>> Fix this by moving the unlock_page() before put_page().
> >>>
> >>> Quick grep shows that a similar flow (put_page() followed by an
> >>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
> >>> well?
> >>
> >> I assume you are asking about this block of code?
> >
> > Yes.
> >
> >>
> >> /*
> >> * page_put due to reference from alloc_huge_page()
> >> * unlock_page because locked by add_to_page_cache()
> >> */
> >> put_page(page);
> >> unlock_page(page);
> >>
> >> Well, there is a typo (page_put) in the comment. :(
> >>
> >> However, in this case we have just added the huge page to a hugetlbfs
> >> file. The put_page() is there just to drop the reference count on the
> >> page (taken when allocated). It will still be non-zero as we have
> >> successfully added it to the page cache. So, we are not freeing the
> >> page here, just dropping the reference count.
> >>
> >> This should not cause a problem like that seen in madvise.
> >
> > Thanks for the quick response.
> >
> > I am not too familiar with this piece of code, so just for the matter of
> > understanding: what prevents the page from being removed from the page cache
> > shortly after it is added (even if it is highly unlikely)? The page lock? The
> > inode lock?
>
> Someone would need to acquire the inode lock to remove the page. This
> is held until we exit the routine. Also note that put_page for this
> type of huge page almost always results in the page being put back
> on a free list within the hugetlb(fs) subsystem. It is not returned
> to the 'normal' memory allocators for general use.
>
I'm not sure about that. What about sys_fadvise64(..., POSIX_FADV_DONTNEED)?
That removes pages from the page cache without taking the inode lock. It won't
remove locked pages though, so presumably it is only the page lock that prevents
the race with hugetlbfs_fallocate().
But in any case, why not do the "obviously correct" thing --- unlock before put?
Eric
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
@ 2017-08-26 21:09 ` Eric Biggers
0 siblings, 0 replies; 25+ messages in thread
From: Eric Biggers @ 2017-08-26 21:09 UTC (permalink / raw)
To: Mike Kravetz
Cc: Nadav Amit, ebiggers, Andrew Morton, Andrea Arcangeli,
Dmitry Vyukov, Hugh Dickins, Minchan Kim, rientjes, stable,
mm-commits, open list:MEMORY MANAGEMENT,
Linux Kernel Mailing List, Michal Hocko, nyc
On Fri, Aug 25, 2017 at 04:41:36PM -0700, Mike Kravetz wrote:
> >>>>>
> >>>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
> >>>>> put_page() before unlock_page(). This was wrong because put_page() can
> >>>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
> >>>>> removed it from the memory mapping. put_page() then rightfully complained
> >>>>> about freeing a locked page.
> >>>>>
> >>>>> Fix this by moving the unlock_page() before put_page().
> >>>
> >>> Quick grep shows that a similar flow (put_page() followed by an
> >>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isn’t it a problem as
> >>> well?
> >>
> >> I assume you are asking about this block of code?
> >
> > Yes.
> >
> >>
> >> /*
> >> * page_put due to reference from alloc_huge_page()
> >> * unlock_page because locked by add_to_page_cache()
> >> */
> >> put_page(page);
> >> unlock_page(page);
> >>
> >> Well, there is a typo (page_put) in the comment. :(
> >>
> >> However, in this case we have just added the huge page to a hugetlbfs
> >> file. The put_page() is there just to drop the reference count on the
> >> page (taken when allocated). It will still be non-zero as we have
> >> successfully added it to the page cache. So, we are not freeing the
> >> page here, just dropping the reference count.
> >>
> >> This should not cause a problem like that seen in madvise.
> >
> > Thanks for the quick response.
> >
> > I am not too familiar with this piece of code, so just for the matter of
> > understanding: what prevents the page from being removed from the page cache
> > shortly after it is added (even if it is highly unlikely)? The page lock? The
> > inode lock?
>
> Someone would need to acquire the inode lock to remove the page. This
> is held until we exit the routine. Also note that put_page for this
> type of huge page almost always results in the page being put back
> on a free list within the hugetlb(fs) subsystem. It is not returned
> to the 'normal' memory allocators for general use.
>
I'm not sure about that. What about sys_fadvise64(..., POSIX_FADV_DONTNEED)?
That removes pages from the page cache without taking the inode lock. It won't
remove locked pages though, so presumably it is only the page lock that prevents
the race with hugetlbfs_fallocate().
But in any case, why not do the "obviously correct" thing --- unlock before put?
Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree
@ 2017-08-26 21:09 ` Eric Biggers
0 siblings, 0 replies; 25+ messages in thread
From: Eric Biggers @ 2017-08-26 21:09 UTC (permalink / raw)
To: Mike Kravetz
Cc: Nadav Amit, ebiggers, Andrew Morton, Andrea Arcangeli,
Dmitry Vyukov, Hugh Dickins, Minchan Kim, rientjes, stable,
mm-commits, open list:MEMORY MANAGEMENT,
Linux Kernel Mailing List, Michal Hocko, nyc
On Fri, Aug 25, 2017 at 04:41:36PM -0700, Mike Kravetz wrote:
> >>>>>
> >>>>> If madvise(..., MADV_FREE) split a transparent hugepage, it called
> >>>>> put_page() before unlock_page(). This was wrong because put_page() can
> >>>>> free the page, e.g. if a concurrent madvise(..., MADV_DONTNEED) has
> >>>>> removed it from the memory mapping. put_page() then rightfully complained
> >>>>> about freeing a locked page.
> >>>>>
> >>>>> Fix this by moving the unlock_page() before put_page().
> >>>
> >>> Quick grep shows that a similar flow (put_page() followed by an
> >>> unlock_page() ) also happens in hugetlbfs_fallocate(). Isna??t it a problem as
> >>> well?
> >>
> >> I assume you are asking about this block of code?
> >
> > Yes.
> >
> >>
> >> /*
> >> * page_put due to reference from alloc_huge_page()
> >> * unlock_page because locked by add_to_page_cache()
> >> */
> >> put_page(page);
> >> unlock_page(page);
> >>
> >> Well, there is a typo (page_put) in the comment. :(
> >>
> >> However, in this case we have just added the huge page to a hugetlbfs
> >> file. The put_page() is there just to drop the reference count on the
> >> page (taken when allocated). It will still be non-zero as we have
> >> successfully added it to the page cache. So, we are not freeing the
> >> page here, just dropping the reference count.
> >>
> >> This should not cause a problem like that seen in madvise.
> >
> > Thanks for the quick response.
> >
> > I am not too familiar with this piece of code, so just for the matter of
> > understanding: what prevents the page from being removed from the page cache
> > shortly after it is added (even if it is highly unlikely)? The page lock? The
> > inode lock?
>
> Someone would need to acquire the inode lock to remove the page. This
> is held until we exit the routine. Also note that put_page for this
> type of huge page almost always results in the page being put back
> on a free list within the hugetlb(fs) subsystem. It is not returned
> to the 'normal' memory allocators for general use.
>
I'm not sure about that. What about sys_fadvise64(..., POSIX_FADV_DONTNEED)?
That removes pages from the page cache without taking the inode lock. It won't
remove locked pages though, so presumably it is only the page lock that prevents
the race with hugetlbfs_fallocate().
But in any case, why not do the "obviously correct" thing --- unlock before put?
Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-08-26 19:11 ` [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate() Nadav Amit
@ 2017-08-27 17:15 ` Mike Kravetz
2017-08-27 20:08 ` Nadav Amit
2017-08-28 13:46 ` Michal Hocko
2017-11-29 2:37 ` Eric Biggers
2 siblings, 1 reply; 25+ messages in thread
From: Mike Kravetz @ 2017-08-27 17:15 UTC (permalink / raw)
To: Nadav Amit, Nadia Yvette Chambers; +Cc: linux-kernel, Nadav Amit, Eric Biggers
On 08/26/2017 12:11 PM, Nadav Amit wrote:
> hugetlfs_fallocate() currently performs put_page() before unlock_page().
> This scenario opens a small time window, from the time the page is added
> to the page cache, until it is unlocked, in which the page might be
> removed from the page-cache by another core. If the page is removed
> during this time windows, it might cause a memory corruption, as the
> wrong page will be unlocked.
>
> It is arguable whether this scenario can happen in a real system, and
> there are several mitigating factors. The issue was found by code
> inspection (actually grep), and not by actually triggering the flow.
> Yet, since putting the page before unlocking is incorrect it should be
> fixed, if only to prevent future breakage or someone copy-pasting this
> code.
>
> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
>
> cc: Eric Biggers <ebiggers3@gmail.com>
> cc: Mike Kravetz <mike.kravetz@oracle.com>
>
> Signed-off-by: Nadav Amit <namit@vmware.com>
Thank you Nadav.
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Since hugetlbfs is an in memory filesystem, the only way one 'should' be
able to remove a page (file content) is through an inode operation such as
truncate, hole punch, or unlink. That was the basis for my response that
the inode lock would be required for page freeing.
Eric's question about sys_fadvise64(POSIX_FADV_DONTNEED) is interesting.
I was expecting to see a check for hugetlbfs pages and exit (without
modification) if encountered. A quick review of the code did not find
any such checks.
I'll take a closer look to determine exactly how hugetlbfs files are
handled. IMO, there should be something similar to the DAX check where
the routine quickly exits.
--
Mike Kravetz
> ---
> fs/hugetlbfs/inode.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 28d2753be094..9475fee79cee 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -655,11 +655,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>
> /*
> - * page_put due to reference from alloc_huge_page()
> * unlock_page because locked by add_to_page_cache()
> + * page_put due to reference from alloc_huge_page()
> */
> - put_page(page);
> unlock_page(page);
> + put_page(page);
> }
>
> if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-08-27 17:15 ` Mike Kravetz
@ 2017-08-27 20:08 ` Nadav Amit
2017-08-28 17:45 ` Mike Kravetz
0 siblings, 1 reply; 25+ messages in thread
From: Nadav Amit @ 2017-08-27 20:08 UTC (permalink / raw)
To: Mike Kravetz
Cc: Nadia Yvette Chambers, Linux Kernel Mailing List, Eric Biggers
Mike Kravetz <mike.kravetz@oracle.com> wrote:
> On 08/26/2017 12:11 PM, Nadav Amit wrote:
>> hugetlfs_fallocate() currently performs put_page() before unlock_page().
>> This scenario opens a small time window, from the time the page is added
>> to the page cache, until it is unlocked, in which the page might be
>> removed from the page-cache by another core. If the page is removed
>> during this time windows, it might cause a memory corruption, as the
>> wrong page will be unlocked.
>>
>> It is arguable whether this scenario can happen in a real system, and
>> there are several mitigating factors. The issue was found by code
>> inspection (actually grep), and not by actually triggering the flow.
>> Yet, since putting the page before unlocking is incorrect it should be
>> fixed, if only to prevent future breakage or someone copy-pasting this
>> code.
>>
>> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
>>
>> cc: Eric Biggers <ebiggers3@gmail.com>
>> cc: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>
> Thank you Nadav.
No problem.
>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>
> Since hugetlbfs is an in memory filesystem, the only way one 'should' be
> able to remove a page (file content) is through an inode operation such as
> truncate, hole punch, or unlink. That was the basis for my response that
> the inode lock would be required for page freeing.
>
> Eric's question about sys_fadvise64(POSIX_FADV_DONTNEED) is interesting.
> I was expecting to see a check for hugetlbfs pages and exit (without
> modification) if encountered. A quick review of the code did not find
> any such checks.
>
> I'll take a closer look to determine exactly how hugetlbfs files are
> handled. IMO, there should be something similar to the DAX check where
> the routine quickly exits.
I did not cc stable when submitting the patch, based on your previous
response. Let me know if you want me to send v2 which does so.
Thanks,
Nadav
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-08-26 19:11 ` [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate() Nadav Amit
2017-08-27 17:15 ` Mike Kravetz
@ 2017-08-28 13:46 ` Michal Hocko
2017-11-29 2:37 ` Eric Biggers
2 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2017-08-28 13:46 UTC (permalink / raw)
To: Nadav Amit
Cc: Nadia Yvette Chambers, linux-kernel, Nadav Amit, Eric Biggers,
Mike Kravetz, Andrew Morton
[CC Andrew]
On Sat 26-08-17 12:11:24, Nadav Amit wrote:
> hugetlfs_fallocate() currently performs put_page() before unlock_page().
> This scenario opens a small time window, from the time the page is added
> to the page cache, until it is unlocked, in which the page might be
> removed from the page-cache by another core. If the page is removed
> during this time windows, it might cause a memory corruption, as the
> wrong page will be unlocked.
>
> It is arguable whether this scenario can happen in a real system, and
> there are several mitigating factors. The issue was found by code
> inspection (actually grep), and not by actually triggering the flow.
> Yet, since putting the page before unlocking is incorrect it should be
> fixed, if only to prevent future breakage or someone copy-pasting this
> code.
Even if this a theoretical problem it is definitely worth fixing because
it is a anti-pattern of how the reference counted object should be treated.
> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
>
> cc: Eric Biggers <ebiggers3@gmail.com>
> cc: Mike Kravetz <mike.kravetz@oracle.com>
>
> Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
> ---
> fs/hugetlbfs/inode.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 28d2753be094..9475fee79cee 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -655,11 +655,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>
> /*
> - * page_put due to reference from alloc_huge_page()
> * unlock_page because locked by add_to_page_cache()
> + * page_put due to reference from alloc_huge_page()
> */
> - put_page(page);
> unlock_page(page);
> + put_page(page);
> }
>
> if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
> --
> 2.11.0
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-08-27 20:08 ` Nadav Amit
@ 2017-08-28 17:45 ` Mike Kravetz
2017-08-28 18:09 ` Michal Hocko
0 siblings, 1 reply; 25+ messages in thread
From: Mike Kravetz @ 2017-08-28 17:45 UTC (permalink / raw)
To: Nadav Amit
Cc: Nadia Yvette Chambers, Linux Kernel Mailing List, Eric Biggers,
Andrew Morton, Michal Hocko
Adding Andrew, Michal on CC
On 08/27/2017 01:08 PM, Nadav Amit wrote:
> Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
>> On 08/26/2017 12:11 PM, Nadav Amit wrote:
>>> hugetlfs_fallocate() currently performs put_page() before unlock_page().
>>> This scenario opens a small time window, from the time the page is added
>>> to the page cache, until it is unlocked, in which the page might be
>>> removed from the page-cache by another core. If the page is removed
>>> during this time windows, it might cause a memory corruption, as the
>>> wrong page will be unlocked.
>>>
>>> It is arguable whether this scenario can happen in a real system, and
>>> there are several mitigating factors. The issue was found by code
>>> inspection (actually grep), and not by actually triggering the flow.
>>> Yet, since putting the page before unlocking is incorrect it should be
>>> fixed, if only to prevent future breakage or someone copy-pasting this
>>> code.
>>>
>>> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
>>>
>>> cc: Eric Biggers <ebiggers3@gmail.com>
>>> cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>
>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>
>> Thank you Nadav.
>
> No problem.
>
>>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> Since hugetlbfs is an in memory filesystem, the only way one 'should' be
>> able to remove a page (file content) is through an inode operation such as
>> truncate, hole punch, or unlink. That was the basis for my response that
>> the inode lock would be required for page freeing.
>>
>> Eric's question about sys_fadvise64(POSIX_FADV_DONTNEED) is interesting.
>> I was expecting to see a check for hugetlbfs pages and exit (without
>> modification) if encountered. A quick review of the code did not find
>> any such checks.
>>
>> I'll take a closer look to determine exactly how hugetlbfs files are
>> handled. IMO, there should be something similar to the DAX check where
>> the routine quickly exits.
>
> I did not cc stable when submitting the patch, based on your previous
> response. Let me know if you want me to send v2 which does so.
I still do not believe there is a need to change this in stable. Your patch
should be sufficient to ensure we do the right thing going forward.
Looking at and testing the sys_fadvise64(POSIX_FADV_DONTNEED) code with
hugetlbfs does indeed show a more general problem. One can use
sys_fadvise64() to remove a huge page from a hugetlbfs file. :( This does
not go through the special hugetlbfs page handling code, but rather the
normal mm paths. As a result hugetlbfs accounting (like reserve counts)
gets out of sync and the hugetlbfs filesystem may become unusable. Sigh!!!
I will address this issue in a separate patch.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-08-28 17:45 ` Mike Kravetz
@ 2017-08-28 18:09 ` Michal Hocko
2017-08-28 18:51 ` Mike Kravetz
0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2017-08-28 18:09 UTC (permalink / raw)
To: Mike Kravetz
Cc: Nadav Amit, Nadia Yvette Chambers, Linux Kernel Mailing List,
Eric Biggers, Andrew Morton
On Mon 28-08-17 10:45:58, Mike Kravetz wrote:
> Adding Andrew, Michal on CC
>
> On 08/27/2017 01:08 PM, Nadav Amit wrote:
> > Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> >> On 08/26/2017 12:11 PM, Nadav Amit wrote:
> >>> hugetlfs_fallocate() currently performs put_page() before unlock_page().
> >>> This scenario opens a small time window, from the time the page is added
> >>> to the page cache, until it is unlocked, in which the page might be
> >>> removed from the page-cache by another core. If the page is removed
> >>> during this time windows, it might cause a memory corruption, as the
> >>> wrong page will be unlocked.
> >>>
> >>> It is arguable whether this scenario can happen in a real system, and
> >>> there are several mitigating factors. The issue was found by code
> >>> inspection (actually grep), and not by actually triggering the flow.
> >>> Yet, since putting the page before unlocking is incorrect it should be
> >>> fixed, if only to prevent future breakage or someone copy-pasting this
> >>> code.
> >>>
> >>> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
> >>>
> >>> cc: Eric Biggers <ebiggers3@gmail.com>
> >>> cc: Mike Kravetz <mike.kravetz@oracle.com>
> >>>
> >>> Signed-off-by: Nadav Amit <namit@vmware.com>
> >>
> >> Thank you Nadav.
> >
> > No problem.
> >
> >>
> >> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> >>
> >> Since hugetlbfs is an in memory filesystem, the only way one 'should' be
> >> able to remove a page (file content) is through an inode operation such as
> >> truncate, hole punch, or unlink. That was the basis for my response that
> >> the inode lock would be required for page freeing.
> >>
> >> Eric's question about sys_fadvise64(POSIX_FADV_DONTNEED) is interesting.
> >> I was expecting to see a check for hugetlbfs pages and exit (without
> >> modification) if encountered. A quick review of the code did not find
> >> any such checks.
> >>
> >> I'll take a closer look to determine exactly how hugetlbfs files are
> >> handled. IMO, there should be something similar to the DAX check where
> >> the routine quickly exits.
> >
> > I did not cc stable when submitting the patch, based on your previous
> > response. Let me know if you want me to send v2 which does so.
>
> I still do not believe there is a need to change this in stable. Your patch
> should be sufficient to ensure we do the right thing going forward.
>
> Looking at and testing the sys_fadvise64(POSIX_FADV_DONTNEED) code with
> hugetlbfs does indeed show a more general problem. One can use
> sys_fadvise64() to remove a huge page from a hugetlbfs file. :( This does
> not go through the special hugetlbfs page handling code, but rather the
> normal mm paths. As a result hugetlbfs accounting (like reserve counts)
> gets out of sync and the hugetlbfs filesystem may become unusable. Sigh!!!
>
> I will address this issue in a separate patch.
I didn't check very carefully but it seems that
http://ozlabs.org/~akpm/mmotm/broken-out/mm-fadvise-avoid-fadvise-for-fs-without-backing-device.patch
should help here, right?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-08-28 18:09 ` Michal Hocko
@ 2017-08-28 18:51 ` Mike Kravetz
0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-28 18:51 UTC (permalink / raw)
To: Michal Hocko
Cc: Nadav Amit, Nadia Yvette Chambers, Linux Kernel Mailing List,
Eric Biggers, Andrew Morton
On 08/28/2017 11:09 AM, Michal Hocko wrote:
> On Mon 28-08-17 10:45:58, Mike Kravetz wrote:
>> Adding Andrew, Michal on CC
>>
>> On 08/27/2017 01:08 PM, Nadav Amit wrote:
>>> Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>
>>>> On 08/26/2017 12:11 PM, Nadav Amit wrote:
>>>>> hugetlfs_fallocate() currently performs put_page() before unlock_page().
>>>>> This scenario opens a small time window, from the time the page is added
>>>>> to the page cache, until it is unlocked, in which the page might be
>>>>> removed from the page-cache by another core. If the page is removed
>>>>> during this time windows, it might cause a memory corruption, as the
>>>>> wrong page will be unlocked.
>>>>>
>>>>> It is arguable whether this scenario can happen in a real system, and
>>>>> there are several mitigating factors. The issue was found by code
>>>>> inspection (actually grep), and not by actually triggering the flow.
>>>>> Yet, since putting the page before unlocking is incorrect it should be
>>>>> fixed, if only to prevent future breakage or someone copy-pasting this
>>>>> code.
>>>>>
>>>>> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
>>>>>
>>>>> cc: Eric Biggers <ebiggers3@gmail.com>
>>>>> cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>>>
>>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>>
>>>> Thank you Nadav.
>>>
>>> No problem.
>>>
>>>>
>>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>>>
>>>> Since hugetlbfs is an in memory filesystem, the only way one 'should' be
>>>> able to remove a page (file content) is through an inode operation such as
>>>> truncate, hole punch, or unlink. That was the basis for my response that
>>>> the inode lock would be required for page freeing.
>>>>
>>>> Eric's question about sys_fadvise64(POSIX_FADV_DONTNEED) is interesting.
>>>> I was expecting to see a check for hugetlbfs pages and exit (without
>>>> modification) if encountered. A quick review of the code did not find
>>>> any such checks.
>>>>
>>>> I'll take a closer look to determine exactly how hugetlbfs files are
>>>> handled. IMO, there should be something similar to the DAX check where
>>>> the routine quickly exits.
>>>
>>> I did not cc stable when submitting the patch, based on your previous
>>> response. Let me know if you want me to send v2 which does so.
>>
>> I still do not believe there is a need to change this in stable. Your patch
>> should be sufficient to ensure we do the right thing going forward.
>>
>> Looking at and testing the sys_fadvise64(POSIX_FADV_DONTNEED) code with
>> hugetlbfs does indeed show a more general problem. One can use
>> sys_fadvise64() to remove a huge page from a hugetlbfs file. :( This does
>> not go through the special hugetlbfs page handling code, but rather the
>> normal mm paths. As a result hugetlbfs accounting (like reserve counts)
>> gets out of sync and the hugetlbfs filesystem may become unusable. Sigh!!!
>>
>> I will address this issue in a separate patch.
>
> I didn't check very carefully but it seems that
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-fadvise-avoid-fadvise-for-fs-without-backing-device.patch
> should help here, right?
Thanks Michal.
Yes, that patch addresses the above issue with hugetlbfs. I was also
wondering if there were similar issues with other in memory filesystems.
Looks like there are.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-08-26 19:11 ` [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate() Nadav Amit
2017-08-27 17:15 ` Mike Kravetz
2017-08-28 13:46 ` Michal Hocko
@ 2017-11-29 2:37 ` Eric Biggers
2017-11-29 3:22 ` Mike Kravetz
2 siblings, 1 reply; 25+ messages in thread
From: Eric Biggers @ 2017-11-29 2:37 UTC (permalink / raw)
To: Nadav Amit; +Cc: Nadia Yvette Chambers, linux-kernel, Nadav Amit, Mike Kravetz
On Sat, Aug 26, 2017 at 12:11:24PM -0700, Nadav Amit wrote:
> hugetlfs_fallocate() currently performs put_page() before unlock_page().
> This scenario opens a small time window, from the time the page is added
> to the page cache, until it is unlocked, in which the page might be
> removed from the page-cache by another core. If the page is removed
> during this time windows, it might cause a memory corruption, as the
> wrong page will be unlocked.
>
> It is arguable whether this scenario can happen in a real system, and
> there are several mitigating factors. The issue was found by code
> inspection (actually grep), and not by actually triggering the flow.
> Yet, since putting the page before unlocking is incorrect it should be
> fixed, if only to prevent future breakage or someone copy-pasting this
> code.
>
> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
>
> cc: Eric Biggers <ebiggers3@gmail.com>
> cc: Mike Kravetz <mike.kravetz@oracle.com>
>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
> fs/hugetlbfs/inode.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 28d2753be094..9475fee79cee 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -655,11 +655,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>
> /*
> - * page_put due to reference from alloc_huge_page()
> * unlock_page because locked by add_to_page_cache()
> + * page_put due to reference from alloc_huge_page()
> */
> - put_page(page);
> unlock_page(page);
> + put_page(page);
> }
>
> if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
> --
This patch wasn't ever applied. Nadia, do you take patches for hugetlbfs, or
does this need to go through Andrew Morton?
Eric
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()
2017-11-29 2:37 ` Eric Biggers
@ 2017-11-29 3:22 ` Mike Kravetz
0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-11-29 3:22 UTC (permalink / raw)
To: Eric Biggers, Nadav Amit
Cc: Nadia Yvette Chambers, linux-kernel, Nadav Amit, Michal Hocko,
Andrew Morton
[CC Andrew, Michal]
On 11/28/2017 06:37 PM, Eric Biggers wrote:
> On Sat, Aug 26, 2017 at 12:11:24PM -0700, Nadav Amit wrote:
>> hugetlfs_fallocate() currently performs put_page() before unlock_page().
>> This scenario opens a small time window, from the time the page is added
>> to the page cache, until it is unlocked, in which the page might be
>> removed from the page-cache by another core. If the page is removed
>> during this time windows, it might cause a memory corruption, as the
>> wrong page will be unlocked.
>>
>> It is arguable whether this scenario can happen in a real system, and
>> there are several mitigating factors. The issue was found by code
>> inspection (actually grep), and not by actually triggering the flow.
>> Yet, since putting the page before unlocking is incorrect it should be
>> fixed, if only to prevent future breakage or someone copy-pasting this
>> code.
>>
>> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
>>
>> cc: Eric Biggers <ebiggers3@gmail.com>
>> cc: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> fs/hugetlbfs/inode.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 28d2753be094..9475fee79cee 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -655,11 +655,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>> mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>>
>> /*
>> - * page_put due to reference from alloc_huge_page()
>> * unlock_page because locked by add_to_page_cache()
>> + * page_put due to reference from alloc_huge_page()
>> */
>> - put_page(page);
>> unlock_page(page);
>> + put_page(page);
>> }
>>
>> if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size)
>> --
>
> This patch wasn't ever applied. Nadia, do you take patches for hugetlbfs, or
> does this need to go through Andrew Morton?
>
> Eric
Nadia has not been active for some time on hugetlbfs, so best to go through
Andrew. Added Andrew and Michal on CC.
This patch has a:
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
I am still of the opinion that this does not need to be sent to stable.
Although the ordering is current code is incorrect, there is no way for
this to be a problem with current locking. In addition, I verified
that the perhaps bigger issue with sys_fadvise64(POSIX_FADV_DONTNEED)
for hugetlbfs and other filesystems is addressed in commit 3a77d214807c.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2017-11-29 3:22 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-23 21:41 + mm-madvise-fix-freeing-of-locked-page-with-madv_free.patch added to -mm tree akpm
2017-08-24 6:09 ` Michal Hocko
2017-08-24 6:53 ` Michal Hocko
2017-08-25 21:36 ` Andrew Morton
2017-08-25 21:36 ` Andrew Morton
2017-08-25 22:02 ` Nadav Amit
2017-08-25 22:31 ` Mike Kravetz
2017-08-25 22:31 ` Mike Kravetz
2017-08-25 22:31 ` Mike Kravetz
2017-08-25 22:51 ` Nadav Amit
2017-08-25 23:41 ` Mike Kravetz
2017-08-25 23:41 ` Mike Kravetz
2017-08-25 23:41 ` Mike Kravetz
2017-08-26 21:09 ` Eric Biggers
2017-08-26 21:09 ` Eric Biggers
2017-08-26 21:09 ` Eric Biggers
2017-08-26 19:11 ` [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate() Nadav Amit
2017-08-27 17:15 ` Mike Kravetz
2017-08-27 20:08 ` Nadav Amit
2017-08-28 17:45 ` Mike Kravetz
2017-08-28 18:09 ` Michal Hocko
2017-08-28 18:51 ` Mike Kravetz
2017-08-28 13:46 ` Michal Hocko
2017-11-29 2:37 ` Eric Biggers
2017-11-29 3:22 ` Mike Kravetz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.