* THP race?
@ 2016-02-23 15:49 Kirill A. Shutemov
2016-02-23 18:06 ` Andrea Arcangeli
2016-02-23 18:49 ` [PATCH 0/1] " Andrea Arcangeli
0 siblings, 2 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 15:49 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-mm
Hi Andrea,
I suspect there's race with THP in __handle_mm_fault(). It's pure
theoretical and race window is small, but..
Consider following scenario:
- THP got allocated by other thread just before "pmd_none() &&
__pte_alloc()" check, so pmd_none() is false and we don't
allocate the page table.
- But before pmd_trans_huge() check the page got unmap by
MADV_DONTNEED in other thread.
- At this point we will call pte_offset_map() for pmd which is
pmd_none().
Nothing pleasant would happen after this...
Do you see anything what would prevent this scenario?
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: THP race?
2016-02-23 15:49 THP race? Kirill A. Shutemov
@ 2016-02-23 18:06 ` Andrea Arcangeli
2016-02-23 18:18 ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault kbuild test robot
` (3 more replies)
2016-02-23 18:49 ` [PATCH 0/1] " Andrea Arcangeli
1 sibling, 4 replies; 13+ messages in thread
From: Andrea Arcangeli @ 2016-02-23 18:06 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: linux-mm
On Tue, Feb 23, 2016 at 06:49:50PM +0300, Kirill A. Shutemov wrote:
> Hi Andrea,
>
> I suspect there's race with THP in __handle_mm_fault(). It's pure
> theoretical and race window is small, but..
>
> Consider following scenario:
>
> - THP got allocated by other thread just before "pmd_none() &&
> __pte_alloc()" check, so pmd_none() is false and we don't
> allocate the page table.
>
> - But before pmd_trans_huge() check the page got unmap by
> MADV_DONTNEED in other thread.
>
> - At this point we will call pte_offset_map() for pmd which is
> pmd_none().
>
> Nothing pleasant would happen after this...
>
> Do you see anything what would prevent this scenario?
No so I think we need s/pmd_trans_huge/pmd_trans_unstable/ and use the
atomic read in C to sort this out lockless. The MADV_DONTNEED part
that isn't holding the mmap_sem for writing unfortunately wasn't
sorted out immediately, that was unexpected in
fact. pmd_trans_unstable() was introduced precisely to handle this
trouble caused by MADV_DONTNEED running with the mmap_sem only for
reading which causes infinite possible transactions back and forth
between none and transhuge while holding only the mmap_sem for
reading.
==
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault
2016-02-23 18:06 ` Andrea Arcangeli
@ 2016-02-23 18:18 ` kbuild test robot
2016-02-23 18:21 ` kbuild test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2016-02-23 18:18 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: kbuild-all, Kirill A. Shutemov, linux-mm
[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]
Hi Andrea,
[auto build test ERROR on v4.5-rc5]
[also build test ERROR on next-20160223]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
url: https://github.com/0day-ci/linux/commits/Andrea-Arcangeli/mm-thp-fix-SMP-race-condition-between-THP-page-fault/20160224-020835
config: i386-tinyconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c: In function '__handle_mm_fault':
>> mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:166:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
vim +/pmd_trans_unstable +3419 mm/memory.c
3413 * a different thread of this mm, in turn leading to a false
3414 * negative pmd_trans_huge() retval. All we have to ensure is
3415 * that it is a regular pmd that we can walk with
3416 * pte_offset_map() and we can do that through an atomic read
3417 * in C, which is what pmd_trans_unstable() is provided for.
3418 */
> 3419 if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
3420 return 0;
3421 /*
3422 * A regular pmd is established and it can't morph into a huge pmd
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 6204 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault
2016-02-23 18:06 ` Andrea Arcangeli
2016-02-23 18:18 ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault kbuild test robot
@ 2016-02-23 18:21 ` kbuild test robot
2016-02-23 18:27 ` kbuild test robot
2016-02-23 18:38 ` THP race? Kirill A. Shutemov
3 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2016-02-23 18:21 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: kbuild-all, Kirill A. Shutemov, linux-mm
[-- Attachment #1: Type: text/plain, Size: 15914 bytes --]
Hi Andrea,
[auto build test WARNING on v4.5-rc5]
[also build test WARNING on next-20160223]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
url: https://github.com/0day-ci/linux/commits/Andrea-Arcangeli/mm-thp-fix-SMP-race-condition-between-THP-page-fault/20160224-020835
config: x86_64-randconfig-x011-201608 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c: In function '__handle_mm_fault':
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:147:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:147:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:147:30: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
>> include/linux/compiler.h:123:14: note: in expansion of macro 'likely_notrace'
______r = likely_notrace(x); \
^
include/linux/compiler.h:137:58: note: in expansion of macro '__branch_check__'
# define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:147:42: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:147:42: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:147:42: note: in definition of macro '__trace_if'
if (__builtin_constant_p(!!(cond)) ? !!(cond) : \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
>> include/linux/compiler.h:123:14: note: in expansion of macro 'likely_notrace'
______r = likely_notrace(x); \
^
include/linux/compiler.h:137:58: note: in expansion of macro '__branch_check__'
# define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:158:16: note: in definition of macro '__trace_if'
______r = !!(cond); \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:158:16: note: in definition of macro '__trace_if'
______r = !!(cond); \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:158:16: note: in definition of macro '__trace_if'
______r = !!(cond); \
^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
>> include/linux/compiler.h:123:14: note: in expansion of macro 'likely_notrace'
______r = likely_notrace(x); \
^
include/linux/compiler.h:137:58: note: in expansion of macro '__branch_check__'
# define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
^
mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
vim +/if +3419 mm/memory.c
3403 */
3404 if (unlikely(pmd_none(*pmd)) &&
3405 unlikely(__pte_alloc(mm, vma, pmd, address)))
3406 return VM_FAULT_OOM;
3407 /*
3408 * If an huge pmd materialized from under us just retry later.
3409 * Use pmd_trans_unstable() instead of pmd_trans_huge() to
3410 * ensure the pmd didn't become pmd_trans_huge from under us
3411 * and then immediately back to pmd_none as result of
3412 * MADV_DONTNEED running immediately after a huge_pmd fault of
3413 * a different thread of this mm, in turn leading to a false
3414 * negative pmd_trans_huge() retval. All we have to ensure is
3415 * that it is a regular pmd that we can walk with
3416 * pte_offset_map() and we can do that through an atomic read
3417 * in C, which is what pmd_trans_unstable() is provided for.
3418 */
> 3419 if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
3420 return 0;
3421 /*
3422 * A regular pmd is established and it can't morph into a huge pmd
3423 * from under us anymore at this point because we hold the mmap_sem
3424 * read mode and khugepaged takes it in write mode. So now it's
3425 * safe to run pte_offset_map().
3426 */
3427 pte = pte_offset_map(pmd, address);
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 19381 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault
2016-02-23 18:06 ` Andrea Arcangeli
2016-02-23 18:18 ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault kbuild test robot
2016-02-23 18:21 ` kbuild test robot
@ 2016-02-23 18:27 ` kbuild test robot
2016-02-23 18:38 ` THP race? Kirill A. Shutemov
3 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2016-02-23 18:27 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: kbuild-all, Kirill A. Shutemov, linux-mm
[-- Attachment #1: Type: text/plain, Size: 5996 bytes --]
Hi Andrea,
[auto build test WARNING on v4.5-rc5]
[also build test WARNING on next-20160223]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
url: https://github.com/0day-ci/linux/commits/Andrea-Arcangeli/mm-thp-fix-SMP-race-condition-between-THP-page-fault/20160224-020835
config: x86_64-randconfig-x012-201608 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c: In function '__handle_mm_fault':
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:137:45: note: in definition of macro 'unlikely'
# define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:137:53: note: in definition of macro 'unlikely'
# define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/smp.h:10,
from include/linux/kernel_stat.h:4,
from mm/memory.c:41:
mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
include/linux/compiler.h:110:47: note: in definition of macro 'likely_notrace'
#define likely_notrace(x) __builtin_expect(!!(x), 1)
^
include/linux/compiler.h:137:58: note: in expansion of macro '__branch_check__'
# define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
^
>> mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
^
In file included from arch/x86/include/asm/pgtable.h:914:0,
from include/linux/mm.h:67,
from mm/memory.c:42:
include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
static inline int pmd_trans_unstable(pmd_t *pmd)
^
vim +/unlikely +3419 mm/memory.c
3403 */
3404 if (unlikely(pmd_none(*pmd)) &&
3405 unlikely(__pte_alloc(mm, vma, pmd, address)))
3406 return VM_FAULT_OOM;
3407 /*
3408 * If an huge pmd materialized from under us just retry later.
3409 * Use pmd_trans_unstable() instead of pmd_trans_huge() to
3410 * ensure the pmd didn't become pmd_trans_huge from under us
3411 * and then immediately back to pmd_none as result of
3412 * MADV_DONTNEED running immediately after a huge_pmd fault of
3413 * a different thread of this mm, in turn leading to a false
3414 * negative pmd_trans_huge() retval. All we have to ensure is
3415 * that it is a regular pmd that we can walk with
3416 * pte_offset_map() and we can do that through an atomic read
3417 * in C, which is what pmd_trans_unstable() is provided for.
3418 */
> 3419 if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
3420 return 0;
3421 /*
3422 * A regular pmd is established and it can't morph into a huge pmd
3423 * from under us anymore at this point because we hold the mmap_sem
3424 * read mode and khugepaged takes it in write mode. So now it's
3425 * safe to run pte_offset_map().
3426 */
3427 pte = pte_offset_map(pmd, address);
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 21966 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: THP race?
2016-02-23 18:06 ` Andrea Arcangeli
` (2 preceding siblings ...)
2016-02-23 18:27 ` kbuild test robot
@ 2016-02-23 18:38 ` Kirill A. Shutemov
2016-02-23 19:28 ` Andrea Arcangeli
3 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 18:38 UTC (permalink / raw)
To: Andrea Arcangeli, Dan Williams; +Cc: linux-mm
On Tue, Feb 23, 2016 at 07:06:09PM +0100, Andrea Arcangeli wrote:
> On Tue, Feb 23, 2016 at 06:49:50PM +0300, Kirill A. Shutemov wrote:
> > Hi Andrea,
> >
> > I suspect there's race with THP in __handle_mm_fault(). It's pure
> > theoretical and race window is small, but..
> >
> > Consider following scenario:
> >
> > - THP got allocated by other thread just before "pmd_none() &&
> > __pte_alloc()" check, so pmd_none() is false and we don't
> > allocate the page table.
> >
> > - But before pmd_trans_huge() check the page got unmap by
> > MADV_DONTNEED in other thread.
> >
> > - At this point we will call pte_offset_map() for pmd which is
> > pmd_none().
> >
> > Nothing pleasant would happen after this...
> >
> > Do you see anything what would prevent this scenario?
>
> No so I think we need s/pmd_trans_huge/pmd_trans_unstable/ and use the
> atomic read in C to sort this out lockless. The MADV_DONTNEED part
> that isn't holding the mmap_sem for writing unfortunately wasn't
> sorted out immediately, that was unexpected in
> fact. pmd_trans_unstable() was introduced precisely to handle this
> trouble caused by MADV_DONTNEED running with the mmap_sem only for
> reading which causes infinite possible transactions back and forth
> between none and transhuge while holding only the mmap_sem for
> reading.
>
> ==
> From eae4f251604299082dd824dc8acade71268c8d88 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Tue, 23 Feb 2016 18:56:55 +0100
> Subject: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault
> and MADV_DONTNEED
>
> pmd_trans_unstable/pmd_none_or_trans_huge_or_clear_bad were introduced
> to locklessy (but atomically) detect when a pmd is a regular (stable)
> pmd or when the pmd is unstable and can infinitely transition from
> pmd_none and pmd_trans_huge from under us, while only holding the
> mmap_sem for reading (for writing not).
>
> While holding the mmap_sem only for reading, MADV_DONTNEED can run
> from under us and so before we can threat the pmd as regular we need
> to compare it against pmd_none and pmd_trans_huge in an atomic way,
> with pmd_trans_unstable(). The old pmd_trans_huge check is correct but
> it leaves a tiny window for a race.
>
> Useful applications are unlikely to notice the difference as doing
> MADV_DONTNEED concurrently with a page fault would lead to undefined
> behavior.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> mm/memory.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 635451a..d5912b0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3404,8 +3404,19 @@ static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> if (unlikely(pmd_none(*pmd)) &&
> unlikely(__pte_alloc(mm, vma, pmd, address)))
> return VM_FAULT_OOM;
> - /* if an huge pmd materialized from under us just retry later */
> - if (unlikely(pmd_trans_huge(*pmd) || pmd_devmap(*pmd)))
> + /*
> + * If an huge pmd materialized from under us just retry later.
> + * Use pmd_trans_unstable() instead of pmd_trans_huge() to
> + * ensure the pmd didn't become pmd_trans_huge from under us
> + * and then immediately back to pmd_none as result of
> + * MADV_DONTNEED running immediately after a huge_pmd fault of
> + * a different thread of this mm, in turn leading to a false
> + * negative pmd_trans_huge() retval. All we have to ensure is
> + * that it is a regular pmd that we can walk with
> + * pte_offset_map() and we can do that through an atomic read
> + * in C, which is what pmd_trans_unstable() is provided for.
> + */
> + if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
pmd_trans_unstable(pmd), otherwise looks good:
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
BTW, I guess DAX would need to introduce the same infrastructure for
pmd_devmap(). Dan?
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 0/1] Re: THP race?
2016-02-23 15:49 THP race? Kirill A. Shutemov
2016-02-23 18:06 ` Andrea Arcangeli
@ 2016-02-23 18:49 ` Andrea Arcangeli
2016-02-23 18:49 ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED Andrea Arcangeli
1 sibling, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2016-02-23 18:49 UTC (permalink / raw)
To: Andrew Morton, \"Kirill A. Shutemov\"; +Cc: linux-mm
Hello Kirill and Andrew,
Resending as a more proper submit with slightly improved commentary
and that builds and boots..
Andrea Arcangeli (1):
mm: thp: fix SMP race condition between THP page fault and
MADV_DONTNEED
mm/memory.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-23 18:49 ` [PATCH 0/1] " Andrea Arcangeli
@ 2016-02-23 18:49 ` Andrea Arcangeli
2016-02-23 21:18 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2016-02-23 18:49 UTC (permalink / raw)
To: Andrew Morton, \"Kirill A. Shutemov\"; +Cc: linux-mm
pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a
regular (stable) pmd or when the pmd is unstable and can infinitely
transition from pmd_none() and pmd_trans_huge() from under us, while
only holding the mmap_sem for reading (for writing not).
While holding the mmap_sem only for reading, MADV_DONTNEED can run
from under us and so before we can assume the pmd to be a regular
stable pmd we need to compare it against pmd_none() and
pmd_trans_huge() in an atomic way, with pmd_trans_unstable(). The old
pmd_trans_huge() left a tiny window for a race.
Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
mm/memory.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 635451a..50347ed 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3404,8 +3404,19 @@ static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (unlikely(pmd_none(*pmd)) &&
unlikely(__pte_alloc(mm, vma, pmd, address)))
return VM_FAULT_OOM;
- /* if an huge pmd materialized from under us just retry later */
- if (unlikely(pmd_trans_huge(*pmd) || pmd_devmap(*pmd)))
+ /*
+ * If an huge pmd materialized from under us just retry later.
+ * Use pmd_trans_unstable() instead of pmd_trans_huge() to
+ * ensure the pmd didn't become pmd_trans_huge from under us
+ * and then back to pmd_none, as result of MADV_DONTNEED
+ * running immediately after a huge pmd fault of a different
+ * thread of this mm, in turn leading to a misleading
+ * pmd_trans_huge() retval. All we have to ensure is that it
+ * is a regular pmd that we can walk with pte_offset_map() and
+ * we can do that through an atomic read in C, which is what
+ * pmd_trans_unstable() is provided for.
+ */
+ if (unlikely(pmd_trans_unstable(pmd) || pmd_devmap(*pmd)))
return 0;
/*
* A regular pmd is established and it can't morph into a huge pmd
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: THP race?
2016-02-23 18:38 ` THP race? Kirill A. Shutemov
@ 2016-02-23 19:28 ` Andrea Arcangeli
2016-02-25 18:45 ` Dan Williams
0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2016-02-23 19:28 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: Dan Williams, linux-mm
On Tue, Feb 23, 2016 at 09:38:32PM +0300, Kirill A. Shutemov wrote:
> pmd_trans_unstable(pmd), otherwise looks good:
Yes sorry.
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Thanks for the quick ack, I just noticed or I would have added it to
the resubmit, but it can be still added to -mm.
> BTW, I guess DAX would need to introduce the same infrastructure for
> pmd_devmap(). Dan?
There is a i_mmap_lock_write in the truncate path that saves the day
for the pmd zapping in the truncate() case without mmap_sem (the only
case anon THP doesn't need to care about as truncate isn't possible in
the anon case), but not in the MADV_DONTNEED madvise case that runs
only with the mmap_sem for reading.
The only objective of this "infrastructure" is to add no pmd_lock()ing
overhead to the page fault, if the mapping is already established but
not huge, and we've just to walk through the pmd to reach the
pte. All because MADV_DONTNEED is running with the mmap_sem for
reading unlike munmap and other slower syscalls that are forced to
mangle the vmas and have to take the mmap_sem for writing regardless.
The question for DAX is if it should do a pmd_devmap check inside
pmd_none_or_trans_huge_or_clear_bad() after pmd_trans_huge() and get
away with a one liner, or add its own infrastructure with
pmd_devmap_unstable(). In the pmd_devmap case the problem isn't just
in __handle_mm_fault. If it could share the same infrastructure it'd
be ideal.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-23 18:49 ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED Andrea Arcangeli
@ 2016-02-23 21:18 ` Andrew Morton
0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2016-02-23 21:18 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: \"Kirill A. Shutemov\", linux-mm, stable
On Tue, 23 Feb 2016 19:49:10 +0100 Andrea Arcangeli <aarcange@redhat.com> wrote:
> pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
> introduced to locklessy (but atomically) detect when a pmd is a
> regular (stable) pmd or when the pmd is unstable and can infinitely
> transition from pmd_none() and pmd_trans_huge() from under us, while
> only holding the mmap_sem for reading (for writing not).
>
> While holding the mmap_sem only for reading, MADV_DONTNEED can run
> from under us and so before we can assume the pmd to be a regular
> stable pmd we need to compare it against pmd_none() and
> pmd_trans_huge() in an atomic way, with pmd_trans_unstable(). The old
> pmd_trans_huge() left a tiny window for a race.
>
> Useful applications are unlikely to notice the difference as doing
> MADV_DONTNEED concurrently with a page fault would lead to undefined
> behavior.
Thanks.
I put a cc:stable on this as it appears to be applicable to 4.4 and
perhaps earlier.
It generates a reject against 4.4 because of the recently-added
pmd_devmap() test. It's easily fixed but I don't have a process to
handle -stable rejects. This means that when Greg hits the reject
he'll ask us for a fixed up version.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: THP race?
2016-02-23 19:28 ` Andrea Arcangeli
@ 2016-02-25 18:45 ` Dan Williams
2016-02-26 10:37 ` Kirill A. Shutemov
0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2016-02-25 18:45 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Kirill A. Shutemov, linux-mm
On Tue, Feb 23, 2016 at 11:28 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Tue, Feb 23, 2016 at 09:38:32PM +0300, Kirill A. Shutemov wrote:
>> pmd_trans_unstable(pmd), otherwise looks good:
>
> Yes sorry.
>
>> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> Thanks for the quick ack, I just noticed or I would have added it to
> the resubmit, but it can be still added to -mm.
>
>> BTW, I guess DAX would need to introduce the same infrastructure for
>> pmd_devmap(). Dan?
>
> There is a i_mmap_lock_write in the truncate path that saves the day
> for the pmd zapping in the truncate() case without mmap_sem (the only
> case anon THP doesn't need to care about as truncate isn't possible in
> the anon case), but not in the MADV_DONTNEED madvise case that runs
> only with the mmap_sem for reading.
>
> The only objective of this "infrastructure" is to add no pmd_lock()ing
> overhead to the page fault, if the mapping is already established but
> not huge, and we've just to walk through the pmd to reach the
> pte. All because MADV_DONTNEED is running with the mmap_sem for
> reading unlike munmap and other slower syscalls that are forced to
> mangle the vmas and have to take the mmap_sem for writing regardless.
>
> The question for DAX is if it should do a pmd_devmap check inside
> pmd_none_or_trans_huge_or_clear_bad() after pmd_trans_huge() and get
> away with a one liner, or add its own infrastructure with
> pmd_devmap_unstable(). In the pmd_devmap case the problem isn't just
> in __handle_mm_fault. If it could share the same infrastructure it'd
> be ideal.
>
Yes, I see no reason why we can't/shoudn't move the pmd_devmap() check
inside pmd_none_or_trans_huge_or_clear_bad().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: THP race?
2016-02-25 18:45 ` Dan Williams
@ 2016-02-26 10:37 ` Kirill A. Shutemov
2016-02-26 14:46 ` Dan Williams
0 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2016-02-26 10:37 UTC (permalink / raw)
To: Dan Williams; +Cc: Andrea Arcangeli, linux-mm
On Thu, Feb 25, 2016 at 10:45:05AM -0800, Dan Williams wrote:
> On Tue, Feb 23, 2016 at 11:28 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > On Tue, Feb 23, 2016 at 09:38:32PM +0300, Kirill A. Shutemov wrote:
> >> pmd_trans_unstable(pmd), otherwise looks good:
> >
> > Yes sorry.
> >
> >> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >
> > Thanks for the quick ack, I just noticed or I would have added it to
> > the resubmit, but it can be still added to -mm.
> >
> >> BTW, I guess DAX would need to introduce the same infrastructure for
> >> pmd_devmap(). Dan?
> >
> > There is a i_mmap_lock_write in the truncate path that saves the day
> > for the pmd zapping in the truncate() case without mmap_sem (the only
> > case anon THP doesn't need to care about as truncate isn't possible in
> > the anon case), but not in the MADV_DONTNEED madvise case that runs
> > only with the mmap_sem for reading.
> >
> > The only objective of this "infrastructure" is to add no pmd_lock()ing
> > overhead to the page fault, if the mapping is already established but
> > not huge, and we've just to walk through the pmd to reach the
> > pte. All because MADV_DONTNEED is running with the mmap_sem for
> > reading unlike munmap and other slower syscalls that are forced to
> > mangle the vmas and have to take the mmap_sem for writing regardless.
> >
> > The question for DAX is if it should do a pmd_devmap check inside
> > pmd_none_or_trans_huge_or_clear_bad() after pmd_trans_huge() and get
> > away with a one liner, or add its own infrastructure with
> > pmd_devmap_unstable(). In the pmd_devmap case the problem isn't just
> > in __handle_mm_fault. If it could share the same infrastructure it'd
> > be ideal.
> >
>
> Yes, I see no reason why we can't/shoudn't move the pmd_devmap() check
> inside pmd_none_or_trans_huge_or_clear_bad().
Are you going take care about this?
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: THP race?
2016-02-26 10:37 ` Kirill A. Shutemov
@ 2016-02-26 14:46 ` Dan Williams
0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2016-02-26 14:46 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: Andrea Arcangeli, linux-mm
On Fri, Feb 26, 2016 at 2:37 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Thu, Feb 25, 2016 at 10:45:05AM -0800, Dan Williams wrote:
>> On Tue, Feb 23, 2016 at 11:28 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
>> > On Tue, Feb 23, 2016 at 09:38:32PM +0300, Kirill A. Shutemov wrote:
>> >> pmd_trans_unstable(pmd), otherwise looks good:
>> >
>> > Yes sorry.
>> >
>> >> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >
>> > Thanks for the quick ack, I just noticed or I would have added it to
>> > the resubmit, but it can be still added to -mm.
>> >
>> >> BTW, I guess DAX would need to introduce the same infrastructure for
>> >> pmd_devmap(). Dan?
>> >
>> > There is a i_mmap_lock_write in the truncate path that saves the day
>> > for the pmd zapping in the truncate() case without mmap_sem (the only
>> > case anon THP doesn't need to care about as truncate isn't possible in
>> > the anon case), but not in the MADV_DONTNEED madvise case that runs
>> > only with the mmap_sem for reading.
>> >
>> > The only objective of this "infrastructure" is to add no pmd_lock()ing
>> > overhead to the page fault, if the mapping is already established but
>> > not huge, and we've just to walk through the pmd to reach the
>> > pte. All because MADV_DONTNEED is running with the mmap_sem for
>> > reading unlike munmap and other slower syscalls that are forced to
>> > mangle the vmas and have to take the mmap_sem for writing regardless.
>> >
>> > The question for DAX is if it should do a pmd_devmap check inside
>> > pmd_none_or_trans_huge_or_clear_bad() after pmd_trans_huge() and get
>> > away with a one liner, or add its own infrastructure with
>> > pmd_devmap_unstable(). In the pmd_devmap case the problem isn't just
>> > in __handle_mm_fault. If it could share the same infrastructure it'd
>> > be ideal.
>> >
>>
>> Yes, I see no reason why we can't/shoudn't move the pmd_devmap() check
>> inside pmd_none_or_trans_huge_or_clear_bad().
>
> Are you going take care about this?
Sure, I thought there was time to still fold this in to Andrea's
patch? Otherwise yes, I'll send a patch.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-02-26 14:46 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-23 15:49 THP race? Kirill A. Shutemov
2016-02-23 18:06 ` Andrea Arcangeli
2016-02-23 18:18 ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault kbuild test robot
2016-02-23 18:21 ` kbuild test robot
2016-02-23 18:27 ` kbuild test robot
2016-02-23 18:38 ` THP race? Kirill A. Shutemov
2016-02-23 19:28 ` Andrea Arcangeli
2016-02-25 18:45 ` Dan Williams
2016-02-26 10:37 ` Kirill A. Shutemov
2016-02-26 14:46 ` Dan Williams
2016-02-23 18:49 ` [PATCH 0/1] " Andrea Arcangeli
2016-02-23 18:49 ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED Andrea Arcangeli
2016-02-23 21:18 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).