linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* THP race?
@ 2016-02-23 15:49 Kirill A. Shutemov
  2016-02-23 18:06 ` Andrea Arcangeli
  2016-02-23 18:49 ` [PATCH 0/1] " Andrea Arcangeli
  0 siblings, 2 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 15:49 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm

Hi Andrea,

I suspect there's race with THP in __handle_mm_fault(). It's pure
theoretical and race window is small, but..

Consider following scenario:

  - THP got allocated by other thread just before "pmd_none() &&
    __pte_alloc()" check, so pmd_none() is false and we don't
    allocate the page table.

  - But before pmd_trans_huge() check the page got unmap by
    MADV_DONTNEED in other thread.

  - At this point we will call pte_offset_map() for pmd which is
    pmd_none().

Nothing pleasant would happen after this...

Do you see anything what would prevent this scenario?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: THP race?
  2016-02-23 15:49 THP race? Kirill A. Shutemov
@ 2016-02-23 18:06 ` Andrea Arcangeli
  2016-02-23 18:18   ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault kbuild test robot
                     ` (3 more replies)
  2016-02-23 18:49 ` [PATCH 0/1] " Andrea Arcangeli
  1 sibling, 4 replies; 13+ messages in thread
From: Andrea Arcangeli @ 2016-02-23 18:06 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-mm

On Tue, Feb 23, 2016 at 06:49:50PM +0300, Kirill A. Shutemov wrote:
> Hi Andrea,
> 
> I suspect there's race with THP in __handle_mm_fault(). It's pure
> theoretical and race window is small, but..
> 
> Consider following scenario:
> 
>   - THP got allocated by other thread just before "pmd_none() &&
>     __pte_alloc()" check, so pmd_none() is false and we don't
>     allocate the page table.
> 
>   - But before pmd_trans_huge() check the page got unmap by
>     MADV_DONTNEED in other thread.
> 
>   - At this point we will call pte_offset_map() for pmd which is
>     pmd_none().
> 
> Nothing pleasant would happen after this...
> 
> Do you see anything what would prevent this scenario?

No so I think we need s/pmd_trans_huge/pmd_trans_unstable/ and use the
atomic read in C to sort this out lockless. The MADV_DONTNEED part
that isn't holding the mmap_sem for writing unfortunately wasn't
sorted out immediately, that was unexpected in
fact. pmd_trans_unstable() was introduced precisely to handle this
trouble caused by MADV_DONTNEED running with the mmap_sem only for
reading which causes infinite possible transactions back and forth
between none and transhuge while holding only the mmap_sem for
reading.

==

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault
  2016-02-23 18:06 ` Andrea Arcangeli
@ 2016-02-23 18:18   ` kbuild test robot
  2016-02-23 18:21   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2016-02-23 18:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: kbuild-all, Kirill A. Shutemov, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

Hi Andrea,

[auto build test ERROR on v4.5-rc5]
[also build test ERROR on next-20160223]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Andrea-Arcangeli/mm-thp-fix-SMP-race-condition-between-THP-page-fault/20160224-020835
config: i386-tinyconfig (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c: In function '__handle_mm_fault':
>> mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:166:42: note: in definition of macro 'unlikely'
    # define unlikely(x) __builtin_expect(!!(x), 0)
                                             ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^

vim +/pmd_trans_unstable +3419 mm/memory.c

  3413		 * a different thread of this mm, in turn leading to a false
  3414		 * negative pmd_trans_huge() retval. All we have to ensure is
  3415		 * that it is a regular pmd that we can walk with
  3416		 * pte_offset_map() and we can do that through an atomic read
  3417		 * in C, which is what pmd_trans_unstable() is provided for.
  3418		 */
> 3419		if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
  3420			return 0;
  3421		/*
  3422		 * A regular pmd is established and it can't morph into a huge pmd

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 6204 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault
  2016-02-23 18:06 ` Andrea Arcangeli
  2016-02-23 18:18   ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault kbuild test robot
@ 2016-02-23 18:21   ` kbuild test robot
  2016-02-23 18:27   ` kbuild test robot
  2016-02-23 18:38   ` THP race? Kirill A. Shutemov
  3 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2016-02-23 18:21 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: kbuild-all, Kirill A. Shutemov, linux-mm

[-- Attachment #1: Type: text/plain, Size: 15914 bytes --]

Hi Andrea,

[auto build test WARNING on v4.5-rc5]
[also build test WARNING on next-20160223]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Andrea-Arcangeli/mm-thp-fix-SMP-race-condition-between-THP-page-fault/20160224-020835
config: x86_64-randconfig-x011-201608 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c: In function '__handle_mm_fault':
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:147:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:147:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:147:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
>> include/linux/compiler.h:123:14: note: in expansion of macro 'likely_notrace'
       ______r = likely_notrace(x);   \
                 ^
   include/linux/compiler.h:137:58: note: in expansion of macro '__branch_check__'
    #  define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
                                                             ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:147:42: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                             ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:147:42: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                             ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:147:42: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                             ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
>> include/linux/compiler.h:123:14: note: in expansion of macro 'likely_notrace'
       ______r = likely_notrace(x);   \
                 ^
   include/linux/compiler.h:137:58: note: in expansion of macro '__branch_check__'
    #  define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
                                                             ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:158:16: note: in definition of macro '__trace_if'
      ______r = !!(cond);     \
                   ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:158:16: note: in definition of macro '__trace_if'
      ______r = !!(cond);     \
                   ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:158:16: note: in definition of macro '__trace_if'
      ______r = !!(cond);     \
                   ^
>> mm/memory.c:3419:2: note: in expansion of macro 'if'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
     ^
>> include/linux/compiler.h:123:14: note: in expansion of macro 'likely_notrace'
       ______r = likely_notrace(x);   \
                 ^
   include/linux/compiler.h:137:58: note: in expansion of macro '__branch_check__'
    #  define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
                                                             ^
   mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^

vim +/if +3419 mm/memory.c

  3403		 */
  3404		if (unlikely(pmd_none(*pmd)) &&
  3405		    unlikely(__pte_alloc(mm, vma, pmd, address)))
  3406			return VM_FAULT_OOM;
  3407		/*
  3408		 * If an huge pmd materialized from under us just retry later.
  3409		 * Use pmd_trans_unstable() instead of pmd_trans_huge() to
  3410		 * ensure the pmd didn't become pmd_trans_huge from under us
  3411		 * and then immediately back to pmd_none as result of
  3412		 * MADV_DONTNEED running immediately after a huge_pmd fault of
  3413		 * a different thread of this mm, in turn leading to a false
  3414		 * negative pmd_trans_huge() retval. All we have to ensure is
  3415		 * that it is a regular pmd that we can walk with
  3416		 * pte_offset_map() and we can do that through an atomic read
  3417		 * in C, which is what pmd_trans_unstable() is provided for.
  3418		 */
> 3419		if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
  3420			return 0;
  3421		/*
  3422		 * A regular pmd is established and it can't morph into a huge pmd
  3423		 * from under us anymore at this point because we hold the mmap_sem
  3424		 * read mode and khugepaged takes it in write mode. So now it's
  3425		 * safe to run pte_offset_map().
  3426		 */
  3427		pte = pte_offset_map(pmd, address);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 19381 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault
  2016-02-23 18:06 ` Andrea Arcangeli
  2016-02-23 18:18   ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault kbuild test robot
  2016-02-23 18:21   ` kbuild test robot
@ 2016-02-23 18:27   ` kbuild test robot
  2016-02-23 18:38   ` THP race? Kirill A. Shutemov
  3 siblings, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2016-02-23 18:27 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: kbuild-all, Kirill A. Shutemov, linux-mm

[-- Attachment #1: Type: text/plain, Size: 5996 bytes --]

Hi Andrea,

[auto build test WARNING on v4.5-rc5]
[also build test WARNING on next-20160223]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Andrea-Arcangeli/mm-thp-fix-SMP-race-condition-between-THP-page-fault/20160224-020835
config: x86_64-randconfig-x012-201608 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c: In function '__handle_mm_fault':
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:137:45: note: in definition of macro 'unlikely'
    #  define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
                                                ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:137:53: note: in definition of macro 'unlikely'
    #  define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
                                                        ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/smp.h:10,
                    from include/linux/kernel_stat.h:4,
                    from mm/memory.c:41:
   mm/memory.c:3419:34: error: incompatible type for argument 1 of 'pmd_trans_unstable'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
                                     ^
   include/linux/compiler.h:110:47: note: in definition of macro 'likely_notrace'
    #define likely_notrace(x) __builtin_expect(!!(x), 1)
                                                  ^
   include/linux/compiler.h:137:58: note: in expansion of macro '__branch_check__'
    #  define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
                                                             ^
>> mm/memory.c:3419:6: note: in expansion of macro 'unlikely'
     if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
         ^
   In file included from arch/x86/include/asm/pgtable.h:914:0,
                    from include/linux/mm.h:67,
                    from mm/memory.c:42:
   include/asm-generic/pgtable.h:731:19: note: expected 'pmd_t * {aka struct <anonymous> *}' but argument is of type 'pmd_t {aka struct <anonymous>}'
    static inline int pmd_trans_unstable(pmd_t *pmd)
                      ^

vim +/unlikely +3419 mm/memory.c

  3403		 */
  3404		if (unlikely(pmd_none(*pmd)) &&
  3405		    unlikely(__pte_alloc(mm, vma, pmd, address)))
  3406			return VM_FAULT_OOM;
  3407		/*
  3408		 * If an huge pmd materialized from under us just retry later.
  3409		 * Use pmd_trans_unstable() instead of pmd_trans_huge() to
  3410		 * ensure the pmd didn't become pmd_trans_huge from under us
  3411		 * and then immediately back to pmd_none as result of
  3412		 * MADV_DONTNEED running immediately after a huge_pmd fault of
  3413		 * a different thread of this mm, in turn leading to a false
  3414		 * negative pmd_trans_huge() retval. All we have to ensure is
  3415		 * that it is a regular pmd that we can walk with
  3416		 * pte_offset_map() and we can do that through an atomic read
  3417		 * in C, which is what pmd_trans_unstable() is provided for.
  3418		 */
> 3419		if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))
  3420			return 0;
  3421		/*
  3422		 * A regular pmd is established and it can't morph into a huge pmd
  3423		 * from under us anymore at this point because we hold the mmap_sem
  3424		 * read mode and khugepaged takes it in write mode. So now it's
  3425		 * safe to run pte_offset_map().
  3426		 */
  3427		pte = pte_offset_map(pmd, address);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 21966 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: THP race?
  2016-02-23 18:06 ` Andrea Arcangeli
                     ` (2 preceding siblings ...)
  2016-02-23 18:27   ` kbuild test robot
@ 2016-02-23 18:38   ` Kirill A. Shutemov
  2016-02-23 19:28     ` Andrea Arcangeli
  3 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2016-02-23 18:38 UTC (permalink / raw)
  To: Andrea Arcangeli, Dan Williams; +Cc: linux-mm

On Tue, Feb 23, 2016 at 07:06:09PM +0100, Andrea Arcangeli wrote:
> On Tue, Feb 23, 2016 at 06:49:50PM +0300, Kirill A. Shutemov wrote:
> > Hi Andrea,
> > 
> > I suspect there's race with THP in __handle_mm_fault(). It's pure
> > theoretical and race window is small, but..
> > 
> > Consider following scenario:
> > 
> >   - THP got allocated by other thread just before "pmd_none() &&
> >     __pte_alloc()" check, so pmd_none() is false and we don't
> >     allocate the page table.
> > 
> >   - But before pmd_trans_huge() check the page got unmap by
> >     MADV_DONTNEED in other thread.
> > 
> >   - At this point we will call pte_offset_map() for pmd which is
> >     pmd_none().
> > 
> > Nothing pleasant would happen after this...
> > 
> > Do you see anything what would prevent this scenario?
> 
> No so I think we need s/pmd_trans_huge/pmd_trans_unstable/ and use the
> atomic read in C to sort this out lockless. The MADV_DONTNEED part
> that isn't holding the mmap_sem for writing unfortunately wasn't
> sorted out immediately, that was unexpected in
> fact. pmd_trans_unstable() was introduced precisely to handle this
> trouble caused by MADV_DONTNEED running with the mmap_sem only for
> reading which causes infinite possible transactions back and forth
> between none and transhuge while holding only the mmap_sem for
> reading.
> 
> ==
> From eae4f251604299082dd824dc8acade71268c8d88 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Tue, 23 Feb 2016 18:56:55 +0100
> Subject: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault
>  and MADV_DONTNEED
> 
> pmd_trans_unstable/pmd_none_or_trans_huge_or_clear_bad were introduced
> to locklessy (but atomically) detect when a pmd is a regular (stable)
> pmd or when the pmd is unstable and can infinitely transition from
> pmd_none and pmd_trans_huge from under us, while only holding the
> mmap_sem for reading (for writing not).
> 
> While holding the mmap_sem only for reading, MADV_DONTNEED can run
> from under us and so before we can threat the pmd as regular we need
> to compare it against pmd_none and pmd_trans_huge in an atomic way,
> with pmd_trans_unstable(). The old pmd_trans_huge check is correct but
> it leaves a tiny window for a race.
> 
> Useful applications are unlikely to notice the difference as doing
> MADV_DONTNEED concurrently with a page fault would lead to undefined
> behavior.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/memory.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 635451a..d5912b0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3404,8 +3404,19 @@ static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	if (unlikely(pmd_none(*pmd)) &&
>  	    unlikely(__pte_alloc(mm, vma, pmd, address)))
>  		return VM_FAULT_OOM;
> -	/* if an huge pmd materialized from under us just retry later */
> -	if (unlikely(pmd_trans_huge(*pmd) || pmd_devmap(*pmd)))
> +	/*
> +	 * If an huge pmd materialized from under us just retry later.
> +	 * Use pmd_trans_unstable() instead of pmd_trans_huge() to
> +	 * ensure the pmd didn't become pmd_trans_huge from under us
> +	 * and then immediately back to pmd_none as result of
> +	 * MADV_DONTNEED running immediately after a huge_pmd fault of
> +	 * a different thread of this mm, in turn leading to a false
> +	 * negative pmd_trans_huge() retval. All we have to ensure is
> +	 * that it is a regular pmd that we can walk with
> +	 * pte_offset_map() and we can do that through an atomic read
> +	 * in C, which is what pmd_trans_unstable() is provided for.
> +	 */
> +	if (unlikely(pmd_trans_unstable(*pmd) || pmd_devmap(*pmd)))

pmd_trans_unstable(pmd), otherwise looks good:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

BTW, I guess DAX would need to introduce the same infrastructure for
pmd_devmap(). Dan?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/1] Re: THP race?
  2016-02-23 15:49 THP race? Kirill A. Shutemov
  2016-02-23 18:06 ` Andrea Arcangeli
@ 2016-02-23 18:49 ` Andrea Arcangeli
  2016-02-23 18:49   ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED Andrea Arcangeli
  1 sibling, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2016-02-23 18:49 UTC (permalink / raw)
  To: Andrew Morton, \"Kirill A. Shutemov\"; +Cc: linux-mm

Hello Kirill and Andrew,

Resending as a more proper submit with slightly improved commentary
and that builds and boots..

Andrea Arcangeli (1):
  mm: thp: fix SMP race condition between THP page fault and
    MADV_DONTNEED

 mm/memory.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
  2016-02-23 18:49 ` [PATCH 0/1] " Andrea Arcangeli
@ 2016-02-23 18:49   ` Andrea Arcangeli
  2016-02-23 21:18     ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2016-02-23 18:49 UTC (permalink / raw)
  To: Andrew Morton, \"Kirill A. Shutemov\"; +Cc: linux-mm

pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a
regular (stable) pmd or when the pmd is unstable and can infinitely
transition from pmd_none() and pmd_trans_huge() from under us, while
only holding the mmap_sem for reading (for writing not).

While holding the mmap_sem only for reading, MADV_DONTNEED can run
from under us and so before we can assume the pmd to be a regular
stable pmd we need to compare it against pmd_none() and
pmd_trans_huge() in an atomic way, with pmd_trans_unstable(). The old
pmd_trans_huge() left a tiny window for a race.

Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 635451a..50347ed 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3404,8 +3404,19 @@ static int __handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (unlikely(pmd_none(*pmd)) &&
 	    unlikely(__pte_alloc(mm, vma, pmd, address)))
 		return VM_FAULT_OOM;
-	/* if an huge pmd materialized from under us just retry later */
-	if (unlikely(pmd_trans_huge(*pmd) || pmd_devmap(*pmd)))
+	/*
+	 * If an huge pmd materialized from under us just retry later.
+	 * Use pmd_trans_unstable() instead of pmd_trans_huge() to
+	 * ensure the pmd didn't become pmd_trans_huge from under us
+	 * and then back to pmd_none, as result of MADV_DONTNEED
+	 * running immediately after a huge pmd fault of a different
+	 * thread of this mm, in turn leading to a misleading
+	 * pmd_trans_huge() retval. All we have to ensure is that it
+	 * is a regular pmd that we can walk with pte_offset_map() and
+	 * we can do that through an atomic read in C, which is what
+	 * pmd_trans_unstable() is provided for.
+	 */
+	if (unlikely(pmd_trans_unstable(pmd) || pmd_devmap(*pmd)))
 		return 0;
 	/*
 	 * A regular pmd is established and it can't morph into a huge pmd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: THP race?
  2016-02-23 18:38   ` THP race? Kirill A. Shutemov
@ 2016-02-23 19:28     ` Andrea Arcangeli
  2016-02-25 18:45       ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Andrea Arcangeli @ 2016-02-23 19:28 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Dan Williams, linux-mm

On Tue, Feb 23, 2016 at 09:38:32PM +0300, Kirill A. Shutemov wrote:
> pmd_trans_unstable(pmd), otherwise looks good:

Yes sorry.

> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Thanks for the quick ack, I just noticed or I would have added it to
the resubmit, but it can be still added to -mm.

> BTW, I guess DAX would need to introduce the same infrastructure for
> pmd_devmap(). Dan?

There is a i_mmap_lock_write in the truncate path that saves the day
for the pmd zapping in the truncate() case without mmap_sem (the only
case anon THP doesn't need to care about as truncate isn't possible in
the anon case), but not in the MADV_DONTNEED madvise case that runs
only with the mmap_sem for reading.

The only objective of this "infrastructure" is to add no pmd_lock()ing
overhead to the page fault, if the mapping is already established but
not huge, and we've just to walk through the pmd to reach the
pte. All because MADV_DONTNEED is running with the mmap_sem for
reading unlike munmap and other slower syscalls that are forced to
mangle the vmas and have to take the mmap_sem for writing regardless.

The question for DAX is if it should do a pmd_devmap check inside
pmd_none_or_trans_huge_or_clear_bad() after pmd_trans_huge() and get
away with a one liner, or add its own infrastructure with
pmd_devmap_unstable(). In the pmd_devmap case the problem isn't just
in __handle_mm_fault. If it could share the same infrastructure it'd
be ideal.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
  2016-02-23 18:49   ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED Andrea Arcangeli
@ 2016-02-23 21:18     ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2016-02-23 21:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: \"Kirill A. Shutemov\", linux-mm, stable

On Tue, 23 Feb 2016 19:49:10 +0100 Andrea Arcangeli <aarcange@redhat.com> wrote:

> pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
> introduced to locklessy (but atomically) detect when a pmd is a
> regular (stable) pmd or when the pmd is unstable and can infinitely
> transition from pmd_none() and pmd_trans_huge() from under us, while
> only holding the mmap_sem for reading (for writing not).
> 
> While holding the mmap_sem only for reading, MADV_DONTNEED can run
> from under us and so before we can assume the pmd to be a regular
> stable pmd we need to compare it against pmd_none() and
> pmd_trans_huge() in an atomic way, with pmd_trans_unstable(). The old
> pmd_trans_huge() left a tiny window for a race.
> 
> Useful applications are unlikely to notice the difference as doing
> MADV_DONTNEED concurrently with a page fault would lead to undefined
> behavior.

Thanks.

I put a cc:stable on this as it appears to be applicable to 4.4 and
perhaps earlier.

It generates a reject against 4.4 because of the recently-added
pmd_devmap() test.  It's easily fixed but I don't have a process to
handle -stable rejects.  This means that when Greg hits the reject
he'll ask us for a fixed up version.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: THP race?
  2016-02-23 19:28     ` Andrea Arcangeli
@ 2016-02-25 18:45       ` Dan Williams
  2016-02-26 10:37         ` Kirill A. Shutemov
  0 siblings, 1 reply; 13+ messages in thread
From: Dan Williams @ 2016-02-25 18:45 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Kirill A. Shutemov, linux-mm

On Tue, Feb 23, 2016 at 11:28 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Tue, Feb 23, 2016 at 09:38:32PM +0300, Kirill A. Shutemov wrote:
>> pmd_trans_unstable(pmd), otherwise looks good:
>
> Yes sorry.
>
>> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> Thanks for the quick ack, I just noticed or I would have added it to
> the resubmit, but it can be still added to -mm.
>
>> BTW, I guess DAX would need to introduce the same infrastructure for
>> pmd_devmap(). Dan?
>
> There is a i_mmap_lock_write in the truncate path that saves the day
> for the pmd zapping in the truncate() case without mmap_sem (the only
> case anon THP doesn't need to care about as truncate isn't possible in
> the anon case), but not in the MADV_DONTNEED madvise case that runs
> only with the mmap_sem for reading.
>
> The only objective of this "infrastructure" is to add no pmd_lock()ing
> overhead to the page fault, if the mapping is already established but
> not huge, and we've just to walk through the pmd to reach the
> pte. All because MADV_DONTNEED is running with the mmap_sem for
> reading unlike munmap and other slower syscalls that are forced to
> mangle the vmas and have to take the mmap_sem for writing regardless.
>
> The question for DAX is if it should do a pmd_devmap check inside
> pmd_none_or_trans_huge_or_clear_bad() after pmd_trans_huge() and get
> away with a one liner, or add its own infrastructure with
> pmd_devmap_unstable(). In the pmd_devmap case the problem isn't just
> in __handle_mm_fault. If it could share the same infrastructure it'd
> be ideal.
>

Yes, I see no reason why we can't/shoudn't move the pmd_devmap() check
inside pmd_none_or_trans_huge_or_clear_bad().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: THP race?
  2016-02-25 18:45       ` Dan Williams
@ 2016-02-26 10:37         ` Kirill A. Shutemov
  2016-02-26 14:46           ` Dan Williams
  0 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2016-02-26 10:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: Andrea Arcangeli, linux-mm

On Thu, Feb 25, 2016 at 10:45:05AM -0800, Dan Williams wrote:
> On Tue, Feb 23, 2016 at 11:28 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > On Tue, Feb 23, 2016 at 09:38:32PM +0300, Kirill A. Shutemov wrote:
> >> pmd_trans_unstable(pmd), otherwise looks good:
> >
> > Yes sorry.
> >
> >> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >
> > Thanks for the quick ack, I just noticed or I would have added it to
> > the resubmit, but it can be still added to -mm.
> >
> >> BTW, I guess DAX would need to introduce the same infrastructure for
> >> pmd_devmap(). Dan?
> >
> > There is a i_mmap_lock_write in the truncate path that saves the day
> > for the pmd zapping in the truncate() case without mmap_sem (the only
> > case anon THP doesn't need to care about as truncate isn't possible in
> > the anon case), but not in the MADV_DONTNEED madvise case that runs
> > only with the mmap_sem for reading.
> >
> > The only objective of this "infrastructure" is to add no pmd_lock()ing
> > overhead to the page fault, if the mapping is already established but
> > not huge, and we've just to walk through the pmd to reach the
> > pte. All because MADV_DONTNEED is running with the mmap_sem for
> > reading unlike munmap and other slower syscalls that are forced to
> > mangle the vmas and have to take the mmap_sem for writing regardless.
> >
> > The question for DAX is if it should do a pmd_devmap check inside
> > pmd_none_or_trans_huge_or_clear_bad() after pmd_trans_huge() and get
> > away with a one liner, or add its own infrastructure with
> > pmd_devmap_unstable(). In the pmd_devmap case the problem isn't just
> > in __handle_mm_fault. If it could share the same infrastructure it'd
> > be ideal.
> >
> 
> Yes, I see no reason why we can't/shoudn't move the pmd_devmap() check
> inside pmd_none_or_trans_huge_or_clear_bad().

Are you going take care about this?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: THP race?
  2016-02-26 10:37         ` Kirill A. Shutemov
@ 2016-02-26 14:46           ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2016-02-26 14:46 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Andrea Arcangeli, linux-mm

On Fri, Feb 26, 2016 at 2:37 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Thu, Feb 25, 2016 at 10:45:05AM -0800, Dan Williams wrote:
>> On Tue, Feb 23, 2016 at 11:28 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
>> > On Tue, Feb 23, 2016 at 09:38:32PM +0300, Kirill A. Shutemov wrote:
>> >> pmd_trans_unstable(pmd), otherwise looks good:
>> >
>> > Yes sorry.
>> >
>> >> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >
>> > Thanks for the quick ack, I just noticed or I would have added it to
>> > the resubmit, but it can be still added to -mm.
>> >
>> >> BTW, I guess DAX would need to introduce the same infrastructure for
>> >> pmd_devmap(). Dan?
>> >
>> > There is a i_mmap_lock_write in the truncate path that saves the day
>> > for the pmd zapping in the truncate() case without mmap_sem (the only
>> > case anon THP doesn't need to care about as truncate isn't possible in
>> > the anon case), but not in the MADV_DONTNEED madvise case that runs
>> > only with the mmap_sem for reading.
>> >
>> > The only objective of this "infrastructure" is to add no pmd_lock()ing
>> > overhead to the page fault, if the mapping is already established but
>> > not huge, and we've just to walk through the pmd to reach the
>> > pte. All because MADV_DONTNEED is running with the mmap_sem for
>> > reading unlike munmap and other slower syscalls that are forced to
>> > mangle the vmas and have to take the mmap_sem for writing regardless.
>> >
>> > The question for DAX is if it should do a pmd_devmap check inside
>> > pmd_none_or_trans_huge_or_clear_bad() after pmd_trans_huge() and get
>> > away with a one liner, or add its own infrastructure with
>> > pmd_devmap_unstable(). In the pmd_devmap case the problem isn't just
>> > in __handle_mm_fault. If it could share the same infrastructure it'd
>> > be ideal.
>> >
>>
>> Yes, I see no reason why we can't/shoudn't move the pmd_devmap() check
>> inside pmd_none_or_trans_huge_or_clear_bad().
>
> Are you going take care about this?

Sure, I thought there was time to still fold this in to Andrea's
patch?  Otherwise yes, I'll send a patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-02-26 14:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-23 15:49 THP race? Kirill A. Shutemov
2016-02-23 18:06 ` Andrea Arcangeli
2016-02-23 18:18   ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault kbuild test robot
2016-02-23 18:21   ` kbuild test robot
2016-02-23 18:27   ` kbuild test robot
2016-02-23 18:38   ` THP race? Kirill A. Shutemov
2016-02-23 19:28     ` Andrea Arcangeli
2016-02-25 18:45       ` Dan Williams
2016-02-26 10:37         ` Kirill A. Shutemov
2016-02-26 14:46           ` Dan Williams
2016-02-23 18:49 ` [PATCH 0/1] " Andrea Arcangeli
2016-02-23 18:49   ` [PATCH 1/1] mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED Andrea Arcangeli
2016-02-23 21:18     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).