linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* a racy access flag clearing warning when calling mmap system call
@ 2017-12-01  7:38 chenjiankang
  2017-12-01 13:18 ` Will Deacon
  0 siblings, 1 reply; 8+ messages in thread
From: chenjiankang @ 2017-12-01  7:38 UTC (permalink / raw)
  To: will.deacon, catalin.marinas, linux-kernel, xieyisheng1, wangkefeng.wang


Hi will
    
   I find a warning by a syzkaller test;

   When the mmap syscall is called to create a virtual memory,
firstly it delete a old huge page mapping area;   
   Before splitting the huge page, the pmd of a huge page is set up.
But The PTE_AF is zreo belonging to the current pmd of huge page.
   I suspect that when the child process is created, the PTE_AF is cleared in copy_one_pte();
So, the set_pte_at() can have a warning.
   
   There, whether the set_pte_at() should detect PTE_AF?

Thanks.

The log:

    set_pte_at: racy access flag clearing: 00e000009d200bd1 -> 01e000009d200bd1
------------[ cut here ]------------      
WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
Modules linked in:                         
CPU: 0 PID: 3665 Comm: syz-executor7 Not tainted 4.1.44+ #1
Hardware name: linux,dummy-virt (DT)       
task: ffffffc06a873fc0 ti: ffffffc05aefc000 task.ti: ffffffc05aefc000
PC is at pmdp_splitting_flush+0x194/0x1b0
LR is at pmdp_splitting_flush+0x194/0x1b0
pc : [<ffffffc0000a5244>] lr : [<ffffffc0000a5244>] pstate: 80000145
sp : ffffffc05aeff770                     
x29: ffffffc05aeff770 x28: ffffffc05ae45800 
x27: 0000000020200000 x26: ffffffc061fdf450 
x25: 0000000000020000 x24: 0000000000000001 
x23: ffffffc06333d9c8 x22: ffffffc0014ba000 
x21: 01e000009d200bd1 x20: 00e000009d200bd1 
x19: ffffffc05ae45800 x18: 0000000000000000 
x17: 00000000004b4000 x16: ffffffc00017fdc0 
x15: 00000000038ee280 x14: 3030653130203e2d 
x13: 2031646230303264 x12: 3930303030306530 
x11: 30203a676e697261 x10: 656c632067616c66 
x9 : 2073736563636120 x8 : 79636172203a7461 
x7 : ffffffc05aeff430 x6 : ffffffc00015f38c 
x5 : 0000000000000003 x4 : 0000000000000000 
x3 : 0000000000000003 x2 : 0000000000010000 
x1 : ffffff9005a03000 x0 : 000000000000004b 
Call trace:                                
[<ffffffc0000a5244>] pmdp_splitting_flush+0x194/0x1b0
[<ffffffc0002784c0>] split_huge_page_to_list+0x168/0xdb0
[<ffffffc00027a0d8>] __split_huge_page_pmd+0x1b0/0x510
[<ffffffc00027b5ac>] split_huge_page_pmd_mm+0x84/0x88
[<ffffffc00027b674>] split_huge_page_address+0xc4/0xe8
[<ffffffc00027b7f4>] __vma_adjust_trans_huge+0x15c/0x190
[<ffffffc000238d5c>] vma_adjust+0x884/0x9f0
[<ffffffc0002390c8>] __split_vma.isra.5+0x200/0x310
[<ffffffc00023b7b8>] do_munmap+0x5e0/0x608 
[<ffffffc00023c0d4>] mmap_region+0x12c/0x900
[<ffffffc00023cd2c>] do_mmap_pgoff+0x484/0x540
[<ffffffc000218118>] vm_mmap_pgoff+0x128/0x158
[<ffffffc000239920>] SyS_mmap_pgoff+0x188/0x300
[<ffffffc00008cce0>] sys_mmap+0x58/0x80   

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: a racy access flag clearing warning when calling mmap system call
  2017-12-01  7:38 a racy access flag clearing warning when calling mmap system call chenjiankang
@ 2017-12-01 13:18 ` Will Deacon
  2017-12-05 12:44   ` chenjiankang
  2017-12-07  1:46   ` Yisheng Xie
  0 siblings, 2 replies; 8+ messages in thread
From: Will Deacon @ 2017-12-01 13:18 UTC (permalink / raw)
  To: chenjiankang; +Cc: catalin.marinas, linux-kernel, xieyisheng1, wangkefeng.wang

On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
>    I find a warning by a syzkaller test;
> 
>    When the mmap syscall is called to create a virtual memory,
> firstly it delete a old huge page mapping area;   
>    Before splitting the huge page, the pmd of a huge page is set up.
> But The PTE_AF is zreo belonging to the current pmd of huge page.
>    I suspect that when the child process is created, the PTE_AF is cleared in copy_one_pte();
> So, the set_pte_at() can have a warning.
>    
>    There, whether the set_pte_at() should detect PTE_AF?
> 
> Thanks.
> 
> The log:
> 
>     set_pte_at: racy access flag clearing: 00e000009d200bd1 -> 01e000009d200bd1
> ------------[ cut here ]------------      
> WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211

Given that this is a fairly old 4.1 kernel, could you try to reproduce the
failure with something more recent, please? We've fixed many bugs since
then, some of them involving huge pages.

Thanks,

Will

> Modules linked in:                         
> CPU: 0 PID: 3665 Comm: syz-executor7 Not tainted 4.1.44+ #1
> Hardware name: linux,dummy-virt (DT)       
> task: ffffffc06a873fc0 ti: ffffffc05aefc000 task.ti: ffffffc05aefc000
> PC is at pmdp_splitting_flush+0x194/0x1b0
> LR is at pmdp_splitting_flush+0x194/0x1b0
> pc : [<ffffffc0000a5244>] lr : [<ffffffc0000a5244>] pstate: 80000145
> sp : ffffffc05aeff770                     
> x29: ffffffc05aeff770 x28: ffffffc05ae45800 
> x27: 0000000020200000 x26: ffffffc061fdf450 
> x25: 0000000000020000 x24: 0000000000000001 
> x23: ffffffc06333d9c8 x22: ffffffc0014ba000 
> x21: 01e000009d200bd1 x20: 00e000009d200bd1 
> x19: ffffffc05ae45800 x18: 0000000000000000 
> x17: 00000000004b4000 x16: ffffffc00017fdc0 
> x15: 00000000038ee280 x14: 3030653130203e2d 
> x13: 2031646230303264 x12: 3930303030306530 
> x11: 30203a676e697261 x10: 656c632067616c66 
> x9 : 2073736563636120 x8 : 79636172203a7461 
> x7 : ffffffc05aeff430 x6 : ffffffc00015f38c 
> x5 : 0000000000000003 x4 : 0000000000000000 
> x3 : 0000000000000003 x2 : 0000000000010000 
> x1 : ffffff9005a03000 x0 : 000000000000004b 
> Call trace:                                
> [<ffffffc0000a5244>] pmdp_splitting_flush+0x194/0x1b0
> [<ffffffc0002784c0>] split_huge_page_to_list+0x168/0xdb0
> [<ffffffc00027a0d8>] __split_huge_page_pmd+0x1b0/0x510
> [<ffffffc00027b5ac>] split_huge_page_pmd_mm+0x84/0x88
> [<ffffffc00027b674>] split_huge_page_address+0xc4/0xe8
> [<ffffffc00027b7f4>] __vma_adjust_trans_huge+0x15c/0x190
> [<ffffffc000238d5c>] vma_adjust+0x884/0x9f0
> [<ffffffc0002390c8>] __split_vma.isra.5+0x200/0x310
> [<ffffffc00023b7b8>] do_munmap+0x5e0/0x608 
> [<ffffffc00023c0d4>] mmap_region+0x12c/0x900
> [<ffffffc00023cd2c>] do_mmap_pgoff+0x484/0x540
> [<ffffffc000218118>] vm_mmap_pgoff+0x128/0x158
> [<ffffffc000239920>] SyS_mmap_pgoff+0x188/0x300
> [<ffffffc00008cce0>] sys_mmap+0x58/0x80   
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: a racy access flag clearing warning when calling mmap system call
  2017-12-01 13:18 ` Will Deacon
@ 2017-12-05 12:44   ` chenjiankang
  2017-12-07  1:46   ` Yisheng Xie
  1 sibling, 0 replies; 8+ messages in thread
From: chenjiankang @ 2017-12-05 12:44 UTC (permalink / raw)
  To: Will Deacon; +Cc: catalin.marinas, linux-kernel, xieyisheng1, wangkefeng.wang



在 2017/12/1 21:18, Will Deacon 写道:
> On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
>>    I find a warning by a syzkaller test;
>>
>>    When the mmap syscall is called to create a virtual memory,
>> firstly it delete a old huge page mapping area;   
>>    Before splitting the huge page, the pmd of a huge page is set up.
>> But The PTE_AF is zreo belonging to the current pmd of huge page.
>>    I suspect that when the child process is created, the PTE_AF is cleared in copy_one_pte();
>> So, the set_pte_at() can have a warning.
>>    
>>    There, whether the set_pte_at() should detect PTE_AF?
>>
>> Thanks
>> The log:
>>
>>     set_pte_at: racy access flag clearing: 00e000009d200bd1 -> 01e000009d200bd1
>> ------------[ cut here ]------------      
>> WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
> 
> Given that this is a fairly old 4.1 kernel, could you try to reproduce the
> failure with something more recent, please? We've fixed many bugs since
> then, some of them involving huge pages.
> 
> Thanks,
> 
> Will

 Hi Will:
   
    It is more difficult to reproduce the failure with the old 4.1 kernel.
Does the warning have an impact? Or this is just to lose a PTE_AF flag?

 Thanks 
  
 Jiankang 

> 
>> Modules linked in:                         
>> CPU: 0 PID: 3665 Comm: syz-executor7 Not tainted 4.1.44+ #1
>> Hardware name: linux,dummy-virt (DT)       
>> task: ffffffc06a873fc0 ti: ffffffc05aefc000 task.ti: ffffffc05aefc000
>> PC is at pmdp_splitting_flush+0x194/0x1b0
>> LR is at pmdp_splitting_flush+0x194/0x1b0
>> pc : [<ffffffc0000a5244>] lr : [<ffffffc0000a5244>] pstate: 80000145
>> sp : ffffffc05aeff770                     
>> x29: ffffffc05aeff770 x28: ffffffc05ae45800 
>> x27: 0000000020200000 x26: ffffffc061fdf450 
>> x25: 0000000000020000 x24: 0000000000000001 
>> x23: ffffffc06333d9c8 x22: ffffffc0014ba000 
>> x21: 01e000009d200bd1 x20: 00e000009d200bd1 
>> x19: ffffffc05ae45800 x18: 0000000000000000 
>> x17: 00000000004b4000 x16: ffffffc00017fdc0 
>> x15: 00000000038ee280 x14: 3030653130203e2d 
>> x13: 2031646230303264 x12: 3930303030306530 
>> x11: 30203a676e697261 x10: 656c632067616c66 
>> x9 : 2073736563636120 x8 : 79636172203a7461 
>> x7 : ffffffc05aeff430 x6 : ffffffc00015f38c 
>> x5 : 0000000000000003 x4 : 0000000000000000 
>> x3 : 0000000000000003 x2 : 0000000000010000 
>> x1 : ffffff9005a03000 x0 : 000000000000004b 
>> Call trace:                                
>> [<ffffffc0000a5244>] pmdp_splitting_flush+0x194/0x1b0
>> [<ffffffc0002784c0>] split_huge_page_to_list+0x168/0xdb0
>> [<ffffffc00027a0d8>] __split_huge_page_pmd+0x1b0/0x510
>> [<ffffffc00027b5ac>] split_huge_page_pmd_mm+0x84/0x88
>> [<ffffffc00027b674>] split_huge_page_address+0xc4/0xe8
>> [<ffffffc00027b7f4>] __vma_adjust_trans_huge+0x15c/0x190
>> [<ffffffc000238d5c>] vma_adjust+0x884/0x9f0
>> [<ffffffc0002390c8>] __split_vma.isra.5+0x200/0x310
>> [<ffffffc00023b7b8>] do_munmap+0x5e0/0x608 
>> [<ffffffc00023c0d4>] mmap_region+0x12c/0x900
>> [<ffffffc00023cd2c>] do_mmap_pgoff+0x484/0x540
>> [<ffffffc000218118>] vm_mmap_pgoff+0x128/0x158
>> [<ffffffc000239920>] SyS_mmap_pgoff+0x188/0x300
>> [<ffffffc00008cce0>] sys_mmap+0x58/0x80   
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: a racy access flag clearing warning when calling mmap system call
  2017-12-01 13:18 ` Will Deacon
  2017-12-05 12:44   ` chenjiankang
@ 2017-12-07  1:46   ` Yisheng Xie
  2017-12-07 13:23     ` Will Deacon
  1 sibling, 1 reply; 8+ messages in thread
From: Yisheng Xie @ 2017-12-07  1:46 UTC (permalink / raw)
  To: Will Deacon, chenjiankang; +Cc: catalin.marinas, linux-kernel, wangkefeng.wang

Hi Will,

On 2017/12/1 21:18, Will Deacon wrote:
> On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
>> ------------[ cut here ]------------      
>> WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
> 
> Given that this is a fairly old 4.1 kernel, could you try to reproduce the
> failure with something more recent, please? We've fixed many bugs since
> then, some of them involving huge pages.

Yeah, this is and old kernel, but I find a scene that will cause this warn_on:
When fork and dup_mmap, it will call copy_huge_pmd() and clear the Access Flag.
  dup_mmap
    -> copy_page_range
         -> copy_pud_range
              -> copy_pmd_range
                   -> copy_huge_pmd
                        -> pmd_mkold

If we do not have any access after dup_mmap, and start to split this thp,
it will cause this call trace in the old kernel, right?

It seems this is normal scene but will cause call trace for this old kernel,
therefore, for this old kernel, we should just remove this WARN_ON_ONCE, right?

Thanks
Yisheng Xie

> 
> Thanks,
> 
> Will
> 
>> Modules linked in:                         
>> CPU: 0 PID: 3665 Comm: syz-executor7 Not tainted 4.1.44+ #1
>> Hardware name: linux,dummy-virt (DT)       
>> task: ffffffc06a873fc0 ti: ffffffc05aefc000 task.ti: ffffffc05aefc000
>> PC is at pmdp_splitting_flush+0x194/0x1b0
>> LR is at pmdp_splitting_flush+0x194/0x1b0
>> pc : [<ffffffc0000a5244>] lr : [<ffffffc0000a5244>] pstate: 80000145
>> sp : ffffffc05aeff770                     
>> x29: ffffffc05aeff770 x28: ffffffc05ae45800 
>> x27: 0000000020200000 x26: ffffffc061fdf450 
>> x25: 0000000000020000 x24: 0000000000000001 
>> x23: ffffffc06333d9c8 x22: ffffffc0014ba000 
>> x21: 01e000009d200bd1 x20: 00e000009d200bd1 
>> x19: ffffffc05ae45800 x18: 0000000000000000 
>> x17: 00000000004b4000 x16: ffffffc00017fdc0 
>> x15: 00000000038ee280 x14: 3030653130203e2d 
>> x13: 2031646230303264 x12: 3930303030306530 
>> x11: 30203a676e697261 x10: 656c632067616c66 
>> x9 : 2073736563636120 x8 : 79636172203a7461 
>> x7 : ffffffc05aeff430 x6 : ffffffc00015f38c 
>> x5 : 0000000000000003 x4 : 0000000000000000 
>> x3 : 0000000000000003 x2 : 0000000000010000 
>> x1 : ffffff9005a03000 x0 : 000000000000004b 
>> Call trace:                                
>> [<ffffffc0000a5244>] pmdp_splitting_flush+0x194/0x1b0
>> [<ffffffc0002784c0>] split_huge_page_to_list+0x168/0xdb0
>> [<ffffffc00027a0d8>] __split_huge_page_pmd+0x1b0/0x510
>> [<ffffffc00027b5ac>] split_huge_page_pmd_mm+0x84/0x88
>> [<ffffffc00027b674>] split_huge_page_address+0xc4/0xe8
>> [<ffffffc00027b7f4>] __vma_adjust_trans_huge+0x15c/0x190
>> [<ffffffc000238d5c>] vma_adjust+0x884/0x9f0
>> [<ffffffc0002390c8>] __split_vma.isra.5+0x200/0x310
>> [<ffffffc00023b7b8>] do_munmap+0x5e0/0x608 
>> [<ffffffc00023c0d4>] mmap_region+0x12c/0x900
>> [<ffffffc00023cd2c>] do_mmap_pgoff+0x484/0x540
>> [<ffffffc000218118>] vm_mmap_pgoff+0x128/0x158
>> [<ffffffc000239920>] SyS_mmap_pgoff+0x188/0x300
>> [<ffffffc00008cce0>] sys_mmap+0x58/0x80   
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: a racy access flag clearing warning when calling mmap system call
  2017-12-07  1:46   ` Yisheng Xie
@ 2017-12-07 13:23     ` Will Deacon
  2017-12-08  3:19       ` chenjiankang
  0 siblings, 1 reply; 8+ messages in thread
From: Will Deacon @ 2017-12-07 13:23 UTC (permalink / raw)
  To: Yisheng Xie; +Cc: chenjiankang, catalin.marinas, linux-kernel, wangkefeng.wang

On Thu, Dec 07, 2017 at 09:46:59AM +0800, Yisheng Xie wrote:
> On 2017/12/1 21:18, Will Deacon wrote:
> > On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
> >> ------------[ cut here ]------------      
> >> WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
> > 
> > Given that this is a fairly old 4.1 kernel, could you try to reproduce the
> > failure with something more recent, please? We've fixed many bugs since
> > then, some of them involving huge pages.
> 
> Yeah, this is and old kernel, but I find a scene that will cause this warn_on:
> When fork and dup_mmap, it will call copy_huge_pmd() and clear the Access Flag.
>   dup_mmap
>     -> copy_page_range
>          -> copy_pud_range
>               -> copy_pmd_range
>                    -> copy_huge_pmd
>                         -> pmd_mkold
> 
> If we do not have any access after dup_mmap, and start to split this thp,
> it will cause this call trace in the old kernel, right?
> 
> It seems this is normal scene but will cause call trace for this old kernel,
> therefore, for this old kernel, we should just remove this WARN_ON_ONCE, right?

Whilst racy clearing of the access flag should be safe in practice, I like
having the warning around because it does indicate that we're setting
something to old which could immediately be made young again by the CPU.

In this case, it looks like the mm isn't even live, so a better approach
would probably be to predicate that conditional on mm == current->active_mm
or something like that. That also avoids us getting false positive for
the dirty bit case, which would be harmful if the table was installed.

diff below. It's still racy with concurrent fork, but I don't want this
check to become a generic "does my caller hold all the locks to protect
against a concurrent walk" predicate and it just means we won't catch all
possible races.

Will

--->8

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 149d05fb9421..8fe103b1e101 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -42,6 +42,8 @@
 #include <asm/cmpxchg.h>
 #include <asm/fixmap.h>
 #include <linux/mmdebug.h>
+#include <linux/mm_types.h>
+#include <linux/sched.h>
 
 extern void __pte_error(const char *file, int line, unsigned long val);
 extern void __pmd_error(const char *file, int line, unsigned long val);
@@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 	}
 }
 
-struct mm_struct;
-struct vm_area_struct;
-
 extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
 
 /*
@@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	 * hardware updates of the pte (ptep_set_access_flags safely changes
 	 * valid ptes without going through an invalid entry).
 	 */
-	if (pte_valid(*ptep) && pte_valid(pte)) {
+	if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
+	   (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
 		VM_WARN_ONCE(!pte_young(pte),
 			     "%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
 			     __func__, pte_val(*ptep), pte_val(pte));

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: a racy access flag clearing warning when calling mmap system call
  2017-12-07 13:23     ` Will Deacon
@ 2017-12-08  3:19       ` chenjiankang
  2017-12-08 11:01         ` Catalin Marinas
  0 siblings, 1 reply; 8+ messages in thread
From: chenjiankang @ 2017-12-08  3:19 UTC (permalink / raw)
  To: Will Deacon, Yisheng Xie, qiuxishi
  Cc: catalin.marinas, linux-kernel, wangkefeng.wang



在 2017/12/7 21:23, Will Deacon 写道:
> On Thu, Dec 07, 2017 at 09:46:59AM +0800, Yisheng Xie wrote:
>> On 2017/12/1 21:18, Will Deacon wrote:
>>> On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
>>>> ------------[ cut here ]------------      
>>>> WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
>>>
>>> Given that this is a fairly old 4.1 kernel, could you try to reproduce the
>>> failure with something more recent, please? We've fixed many bugs since
>>> then, some of them involving huge pages.
>>
>> Yeah, this is and old kernel, but I find a scene that will cause this warn_on:
>> When fork and dup_mmap, it will call copy_huge_pmd() and clear the Access Flag.
>>   dup_mmap
>>     -> copy_page_range
>>          -> copy_pud_range
>>               -> copy_pmd_range
>>                    -> copy_huge_pmd
>>                         -> pmd_mkold
>>
>> If we do not have any access after dup_mmap, and start to split this thp,
>> it will cause this call trace in the old kernel, right?
>>
>> It seems this is normal scene but will cause call trace for this old kernel,
>> therefore, for this old kernel, we should just remove this WARN_ON_ONCE, right?
> 
> Whilst racy clearing of the access flag should be safe in practice, I like
> having the warning around because it does indicate that we're setting
> something to old which could immediately be made young again by the CPU.
> 
> In this case, it looks like the mm isn't even live, so a better approach
> would probably be to predicate that conditional on mm == current->active_mm
> or something like that. That also avoids us getting false positive for
> the dirty bit case, which would be harmful if the table was installed.
> 
> diff below. It's still racy with concurrent fork, but I don't want this
> check to become a generic "does my caller hold all the locks to protect
> against a concurrent walk" predicate and it just means we won't catch all
> possible races.
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 149d05fb9421..8fe103b1e101 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -42,6 +42,8 @@
>  #include <asm/cmpxchg.h>
>  #include <asm/fixmap.h>
>  #include <linux/mmdebug.h>
> +#include <linux/mm_types.h>
> +#include <linux/sched.h>
>  
>  extern void __pte_error(const char *file, int line, unsigned long val);
>  extern void __pmd_error(const char *file, int line, unsigned long val);
> @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>  	}
>  }
>  
> -struct mm_struct;
> -struct vm_area_struct;
> -
>  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
>  
>  /*
> @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	 * hardware updates of the pte (ptep_set_access_flags safely changes
>  	 * valid ptes without going through an invalid entry).
>  	 */
> -	if (pte_valid(*ptep) && pte_valid(pte)) {
> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
> +	   (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
>  		VM_WARN_ONCE(!pte_young(pte),
>  			     "%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
>  			     __func__, pte_val(*ptep), pte_val(pte));
> 
> 
> .
> 

Hi will
  
    From the print information, the only difference between pte and ptep is the PTE_SPECIAL bit.
And the the PTE access bit is all zero.
    diff below. Whether the access bit of the new ptep should be judged to eliminate the 
false positive?

Thanks 

Jiankang

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2987d5a..3c1b0c6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -206,7 +206,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
         * valid ptes without going through an invalid entry).
         */
        if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
-               VM_WARN_ONCE(!pte_young(pte),
+               VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
                             "%s: racy access flag clearing: %016llx -> %016llx",
                             __func__, pte_val(*ptep), pte_val(pte));
                VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(pte),
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: a racy access flag clearing warning when calling mmap system call
  2017-12-08  3:19       ` chenjiankang
@ 2017-12-08 11:01         ` Catalin Marinas
  2017-12-12  3:13           ` chenjiankang
  0 siblings, 1 reply; 8+ messages in thread
From: Catalin Marinas @ 2017-12-08 11:01 UTC (permalink / raw)
  To: chenjiankang
  Cc: Will Deacon, Yisheng Xie, qiuxishi, linux-kernel, wangkefeng.wang

On Fri, Dec 08, 2017 at 11:19:52AM +0800, chenjiankang wrote:
> 在 2017/12/7 21:23, Will Deacon 写道:
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 149d05fb9421..8fe103b1e101 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -42,6 +42,8 @@
> >  #include <asm/cmpxchg.h>
> >  #include <asm/fixmap.h>
> >  #include <linux/mmdebug.h>
> > +#include <linux/mm_types.h>
> > +#include <linux/sched.h>
> >  
> >  extern void __pte_error(const char *file, int line, unsigned long val);
> >  extern void __pmd_error(const char *file, int line, unsigned long val);
> > @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
> >  	}
> >  }
> >  
> > -struct mm_struct;
> > -struct vm_area_struct;
> > -
> >  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
> >  
> >  /*
> > @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >  	 * hardware updates of the pte (ptep_set_access_flags safely changes
> >  	 * valid ptes without going through an invalid entry).
> >  	 */
> > -	if (pte_valid(*ptep) && pte_valid(pte)) {
> > +	if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
> > +	   (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
> >  		VM_WARN_ONCE(!pte_young(pte),
> >  			     "%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
> >  			     __func__, pte_val(*ptep), pte_val(pte));
[...]
>     From the print information, the only difference between pte and ptep is the PTE_SPECIAL bit.
> And the the PTE access bit is all zero.
>     diff below. Whether the access bit of the new ptep should be judged to eliminate the 
> false positive?
[...]
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 2987d5a..3c1b0c6 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -206,7 +206,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>          * valid ptes without going through an invalid entry).
>          */
>         if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
> -               VM_WARN_ONCE(!pte_young(pte),
> +               VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
>                              "%s: racy access flag clearing: %016llx -> %016llx",
>                              __func__, pte_val(*ptep), pte_val(pte));

It's actually the other way around: *ptep being "old" (AF = 0) could at
any point be made "young" by the hardware (AF = 1). This is racing with
the software update which keeps the AF bit 0.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: a racy access flag clearing warning when calling mmap system call
  2017-12-08 11:01         ` Catalin Marinas
@ 2017-12-12  3:13           ` chenjiankang
  0 siblings, 0 replies; 8+ messages in thread
From: chenjiankang @ 2017-12-12  3:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Yisheng Xie, qiuxishi, linux-kernel, wangkefeng.wang


> On Fri, Dec 08, 2017 at 11:19:52AM +0800, chenjiankang wrote:
>> 在 2017/12/7 21:23, Will Deacon 写道:
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index 149d05fb9421..8fe103b1e101 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -42,6 +42,8 @@
>>>  #include <asm/cmpxchg.h>
>>>  #include <asm/fixmap.h>
>>>  #include <linux/mmdebug.h>
>>> +#include <linux/mm_types.h>
>>> +#include <linux/sched.h>
>>>  
>>>  extern void __pte_error(const char *file, int line, unsigned long val);
>>>  extern void __pmd_error(const char *file, int line, unsigned long val);
>>> @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>>>  	}
>>>  }
>>>  
>>> -struct mm_struct;
>>> -struct vm_area_struct;
>>> -
>>>  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
>>>  
>>>  /*
>>> @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>>  	 * hardware updates of the pte (ptep_set_access_flags safely changes
>>>  	 * valid ptes without going through an invalid entry).
>>>  	 */
>>> -	if (pte_valid(*ptep) && pte_valid(pte)) {
>>> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
>>> +	   (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
>>>  		VM_WARN_ONCE(!pte_young(pte),
>>>  			     "%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
>>>  			     __func__, pte_val(*ptep), pte_val(pte));
> [...]

Hi Will:
     I contruct a simple use case that can reproduce this fail;

like this: 
   #define LEN 1024*1024*100
                       
int main(void){        
        int* addr = NULL; 
        int pid = -1; 
        addr = (int*)mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); 
        madvise(addr, LEN, MADV_HUGEPAGE);
        memset(addr, 1, LEN);
        pid = fork();                                                                                                                
                       
        if(pid==0){ 
                printf("wow! I am a child!\n");
        }              
        else {         
                printf("I am a father!\n");
                mmap(addr, 1024*1024*10, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); 
        }              
                       
        return 0;   
} 

And then, I use the will's modification,which can solve this problem;
Will, this patch should send a upstream?

>>     From the print information, the only difference between pte and ptep is the PTE_SPECIAL bit.
>> And the the PTE access bit is all zero.
>>     diff below. Whether the access bit of the new ptep should be judged to eliminate the 
>> false positive?
> [...]
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 2987d5a..3c1b0c6 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -206,7 +206,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>          * valid ptes without going through an invalid entry).
>>          */
>>         if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
>> -               VM_WARN_ONCE(!pte_young(pte),
>> +               VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
>>                              "%s: racy access flag clearing: %016llx -> %016llx",
>>                              __func__, pte_val(*ptep), pte_val(pte));
> 
> It's actually the other way around: *ptep being "old" (AF = 0) could at
> any point be made "young" by the hardware (AF = 1). This is racing with
> the software update which keeps the AF bit 0.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-12  3:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01  7:38 a racy access flag clearing warning when calling mmap system call chenjiankang
2017-12-01 13:18 ` Will Deacon
2017-12-05 12:44   ` chenjiankang
2017-12-07  1:46   ` Yisheng Xie
2017-12-07 13:23     ` Will Deacon
2017-12-08  3:19       ` chenjiankang
2017-12-08 11:01         ` Catalin Marinas
2017-12-12  3:13           ` chenjiankang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).