about the ptep_set_access_flags() for hardware AF/DBM

* about the ptep_set_access_flags() for hardware AF/DBM
@ 2019-10-27  9:56 FF
  2019-10-28 18:43 ` Catalin Marinas
  0 siblings, 1 reply; 5+ messages in thread
From: FF @ 2019-10-27  9:56 UTC (permalink / raw)
  To: linux-arm-kernel, catalin.marinas, julien.grall, will.deacon,
	mark.rutland, steve.capper

hi all：

i see a patch, commit id: 66dbd6e61a52 "arm64: Implement ptep_set_access_flags() for hardware AF/DBM"
in this patch, the author show a insteresting case of the racy of hardware AF/DBM.

Here is the scenario:
A more complex situation is possible when all CPUs support hardware
   AF/DBM:

   a) Initial state: shareable + writable vma and pte_none(pte)
   b) Read fault taken by two threads of the same process on different
      CPUs
   c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
      eventually reaches do_set_pte() which sets a writable + clean pte.
      CPU0 releases the mmap_sem
   d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
      pte entry it reads is present, writable and clean and it continues
      to pte_mkyoung()
   e) CPU1 calls ptep_set_access_flags()

   If between (d) and (e) the hardware (another CPU) updates the dirty
   state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
   marking the entry clean again.

my question is:
1. in step a, it say, the initial state vma is : sharable + writable + pte_none.
let suppose this is a anon mapping by mmap() API.

so the vma->vm_page_prot should be : VM_READ |  VM_WRITE | VM_SHARED
in vm_get_page_prot(), it will change to pte attribute，in linux kernel it has a protection_map[] array.
in that case, it should be __S011 (PAGE_SHARED). for PAGE_SHARED, the pte attribute will set PTE_WRITE，so PTE_DBM is set, 
but the PTE_RDONLY should be zero, right?

in step c, CPU0 trigger read fault and handle the page fault, it will call do_anonymous_page(), and using  system_zero_page.
i don't what is a clean pte?  but currently, the  PTE_RDONLY is zero, it means this pte is writable.

when the CPU2 write this memory, it will update the dirty state like clear PTE_RDONLY, but my questions, the PTE_RDONLY is still zero, in step a~d,
so why CPU1 will override RT_RDONLY bit and marking the entry clean again.

in that case, why it will trigger "racy dirty state clearing" in set_pte_at?
i see the pte_dirty() will check the sw_dirty and hw_dirty, so in this case, is it our sw has not set PTE_DIRTY bit? how hw dirty checking, the PTE_RDONLY is always zero.

#define pte_dirty(pte)		(pte_sw_dirty(pte) || pte_hw_dirty(pte))

would you like point out what i missing?

Best
Ben

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread