* about the ptep_set_access_flags() for hardware AF/DBM
@ 2019-10-27 9:56 FF
2019-10-28 18:43 ` Catalin Marinas
0 siblings, 1 reply; 5+ messages in thread
From: FF @ 2019-10-27 9:56 UTC (permalink / raw)
To: linux-arm-kernel, catalin.marinas, julien.grall, will.deacon,
mark.rutland, steve.capper
hi all:
i see a patch, commit id: 66dbd6e61a52 "arm64: Implement ptep_set_access_flags() for hardware AF/DBM"
in this patch, the author show a insteresting case of the racy of hardware AF/DBM.
Here is the scenario:
A more complex situation is possible when all CPUs support hardware
AF/DBM:
a) Initial state: shareable + writable vma and pte_none(pte)
b) Read fault taken by two threads of the same process on different
CPUs
c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
eventually reaches do_set_pte() which sets a writable + clean pte.
CPU0 releases the mmap_sem
d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
pte entry it reads is present, writable and clean and it continues
to pte_mkyoung()
e) CPU1 calls ptep_set_access_flags()
If between (d) and (e) the hardware (another CPU) updates the dirty
state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
marking the entry clean again.
my question is:
1. in step a, it say, the initial state vma is : sharable + writable + pte_none.
let suppose this is a anon mapping by mmap() API.
so the vma->vm_page_prot should be : VM_READ | VM_WRITE | VM_SHARED
in vm_get_page_prot(), it will change to pte attribute,in linux kernel it has a protection_map[] array.
in that case, it should be __S011 (PAGE_SHARED). for PAGE_SHARED, the pte attribute will set PTE_WRITE,so PTE_DBM is set,
but the PTE_RDONLY should be zero, right?
in step c, CPU0 trigger read fault and handle the page fault, it will call do_anonymous_page(), and using system_zero_page.
i don't what is a clean pte? but currently, the PTE_RDONLY is zero, it means this pte is writable.
when the CPU2 write this memory, it will update the dirty state like clear PTE_RDONLY, but my questions, the PTE_RDONLY is still zero, in step a~d,
so why CPU1 will override RT_RDONLY bit and marking the entry clean again.
in that case, why it will trigger "racy dirty state clearing" in set_pte_at?
i see the pte_dirty() will check the sw_dirty and hw_dirty, so in this case, is it our sw has not set PTE_DIRTY bit? how hw dirty checking, the PTE_RDONLY is always zero.
#define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte))
would you like point out what i missing?
Best
Ben
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: about the ptep_set_access_flags() for hardware AF/DBM
2019-10-27 9:56 about the ptep_set_access_flags() for hardware AF/DBM FF
@ 2019-10-28 18:43 ` Catalin Marinas
2019-10-29 0:54 ` FF
0 siblings, 1 reply; 5+ messages in thread
From: Catalin Marinas @ 2019-10-28 18:43 UTC (permalink / raw)
To: FF
Cc: mark.rutland, julien.grall, will.deacon, linux-arm-kernel, steve.capper
On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote:
> i see a patch, commit id: 66dbd6e61a52 "arm64: Implement ptep_set_access_flags() for hardware AF/DBM"
> in this patch, the author show a insteresting case of the racy of hardware AF/DBM.
>
> Here is the scenario:
> A more complex situation is possible when all CPUs support hardware
> AF/DBM:
>
> a) Initial state: shareable + writable vma and pte_none(pte)
> b) Read fault taken by two threads of the same process on different
> CPUs
> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
> eventually reaches do_set_pte() which sets a writable + clean pte.
> CPU0 releases the mmap_sem
> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
> pte entry it reads is present, writable and clean and it continues
> to pte_mkyoung()
> e) CPU1 calls ptep_set_access_flags()
>
> If between (d) and (e) the hardware (another CPU) updates the dirty
> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
> marking the entry clean again.
>
> my question is:
> 1. in step a, it say, the initial state vma is : sharable + writable +
> pte_none. let suppose this is a anon mapping by mmap() API.
What I had in mind at the time was a file mapping rather than anonymous
one (vma_is_anonymous() is false for shared mappings).
> so the vma->vm_page_prot should be : VM_READ | VM_WRITE | VM_SHARED
> in vm_get_page_prot(), it will change to pte attribute,in linux
> kernel it has a protection_map[] array. in that case, it should be
> __S011 (PAGE_SHARED). for PAGE_SHARED, the pte attribute will set
> PTE_WRITE,so PTE_DBM is set, but the PTE_RDONLY should be zero,
> right?
PAGE_SHARED is indeed writable but how it ends up in the pte depends on
the mapping. For a shared memory mapping, I think you do get a writable
entry on a read fault.
For file mappings, the writable attribute is cleared from vm_page_prot
via the vma_set_page_prot() function because vma_wants_writenotify() is
true. Filesystem normally want to track which pages have been dirtied to
write back.
> in step c, CPU0 trigger read fault and handle the page fault, it will
> call do_anonymous_page(), and using system_zero_page. i don't what is
> a clean pte? but currently, the PTE_RDONLY is zero, it means this
> pte is writable.
Note that we can't invoke do_anonymous_page() for VM_SHARED mappings.
This is only for private mappings. If you look at mmap_region(), the vma
is not set up as anonymous if MAP_SHARED is given but as a shmem.
> when the CPU2 write this memory, it will update the dirty state like
> clear PTE_RDONLY, but my questions, the PTE_RDONLY is still zero, in
> step a~d, so why CPU1 will override RT_RDONLY bit and marking the
> entry clean again.
As I said above, this scenario is for shared file mappings where you do
get a PTE_RDONLY set for clean mappings.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re:Re: about the ptep_set_access_flags() for hardware AF/DBM
2019-10-28 18:43 ` Catalin Marinas
@ 2019-10-29 0:54 ` FF
2019-10-29 12:11 ` Catalin Marinas
0 siblings, 1 reply; 5+ messages in thread
From: FF @ 2019-10-29 0:54 UTC (permalink / raw)
To: Catalin Marinas, runninglinuxkernel
Cc: mark.rutland, steve.capper, will.deacon, runninglinuxkernel,
julien.grall, linux-arm-kernel
At 2019-10-29 02:43:03, "Catalin Marinas" <catalin.marinas@arm.com> wrote:
>On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote:
>> i see a patch, commit id: 66dbd6e61a52 "arm64: Implement ptep_set_access_flags() for hardware AF/DBM"
>> in this patch, the author show a insteresting case of the racy of hardware AF/DBM.
>>
>> Here is the scenario:
>> A more complex situation is possible when all CPUs support hardware
>> AF/DBM:
>>
>> a) Initial state: shareable + writable vma and pte_none(pte)
>> b) Read fault taken by two threads of the same process on different
>> CPUs
>> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
>> eventually reaches do_set_pte() which sets a writable + clean pte.
>> CPU0 releases the mmap_sem
>> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
>> pte entry it reads is present, writable and clean and it continues
>> to pte_mkyoung()
>> e) CPU1 calls ptep_set_access_flags()
>>
>> If between (d) and (e) the hardware (another CPU) updates the dirty
>> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
>> marking the entry clean again.
>>
>> my question is:
>> 1. in step a, it say, the initial state vma is : sharable + writable +
>> pte_none. let suppose this is a anon mapping by mmap() API.
>
>What I had in mind at the time was a file mapping rather than anonymous
>one (vma_is_anonymous() is false for shared mappings).
>
>> so the vma->vm_page_prot should be : VM_READ | VM_WRITE | VM_SHARED
>> in vm_get_page_prot(), it will change to pte attribute,in linux
>> kernel it has a protection_map[] array. in that case, it should be
>> __S011 (PAGE_SHARED). for PAGE_SHARED, the pte attribute will set
>> PTE_WRITE,so PTE_DBM is set, but the PTE_RDONLY should be zero,
>> right?
>
>PAGE_SHARED is indeed writable but how it ends up in the pte depends on
>the mapping. For a shared memory mapping, I think you do get a writable
>entry on a read fault.
>
>For file mappings, the writable attribute is cleared from vm_page_prot
>via the vma_set_page_prot() function because vma_wants_writenotify() is
>true. Filesystem normally want to track which pages have been dirtied to
>write back.
>
>> in step c, CPU0 trigger read fault and handle the page fault, it will
>> call do_anonymous_page(), and using system_zero_page. i don't what is
>> a clean pte? but currently, the PTE_RDONLY is zero, it means this
>> pte is writable.
>
>Note that we can't invoke do_anonymous_page() for VM_SHARED mappings.
>This is only for private mappings. If you look at mmap_region(), the vma
>is not set up as anonymous if MAP_SHARED is given but as a shmem.
>
>> when the CPU2 write this memory, it will update the dirty state like
>> clear PTE_RDONLY, but my questions, the PTE_RDONLY is still zero, in
>> step a~d, so why CPU1 will override RT_RDONLY bit and marking the
>> entry clean again.
>
>As I said above, this scenario is for shared file mappings where you do
>get a PTE_RDONLY set for clean mappings.
>
>--
>Catalin
hi Catalin:
Thanks for your point out.
i want to elaborate the scenario, i saw the first patch to fix the ptep_set_access_flags() for hardware AF/DBM is on Linux 4.7-rc1.
commit id "66dbd6e6" ,arm64: Implement ptep_set_access_flags() for hardware AF/DBM
i think you have issue on Linux 4.6, let's assume that we are look at Linux 4.6 source code.
1. initial phase: we want to create a sharable+writable file mapping by mmap() API, the filesyste is:ext4
in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED.
in mmap_region()->vma_set_page_prot(), it will let the some shared mappigns will want the pages marked read-only to track write events,
so it will clear the VM_SHARED. so it will get the pte attribute from protection_map[] is __P011.
In Linux 4.6, __P011 is PAGE_COPY:
#define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN)
for PAGE_COPY, the PTE_RDONLY and PTE_WRITE(DMB) are zero.
so the vm_flags is: VM_READ | VM_WRITE
2. Thread 1 on CPU0 want to write this page, page_fault will be trigger.
in handle_pte_fault->do_fault->do_shared_fault(), it will allocate a new page cache, and in do_set_pte(),
it will call: "maybe_mkwrite(pte_mkdirty(entry), vma)" to set the pte entry.
so the pte attribute should be: PTE_DIRTY | PTE_WRITE.
3. Thread 2 on CPU1 also want to read this page but this pte has not create by Thread 1, so page_fault happen.
in pte_offset_map(), it found that the pte is created by Thread 1, so it will directly call:
entry = pte_mkyoung(entry);
ptep_set_access_flags()
in ptep_set_access_flags, it will call set_pte_at() to set pte.
but in set_pte_at() function:
if (pte_present(pte)) {
if (pte_sw_dirty(pte) && pte_write(pte))
pte_val(pte) &= ~PTE_RDONLY;
else
pte_val(pte) |= PTE_RDONLY;
if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))
__sync_icache_dcache(pte, addr);
}
it will clean the PTE_RDONLY bit, because the PTE_DIRTY | PTE_WRITE is set in our scenario.
otherwise, anyone clean the PTE_DIRTY bit, who will clean this PTE_DIRTY bit?
so i am very confusing the patch "arm64: Implement ptep_set_access_flags() for hardware AF/DBM" commit log's scenrio.
would you like point out what i am missing?
Best
Ben
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: about the ptep_set_access_flags() for hardware AF/DBM
2019-10-29 0:54 ` FF
@ 2019-10-29 12:11 ` Catalin Marinas
2019-10-29 14:04 ` FF
0 siblings, 1 reply; 5+ messages in thread
From: Catalin Marinas @ 2019-10-29 12:11 UTC (permalink / raw)
To: FF
Cc: mark.rutland, steve.capper, runninglinuxkernel, will.deacon,
julien.grall, linux-arm-kernel
Hi Ben,
On Tue, Oct 29, 2019 at 08:54:38AM +0800, FF wrote:
> >On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote:
> >> Here is the scenario:
> >> A more complex situation is possible when all CPUs support hardware
> >> AF/DBM:
> >>
> >> a) Initial state: shareable + writable vma and pte_none(pte)
> >> b) Read fault taken by two threads of the same process on different
> >> CPUs
> >> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
> >> eventually reaches do_set_pte() which sets a writable + clean pte.
> >> CPU0 releases the mmap_sem
> >> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
> >> pte entry it reads is present, writable and clean and it continues
> >> to pte_mkyoung()
> >> e) CPU1 calls ptep_set_access_flags()
> >>
> >> If between (d) and (e) the hardware (another CPU) updates the dirty
> >> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
> >> marking the entry clean again.
[...]
> i want to elaborate the scenario, i saw the first patch to fix the
> ptep_set_access_flags() for hardware AF/DBM is on Linux 4.7-rc1.
> commit id "66dbd6e6" ,arm64: Implement ptep_set_access_flags() for
> hardware AF/DBM
What are you trying to solve? ptep_set_access_flags() being atomic is
not any worse. Do you think we wouldn't need this patch?
> i think you have issue on Linux 4.6, let's assume that we are look at
> Linux 4.6 source code.
>
> 1. initial phase: we want to create a sharable+writable file mapping
> by mmap() API, the filesyste is:ext4
See more below but I think we may need shm instead of a file mapping to
trigger the race (which, BTW, is rather theoretical; I haven't seen it
in practice).
> in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED.
> in mmap_region()->vma_set_page_prot(), it will let the some shared
> mappigns will want the pages marked read-only to track write
> events, so it will clear the VM_SHARED. so it will get the pte
> attribute from protection_map[] is __P011.
>
> In Linux 4.6, __P011 is PAGE_COPY:
> #define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN)
>
> for PAGE_COPY, the PTE_RDONLY and PTE_WRITE(DMB) are zero.
> so the vm_flags is: VM_READ | VM_WRITE
While you are right that PAGE_COPY has PTE_RDONLY and PTE_WRITE zero,
set_pte_at() in 4.6 sets PTE_RDONLY if !PTE_WRITE. So the resulting
mapping in the page table is read-only.
Anyway I think with vma_wants_writenotify() we can't trigger this race
since it's a purely read-only fault (requires kernel notification). What
we need is to end up with a writable+clean entry which means VM_SHARED
set leading to PAGE_SHARED attributes which have PTE_WRITE/DBM set. Note
that set_pte_at() in 4.6 would mark the page as PTE_RDONLY since
pte_sw_dirty() is false.
> 2. Thread 1 on CPU0 want to write this page, page_fault will be trigger.
> in handle_pte_fault->do_fault->do_shared_fault(), it will allocate
> a new page cache, and in do_set_pte(), it will call:
> "maybe_mkwrite(pte_mkdirty(entry), vma)" to set the pte entry. so
> the pte attribute should be: PTE_DIRTY | PTE_WRITE.
Yes but the scenario I had in mind was a read fault here rather than
write which would set a PAGE_SHARED attributes ending up with
PTE_WRITE|PTE_RDONLY (PTE_WRITE is the PTE_DBM bit).
> 3. Thread 2 on CPU1 also want to read this page but this pte has not
> create by Thread 1, so page_fault happen. in pte_offset_map(), it
> found that the pte is created by Thread 1, so it will directly
> call:
>
> entry = pte_mkyoung(entry);
> ptep_set_access_flags()
>
> in ptep_set_access_flags, it will call set_pte_at() to set pte.
> but in set_pte_at() function:
>
> if (pte_present(pte)) {
> if (pte_sw_dirty(pte) && pte_write(pte))
> pte_val(pte) &= ~PTE_RDONLY;
> else
> pte_val(pte) |= PTE_RDONLY;
> if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))
> __sync_icache_dcache(pte, addr);
> }
>
> it will clean the PTE_RDONLY bit, because the PTE_DIRTY |
> PTE_WRITE is set in our scenario. otherwise, anyone clean the
> PTE_DIRTY bit, who will clean this PTE_DIRTY bit?
Correct for your scenario but not if point 2 is a read.
> so i am very confusing the patch "arm64: Implement
> ptep_set_access_flags() for hardware AF/DBM" commit log's scenrio.
> would you like point out what i am missing?
If point 2 is a read fault, that goes via do_read_fault() and the pte
ends up as clean with PTE_WRITE|PTE_RDONLY is set since it's not
pte_sw_dirty() (checked by set_pte_at()).
Thread 2 on CPU1 would end up calling ptep_set_access_flags() on a
read-only pte with DBM set because it took a read fault (same as Thread
1).
The problem appears if a Thread 3 on CPU2 performs a write access in
parallel with point 3 above. CPU2 sees the pte as valid, RDONLY and DBM
set, and proceeds to clearing the RDONLY bit in hardware. CPU1 then
overrides the PTE_RDONLY bit if ptep_set_access_flags() is not atomic.
Now you need to find a vm_operations_struct that allows shared, writable
and clean mappings and does not set .page_mkwrite (shm_vm_ops is one).
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re:Re: Re: about the ptep_set_access_flags() for hardware AF/DBM
2019-10-29 12:11 ` Catalin Marinas
@ 2019-10-29 14:04 ` FF
0 siblings, 0 replies; 5+ messages in thread
From: FF @ 2019-10-29 14:04 UTC (permalink / raw)
To: Catalin Marinas
Cc: mark.rutland, steve.capper, runninglinuxkernel, will.deacon,
julien.grall, linux-arm-kernel
At 2019-10-29 20:11:54, "Catalin Marinas" <catalin.marinas@arm.com> wrote:
>Hi Ben,
>
>On Tue, Oct 29, 2019 at 08:54:38AM +0800, FF wrote:
>> >On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote:
>> >> Here is the scenario:
>> >> A more complex situation is possible when all CPUs support hardware
>> >> AF/DBM:
>> >>
>> >> a) Initial state: shareable + writable vma and pte_none(pte)
>> >> b) Read fault taken by two threads of the same process on different
>> >> CPUs
>> >> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
>> >> eventually reaches do_set_pte() which sets a writable + clean pte.
>> >> CPU0 releases the mmap_sem
>> >> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
>> >> pte entry it reads is present, writable and clean and it continues
>> >> to pte_mkyoung()
>> >> e) CPU1 calls ptep_set_access_flags()
>> >>
>> >> If between (d) and (e) the hardware (another CPU) updates the dirty
>> >> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
>> >> marking the entry clean again.
>[...]
>> i want to elaborate the scenario, i saw the first patch to fix the
>> ptep_set_access_flags() for hardware AF/DBM is on Linux 4.7-rc1.
>> commit id "66dbd6e6" ,arm64: Implement ptep_set_access_flags() for
>> hardware AF/DBM
>
>What are you trying to solve? ptep_set_access_flags() being atomic is
>not any worse. Do you think we wouldn't need this patch?
>
>> i think you have issue on Linux 4.6, let's assume that we are look at
>> Linux 4.6 source code.
>>
>> 1. initial phase: we want to create a sharable+writable file mapping
>> by mmap() API, the filesyste is:ext4
>
>See more below but I think we may need shm instead of a file mapping to
>trigger the race (which, BTW, is rather theoretical; I haven't seen it
>in practice).
>
>> in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED.
>> in mmap_region()->vma_set_page_prot(), it will let the some shared
>> mappigns will want the pages marked read-only to track write
>> events, so it will clear the VM_SHARED. so it will get the pte
>> attribute from protection_map[] is __P011.
>>
>> In Linux 4.6, __P011 is PAGE_COPY:
>> #define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN)
>>
>> for PAGE_COPY, the PTE_RDONLY and PTE_WRITE(DMB) are zero.
>> so the vm_flags is: VM_READ | VM_WRITE
>
>While you are right that PAGE_COPY has PTE_RDONLY and PTE_WRITE zero,
>set_pte_at() in 4.6 sets PTE_RDONLY if !PTE_WRITE. So the resulting
>mapping in the page table is read-only.
>
>Anyway I think with vma_wants_writenotify() we can't trigger this race
>since it's a purely read-only fault (requires kernel notification). What
>we need is to end up with a writable+clean entry which means VM_SHARED
>set leading to PAGE_SHARED attributes which have PTE_WRITE/DBM set. Note
>that set_pte_at() in 4.6 would mark the page as PTE_RDONLY since
>pte_sw_dirty() is false.
>
>> 2. Thread 1 on CPU0 want to write this page, page_fault will be trigger.
>> in handle_pte_fault->do_fault->do_shared_fault(), it will allocate
>> a new page cache, and in do_set_pte(), it will call:
>> "maybe_mkwrite(pte_mkdirty(entry), vma)" to set the pte entry. so
>> the pte attribute should be: PTE_DIRTY | PTE_WRITE.
>
>Yes but the scenario I had in mind was a read fault here rather than
>write which would set a PAGE_SHARED attributes ending up with
>PTE_WRITE|PTE_RDONLY (PTE_WRITE is the PTE_DBM bit).
>
>> 3. Thread 2 on CPU1 also want to read this page but this pte has not
>> create by Thread 1, so page_fault happen. in pte_offset_map(), it
>> found that the pte is created by Thread 1, so it will directly
>> call:
>>
>> entry = pte_mkyoung(entry);
>> ptep_set_access_flags()
>>
>> in ptep_set_access_flags, it will call set_pte_at() to set pte.
>> but in set_pte_at() function:
>>
>> if (pte_present(pte)) {
>> if (pte_sw_dirty(pte) && pte_write(pte))
>> pte_val(pte) &= ~PTE_RDONLY;
>> else
>> pte_val(pte) |= PTE_RDONLY;
>> if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))
>> __sync_icache_dcache(pte, addr);
>> }
>>
>> it will clean the PTE_RDONLY bit, because the PTE_DIRTY |
>> PTE_WRITE is set in our scenario. otherwise, anyone clean the
>> PTE_DIRTY bit, who will clean this PTE_DIRTY bit?
>
>Correct for your scenario but not if point 2 is a read.
>
>> so i am very confusing the patch "arm64: Implement
>> ptep_set_access_flags() for hardware AF/DBM" commit log's scenrio.
>> would you like point out what i am missing?
>
>If point 2 is a read fault, that goes via do_read_fault() and the pte
>ends up as clean with PTE_WRITE|PTE_RDONLY is set since it's not
>pte_sw_dirty() (checked by set_pte_at()).
>
>Thread 2 on CPU1 would end up calling ptep_set_access_flags() on a
>read-only pte with DBM set because it took a read fault (same as Thread
>1).
>
>The problem appears if a Thread 3 on CPU2 performs a write access in
>parallel with point 3 above. CPU2 sees the pte as valid, RDONLY and DBM
>set, and proceeds to clearing the RDONLY bit in hardware. CPU1 then
>overrides the PTE_RDONLY bit if ptep_set_access_flags() is not atomic.
>
>Now you need to find a vm_operations_struct that allows shared, writable
>and clean mappings and does not set .page_mkwrite (shm_vm_ops is one).
>
>--
>Catalin
hi Catalin:
Cool! Thanks for your point out. you are right, we should use shmem to reproduce this issue.
Let's elaborate this scenario on Linux 4.6.
1. initial phase: we want to create a sharable+writable anon mapping by mmap() API.
in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED.
so in vm_get_page_prot(), it will get the pte attribute from protection_map[] is __S011.
In Linux 4.6, __P011 is PAGE_SHARED:
#define __S011 PAGE_SHARED
#define PAGE_SHARED __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
for PAGE_SHARED, the PTE_RDONLY is zero, but PTE_WRITE(DMB) are 1.
so the vm_flags is: VM_READ | VM_WRITE | VM_SHARED
2. Thread 1 on CPU0 want to read this page, page_fault will be trigger.
in handle_pte_fault->do_fault->do_read_fault(), it will allocate a new page cache, and in do_set_pte(),
in set_pte_at() in Linux 4.6.
if (pte_sw_dirty(pte) && pte_write(pte))
pte_val(pte) &= ~PTE_RDONLY;
else
pte_val(pte) |= PTE_RDONLY;
in this scenrio, PTE_DIRTY is not set, so it will set the PTE_RDONLY.
so the pte attribute should be: PTE_WRITE | PTE_RDONLY.
3. Thread 2 on CPU1 also want to read this page but this pte has not create by Thread 1, so page_fault happen.
in pte_offset_map(), it found that the pte is created by Thread 1, so it will directly call:
entry = pte_mkyoung(entry);
ptep_set_access_flags()
if in the interval between pte_mkyoung() and ptep_set_access_flags(), if other thread T3 want to write this page, the hardware DMB will
clean PTE_RDONLY automatically.
and the T2 execute the ptep_set_access_flags(), it will set the PTE_RDONLY back, so we lose the Dirty status.
Thanks, Catalin, amazing!
Best
Ben shushu
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-29 14:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-27 9:56 about the ptep_set_access_flags() for hardware AF/DBM FF
2019-10-28 18:43 ` Catalin Marinas
2019-10-29 0:54 ` FF
2019-10-29 12:11 ` Catalin Marinas
2019-10-29 14:04 ` FF
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).