* about the ptep_set_access_flags() for hardware AF/DBM @ 2019-10-27 9:56 FF 2019-10-28 18:43 ` Catalin Marinas 0 siblings, 1 reply; 5+ messages in thread From: FF @ 2019-10-27 9:56 UTC (permalink / raw) To: linux-arm-kernel, catalin.marinas, julien.grall, will.deacon, mark.rutland, steve.capper hi all: i see a patch, commit id: 66dbd6e61a52 "arm64: Implement ptep_set_access_flags() for hardware AF/DBM" in this patch, the author show a insteresting case of the racy of hardware AF/DBM. Here is the scenario: A more complex situation is possible when all CPUs support hardware AF/DBM: a) Initial state: shareable + writable vma and pte_none(pte) b) Read fault taken by two threads of the same process on different CPUs c) CPU0 takes the mmap_sem and proceeds to handling the fault. It eventually reaches do_set_pte() which sets a writable + clean pte. CPU0 releases the mmap_sem d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The pte entry it reads is present, writable and clean and it continues to pte_mkyoung() e) CPU1 calls ptep_set_access_flags() If between (d) and (e) the hardware (another CPU) updates the dirty state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit marking the entry clean again. my question is: 1. in step a, it say, the initial state vma is : sharable + writable + pte_none. let suppose this is a anon mapping by mmap() API. so the vma->vm_page_prot should be : VM_READ | VM_WRITE | VM_SHARED in vm_get_page_prot(), it will change to pte attribute,in linux kernel it has a protection_map[] array. in that case, it should be __S011 (PAGE_SHARED). for PAGE_SHARED, the pte attribute will set PTE_WRITE,so PTE_DBM is set, but the PTE_RDONLY should be zero, right? in step c, CPU0 trigger read fault and handle the page fault, it will call do_anonymous_page(), and using system_zero_page. i don't what is a clean pte? but currently, the PTE_RDONLY is zero, it means this pte is writable. when the CPU2 write this memory, it will update the dirty state like clear PTE_RDONLY, but my questions, the PTE_RDONLY is still zero, in step a~d, so why CPU1 will override RT_RDONLY bit and marking the entry clean again. in that case, why it will trigger "racy dirty state clearing" in set_pte_at? i see the pte_dirty() will check the sw_dirty and hw_dirty, so in this case, is it our sw has not set PTE_DIRTY bit? how hw dirty checking, the PTE_RDONLY is always zero. #define pte_dirty(pte) (pte_sw_dirty(pte) || pte_hw_dirty(pte)) would you like point out what i missing? Best Ben _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: about the ptep_set_access_flags() for hardware AF/DBM 2019-10-27 9:56 about the ptep_set_access_flags() for hardware AF/DBM FF @ 2019-10-28 18:43 ` Catalin Marinas 2019-10-29 0:54 ` FF 0 siblings, 1 reply; 5+ messages in thread From: Catalin Marinas @ 2019-10-28 18:43 UTC (permalink / raw) To: FF Cc: mark.rutland, julien.grall, will.deacon, linux-arm-kernel, steve.capper On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote: > i see a patch, commit id: 66dbd6e61a52 "arm64: Implement ptep_set_access_flags() for hardware AF/DBM" > in this patch, the author show a insteresting case of the racy of hardware AF/DBM. > > Here is the scenario: > A more complex situation is possible when all CPUs support hardware > AF/DBM: > > a) Initial state: shareable + writable vma and pte_none(pte) > b) Read fault taken by two threads of the same process on different > CPUs > c) CPU0 takes the mmap_sem and proceeds to handling the fault. It > eventually reaches do_set_pte() which sets a writable + clean pte. > CPU0 releases the mmap_sem > d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The > pte entry it reads is present, writable and clean and it continues > to pte_mkyoung() > e) CPU1 calls ptep_set_access_flags() > > If between (d) and (e) the hardware (another CPU) updates the dirty > state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit > marking the entry clean again. > > my question is: > 1. in step a, it say, the initial state vma is : sharable + writable + > pte_none. let suppose this is a anon mapping by mmap() API. What I had in mind at the time was a file mapping rather than anonymous one (vma_is_anonymous() is false for shared mappings). > so the vma->vm_page_prot should be : VM_READ | VM_WRITE | VM_SHARED > in vm_get_page_prot(), it will change to pte attribute,in linux > kernel it has a protection_map[] array. in that case, it should be > __S011 (PAGE_SHARED). for PAGE_SHARED, the pte attribute will set > PTE_WRITE,so PTE_DBM is set, but the PTE_RDONLY should be zero, > right? PAGE_SHARED is indeed writable but how it ends up in the pte depends on the mapping. For a shared memory mapping, I think you do get a writable entry on a read fault. For file mappings, the writable attribute is cleared from vm_page_prot via the vma_set_page_prot() function because vma_wants_writenotify() is true. Filesystem normally want to track which pages have been dirtied to write back. > in step c, CPU0 trigger read fault and handle the page fault, it will > call do_anonymous_page(), and using system_zero_page. i don't what is > a clean pte? but currently, the PTE_RDONLY is zero, it means this > pte is writable. Note that we can't invoke do_anonymous_page() for VM_SHARED mappings. This is only for private mappings. If you look at mmap_region(), the vma is not set up as anonymous if MAP_SHARED is given but as a shmem. > when the CPU2 write this memory, it will update the dirty state like > clear PTE_RDONLY, but my questions, the PTE_RDONLY is still zero, in > step a~d, so why CPU1 will override RT_RDONLY bit and marking the > entry clean again. As I said above, this scenario is for shared file mappings where you do get a PTE_RDONLY set for clean mappings. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re:Re: about the ptep_set_access_flags() for hardware AF/DBM 2019-10-28 18:43 ` Catalin Marinas @ 2019-10-29 0:54 ` FF 2019-10-29 12:11 ` Catalin Marinas 0 siblings, 1 reply; 5+ messages in thread From: FF @ 2019-10-29 0:54 UTC (permalink / raw) To: Catalin Marinas, runninglinuxkernel Cc: mark.rutland, steve.capper, will.deacon, runninglinuxkernel, julien.grall, linux-arm-kernel At 2019-10-29 02:43:03, "Catalin Marinas" <catalin.marinas@arm.com> wrote: >On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote: >> i see a patch, commit id: 66dbd6e61a52 "arm64: Implement ptep_set_access_flags() for hardware AF/DBM" >> in this patch, the author show a insteresting case of the racy of hardware AF/DBM. >> >> Here is the scenario: >> A more complex situation is possible when all CPUs support hardware >> AF/DBM: >> >> a) Initial state: shareable + writable vma and pte_none(pte) >> b) Read fault taken by two threads of the same process on different >> CPUs >> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It >> eventually reaches do_set_pte() which sets a writable + clean pte. >> CPU0 releases the mmap_sem >> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The >> pte entry it reads is present, writable and clean and it continues >> to pte_mkyoung() >> e) CPU1 calls ptep_set_access_flags() >> >> If between (d) and (e) the hardware (another CPU) updates the dirty >> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit >> marking the entry clean again. >> >> my question is: >> 1. in step a, it say, the initial state vma is : sharable + writable + >> pte_none. let suppose this is a anon mapping by mmap() API. > >What I had in mind at the time was a file mapping rather than anonymous >one (vma_is_anonymous() is false for shared mappings). > >> so the vma->vm_page_prot should be : VM_READ | VM_WRITE | VM_SHARED >> in vm_get_page_prot(), it will change to pte attribute,in linux >> kernel it has a protection_map[] array. in that case, it should be >> __S011 (PAGE_SHARED). for PAGE_SHARED, the pte attribute will set >> PTE_WRITE,so PTE_DBM is set, but the PTE_RDONLY should be zero, >> right? > >PAGE_SHARED is indeed writable but how it ends up in the pte depends on >the mapping. For a shared memory mapping, I think you do get a writable >entry on a read fault. > >For file mappings, the writable attribute is cleared from vm_page_prot >via the vma_set_page_prot() function because vma_wants_writenotify() is >true. Filesystem normally want to track which pages have been dirtied to >write back. > >> in step c, CPU0 trigger read fault and handle the page fault, it will >> call do_anonymous_page(), and using system_zero_page. i don't what is >> a clean pte? but currently, the PTE_RDONLY is zero, it means this >> pte is writable. > >Note that we can't invoke do_anonymous_page() for VM_SHARED mappings. >This is only for private mappings. If you look at mmap_region(), the vma >is not set up as anonymous if MAP_SHARED is given but as a shmem. > >> when the CPU2 write this memory, it will update the dirty state like >> clear PTE_RDONLY, but my questions, the PTE_RDONLY is still zero, in >> step a~d, so why CPU1 will override RT_RDONLY bit and marking the >> entry clean again. > >As I said above, this scenario is for shared file mappings where you do >get a PTE_RDONLY set for clean mappings. > >-- >Catalin hi Catalin: Thanks for your point out. i want to elaborate the scenario, i saw the first patch to fix the ptep_set_access_flags() for hardware AF/DBM is on Linux 4.7-rc1. commit id "66dbd6e6" ,arm64: Implement ptep_set_access_flags() for hardware AF/DBM i think you have issue on Linux 4.6, let's assume that we are look at Linux 4.6 source code. 1. initial phase: we want to create a sharable+writable file mapping by mmap() API, the filesyste is:ext4 in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED. in mmap_region()->vma_set_page_prot(), it will let the some shared mappigns will want the pages marked read-only to track write events, so it will clear the VM_SHARED. so it will get the pte attribute from protection_map[] is __P011. In Linux 4.6, __P011 is PAGE_COPY: #define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN) for PAGE_COPY, the PTE_RDONLY and PTE_WRITE(DMB) are zero. so the vm_flags is: VM_READ | VM_WRITE 2. Thread 1 on CPU0 want to write this page, page_fault will be trigger. in handle_pte_fault->do_fault->do_shared_fault(), it will allocate a new page cache, and in do_set_pte(), it will call: "maybe_mkwrite(pte_mkdirty(entry), vma)" to set the pte entry. so the pte attribute should be: PTE_DIRTY | PTE_WRITE. 3. Thread 2 on CPU1 also want to read this page but this pte has not create by Thread 1, so page_fault happen. in pte_offset_map(), it found that the pte is created by Thread 1, so it will directly call: entry = pte_mkyoung(entry); ptep_set_access_flags() in ptep_set_access_flags, it will call set_pte_at() to set pte. but in set_pte_at() function: if (pte_present(pte)) { if (pte_sw_dirty(pte) && pte_write(pte)) pte_val(pte) &= ~PTE_RDONLY; else pte_val(pte) |= PTE_RDONLY; if (pte_user(pte) && pte_exec(pte) && !pte_special(pte)) __sync_icache_dcache(pte, addr); } it will clean the PTE_RDONLY bit, because the PTE_DIRTY | PTE_WRITE is set in our scenario. otherwise, anyone clean the PTE_DIRTY bit, who will clean this PTE_DIRTY bit? so i am very confusing the patch "arm64: Implement ptep_set_access_flags() for hardware AF/DBM" commit log's scenrio. would you like point out what i am missing? Best Ben _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: about the ptep_set_access_flags() for hardware AF/DBM 2019-10-29 0:54 ` FF @ 2019-10-29 12:11 ` Catalin Marinas 2019-10-29 14:04 ` FF 0 siblings, 1 reply; 5+ messages in thread From: Catalin Marinas @ 2019-10-29 12:11 UTC (permalink / raw) To: FF Cc: mark.rutland, steve.capper, runninglinuxkernel, will.deacon, julien.grall, linux-arm-kernel Hi Ben, On Tue, Oct 29, 2019 at 08:54:38AM +0800, FF wrote: > >On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote: > >> Here is the scenario: > >> A more complex situation is possible when all CPUs support hardware > >> AF/DBM: > >> > >> a) Initial state: shareable + writable vma and pte_none(pte) > >> b) Read fault taken by two threads of the same process on different > >> CPUs > >> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It > >> eventually reaches do_set_pte() which sets a writable + clean pte. > >> CPU0 releases the mmap_sem > >> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The > >> pte entry it reads is present, writable and clean and it continues > >> to pte_mkyoung() > >> e) CPU1 calls ptep_set_access_flags() > >> > >> If between (d) and (e) the hardware (another CPU) updates the dirty > >> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit > >> marking the entry clean again. [...] > i want to elaborate the scenario, i saw the first patch to fix the > ptep_set_access_flags() for hardware AF/DBM is on Linux 4.7-rc1. > commit id "66dbd6e6" ,arm64: Implement ptep_set_access_flags() for > hardware AF/DBM What are you trying to solve? ptep_set_access_flags() being atomic is not any worse. Do you think we wouldn't need this patch? > i think you have issue on Linux 4.6, let's assume that we are look at > Linux 4.6 source code. > > 1. initial phase: we want to create a sharable+writable file mapping > by mmap() API, the filesyste is:ext4 See more below but I think we may need shm instead of a file mapping to trigger the race (which, BTW, is rather theoretical; I haven't seen it in practice). > in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED. > in mmap_region()->vma_set_page_prot(), it will let the some shared > mappigns will want the pages marked read-only to track write > events, so it will clear the VM_SHARED. so it will get the pte > attribute from protection_map[] is __P011. > > In Linux 4.6, __P011 is PAGE_COPY: > #define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN) > > for PAGE_COPY, the PTE_RDONLY and PTE_WRITE(DMB) are zero. > so the vm_flags is: VM_READ | VM_WRITE While you are right that PAGE_COPY has PTE_RDONLY and PTE_WRITE zero, set_pte_at() in 4.6 sets PTE_RDONLY if !PTE_WRITE. So the resulting mapping in the page table is read-only. Anyway I think with vma_wants_writenotify() we can't trigger this race since it's a purely read-only fault (requires kernel notification). What we need is to end up with a writable+clean entry which means VM_SHARED set leading to PAGE_SHARED attributes which have PTE_WRITE/DBM set. Note that set_pte_at() in 4.6 would mark the page as PTE_RDONLY since pte_sw_dirty() is false. > 2. Thread 1 on CPU0 want to write this page, page_fault will be trigger. > in handle_pte_fault->do_fault->do_shared_fault(), it will allocate > a new page cache, and in do_set_pte(), it will call: > "maybe_mkwrite(pte_mkdirty(entry), vma)" to set the pte entry. so > the pte attribute should be: PTE_DIRTY | PTE_WRITE. Yes but the scenario I had in mind was a read fault here rather than write which would set a PAGE_SHARED attributes ending up with PTE_WRITE|PTE_RDONLY (PTE_WRITE is the PTE_DBM bit). > 3. Thread 2 on CPU1 also want to read this page but this pte has not > create by Thread 1, so page_fault happen. in pte_offset_map(), it > found that the pte is created by Thread 1, so it will directly > call: > > entry = pte_mkyoung(entry); > ptep_set_access_flags() > > in ptep_set_access_flags, it will call set_pte_at() to set pte. > but in set_pte_at() function: > > if (pte_present(pte)) { > if (pte_sw_dirty(pte) && pte_write(pte)) > pte_val(pte) &= ~PTE_RDONLY; > else > pte_val(pte) |= PTE_RDONLY; > if (pte_user(pte) && pte_exec(pte) && !pte_special(pte)) > __sync_icache_dcache(pte, addr); > } > > it will clean the PTE_RDONLY bit, because the PTE_DIRTY | > PTE_WRITE is set in our scenario. otherwise, anyone clean the > PTE_DIRTY bit, who will clean this PTE_DIRTY bit? Correct for your scenario but not if point 2 is a read. > so i am very confusing the patch "arm64: Implement > ptep_set_access_flags() for hardware AF/DBM" commit log's scenrio. > would you like point out what i am missing? If point 2 is a read fault, that goes via do_read_fault() and the pte ends up as clean with PTE_WRITE|PTE_RDONLY is set since it's not pte_sw_dirty() (checked by set_pte_at()). Thread 2 on CPU1 would end up calling ptep_set_access_flags() on a read-only pte with DBM set because it took a read fault (same as Thread 1). The problem appears if a Thread 3 on CPU2 performs a write access in parallel with point 3 above. CPU2 sees the pte as valid, RDONLY and DBM set, and proceeds to clearing the RDONLY bit in hardware. CPU1 then overrides the PTE_RDONLY bit if ptep_set_access_flags() is not atomic. Now you need to find a vm_operations_struct that allows shared, writable and clean mappings and does not set .page_mkwrite (shm_vm_ops is one). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re:Re: Re: about the ptep_set_access_flags() for hardware AF/DBM 2019-10-29 12:11 ` Catalin Marinas @ 2019-10-29 14:04 ` FF 0 siblings, 0 replies; 5+ messages in thread From: FF @ 2019-10-29 14:04 UTC (permalink / raw) To: Catalin Marinas Cc: mark.rutland, steve.capper, runninglinuxkernel, will.deacon, julien.grall, linux-arm-kernel At 2019-10-29 20:11:54, "Catalin Marinas" <catalin.marinas@arm.com> wrote: >Hi Ben, > >On Tue, Oct 29, 2019 at 08:54:38AM +0800, FF wrote: >> >On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote: >> >> Here is the scenario: >> >> A more complex situation is possible when all CPUs support hardware >> >> AF/DBM: >> >> >> >> a) Initial state: shareable + writable vma and pte_none(pte) >> >> b) Read fault taken by two threads of the same process on different >> >> CPUs >> >> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It >> >> eventually reaches do_set_pte() which sets a writable + clean pte. >> >> CPU0 releases the mmap_sem >> >> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The >> >> pte entry it reads is present, writable and clean and it continues >> >> to pte_mkyoung() >> >> e) CPU1 calls ptep_set_access_flags() >> >> >> >> If between (d) and (e) the hardware (another CPU) updates the dirty >> >> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit >> >> marking the entry clean again. >[...] >> i want to elaborate the scenario, i saw the first patch to fix the >> ptep_set_access_flags() for hardware AF/DBM is on Linux 4.7-rc1. >> commit id "66dbd6e6" ,arm64: Implement ptep_set_access_flags() for >> hardware AF/DBM > >What are you trying to solve? ptep_set_access_flags() being atomic is >not any worse. Do you think we wouldn't need this patch? > >> i think you have issue on Linux 4.6, let's assume that we are look at >> Linux 4.6 source code. >> >> 1. initial phase: we want to create a sharable+writable file mapping >> by mmap() API, the filesyste is:ext4 > >See more below but I think we may need shm instead of a file mapping to >trigger the race (which, BTW, is rather theoretical; I haven't seen it >in practice). > >> in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED. >> in mmap_region()->vma_set_page_prot(), it will let the some shared >> mappigns will want the pages marked read-only to track write >> events, so it will clear the VM_SHARED. so it will get the pte >> attribute from protection_map[] is __P011. >> >> In Linux 4.6, __P011 is PAGE_COPY: >> #define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN) >> >> for PAGE_COPY, the PTE_RDONLY and PTE_WRITE(DMB) are zero. >> so the vm_flags is: VM_READ | VM_WRITE > >While you are right that PAGE_COPY has PTE_RDONLY and PTE_WRITE zero, >set_pte_at() in 4.6 sets PTE_RDONLY if !PTE_WRITE. So the resulting >mapping in the page table is read-only. > >Anyway I think with vma_wants_writenotify() we can't trigger this race >since it's a purely read-only fault (requires kernel notification). What >we need is to end up with a writable+clean entry which means VM_SHARED >set leading to PAGE_SHARED attributes which have PTE_WRITE/DBM set. Note >that set_pte_at() in 4.6 would mark the page as PTE_RDONLY since >pte_sw_dirty() is false. > >> 2. Thread 1 on CPU0 want to write this page, page_fault will be trigger. >> in handle_pte_fault->do_fault->do_shared_fault(), it will allocate >> a new page cache, and in do_set_pte(), it will call: >> "maybe_mkwrite(pte_mkdirty(entry), vma)" to set the pte entry. so >> the pte attribute should be: PTE_DIRTY | PTE_WRITE. > >Yes but the scenario I had in mind was a read fault here rather than >write which would set a PAGE_SHARED attributes ending up with >PTE_WRITE|PTE_RDONLY (PTE_WRITE is the PTE_DBM bit). > >> 3. Thread 2 on CPU1 also want to read this page but this pte has not >> create by Thread 1, so page_fault happen. in pte_offset_map(), it >> found that the pte is created by Thread 1, so it will directly >> call: >> >> entry = pte_mkyoung(entry); >> ptep_set_access_flags() >> >> in ptep_set_access_flags, it will call set_pte_at() to set pte. >> but in set_pte_at() function: >> >> if (pte_present(pte)) { >> if (pte_sw_dirty(pte) && pte_write(pte)) >> pte_val(pte) &= ~PTE_RDONLY; >> else >> pte_val(pte) |= PTE_RDONLY; >> if (pte_user(pte) && pte_exec(pte) && !pte_special(pte)) >> __sync_icache_dcache(pte, addr); >> } >> >> it will clean the PTE_RDONLY bit, because the PTE_DIRTY | >> PTE_WRITE is set in our scenario. otherwise, anyone clean the >> PTE_DIRTY bit, who will clean this PTE_DIRTY bit? > >Correct for your scenario but not if point 2 is a read. > >> so i am very confusing the patch "arm64: Implement >> ptep_set_access_flags() for hardware AF/DBM" commit log's scenrio. >> would you like point out what i am missing? > >If point 2 is a read fault, that goes via do_read_fault() and the pte >ends up as clean with PTE_WRITE|PTE_RDONLY is set since it's not >pte_sw_dirty() (checked by set_pte_at()). > >Thread 2 on CPU1 would end up calling ptep_set_access_flags() on a >read-only pte with DBM set because it took a read fault (same as Thread >1). > >The problem appears if a Thread 3 on CPU2 performs a write access in >parallel with point 3 above. CPU2 sees the pte as valid, RDONLY and DBM >set, and proceeds to clearing the RDONLY bit in hardware. CPU1 then >overrides the PTE_RDONLY bit if ptep_set_access_flags() is not atomic. > >Now you need to find a vm_operations_struct that allows shared, writable >and clean mappings and does not set .page_mkwrite (shm_vm_ops is one). > >-- >Catalin hi Catalin: Cool! Thanks for your point out. you are right, we should use shmem to reproduce this issue. Let's elaborate this scenario on Linux 4.6. 1. initial phase: we want to create a sharable+writable anon mapping by mmap() API. in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED. so in vm_get_page_prot(), it will get the pte attribute from protection_map[] is __S011. In Linux 4.6, __P011 is PAGE_SHARED: #define __S011 PAGE_SHARED #define PAGE_SHARED __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE) for PAGE_SHARED, the PTE_RDONLY is zero, but PTE_WRITE(DMB) are 1. so the vm_flags is: VM_READ | VM_WRITE | VM_SHARED 2. Thread 1 on CPU0 want to read this page, page_fault will be trigger. in handle_pte_fault->do_fault->do_read_fault(), it will allocate a new page cache, and in do_set_pte(), in set_pte_at() in Linux 4.6. if (pte_sw_dirty(pte) && pte_write(pte)) pte_val(pte) &= ~PTE_RDONLY; else pte_val(pte) |= PTE_RDONLY; in this scenrio, PTE_DIRTY is not set, so it will set the PTE_RDONLY. so the pte attribute should be: PTE_WRITE | PTE_RDONLY. 3. Thread 2 on CPU1 also want to read this page but this pte has not create by Thread 1, so page_fault happen. in pte_offset_map(), it found that the pte is created by Thread 1, so it will directly call: entry = pte_mkyoung(entry); ptep_set_access_flags() if in the interval between pte_mkyoung() and ptep_set_access_flags(), if other thread T3 want to write this page, the hardware DMB will clean PTE_RDONLY automatically. and the T2 execute the ptep_set_access_flags(), it will set the PTE_RDONLY back, so we lose the Dirty status. Thanks, Catalin, amazing! Best Ben shushu _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-29 14:05 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-27 9:56 about the ptep_set_access_flags() for hardware AF/DBM FF 2019-10-28 18:43 ` Catalin Marinas 2019-10-29 0:54 ` FF 2019-10-29 12:11 ` Catalin Marinas 2019-10-29 14:04 ` FF
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).