* [PATCH V2] mm: Recheck page table entry with page table lock held
@ 2018-09-26 3:18 Aneesh Kumar K.V
2018-09-26 12:40 ` Kirill A. Shutemov
[not found] ` <CAF7GXvqmf_pqrYCoG+9Kna184Yi0JNvGpwN7JcGvnBL3SFrcnQ@mail.gmail.com>
0 siblings, 2 replies; 3+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-26 3:18 UTC (permalink / raw)
To: akpm, Kirill A . Shutemov; +Cc: linux-mm, linux-kernel, Aneesh Kumar K.V
We clear the pte temporarily during read/modify/write update of the pte. If we
take a page fault while the pte is cleared, the application can get SIGBUS. One
such case is with remap_pfn_range without a backing vm_ops->fault callback.
do_fault will return SIGBUS in that case.
cpu 0 cpu1
mprotect()
ptep_modify_prot_start()/pte cleared.
.
. page fault.
.
.
prep_modify_prot_commit()
Fix this by taking page table lock and rechecking for pte_none.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
V1:
* update commit message.
mm/memory.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index c467102a5cbc..c2f933184303 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
vm_fault_t ret;
- /* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
- if (!vma->vm_ops->fault)
- ret = VM_FAULT_SIGBUS;
- else if (!(vmf->flags & FAULT_FLAG_WRITE))
+ /*
+ * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND
+ */
+ if (!vma->vm_ops->fault) {
+
+ /*
+ * pmd entries won't be marked none during a R/M/W cycle.
+ */
+ if (unlikely(pmd_none(*vmf->pmd)))
+ ret = VM_FAULT_SIGBUS;
+ else {
+ vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
+ /*
+ * Make sure this is not a temporary clearing of pte
+ * by holding ptl and checking again. A R/M/W update
+ * of pte involves: take ptl, clearing the pte so that
+ * we don't have concurrent modification by hardware
+ * followed by an update.
+ */
+ spin_lock(vmf->ptl);
+ if (unlikely(pte_none(*vmf->pte)))
+ ret = VM_FAULT_SIGBUS;
+ else
+ ret = VM_FAULT_NOPAGE;
+ spin_unlock(vmf->ptl);
+ }
+ } else if (!(vmf->flags & FAULT_FLAG_WRITE))
ret = do_read_fault(vmf);
else if (!(vma->vm_flags & VM_SHARED))
ret = do_cow_fault(vmf);
--
2.17.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH V2] mm: Recheck page table entry with page table lock held
2018-09-26 3:18 [PATCH V2] mm: Recheck page table entry with page table lock held Aneesh Kumar K.V
@ 2018-09-26 12:40 ` Kirill A. Shutemov
[not found] ` <CAF7GXvqmf_pqrYCoG+9Kna184Yi0JNvGpwN7JcGvnBL3SFrcnQ@mail.gmail.com>
1 sibling, 0 replies; 3+ messages in thread
From: Kirill A. Shutemov @ 2018-09-26 12:40 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: akpm, Kirill A . Shutemov, linux-mm, linux-kernel
On Wed, Sep 26, 2018 at 08:48:58AM +0530, Aneesh Kumar K.V wrote:
> We clear the pte temporarily during read/modify/write update of the pte. If we
> take a page fault while the pte is cleared, the application can get SIGBUS. One
> such case is with remap_pfn_range without a backing vm_ops->fault callback.
> do_fault will return SIGBUS in that case.
>
> cpu 0 cpu1
> mprotect()
> ptep_modify_prot_start()/pte cleared.
> .
> . page fault.
> .
> .
> prep_modify_prot_commit()
>
> Fix this by taking page table lock and rechecking for pte_none.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> V1:
> * update commit message.
You choosed to stick with VM_FAULT_NOPAGE, that's fine.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Should it be in stable?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH V2] mm: Recheck page table entry with page table lock held
[not found] ` <CAF7GXvqmf_pqrYCoG+9Kna184Yi0JNvGpwN7JcGvnBL3SFrcnQ@mail.gmail.com>
@ 2019-10-28 12:08 ` Kirill A. Shutemov
0 siblings, 0 replies; 3+ messages in thread
From: Kirill A. Shutemov @ 2019-10-28 12:08 UTC (permalink / raw)
To: Figo.zhang
Cc: Aneesh Kumar K.V, Andrew Morton, Kirill A . Shutemov, Linux MM, LKML
On Fri, Oct 25, 2019 at 11:13:58AM +0800, Figo.zhang wrote:
> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> 于2018年9月26日周三 上午11:19写道:
>
> > We clear the pte temporarily during read/modify/write update of the pte.
> > If we
> > take a page fault while the pte is cleared, the application can get
> > SIGBUS. One
> > such case is with remap_pfn_range without a backing vm_ops->fault callback.
> > do_fault will return SIGBUS in that case.
> >
> what is " remap_pfn_range without a backing vm_ops->fault callback ", would
> you like elaborate the scenario?
> is it the case using remap_pfn_range() in drivers mmap() file operations?
> if in that case, why it will trap into do_fault?
Because there's no page mapped there during the race.
> >
> > cpu 0 cpu1
> > mprotect()
> > ptep_modify_prot_start()/pte cleared.
> > .
> > . page fault.
> > .
> > .
> > prep_modify_prot_commit()
>
>
> i am confusing this scenario, when CPU0 will call
> in change_pte_range()->ptep_modify_prot_start() to clear the pte content,
> and
> on the other thread, in handle_pte_fault(), pte_offset_map() can get the
> pte, and the pte is not invalid, it's pte is valid but just the content is
> all zero, so why it will call into do_fault?
>
> in handle_pte_fault():
> vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
> if (!vmf->pte) {
> return do_fault(vmf);
> }
This case handles the situation when pte is none (clear) or page table is
not allocated at all.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-10-28 12:08 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-26 3:18 [PATCH V2] mm: Recheck page table entry with page table lock held Aneesh Kumar K.V
2018-09-26 12:40 ` Kirill A. Shutemov
[not found] ` <CAF7GXvqmf_pqrYCoG+9Kna184Yi0JNvGpwN7JcGvnBL3SFrcnQ@mail.gmail.com>
2019-10-28 12:08 ` Kirill A. Shutemov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).