linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Recheck page table entry with page table lock held
@ 2018-09-20  9:24 Aneesh Kumar K.V
  2018-09-20 11:05 ` Kirill A. Shutemov
  0 siblings, 1 reply; 4+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-20  9:24 UTC (permalink / raw)
  To: akpm, Kirill A . Shutemov; +Cc: linux-mm, linux-kernel, Aneesh Kumar K.V

We clear the pte temporarily during read/modify/write update of the pte. If we
take a page fault while the pte is cleared, the application can get SIGBUS. One
such case is with remap_pfn_range without a backing vm_ops->fault callback.
do_fault will return SIGBUS in that case.

Fix this by taking page table lock and rechecking for pte_none.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c467102a5cbc..c2f933184303 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
 	struct vm_area_struct *vma = vmf->vma;
 	vm_fault_t ret;
 
-	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
-	if (!vma->vm_ops->fault)
-		ret = VM_FAULT_SIGBUS;
-	else if (!(vmf->flags & FAULT_FLAG_WRITE))
+	/*
+	 * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND
+	 */
+	if (!vma->vm_ops->fault) {
+
+		/*
+		 * pmd entries won't be marked none during a R/M/W cycle.
+		 */
+		if (unlikely(pmd_none(*vmf->pmd)))
+			ret = VM_FAULT_SIGBUS;
+		else {
+			vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
+			/*
+			 * Make sure this is not a temporary clearing of pte
+			 * by holding ptl and checking again. A R/M/W update
+			 * of pte involves: take ptl, clearing the pte so that
+			 * we don't have concurrent modification by hardware
+			 * followed by an update.
+			 */
+			spin_lock(vmf->ptl);
+			if (unlikely(pte_none(*vmf->pte)))
+				ret = VM_FAULT_SIGBUS;
+			else
+				ret = VM_FAULT_NOPAGE;
+			spin_unlock(vmf->ptl);
+		}
+	} else if (!(vmf->flags & FAULT_FLAG_WRITE))
 		ret = do_read_fault(vmf);
 	else if (!(vma->vm_flags & VM_SHARED))
 		ret = do_cow_fault(vmf);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Recheck page table entry with page table lock held
  2018-09-20  9:24 [PATCH] mm: Recheck page table entry with page table lock held Aneesh Kumar K.V
@ 2018-09-20 11:05 ` Kirill A. Shutemov
  2018-09-20 11:11   ` Aneesh Kumar K.V
  0 siblings, 1 reply; 4+ messages in thread
From: Kirill A. Shutemov @ 2018-09-20 11:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: akpm, Kirill A . Shutemov, linux-mm, linux-kernel

On Thu, Sep 20, 2018 at 02:54:08PM +0530, Aneesh Kumar K.V wrote:
> We clear the pte temporarily during read/modify/write update of the pte. If we
> take a page fault while the pte is cleared, the application can get SIGBUS. One
> such case is with remap_pfn_range without a backing vm_ops->fault callback.
> do_fault will return SIGBUS in that case.

It would be nice to show the path that clears pte temporarily.

> Fix this by taking page table lock and rechecking for pte_none.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/memory.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index c467102a5cbc..c2f933184303 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
>  	struct vm_area_struct *vma = vmf->vma;
>  	vm_fault_t ret;
>  
> -	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
> -	if (!vma->vm_ops->fault)
> -		ret = VM_FAULT_SIGBUS;
> -	else if (!(vmf->flags & FAULT_FLAG_WRITE))
> +	/*
> +	 * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND
> +	 */
> +	if (!vma->vm_ops->fault) {
> +
> +		/*
> +		 * pmd entries won't be marked none during a R/M/W cycle.
> +		 */
> +		if (unlikely(pmd_none(*vmf->pmd)))
> +			ret = VM_FAULT_SIGBUS;
> +		else {
> +			vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> +			/*
> +			 * Make sure this is not a temporary clearing of pte
> +			 * by holding ptl and checking again. A R/M/W update
> +			 * of pte involves: take ptl, clearing the pte so that
> +			 * we don't have concurrent modification by hardware
> +			 * followed by an update.
> +			 */
> +			spin_lock(vmf->ptl);
> +			if (unlikely(pte_none(*vmf->pte)))
> +				ret = VM_FAULT_SIGBUS;
> +			else
> +				ret = VM_FAULT_NOPAGE;

We return 0 if we did nothing in fault path.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Recheck page table entry with page table lock held
  2018-09-20 11:05 ` Kirill A. Shutemov
@ 2018-09-20 11:11   ` Aneesh Kumar K.V
  2018-09-20 11:25     ` Kirill A. Shutemov
  0 siblings, 1 reply; 4+ messages in thread
From: Aneesh Kumar K.V @ 2018-09-20 11:11 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: akpm, Kirill A . Shutemov, linux-mm, linux-kernel

On 9/20/18 4:35 PM, Kirill A. Shutemov wrote:
> On Thu, Sep 20, 2018 at 02:54:08PM +0530, Aneesh Kumar K.V wrote:
>> We clear the pte temporarily during read/modify/write update of the pte. If we
>> take a page fault while the pte is cleared, the application can get SIGBUS. One
>> such case is with remap_pfn_range without a backing vm_ops->fault callback.
>> do_fault will return SIGBUS in that case.
> 
> It would be nice to show the path that clears pte temporarily.
> 
>> Fix this by taking page table lock and rechecking for pte_none.


we do that in the ptep_modify_prot_start/ptep_modify_prot_commit. Also 
in hugetlb_change_protection. The hugetlb case many not be relevant 
because that cannot be backed by a vma without vma->vm_ops.

What will hit this will be mprotect of a remap_pfn_range address?

>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   mm/memory.c | 31 +++++++++++++++++++++++++++----
>>   1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index c467102a5cbc..c2f933184303 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
>>   	struct vm_area_struct *vma = vmf->vma;
>>   	vm_fault_t ret;
>>   
>> -	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
>> -	if (!vma->vm_ops->fault)
>> -		ret = VM_FAULT_SIGBUS;
>> -	else if (!(vmf->flags & FAULT_FLAG_WRITE))
>> +	/*
>> +	 * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND
>> +	 */
>> +	if (!vma->vm_ops->fault) {
>> +
>> +		/*
>> +		 * pmd entries won't be marked none during a R/M/W cycle.
>> +		 */
>> +		if (unlikely(pmd_none(*vmf->pmd)))
>> +			ret = VM_FAULT_SIGBUS;
>> +		else {
>> +			vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
>> +			/*
>> +			 * Make sure this is not a temporary clearing of pte
>> +			 * by holding ptl and checking again. A R/M/W update
>> +			 * of pte involves: take ptl, clearing the pte so that
>> +			 * we don't have concurrent modification by hardware
>> +			 * followed by an update.
>> +			 */
>> +			spin_lock(vmf->ptl);
>> +			if (unlikely(pte_none(*vmf->pte)))
>> +				ret = VM_FAULT_SIGBUS;
>> +			else
>> +				ret = VM_FAULT_NOPAGE;
> 
> We return 0 if we did nothing in fault path.
> 

I didn't get that. If we find the pte not none, we return so that we 
retry the access. Are you suggesting VM_FAULT_NOPAGE is not the right 
return for that?

-aneesh


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Recheck page table entry with page table lock held
  2018-09-20 11:11   ` Aneesh Kumar K.V
@ 2018-09-20 11:25     ` Kirill A. Shutemov
  0 siblings, 0 replies; 4+ messages in thread
From: Kirill A. Shutemov @ 2018-09-20 11:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: akpm, Kirill A . Shutemov, linux-mm, linux-kernel

On Thu, Sep 20, 2018 at 04:41:59PM +0530, Aneesh Kumar K.V wrote:
> On 9/20/18 4:35 PM, Kirill A. Shutemov wrote:
> > On Thu, Sep 20, 2018 at 02:54:08PM +0530, Aneesh Kumar K.V wrote:
> > > We clear the pte temporarily during read/modify/write update of the pte. If we
> > > take a page fault while the pte is cleared, the application can get SIGBUS. One
> > > such case is with remap_pfn_range without a backing vm_ops->fault callback.
> > > do_fault will return SIGBUS in that case.
> > 
> > It would be nice to show the path that clears pte temporarily.
> > 
> > > Fix this by taking page table lock and rechecking for pte_none.
> 
> 
> we do that in the ptep_modify_prot_start/ptep_modify_prot_commit. Also in
> hugetlb_change_protection. The hugetlb case many not be relevant because
> that cannot be backed by a vma without vma->vm_ops.
> 
> What will hit this will be mprotect of a remap_pfn_range address?

Sounds right. Please update commit message.
> 
> > > 
> > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > ---
> > >   mm/memory.c | 31 +++++++++++++++++++++++++++----
> > >   1 file changed, 27 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index c467102a5cbc..c2f933184303 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -3745,10 +3745,33 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
> > >   	struct vm_area_struct *vma = vmf->vma;
> > >   	vm_fault_t ret;
> > > -	/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */
> > > -	if (!vma->vm_ops->fault)
> > > -		ret = VM_FAULT_SIGBUS;
> > > -	else if (!(vmf->flags & FAULT_FLAG_WRITE))
> > > +	/*
> > > +	 * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND
> > > +	 */
> > > +	if (!vma->vm_ops->fault) {
> > > +
> > > +		/*
> > > +		 * pmd entries won't be marked none during a R/M/W cycle.
> > > +		 */
> > > +		if (unlikely(pmd_none(*vmf->pmd)))
> > > +			ret = VM_FAULT_SIGBUS;
> > > +		else {
> > > +			vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> > > +			/*
> > > +			 * Make sure this is not a temporary clearing of pte
> > > +			 * by holding ptl and checking again. A R/M/W update
> > > +			 * of pte involves: take ptl, clearing the pte so that
> > > +			 * we don't have concurrent modification by hardware
> > > +			 * followed by an update.
> > > +			 */
> > > +			spin_lock(vmf->ptl);
> > > +			if (unlikely(pte_none(*vmf->pte)))
> > > +				ret = VM_FAULT_SIGBUS;
> > > +			else
> > > +				ret = VM_FAULT_NOPAGE;
> > 
> > We return 0 if we did nothing in fault path.
> > 
> 
> I didn't get that. If we find the pte not none, we return so that we retry
> the access. Are you suggesting VM_FAULT_NOPAGE is not the right return for
> that?

We usually use VM_FAULT_NOPAGE to indicate that ->fault() installed the
pte and we don't need to do anything. We don't touch pte in this page
fault.

It doesn't make difference in this particular case, nobody cares upper by
stack. Just a nitpick.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-09-20 11:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-20  9:24 [PATCH] mm: Recheck page table entry with page table lock held Aneesh Kumar K.V
2018-09-20 11:05 ` Kirill A. Shutemov
2018-09-20 11:11   ` Aneesh Kumar K.V
2018-09-20 11:25     ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).