All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Allow gmap fault to retry
@ 2016-01-04 11:19 ` Dominik Dingel
  0 siblings, 0 replies; 12+ messages in thread
From: Dominik Dingel @ 2016-01-04 11:19 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
	Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
  Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
	Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
	linux-kernel

Hello,

sorry for the delay since the last version.

During Jasons work with postcopy migration support for s390 a problem regarding
gmap faults was discovered.

The gmap code will call fixup_user_fault which will end up always in
handle_mm_fault. Till now we never cared about retries, but as the userfaultfd
code kind of relies on it. this needs some fix.

This patchset does not take care of the futex code. I will now look closer at
this.

Thanks,
    Dominik

v2 -> v3:
- In case of retrying check vma again
- Do the accounting of major/minor faults once

v1 -> v2:
- Instread of passing the VM_FAULT_RETRY from fixup_user_fault we do retries
  within fixup_user_fault, like get_user_pages_locked do.
- gmap code will now take retry if fixup_user_fault drops the lock.

Dominik Dingel (2):
  mm: bring in additional flag for fixup_user_fault to signal unlock
  s390/mm: enable fixup_user_fault retrying

 arch/s390/mm/pgtable.c | 31 ++++++++++++++++++++++++++++---
 include/linux/mm.h     |  5 +++--
 kernel/futex.c         |  2 +-
 mm/gup.c               | 30 +++++++++++++++++++++++++-----
 4 files changed, 57 insertions(+), 11 deletions(-)

-- 
2.3.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 0/2] Allow gmap fault to retry
@ 2016-01-04 11:19 ` Dominik Dingel
  0 siblings, 0 replies; 12+ messages in thread
From: Dominik Dingel @ 2016-01-04 11:19 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
	Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
  Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
	Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
	linux-kernel

Hello,

sorry for the delay since the last version.

During Jasons work with postcopy migration support for s390 a problem regarding
gmap faults was discovered.

The gmap code will call fixup_user_fault which will end up always in
handle_mm_fault. Till now we never cared about retries, but as the userfaultfd
code kind of relies on it. this needs some fix.

This patchset does not take care of the futex code. I will now look closer at
this.

Thanks,
    Dominik

v2 -> v3:
- In case of retrying check vma again
- Do the accounting of major/minor faults once

v1 -> v2:
- Instread of passing the VM_FAULT_RETRY from fixup_user_fault we do retries
  within fixup_user_fault, like get_user_pages_locked do.
- gmap code will now take retry if fixup_user_fault drops the lock.

Dominik Dingel (2):
  mm: bring in additional flag for fixup_user_fault to signal unlock
  s390/mm: enable fixup_user_fault retrying

 arch/s390/mm/pgtable.c | 31 ++++++++++++++++++++++++++++---
 include/linux/mm.h     |  5 +++--
 kernel/futex.c         |  2 +-
 mm/gup.c               | 30 +++++++++++++++++++++++++-----
 4 files changed, 57 insertions(+), 11 deletions(-)

-- 
2.3.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock
  2016-01-04 11:19 ` Dominik Dingel
@ 2016-01-04 11:19   ` Dominik Dingel
  -1 siblings, 0 replies; 12+ messages in thread
From: Dominik Dingel @ 2016-01-04 11:19 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
	Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
  Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
	Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
	linux-kernel

With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to
pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we
ever unlocked mmap_sem.

This patch brings in the logic to handle retries as well as it cleans up
the current documentation.  fixup_user_fault was not having the same
semantics as filemap_fault.  It never indicated if a retry happened and so
a caller wasn't able to handle that case.  So we now changed the behaviour
to always retry a locked mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/mm/pgtable.c |  8 +++++---
 include/linux/mm.h     |  5 +++--
 kernel/futex.c         |  2 +-
 mm/gup.c               | 30 +++++++++++++++++++++++++-----
 4 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 54ef3bc..b15759c 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -585,7 +585,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
 		rc = vmaddr;
 		goto out_up;
 	}
-	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags)) {
+	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags, NULL)) {
 		rc = -EFAULT;
 		goto out_up;
 	}
@@ -730,7 +730,8 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
 			break;
 		}
 		/* Get the page mapped */
-		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE)) {
+		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE,
+				     NULL)) {
 			rc = -EFAULT;
 			break;
 		}
@@ -805,7 +806,8 @@ retry:
 	if (!(pte_val(*ptep) & _PAGE_INVALID) &&
 	     (pte_val(*ptep) & _PAGE_PROTECT)) {
 		pte_unmap_unlock(ptep, ptl);
-		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE)) {
+		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE,
+				     NULL)) {
 			up_read(&mm->mmap_sem);
 			return -EFAULT;
 		}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad77..7783073 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1163,7 +1163,8 @@ int invalidate_inode_page(struct page *page);
 extern int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags);
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
-			    unsigned long address, unsigned int fault_flags);
+			    unsigned long address, unsigned int fault_flags,
+			    bool *unlocked);
 #else
 static inline int handle_mm_fault(struct mm_struct *mm,
 			struct vm_area_struct *vma, unsigned long address,
@@ -1175,7 +1176,7 @@ static inline int handle_mm_fault(struct mm_struct *mm,
 }
 static inline int fixup_user_fault(struct task_struct *tsk,
 		struct mm_struct *mm, unsigned long address,
-		unsigned int fault_flags)
+		unsigned int fault_flags, bool *unlocked)
 {
 	/* should never happen if there's no MMU */
 	BUG();
diff --git a/kernel/futex.c b/kernel/futex.c
index 684d754..fb640c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -639,7 +639,7 @@ static int fault_in_user_writeable(u32 __user *uaddr)
 
 	down_read(&mm->mmap_sem);
 	ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
-			       FAULT_FLAG_WRITE);
+			       FAULT_FLAG_WRITE, NULL);
 	up_read(&mm->mmap_sem);
 
 	return ret < 0 ? ret : 0;
diff --git a/mm/gup.c b/mm/gup.c
index deafa2c..493d543 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -564,6 +564,8 @@ EXPORT_SYMBOL(__get_user_pages);
  * @mm:		mm_struct of target mm
  * @address:	user address
  * @fault_flags:flags to pass down to handle_mm_fault()
+ * @unlocked:	did we unlock the mmap_sem while retrying, maybe NULL if caller
+ *		does not allow retry
  *
  * This is meant to be called in the specific scenario where for locking reasons
  * we try to access user memory in atomic context (within a pagefault_disable()
@@ -575,22 +577,28 @@ EXPORT_SYMBOL(__get_user_pages);
  * The main difference with get_user_pages() is that this function will
  * unconditionally call handle_mm_fault() which will in turn perform all the
  * necessary SW fixup of the dirty and young bits in the PTE, while
- * handle_mm_fault() only guarantees to update these in the struct page.
+ * get_user_pages() only guarantees to update these in the struct page.
  *
  * This is important for some architectures where those bits also gate the
  * access permission to the page because they are maintained in software.  On
  * such architectures, gup() will not be enough to make a subsequent access
  * succeed.
  *
- * This has the same semantics wrt the @mm->mmap_sem as does filemap_fault().
+ * This function will not return with an unlocked mmap_sem. So it has not the
+ * same semantics wrt the @mm->mmap_sem as does filemap_fault().
  */
 int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
-		     unsigned long address, unsigned int fault_flags)
+		     unsigned long address, unsigned int fault_flags,
+		     bool *unlocked)
 {
 	struct vm_area_struct *vma;
 	vm_flags_t vm_flags;
-	int ret;
+	int ret, major = 0;
+
+	if (unlocked)
+		fault_flags |= FAULT_FLAG_ALLOW_RETRY;
 
+retry:
 	vma = find_extend_vma(mm, address);
 	if (!vma || address < vma->vm_start)
 		return -EFAULT;
@@ -600,6 +608,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 		return -EFAULT;
 
 	ret = handle_mm_fault(mm, vma, address, fault_flags);
+	major |= ret & VM_FAULT_MAJOR;
 	if (ret & VM_FAULT_ERROR) {
 		if (ret & VM_FAULT_OOM)
 			return -ENOMEM;
@@ -609,8 +618,19 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 			return -EFAULT;
 		BUG();
 	}
+
+	if (ret & VM_FAULT_RETRY) {
+		down_read(&mm->mmap_sem);
+		if (!(fault_flags & FAULT_FLAG_TRIED)) {
+			*unlocked = true;
+			fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
+			fault_flags |= FAULT_FLAG_TRIED;
+			goto retry;
+		}
+	}
+
 	if (tsk) {
-		if (ret & VM_FAULT_MAJOR)
+		if (major)
 			tsk->maj_flt++;
 		else
 			tsk->min_flt++;
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock
@ 2016-01-04 11:19   ` Dominik Dingel
  0 siblings, 0 replies; 12+ messages in thread
From: Dominik Dingel @ 2016-01-04 11:19 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
	Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
  Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
	Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
	linux-kernel

With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to
pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we
ever unlocked mmap_sem.

This patch brings in the logic to handle retries as well as it cleans up
the current documentation.  fixup_user_fault was not having the same
semantics as filemap_fault.  It never indicated if a retry happened and so
a caller wasn't able to handle that case.  So we now changed the behaviour
to always retry a locked mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/mm/pgtable.c |  8 +++++---
 include/linux/mm.h     |  5 +++--
 kernel/futex.c         |  2 +-
 mm/gup.c               | 30 +++++++++++++++++++++++++-----
 4 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 54ef3bc..b15759c 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -585,7 +585,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
 		rc = vmaddr;
 		goto out_up;
 	}
-	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags)) {
+	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags, NULL)) {
 		rc = -EFAULT;
 		goto out_up;
 	}
@@ -730,7 +730,8 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
 			break;
 		}
 		/* Get the page mapped */
-		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE)) {
+		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE,
+				     NULL)) {
 			rc = -EFAULT;
 			break;
 		}
@@ -805,7 +806,8 @@ retry:
 	if (!(pte_val(*ptep) & _PAGE_INVALID) &&
 	     (pte_val(*ptep) & _PAGE_PROTECT)) {
 		pte_unmap_unlock(ptep, ptl);
-		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE)) {
+		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE,
+				     NULL)) {
 			up_read(&mm->mmap_sem);
 			return -EFAULT;
 		}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad77..7783073 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1163,7 +1163,8 @@ int invalidate_inode_page(struct page *page);
 extern int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags);
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
-			    unsigned long address, unsigned int fault_flags);
+			    unsigned long address, unsigned int fault_flags,
+			    bool *unlocked);
 #else
 static inline int handle_mm_fault(struct mm_struct *mm,
 			struct vm_area_struct *vma, unsigned long address,
@@ -1175,7 +1176,7 @@ static inline int handle_mm_fault(struct mm_struct *mm,
 }
 static inline int fixup_user_fault(struct task_struct *tsk,
 		struct mm_struct *mm, unsigned long address,
-		unsigned int fault_flags)
+		unsigned int fault_flags, bool *unlocked)
 {
 	/* should never happen if there's no MMU */
 	BUG();
diff --git a/kernel/futex.c b/kernel/futex.c
index 684d754..fb640c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -639,7 +639,7 @@ static int fault_in_user_writeable(u32 __user *uaddr)
 
 	down_read(&mm->mmap_sem);
 	ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
-			       FAULT_FLAG_WRITE);
+			       FAULT_FLAG_WRITE, NULL);
 	up_read(&mm->mmap_sem);
 
 	return ret < 0 ? ret : 0;
diff --git a/mm/gup.c b/mm/gup.c
index deafa2c..493d543 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -564,6 +564,8 @@ EXPORT_SYMBOL(__get_user_pages);
  * @mm:		mm_struct of target mm
  * @address:	user address
  * @fault_flags:flags to pass down to handle_mm_fault()
+ * @unlocked:	did we unlock the mmap_sem while retrying, maybe NULL if caller
+ *		does not allow retry
  *
  * This is meant to be called in the specific scenario where for locking reasons
  * we try to access user memory in atomic context (within a pagefault_disable()
@@ -575,22 +577,28 @@ EXPORT_SYMBOL(__get_user_pages);
  * The main difference with get_user_pages() is that this function will
  * unconditionally call handle_mm_fault() which will in turn perform all the
  * necessary SW fixup of the dirty and young bits in the PTE, while
- * handle_mm_fault() only guarantees to update these in the struct page.
+ * get_user_pages() only guarantees to update these in the struct page.
  *
  * This is important for some architectures where those bits also gate the
  * access permission to the page because they are maintained in software.  On
  * such architectures, gup() will not be enough to make a subsequent access
  * succeed.
  *
- * This has the same semantics wrt the @mm->mmap_sem as does filemap_fault().
+ * This function will not return with an unlocked mmap_sem. So it has not the
+ * same semantics wrt the @mm->mmap_sem as does filemap_fault().
  */
 int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
-		     unsigned long address, unsigned int fault_flags)
+		     unsigned long address, unsigned int fault_flags,
+		     bool *unlocked)
 {
 	struct vm_area_struct *vma;
 	vm_flags_t vm_flags;
-	int ret;
+	int ret, major = 0;
+
+	if (unlocked)
+		fault_flags |= FAULT_FLAG_ALLOW_RETRY;
 
+retry:
 	vma = find_extend_vma(mm, address);
 	if (!vma || address < vma->vm_start)
 		return -EFAULT;
@@ -600,6 +608,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 		return -EFAULT;
 
 	ret = handle_mm_fault(mm, vma, address, fault_flags);
+	major |= ret & VM_FAULT_MAJOR;
 	if (ret & VM_FAULT_ERROR) {
 		if (ret & VM_FAULT_OOM)
 			return -ENOMEM;
@@ -609,8 +618,19 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 			return -EFAULT;
 		BUG();
 	}
+
+	if (ret & VM_FAULT_RETRY) {
+		down_read(&mm->mmap_sem);
+		if (!(fault_flags & FAULT_FLAG_TRIED)) {
+			*unlocked = true;
+			fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
+			fault_flags |= FAULT_FLAG_TRIED;
+			goto retry;
+		}
+	}
+
 	if (tsk) {
-		if (ret & VM_FAULT_MAJOR)
+		if (major)
 			tsk->maj_flt++;
 		else
 			tsk->min_flt++;
-- 
2.3.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] s390/mm: enable fixup_user_fault retrying
  2016-01-04 11:19 ` Dominik Dingel
@ 2016-01-04 11:19   ` Dominik Dingel
  -1 siblings, 0 replies; 12+ messages in thread
From: Dominik Dingel @ 2016-01-04 11:19 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
	Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
  Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
	Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
	linux-kernel

By passing a non-null flag we allow fixup_user_fault to retry, which
enables userfaultfd.  As during these retries we might drop the mmap_sem we
need to check if that happened and redo the complete chain of actions.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/mm/pgtable.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index b15759c..3c5456d 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -578,17 +578,29 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
 {
 	unsigned long vmaddr;
 	int rc;
+	bool unlocked;
 
 	down_read(&gmap->mm->mmap_sem);
+
+retry:
+	unlocked = false;
 	vmaddr = __gmap_translate(gmap, gaddr);
 	if (IS_ERR_VALUE(vmaddr)) {
 		rc = vmaddr;
 		goto out_up;
 	}
-	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags, NULL)) {
+	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags,
+			     &unlocked)) {
 		rc = -EFAULT;
 		goto out_up;
 	}
+	/*
+	 * In the case that fixup_user_fault unlocked the mmap_sem during
+	 * faultin redo __gmap_translate to not race with a map/unmap_segment.
+	 */
+	if (unlocked)
+		goto retry;
+
 	rc = __gmap_link(gmap, gaddr, vmaddr);
 out_up:
 	up_read(&gmap->mm->mmap_sem);
@@ -717,12 +729,14 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
 	spinlock_t *ptl;
 	pte_t *ptep, entry;
 	pgste_t pgste;
+	bool unlocked;
 	int rc = 0;
 
 	if ((gaddr & ~PAGE_MASK) || (len & ~PAGE_MASK))
 		return -EINVAL;
 	down_read(&gmap->mm->mmap_sem);
 	while (len) {
+		unlocked = false;
 		/* Convert gmap address and connect the page tables */
 		addr = __gmap_translate(gmap, gaddr);
 		if (IS_ERR_VALUE(addr)) {
@@ -731,10 +745,13 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
 		}
 		/* Get the page mapped */
 		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE,
-				     NULL)) {
+				     &unlocked)) {
 			rc = -EFAULT;
 			break;
 		}
+		/* While trying to map mmap_sem got unlocked. Let us retry */
+		if (unlocked)
+			continue;
 		rc = __gmap_link(gmap, gaddr, addr);
 		if (rc)
 			break;
@@ -795,9 +812,11 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 	spinlock_t *ptl;
 	pgste_t old, new;
 	pte_t *ptep;
+	bool unlocked;
 
 	down_read(&mm->mmap_sem);
 retry:
+	unlocked = false;
 	ptep = get_locked_pte(mm, addr, &ptl);
 	if (unlikely(!ptep)) {
 		up_read(&mm->mmap_sem);
@@ -806,8 +825,12 @@ retry:
 	if (!(pte_val(*ptep) & _PAGE_INVALID) &&
 	     (pte_val(*ptep) & _PAGE_PROTECT)) {
 		pte_unmap_unlock(ptep, ptl);
+		/*
+		 * We do not really care about unlocked. We will retry either
+		 * way. But this allows fixup_user_fault to enable userfaultfd.
+		 */
 		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE,
-				     NULL)) {
+				     &unlocked)) {
 			up_read(&mm->mmap_sem);
 			return -EFAULT;
 		}
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] s390/mm: enable fixup_user_fault retrying
@ 2016-01-04 11:19   ` Dominik Dingel
  0 siblings, 0 replies; 12+ messages in thread
From: Dominik Dingel @ 2016-01-04 11:19 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
	Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
  Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
	Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
	linux-kernel

By passing a non-null flag we allow fixup_user_fault to retry, which
enables userfaultfd.  As during these retries we might drop the mmap_sem we
need to check if that happened and redo the complete chain of actions.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/mm/pgtable.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index b15759c..3c5456d 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -578,17 +578,29 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
 {
 	unsigned long vmaddr;
 	int rc;
+	bool unlocked;
 
 	down_read(&gmap->mm->mmap_sem);
+
+retry:
+	unlocked = false;
 	vmaddr = __gmap_translate(gmap, gaddr);
 	if (IS_ERR_VALUE(vmaddr)) {
 		rc = vmaddr;
 		goto out_up;
 	}
-	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags, NULL)) {
+	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags,
+			     &unlocked)) {
 		rc = -EFAULT;
 		goto out_up;
 	}
+	/*
+	 * In the case that fixup_user_fault unlocked the mmap_sem during
+	 * faultin redo __gmap_translate to not race with a map/unmap_segment.
+	 */
+	if (unlocked)
+		goto retry;
+
 	rc = __gmap_link(gmap, gaddr, vmaddr);
 out_up:
 	up_read(&gmap->mm->mmap_sem);
@@ -717,12 +729,14 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
 	spinlock_t *ptl;
 	pte_t *ptep, entry;
 	pgste_t pgste;
+	bool unlocked;
 	int rc = 0;
 
 	if ((gaddr & ~PAGE_MASK) || (len & ~PAGE_MASK))
 		return -EINVAL;
 	down_read(&gmap->mm->mmap_sem);
 	while (len) {
+		unlocked = false;
 		/* Convert gmap address and connect the page tables */
 		addr = __gmap_translate(gmap, gaddr);
 		if (IS_ERR_VALUE(addr)) {
@@ -731,10 +745,13 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
 		}
 		/* Get the page mapped */
 		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE,
-				     NULL)) {
+				     &unlocked)) {
 			rc = -EFAULT;
 			break;
 		}
+		/* While trying to map mmap_sem got unlocked. Let us retry */
+		if (unlocked)
+			continue;
 		rc = __gmap_link(gmap, gaddr, addr);
 		if (rc)
 			break;
@@ -795,9 +812,11 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 	spinlock_t *ptl;
 	pgste_t old, new;
 	pte_t *ptep;
+	bool unlocked;
 
 	down_read(&mm->mmap_sem);
 retry:
+	unlocked = false;
 	ptep = get_locked_pte(mm, addr, &ptl);
 	if (unlikely(!ptep)) {
 		up_read(&mm->mmap_sem);
@@ -806,8 +825,12 @@ retry:
 	if (!(pte_val(*ptep) & _PAGE_INVALID) &&
 	     (pte_val(*ptep) & _PAGE_PROTECT)) {
 		pte_unmap_unlock(ptep, ptl);
+		/*
+		 * We do not really care about unlocked. We will retry either
+		 * way. But this allows fixup_user_fault to enable userfaultfd.
+		 */
 		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE,
-				     NULL)) {
+				     &unlocked)) {
 			up_read(&mm->mmap_sem);
 			return -EFAULT;
 		}
-- 
2.3.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] Allow gmap fault to retry
  2016-01-04 11:19 ` Dominik Dingel
@ 2016-01-04 17:08   ` Andrea Arcangeli
  -1 siblings, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2016-01-04 17:08 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: Kirill A. Shutemov, Martin Schwidefsky, Christian Borntraeger,
	Jason J. Herne, linux-s390, linux-mm, Andrew Morton,
	David Rientjes, Eric B Munson, Naoya Horiguchi, Mel Gorman,
	Heiko Carstens, Paolo Bonzini, linux-kernel

On Mon, Jan 04, 2016 at 12:19:53PM +0100, Dominik Dingel wrote:
> Hello,
> 
> sorry for the delay since the last version.
> 
> During Jasons work with postcopy migration support for s390 a problem regarding
> gmap faults was discovered.
> 
> The gmap code will call fixup_user_fault which will end up always in
> handle_mm_fault. Till now we never cared about retries, but as the userfaultfd
> code kind of relies on it. this needs some fix.
> 
> This patchset does not take care of the futex code. I will now look closer at
> this.
> 
> Thanks,
>     Dominik
> 
> v2 -> v3:
> - In case of retrying check vma again
> - Do the accounting of major/minor faults once

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>


> 
> v1 -> v2:
> - Instread of passing the VM_FAULT_RETRY from fixup_user_fault we do retries
>   within fixup_user_fault, like get_user_pages_locked do.
> - gmap code will now take retry if fixup_user_fault drops the lock.
> 
> Dominik Dingel (2):
>   mm: bring in additional flag for fixup_user_fault to signal unlock
>   s390/mm: enable fixup_user_fault retrying
> 
>  arch/s390/mm/pgtable.c | 31 ++++++++++++++++++++++++++++---
>  include/linux/mm.h     |  5 +++--
>  kernel/futex.c         |  2 +-
>  mm/gup.c               | 30 +++++++++++++++++++++++++-----
>  4 files changed, 57 insertions(+), 11 deletions(-)
> 
> -- 
> 2.3.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/2] Allow gmap fault to retry
@ 2016-01-04 17:08   ` Andrea Arcangeli
  0 siblings, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2016-01-04 17:08 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: Kirill A. Shutemov, Martin Schwidefsky, Christian Borntraeger,
	Jason J. Herne, linux-s390, linux-mm, Andrew Morton,
	David Rientjes, Eric B Munson, Naoya Horiguchi, Mel Gorman,
	Heiko Carstens, Paolo Bonzini, linux-kernel

On Mon, Jan 04, 2016 at 12:19:53PM +0100, Dominik Dingel wrote:
> Hello,
> 
> sorry for the delay since the last version.
> 
> During Jasons work with postcopy migration support for s390 a problem regarding
> gmap faults was discovered.
> 
> The gmap code will call fixup_user_fault which will end up always in
> handle_mm_fault. Till now we never cared about retries, but as the userfaultfd
> code kind of relies on it. this needs some fix.
> 
> This patchset does not take care of the futex code. I will now look closer at
> this.
> 
> Thanks,
>     Dominik
> 
> v2 -> v3:
> - In case of retrying check vma again
> - Do the accounting of major/minor faults once

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>


> 
> v1 -> v2:
> - Instread of passing the VM_FAULT_RETRY from fixup_user_fault we do retries
>   within fixup_user_fault, like get_user_pages_locked do.
> - gmap code will now take retry if fixup_user_fault drops the lock.
> 
> Dominik Dingel (2):
>   mm: bring in additional flag for fixup_user_fault to signal unlock
>   s390/mm: enable fixup_user_fault retrying
> 
>  arch/s390/mm/pgtable.c | 31 ++++++++++++++++++++++++++++---
>  include/linux/mm.h     |  5 +++--
>  kernel/futex.c         |  2 +-
>  mm/gup.c               | 30 +++++++++++++++++++++++++-----
>  4 files changed, 57 insertions(+), 11 deletions(-)
> 
> -- 
> 2.3.0
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock
  2015-11-26 17:27   ` Dominik Dingel
@ 2015-12-04 21:49     ` Andrea Arcangeli
  -1 siblings, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2015-12-04 21:49 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: Kirill A. Shutemov, Martin Schwidefsky, Christian Borntraeger,
	Jason J. Herne, linux-s390, linux-mm, Andrew Morton,
	David Rientjes, Eric B Munson, Naoya Horiguchi, Mel Gorman,
	Heiko Carstens, Paolo Bonzini, linux-kernel

On Thu, Nov 26, 2015 at 06:27:01PM +0100, Dominik Dingel wrote:
> @@ -599,6 +603,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  	if (!(vm_flags & vma->vm_flags))
>  		return -EFAULT;
>  
> +	if (unlocked)
> +		fault_flags |= FAULT_FLAG_ALLOW_RETRY;
> +
> +retry:

This should move up before find_extend_vma, otherwise the vma used
below could be a dangling pointer after the "goto retry".

>  	ret = handle_mm_fault(mm, vma, address, fault_flags);
>  	if (ret & VM_FAULT_ERROR) {
>  		if (ret & VM_FAULT_OOM)
> @@ -609,12 +617,21 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  			return -EFAULT;
>  		BUG();
>  	}
> -	if (tsk) {
> +	if (tsk && !(fault_flags & FAULT_FLAG_TRIED)) {
>  		if (ret & VM_FAULT_MAJOR)
>  			tsk->maj_flt++;
>  		else
>  			tsk->min_flt++;
>  	}

It'd look cleaner if we'd move the tsk update after the retry check in
case the FAULT_FLAG_TRIED second attempt actually fails, to avoid
recording a fault for a non-really-faulting VM_FAULT_RETRY
attempt. This is what the real page fault does at least so it sounds
cleaner do the same here, but then in practice it makes very little
difference.

> +	if (ret & VM_FAULT_RETRY) {
> +		down_read(&mm->mmap_sem);
> +		if (!(fault_flags & FAULT_FLAG_TRIED)) {
> +			*unlocked = true;
> +			fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
> +			fault_flags |= FAULT_FLAG_TRIED;
> +			goto retry;
> +		}
> +	}
>  	return 0;
>  }

Rest looks great.

The futex.c should be patched to pass the unlocked pointer in a later
patch but we can also postpone it to a different patchset.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock
@ 2015-12-04 21:49     ` Andrea Arcangeli
  0 siblings, 0 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2015-12-04 21:49 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: Kirill A. Shutemov, Martin Schwidefsky, Christian Borntraeger,
	Jason J. Herne, linux-s390, linux-mm, Andrew Morton,
	David Rientjes, Eric B Munson, Naoya Horiguchi, Mel Gorman,
	Heiko Carstens, Paolo Bonzini, linux-kernel

On Thu, Nov 26, 2015 at 06:27:01PM +0100, Dominik Dingel wrote:
> @@ -599,6 +603,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  	if (!(vm_flags & vma->vm_flags))
>  		return -EFAULT;
>  
> +	if (unlocked)
> +		fault_flags |= FAULT_FLAG_ALLOW_RETRY;
> +
> +retry:

This should move up before find_extend_vma, otherwise the vma used
below could be a dangling pointer after the "goto retry".

>  	ret = handle_mm_fault(mm, vma, address, fault_flags);
>  	if (ret & VM_FAULT_ERROR) {
>  		if (ret & VM_FAULT_OOM)
> @@ -609,12 +617,21 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  			return -EFAULT;
>  		BUG();
>  	}
> -	if (tsk) {
> +	if (tsk && !(fault_flags & FAULT_FLAG_TRIED)) {
>  		if (ret & VM_FAULT_MAJOR)
>  			tsk->maj_flt++;
>  		else
>  			tsk->min_flt++;
>  	}

It'd look cleaner if we'd move the tsk update after the retry check in
case the FAULT_FLAG_TRIED second attempt actually fails, to avoid
recording a fault for a non-really-faulting VM_FAULT_RETRY
attempt. This is what the real page fault does at least so it sounds
cleaner do the same here, but then in practice it makes very little
difference.

> +	if (ret & VM_FAULT_RETRY) {
> +		down_read(&mm->mmap_sem);
> +		if (!(fault_flags & FAULT_FLAG_TRIED)) {
> +			*unlocked = true;
> +			fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
> +			fault_flags |= FAULT_FLAG_TRIED;
> +			goto retry;
> +		}
> +	}
>  	return 0;
>  }

Rest looks great.

The futex.c should be patched to pass the unlocked pointer in a later
patch but we can also postpone it to a different patchset.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock
  2015-11-26 17:27 [PATCH v2 " Dominik Dingel
@ 2015-11-26 17:27   ` Dominik Dingel
  0 siblings, 0 replies; 12+ messages in thread
From: Dominik Dingel @ 2015-11-26 17:27 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
	Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
  Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
	Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
	linux-kernel

With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to
pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we
ever unlocked mmap_sem.

This patch brings in the logic to handle retries as well as it cleans up
the current documentation.  fixup_user_fault was not having the same
semantics as filemap_fault.  It never indicated if a retry happened and so
a caller wasn't able to handle that case.  So we now changed the behaviour
to always retry a locked mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/mm/pgtable.c |  8 +++++---
 include/linux/mm.h     |  5 +++--
 kernel/futex.c         |  2 +-
 mm/gup.c               | 25 +++++++++++++++++++++----
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 54ef3bc..b15759c 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -585,7 +585,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
 		rc = vmaddr;
 		goto out_up;
 	}
-	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags)) {
+	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags, NULL)) {
 		rc = -EFAULT;
 		goto out_up;
 	}
@@ -730,7 +730,8 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
 			break;
 		}
 		/* Get the page mapped */
-		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE)) {
+		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE,
+				     NULL)) {
 			rc = -EFAULT;
 			break;
 		}
@@ -805,7 +806,8 @@ retry:
 	if (!(pte_val(*ptep) & _PAGE_INVALID) &&
 	     (pte_val(*ptep) & _PAGE_PROTECT)) {
 		pte_unmap_unlock(ptep, ptl);
-		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE)) {
+		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE,
+				     NULL)) {
 			up_read(&mm->mmap_sem);
 			return -EFAULT;
 		}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad77..7783073 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1163,7 +1163,8 @@ int invalidate_inode_page(struct page *page);
 extern int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags);
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
-			    unsigned long address, unsigned int fault_flags);
+			    unsigned long address, unsigned int fault_flags,
+			    bool *unlocked);
 #else
 static inline int handle_mm_fault(struct mm_struct *mm,
 			struct vm_area_struct *vma, unsigned long address,
@@ -1175,7 +1176,7 @@ static inline int handle_mm_fault(struct mm_struct *mm,
 }
 static inline int fixup_user_fault(struct task_struct *tsk,
 		struct mm_struct *mm, unsigned long address,
-		unsigned int fault_flags)
+		unsigned int fault_flags, bool *unlocked)
 {
 	/* should never happen if there's no MMU */
 	BUG();
diff --git a/kernel/futex.c b/kernel/futex.c
index 684d754..fb640c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -639,7 +639,7 @@ static int fault_in_user_writeable(u32 __user *uaddr)
 
 	down_read(&mm->mmap_sem);
 	ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
-			       FAULT_FLAG_WRITE);
+			       FAULT_FLAG_WRITE, NULL);
 	up_read(&mm->mmap_sem);
 
 	return ret < 0 ? ret : 0;
diff --git a/mm/gup.c b/mm/gup.c
index deafa2c..4ed35a3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -564,6 +564,8 @@ EXPORT_SYMBOL(__get_user_pages);
  * @mm:		mm_struct of target mm
  * @address:	user address
  * @fault_flags:flags to pass down to handle_mm_fault()
+ * @unlocked:	did we unlock the mmap_sem while retrying, maybe NULL if caller
+ *		does not allow retry
  *
  * This is meant to be called in the specific scenario where for locking reasons
  * we try to access user memory in atomic context (within a pagefault_disable()
@@ -575,17 +577,19 @@ EXPORT_SYMBOL(__get_user_pages);
  * The main difference with get_user_pages() is that this function will
  * unconditionally call handle_mm_fault() which will in turn perform all the
  * necessary SW fixup of the dirty and young bits in the PTE, while
- * handle_mm_fault() only guarantees to update these in the struct page.
+ * get_user_pages() only guarantees to update these in the struct page.
  *
  * This is important for some architectures where those bits also gate the
  * access permission to the page because they are maintained in software.  On
  * such architectures, gup() will not be enough to make a subsequent access
  * succeed.
  *
- * This has the same semantics wrt the @mm->mmap_sem as does filemap_fault().
+ * This function will not return with an unlocked mmap_sem. So it has not the
+ * same semantics wrt the @mm->mmap_sem as does filemap_fault().
  */
 int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
-		     unsigned long address, unsigned int fault_flags)
+		     unsigned long address, unsigned int fault_flags,
+		     bool *unlocked)
 {
 	struct vm_area_struct *vma;
 	vm_flags_t vm_flags;
@@ -599,6 +603,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 	if (!(vm_flags & vma->vm_flags))
 		return -EFAULT;
 
+	if (unlocked)
+		fault_flags |= FAULT_FLAG_ALLOW_RETRY;
+
+retry:
 	ret = handle_mm_fault(mm, vma, address, fault_flags);
 	if (ret & VM_FAULT_ERROR) {
 		if (ret & VM_FAULT_OOM)
@@ -609,12 +617,21 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 			return -EFAULT;
 		BUG();
 	}
-	if (tsk) {
+	if (tsk && !(fault_flags & FAULT_FLAG_TRIED)) {
 		if (ret & VM_FAULT_MAJOR)
 			tsk->maj_flt++;
 		else
 			tsk->min_flt++;
 	}
+	if (ret & VM_FAULT_RETRY) {
+		down_read(&mm->mmap_sem);
+		if (!(fault_flags & FAULT_FLAG_TRIED)) {
+			*unlocked = true;
+			fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
+			fault_flags |= FAULT_FLAG_TRIED;
+			goto retry;
+		}
+	}
 	return 0;
 }
 
-- 
2.3.9


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock
@ 2015-11-26 17:27   ` Dominik Dingel
  0 siblings, 0 replies; 12+ messages in thread
From: Dominik Dingel @ 2015-11-26 17:27 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrea Arcangeli, Martin Schwidefsky,
	Christian Borntraeger, Jason J. Herne, linux-s390, linux-mm
  Cc: Andrew Morton, David Rientjes, Eric B Munson, Naoya Horiguchi,
	Mel Gorman, Heiko Carstens, Dominik Dingel, Paolo Bonzini,
	linux-kernel

With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to
pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we
ever unlocked mmap_sem.

This patch brings in the logic to handle retries as well as it cleans up
the current documentation.  fixup_user_fault was not having the same
semantics as filemap_fault.  It never indicated if a retry happened and so
a caller wasn't able to handle that case.  So we now changed the behaviour
to always retry a locked mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/mm/pgtable.c |  8 +++++---
 include/linux/mm.h     |  5 +++--
 kernel/futex.c         |  2 +-
 mm/gup.c               | 25 +++++++++++++++++++++----
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 54ef3bc..b15759c 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -585,7 +585,7 @@ int gmap_fault(struct gmap *gmap, unsigned long gaddr,
 		rc = vmaddr;
 		goto out_up;
 	}
-	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags)) {
+	if (fixup_user_fault(current, gmap->mm, vmaddr, fault_flags, NULL)) {
 		rc = -EFAULT;
 		goto out_up;
 	}
@@ -730,7 +730,8 @@ int gmap_ipte_notify(struct gmap *gmap, unsigned long gaddr, unsigned long len)
 			break;
 		}
 		/* Get the page mapped */
-		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE)) {
+		if (fixup_user_fault(current, gmap->mm, addr, FAULT_FLAG_WRITE,
+				     NULL)) {
 			rc = -EFAULT;
 			break;
 		}
@@ -805,7 +806,8 @@ retry:
 	if (!(pte_val(*ptep) & _PAGE_INVALID) &&
 	     (pte_val(*ptep) & _PAGE_PROTECT)) {
 		pte_unmap_unlock(ptep, ptl);
-		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE)) {
+		if (fixup_user_fault(current, mm, addr, FAULT_FLAG_WRITE,
+				     NULL)) {
 			up_read(&mm->mmap_sem);
 			return -EFAULT;
 		}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 00bad77..7783073 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1163,7 +1163,8 @@ int invalidate_inode_page(struct page *page);
 extern int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags);
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
-			    unsigned long address, unsigned int fault_flags);
+			    unsigned long address, unsigned int fault_flags,
+			    bool *unlocked);
 #else
 static inline int handle_mm_fault(struct mm_struct *mm,
 			struct vm_area_struct *vma, unsigned long address,
@@ -1175,7 +1176,7 @@ static inline int handle_mm_fault(struct mm_struct *mm,
 }
 static inline int fixup_user_fault(struct task_struct *tsk,
 		struct mm_struct *mm, unsigned long address,
-		unsigned int fault_flags)
+		unsigned int fault_flags, bool *unlocked)
 {
 	/* should never happen if there's no MMU */
 	BUG();
diff --git a/kernel/futex.c b/kernel/futex.c
index 684d754..fb640c5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -639,7 +639,7 @@ static int fault_in_user_writeable(u32 __user *uaddr)
 
 	down_read(&mm->mmap_sem);
 	ret = fixup_user_fault(current, mm, (unsigned long)uaddr,
-			       FAULT_FLAG_WRITE);
+			       FAULT_FLAG_WRITE, NULL);
 	up_read(&mm->mmap_sem);
 
 	return ret < 0 ? ret : 0;
diff --git a/mm/gup.c b/mm/gup.c
index deafa2c..4ed35a3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -564,6 +564,8 @@ EXPORT_SYMBOL(__get_user_pages);
  * @mm:		mm_struct of target mm
  * @address:	user address
  * @fault_flags:flags to pass down to handle_mm_fault()
+ * @unlocked:	did we unlock the mmap_sem while retrying, maybe NULL if caller
+ *		does not allow retry
  *
  * This is meant to be called in the specific scenario where for locking reasons
  * we try to access user memory in atomic context (within a pagefault_disable()
@@ -575,17 +577,19 @@ EXPORT_SYMBOL(__get_user_pages);
  * The main difference with get_user_pages() is that this function will
  * unconditionally call handle_mm_fault() which will in turn perform all the
  * necessary SW fixup of the dirty and young bits in the PTE, while
- * handle_mm_fault() only guarantees to update these in the struct page.
+ * get_user_pages() only guarantees to update these in the struct page.
  *
  * This is important for some architectures where those bits also gate the
  * access permission to the page because they are maintained in software.  On
  * such architectures, gup() will not be enough to make a subsequent access
  * succeed.
  *
- * This has the same semantics wrt the @mm->mmap_sem as does filemap_fault().
+ * This function will not return with an unlocked mmap_sem. So it has not the
+ * same semantics wrt the @mm->mmap_sem as does filemap_fault().
  */
 int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
-		     unsigned long address, unsigned int fault_flags)
+		     unsigned long address, unsigned int fault_flags,
+		     bool *unlocked)
 {
 	struct vm_area_struct *vma;
 	vm_flags_t vm_flags;
@@ -599,6 +603,10 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 	if (!(vm_flags & vma->vm_flags))
 		return -EFAULT;
 
+	if (unlocked)
+		fault_flags |= FAULT_FLAG_ALLOW_RETRY;
+
+retry:
 	ret = handle_mm_fault(mm, vma, address, fault_flags);
 	if (ret & VM_FAULT_ERROR) {
 		if (ret & VM_FAULT_OOM)
@@ -609,12 +617,21 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 			return -EFAULT;
 		BUG();
 	}
-	if (tsk) {
+	if (tsk && !(fault_flags & FAULT_FLAG_TRIED)) {
 		if (ret & VM_FAULT_MAJOR)
 			tsk->maj_flt++;
 		else
 			tsk->min_flt++;
 	}
+	if (ret & VM_FAULT_RETRY) {
+		down_read(&mm->mmap_sem);
+		if (!(fault_flags & FAULT_FLAG_TRIED)) {
+			*unlocked = true;
+			fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
+			fault_flags |= FAULT_FLAG_TRIED;
+			goto retry;
+		}
+	}
 	return 0;
 }
 
-- 
2.3.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-04 17:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-04 11:19 [PATCH v3 0/2] Allow gmap fault to retry Dominik Dingel
2016-01-04 11:19 ` Dominik Dingel
2016-01-04 11:19 ` [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock Dominik Dingel
2016-01-04 11:19   ` Dominik Dingel
2016-01-04 11:19 ` [PATCH 2/2] s390/mm: enable fixup_user_fault retrying Dominik Dingel
2016-01-04 11:19   ` Dominik Dingel
2016-01-04 17:08 ` [PATCH v3 0/2] Allow gmap fault to retry Andrea Arcangeli
2016-01-04 17:08   ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2015-11-26 17:27 [PATCH v2 " Dominik Dingel
2015-11-26 17:27 ` [PATCH 1/2] mm: bring in additional flag for fixup_user_fault to signal unlock Dominik Dingel
2015-11-26 17:27   ` Dominik Dingel
2015-12-04 21:49   ` Andrea Arcangeli
2015-12-04 21:49     ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.