linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 01/26] cris: Convert cryptocop to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 02/26] ia64: Use get_user_pages_fast() in err_inject.c Jan Kara
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, linux-cris-kernel, Mikael Starvik, Jesper Nilsson

CC: linux-cris-kernel@axis.com
CC: Mikael Starvik <starvik@axis.com>
CC: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 arch/cris/arch-v32/drivers/cryptocop.c | 35 ++++++++++------------------------
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/arch/cris/arch-v32/drivers/cryptocop.c b/arch/cris/arch-v32/drivers/cryptocop.c
index 877da1908234..df7ceeff1086 100644
--- a/arch/cris/arch-v32/drivers/cryptocop.c
+++ b/arch/cris/arch-v32/drivers/cryptocop.c
@@ -2716,43 +2716,28 @@ static int cryptocop_ioctl_process(struct inode *inode, struct file *filp, unsig
 		}
 	}
 
-	/* Acquire the mm page semaphore. */
-	down_read(&current->mm->mmap_sem);
-
-	err = get_user_pages(current,
-			     current->mm,
-			     (unsigned long int)(oper.indata + prev_ix),
-			     noinpages,
-			     0,  /* read access only for in data */
-			     0, /* no force */
-			     inpages,
-			     NULL);
+	err = get_user_pages_fast((unsigned long)(oper.indata + prev_ix),
+				  noinpages,
+				  0,  /* read access only for in data */
+				  inpages);
 
 	if (err < 0) {
-		up_read(&current->mm->mmap_sem);
 		nooutpages = noinpages = 0;
-		DEBUG_API(printk("cryptocop_ioctl_process: get_user_pages indata\n"));
+		DEBUG_API(printk("cryptocop_ioctl_process: get_user_pages_fast indata\n"));
 		goto error_cleanup;
 	}
 	noinpages = err;
 	if (oper.do_cipher){
-		err = get_user_pages(current,
-				     current->mm,
-				     (unsigned long int)oper.cipher_outdata,
-				     nooutpages,
-				     1, /* write access for out data */
-				     0, /* no force */
-				     outpages,
-				     NULL);
-		up_read(&current->mm->mmap_sem);
+		err = get_user_pages_fast((unsigned long)oper.cipher_outdata,
+					  nooutpages,
+					  1, /* write access for out data */
+					  outpages);
 		if (err < 0) {
 			nooutpages = 0;
-			DEBUG_API(printk("cryptocop_ioctl_process: get_user_pages outdata\n"));
+			DEBUG_API(printk("cryptocop_ioctl_process: get_user_pages_fast outdata\n"));
 			goto error_cleanup;
 		}
 		nooutpages = err;
-	} else {
-		up_read(&current->mm->mmap_sem);
 	}
 
 	/* Add 6 to nooutpages to make room for possibly inserted buffers for storing digest and
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/26] ia64: Use get_user_pages_fast() in err_inject.c
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
  2013-10-02 14:27 ` [PATCH 01/26] cris: Convert cryptocop to use get_user_pages_fast() Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 03/26] dma: Use get_user_pages_fast() in dma_pin_iovec_pages() Jan Kara
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Tony Luck, linux-ia64

Convert get_user_pages() call to get_user_pages_fast(). This actually
fixes an apparent bug where get_user_pages() has been called without
mmap_sem for an arbitrary user-provided address.

CC: Tony Luck <tony.luck@intel.com>
CC: linux-ia64@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 arch/ia64/kernel/err_inject.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/err_inject.c b/arch/ia64/kernel/err_inject.c
index f59c0b844e88..75d35906a86b 100644
--- a/arch/ia64/kernel/err_inject.c
+++ b/arch/ia64/kernel/err_inject.c
@@ -142,8 +142,7 @@ store_virtual_to_phys(struct device *dev, struct device_attribute *attr,
 	u64 virt_addr=simple_strtoull(buf, NULL, 16);
 	int ret;
 
-        ret = get_user_pages(current, current->mm, virt_addr,
-                        1, VM_READ, 0, NULL, NULL);
+	ret = get_user_pages_fast(virt_addr, 1, VM_READ, NULL);
 	if (ret<=0) {
 #ifdef ERR_INJ_DEBUG
 		printk("Virtual address %lx is not existing.\n",virt_addr);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/26] dma: Use get_user_pages_fast() in dma_pin_iovec_pages()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
  2013-10-02 14:27 ` [PATCH 01/26] cris: Convert cryptocop to use get_user_pages_fast() Jan Kara
  2013-10-02 14:27 ` [PATCH 02/26] ia64: Use get_user_pages_fast() in err_inject.c Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 04/26] drm: Convert via driver to use get_user_pages_fast() Jan Kara
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Dan Williams

CC: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/dma/iovlock.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c
index bb48a57c2fc1..8b16332da8a9 100644
--- a/drivers/dma/iovlock.c
+++ b/drivers/dma/iovlock.c
@@ -95,17 +95,10 @@ struct dma_pinned_list *dma_pin_iovec_pages(struct iovec *iov, size_t len)
 		pages += page_list->nr_pages;
 
 		/* pin pages down */
-		down_read(&current->mm->mmap_sem);
-		ret = get_user_pages(
-			current,
-			current->mm,
-			(unsigned long) iov[i].iov_base,
-			page_list->nr_pages,
-			1,	/* write */
-			0,	/* force */
-			page_list->pages,
-			NULL);
-		up_read(&current->mm->mmap_sem);
+		ret = get_user_pages_fast((unsigned long) iov[i].iov_base,
+					  page_list->nr_pages,
+					  1,	/* write */
+					  page_list->pages);
 
 		if (ret != page_list->nr_pages)
 			goto unpin;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/26] drm: Convert via driver to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (2 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 03/26] dma: Use get_user_pages_fast() in dma_pin_iovec_pages() Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 05/26] omap3isp: Make isp_video_buffer_prepare_user() " Jan Kara
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, David Airlie, dri-devel

CC: David Airlie <airlied@linux.ie>
CC: dri-devel@lists.freedesktop.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/gpu/drm/via/via_dmablit.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
index 8b0f25904e6d..7e3766667d78 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -238,14 +238,10 @@ via_lock_all_dma_pages(drm_via_sg_info_t *vsg,  drm_via_dmablit_t *xfer)
 	vsg->pages = vzalloc(sizeof(struct page *) * vsg->num_pages);
 	if (NULL == vsg->pages)
 		return -ENOMEM;
-	down_read(&current->mm->mmap_sem);
-	ret = get_user_pages(current, current->mm,
-			     (unsigned long)xfer->mem_addr,
-			     vsg->num_pages,
-			     (vsg->direction == DMA_FROM_DEVICE),
-			     0, vsg->pages, NULL);
-
-	up_read(&current->mm->mmap_sem);
+	ret = get_user_pages_fast((unsigned long)xfer->mem_addr,
+				  vsg->num_pages,
+				  (vsg->direction == DMA_FROM_DEVICE),
+				  vsg->pages);
 	if (ret != vsg->num_pages) {
 		if (ret < 0)
 			return ret;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/26] omap3isp: Make isp_video_buffer_prepare_user() use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (3 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 04/26] drm: Convert via driver to use get_user_pages_fast() Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 19:41   ` Laurent Pinchart
  2013-10-02 14:27 ` [PATCH 06/26] vmw_vmci: Convert driver to " Jan Kara
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Laurent Pinchart, linux-media

CC: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
CC: linux-media@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/media/platform/omap3isp/ispqueue.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/media/platform/omap3isp/ispqueue.c b/drivers/media/platform/omap3isp/ispqueue.c
index e15f01342058..bed380395e6c 100644
--- a/drivers/media/platform/omap3isp/ispqueue.c
+++ b/drivers/media/platform/omap3isp/ispqueue.c
@@ -331,13 +331,9 @@ static int isp_video_buffer_prepare_user(struct isp_video_buffer *buf)
 	if (buf->pages == NULL)
 		return -ENOMEM;
 
-	down_read(&current->mm->mmap_sem);
-	ret = get_user_pages(current, current->mm, data & PAGE_MASK,
-			     buf->npages,
-			     buf->vbuf.type == V4L2_BUF_TYPE_VIDEO_CAPTURE, 0,
-			     buf->pages, NULL);
-	up_read(&current->mm->mmap_sem);
-
+	ret = get_user_pages_fast(data & PAGE_MASK, buf->npages,
+				  buf->vbuf.type == V4L2_BUF_TYPE_VIDEO_CAPTURE,
+				  buf->pages);
 	if (ret != buf->npages) {
 		buf->npages = ret < 0 ? 0 : ret;
 		isp_video_buffer_cleanup(buf);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/26] vmw_vmci: Convert driver to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (4 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 05/26] omap3isp: Make isp_video_buffer_prepare_user() " Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 07/26] st: Convert sgl_map_user_pages() " Jan Kara
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Arnd Bergmann, Greg Kroah-Hartman

Convert vmci_host_setup_notify() and qp_host_get_user_memory() to use
get_user_pages_fast() instead of get_user_pages(). Note that
qp_host_get_user_memory() was using mmap_sem for writing without an
apparent reason.

CC: Arnd Bergmann <arnd@arndb.de>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/misc/vmw_vmci/vmci_host.c       |  6 +-----
 drivers/misc/vmw_vmci/vmci_queue_pair.c | 21 ++++++---------------
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_host.c b/drivers/misc/vmw_vmci/vmci_host.c
index d4722b3dc8ec..1723a6e4f2e8 100644
--- a/drivers/misc/vmw_vmci/vmci_host.c
+++ b/drivers/misc/vmw_vmci/vmci_host.c
@@ -243,11 +243,7 @@ static int vmci_host_setup_notify(struct vmci_ctx *context,
 	/*
 	 * Lock physical page backing a given user VA.
 	 */
-	down_read(&current->mm->mmap_sem);
-	retval = get_user_pages(current, current->mm,
-				PAGE_ALIGN(uva),
-				1, 1, 0, &page, NULL);
-	up_read(&current->mm->mmap_sem);
+	retval = get_user_pages_fast(PAGE_ALIGN(uva), 1, 1, &page);
 	if (retval != 1)
 		return VMCI_ERROR_GENERIC;
 
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index a0515a6d6ebd..1b7b303085d2 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -732,13 +732,9 @@ static int qp_host_get_user_memory(u64 produce_uva,
 	int retval;
 	int err = VMCI_SUCCESS;
 
-	down_write(&current->mm->mmap_sem);
-	retval = get_user_pages(current,
-				current->mm,
-				(uintptr_t) produce_uva,
-				produce_q->kernel_if->num_pages,
-				1, 0,
-				produce_q->kernel_if->u.h.header_page, NULL);
+	retval = get_user_pages_fast((uintptr_t) produce_uva,
+				     produce_q->kernel_if->num_pages, 1,
+				     produce_q->kernel_if->u.h.header_page);
 	if (retval < produce_q->kernel_if->num_pages) {
 		pr_warn("get_user_pages(produce) failed (retval=%d)", retval);
 		qp_release_pages(produce_q->kernel_if->u.h.header_page,
@@ -747,12 +743,9 @@ static int qp_host_get_user_memory(u64 produce_uva,
 		goto out;
 	}
 
-	retval = get_user_pages(current,
-				current->mm,
-				(uintptr_t) consume_uva,
-				consume_q->kernel_if->num_pages,
-				1, 0,
-				consume_q->kernel_if->u.h.header_page, NULL);
+	retval = get_user_pages_fast((uintptr_t) consume_uva,
+				     consume_q->kernel_if->num_pages, 1,
+				     consume_q->kernel_if->u.h.header_page);
 	if (retval < consume_q->kernel_if->num_pages) {
 		pr_warn("get_user_pages(consume) failed (retval=%d)", retval);
 		qp_release_pages(consume_q->kernel_if->u.h.header_page,
@@ -763,8 +756,6 @@ static int qp_host_get_user_memory(u64 produce_uva,
 	}
 
  out:
-	up_write(&current->mm->mmap_sem);
-
 	return err;
 }
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/26] st: Convert sgl_map_user_pages() to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (5 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 06/26] vmw_vmci: Convert driver to " Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 08/26] ced1401: Convert driver " Jan Kara
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, linux-scsi, Kai Makisara

CC: linux-scsi@vger.kernel.org
CC: Kai Makisara <Kai.Makisara@kolumbus.fi>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/scsi/st.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index ff44b3c2cff2..ba11299c3740 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4514,19 +4514,11 @@ static int sgl_map_user_pages(struct st_buffer *STbp,
 	if ((pages = kmalloc(max_pages * sizeof(*pages), GFP_KERNEL)) == NULL)
 		return -ENOMEM;
 
-        /* Try to fault in all of the necessary pages */
-	down_read(&current->mm->mmap_sem);
-        /* rw==READ means read from drive, write into memory area */
-	res = get_user_pages(
-		current,
-		current->mm,
-		uaddr,
-		nr_pages,
-		rw == READ,
-		0, /* don't force */
-		pages,
-		NULL);
-	up_read(&current->mm->mmap_sem);
+        /*
+	 * Try to fault in all of the necessary pages. rw==READ means read
+	 * from drive, write into memory area.
+	 */
+	res = get_user_pages_fast(uaddr, nr_pages, rw == READ, pages);
 
 	/* Errors and no page mapped should return here */
 	if (res < nr_pages)
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/26] ced1401: Convert driver to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (6 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 07/26] st: Convert sgl_map_user_pages() " Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 09/26] crystalhd: Convert crystalhd_map_dio() " Jan Kara
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Greg Kroah-Hartman

CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/staging/ced1401/ced_ioc.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/staging/ced1401/ced_ioc.c b/drivers/staging/ced1401/ced_ioc.c
index 2dbaf39e2fc2..62efd74b8c04 100644
--- a/drivers/staging/ced1401/ced_ioc.c
+++ b/drivers/staging/ced1401/ced_ioc.c
@@ -692,10 +692,7 @@ static int SetArea(DEVICE_EXTENSION *pdx, int nArea, char __user *puBuf,
 		__func__, puBuf, dwLength, bCircular);
 
 	/*  To pin down user pages we must first acquire the mapping semaphore. */
-	down_read(&current->mm->mmap_sem);	/*  get memory map semaphore */
-	nPages = get_user_pages(current, current->mm, ulStart, len, 1, 0,
-				pPages, NULL);
-	up_read(&current->mm->mmap_sem);	/*  release the semaphore */
+	nPages = get_user_pages_fast(ulStart, len, 1, pPages);
 	dev_dbg(&pdx->interface->dev, "%s nPages = %d", __func__, nPages);
 
 	if (nPages > 0) {		/*  if we succeeded */
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/26] crystalhd: Convert crystalhd_map_dio() to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (7 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 08/26] ced1401: Convert driver " Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 10/26] lustre: Convert ll_get_user_pages() " Jan Kara
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, Naren Sankar, Jarod Wilson, Scott Davilla,
	Manu Abraham

CC: Naren Sankar <nsankar@broadcom.com>
CC: Jarod Wilson <jarod@wilsonet.com>
CC: Scott Davilla <davilla@4pi.com>
CC: Manu Abraham <abraham.manu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/staging/crystalhd/crystalhd_misc.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/staging/crystalhd/crystalhd_misc.c b/drivers/staging/crystalhd/crystalhd_misc.c
index 51f698052aff..f2d350985e46 100644
--- a/drivers/staging/crystalhd/crystalhd_misc.c
+++ b/drivers/staging/crystalhd/crystalhd_misc.c
@@ -751,10 +751,7 @@ enum BC_STATUS crystalhd_map_dio(struct crystalhd_adp *adp, void *ubuff,
 		}
 	}
 
-	down_read(&current->mm->mmap_sem);
-	res = get_user_pages(current, current->mm, uaddr, nr_pages, rw == READ,
-			     0, dio->pages, NULL);
-	up_read(&current->mm->mmap_sem);
+	res = get_user_pages_fast(uaddr, nr_pages, rw == READ, dio->pages);
 
 	/* Save for release..*/
 	dio->sig = crystalhd_dio_locked;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/26] lustre: Convert ll_get_user_pages() to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (8 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 09/26] crystalhd: Convert crystalhd_map_dio() " Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-05  6:27   ` Dilger, Andreas
  2013-10-02 14:27 ` [PATCH 11/26] sep: Convert sep_lock_user_pages() to get_user_pages_fast() Jan Kara
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, Greg Kroah-Hartman, Peng Tao, Andreas Dilger,
	hpdd-discuss

CC: Greg Kroah-Hartman <greg@kroah.com>
CC: Peng Tao <tao.peng@emc.com>
CC: Andreas Dilger <andreas.dilger@intel.com>
CC: hpdd-discuss@lists.01.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/staging/lustre/lustre/llite/rw26.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index 96c29ad2fc8c..7e3e0967993b 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -202,11 +202,8 @@ static inline int ll_get_user_pages(int rw, unsigned long user_addr,
 
 	OBD_ALLOC_LARGE(*pages, *max_pages * sizeof(**pages));
 	if (*pages) {
-		down_read(&current->mm->mmap_sem);
-		result = get_user_pages(current, current->mm, user_addr,
-					*max_pages, (rw == READ), 0, *pages,
-					NULL);
-		up_read(&current->mm->mmap_sem);
+		result = get_user_pages_fast(user_addr, *max_pages,
+					     (rw == READ), *pages);
 		if (unlikely(result <= 0))
 			OBD_FREE_LARGE(*pages, *max_pages * sizeof(**pages));
 	}
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/26] sep: Convert sep_lock_user_pages() to get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (9 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 10/26] lustre: Convert ll_get_user_pages() " Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 12/26] pvr2fb: Convert pvr2fb_write() to use get_user_pages_fast() Jan Kara
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, Greg Kroah-Hartman, Mark Allyn, Jayant Mangalampalli

CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Mark Allyn <mark.a.allyn@intel.com>
CC: Jayant Mangalampalli <jayant.mangalampalli@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/staging/sep/sep_main.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/sep/sep_main.c b/drivers/staging/sep/sep_main.c
index 6a98a208bbf2..11f5b2117457 100644
--- a/drivers/staging/sep/sep_main.c
+++ b/drivers/staging/sep/sep_main.c
@@ -1263,13 +1263,8 @@ static int sep_lock_user_pages(struct sep_device *sep,
 	}
 
 	/* Convert the application virtual address into a set of physical */
-	down_read(&current->mm->mmap_sem);
-	result = get_user_pages(current, current->mm, app_virt_addr,
-		num_pages,
-		((in_out_flag == SEP_DRIVER_IN_FLAG) ? 0 : 1),
-		0, page_array, NULL);
-
-	up_read(&current->mm->mmap_sem);
+	result = get_user_pages_fast(app_virt_addr, num_pages,
+		((in_out_flag == SEP_DRIVER_IN_FLAG) ? 0 : 1), page_array);
 
 	/* Check the number of pages locked - if not all then exit with error */
 	if (result != num_pages) {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/26] pvr2fb: Convert pvr2fb_write() to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (10 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 11/26] sep: Convert sep_lock_user_pages() to get_user_pages_fast() Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 13/26] fsl_hypervisor: Convert ioctl_memcpy() " Jan Kara
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, linux-fbdev, Tomi Valkeinen,
	Jean-Christophe Plagniol-Villard

CC: linux-fbdev@vger.kernel.org
CC: Tomi Valkeinen <tomi.valkeinen@ti.com>
CC: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/video/pvr2fb.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/video/pvr2fb.c b/drivers/video/pvr2fb.c
index df07860563e6..31e1345a88a8 100644
--- a/drivers/video/pvr2fb.c
+++ b/drivers/video/pvr2fb.c
@@ -686,11 +686,7 @@ static ssize_t pvr2fb_write(struct fb_info *info, const char *buf,
 	if (!pages)
 		return -ENOMEM;
 
-	down_read(&current->mm->mmap_sem);
-	ret = get_user_pages(current, current->mm, (unsigned long)buf,
-			     nr_pages, WRITE, 0, pages, NULL);
-	up_read(&current->mm->mmap_sem);
-
+	ret = get_user_pages_fast((unsigned long)buf, nr_pages, WRITE, pages);
 	if (ret < nr_pages) {
 		nr_pages = ret;
 		ret = -EINVAL;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/26] fsl_hypervisor: Convert ioctl_memcpy() to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (11 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 12/26] pvr2fb: Convert pvr2fb_write() to use get_user_pages_fast() Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-04  2:38   ` Timur Tabi
  2013-10-02 14:27 ` [PATCH 14/26] nfs: Convert direct IO " Jan Kara
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Timur Tabi

CC: Timur Tabi <timur@freescale.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/virt/fsl_hypervisor.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/virt/fsl_hypervisor.c b/drivers/virt/fsl_hypervisor.c
index d294f67d6f84..791a46a5dd2a 100644
--- a/drivers/virt/fsl_hypervisor.c
+++ b/drivers/virt/fsl_hypervisor.c
@@ -242,13 +242,8 @@ static long ioctl_memcpy(struct fsl_hv_ioctl_memcpy __user *p)
 	sg_list = PTR_ALIGN(sg_list_unaligned, sizeof(struct fh_sg_list));
 
 	/* Get the physical addresses of the source buffer */
-	down_read(&current->mm->mmap_sem);
-	num_pinned = get_user_pages(current, current->mm,
-		param.local_vaddr - lb_offset, num_pages,
-		(param.source == -1) ? READ : WRITE,
-		0, pages, NULL);
-	up_read(&current->mm->mmap_sem);
-
+	num_pinned = get_user_pages_fast(param.local_vaddr - lb_offset,
+		num_pages, (param.source == -1) ? READ : WRITE, pages);
 	if (num_pinned != num_pages) {
 		/* get_user_pages() failed */
 		pr_debug("fsl-hv: could not lock source buffer\n");
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/26] nfs: Convert direct IO to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (12 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 13/26] fsl_hypervisor: Convert ioctl_memcpy() " Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 15/26] ceph: Convert ceph_get_direct_page_vector() to get_user_pages_fast() Jan Kara
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Trond Myklebust, linux-nfs

CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: linux-nfs@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/nfs/direct.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 91ff089d3412..1aaf4aa2b3d7 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -337,10 +337,8 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_pageio_descriptor *de
 		if (!pagevec)
 			break;
 		if (uio) {
-			down_read(&current->mm->mmap_sem);
-			result = get_user_pages(current, current->mm, user_addr,
-					npages, 1, 0, pagevec, NULL);
-			up_read(&current->mm->mmap_sem);
+			result = get_user_pages_fast(user_addr, npages, 1,
+						     pagevec);
 			if (result < 0)
 				break;
 		} else {
@@ -658,10 +656,8 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
 			break;
 
 		if (uio) {
-			down_read(&current->mm->mmap_sem);
-			result = get_user_pages(current, current->mm, user_addr,
-						npages, 0, 0, pagevec, NULL);
-			up_read(&current->mm->mmap_sem);
+			result = get_user_pages_fast(user_addr, npages, 0,
+						     pagevec);
 			if (result < 0)
 				break;
 		} else {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 15/26] ceph: Convert ceph_get_direct_page_vector() to get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (13 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 14/26] nfs: Convert direct IO " Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:27 ` [PATCH 16/26] mm: Provide get_user_pages_unlocked() Jan Kara
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Sage Weil, ceph-devel

CC: Sage Weil <sage@inktank.com>
CC: ceph-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 net/ceph/pagevec.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index 815a2249cfa9..1b1ac7887767 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -23,17 +23,15 @@ struct page **ceph_get_direct_page_vector(const void __user *data,
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
 
-	down_read(&current->mm->mmap_sem);
 	while (got < num_pages) {
-		rc = get_user_pages(current, current->mm,
+		rc = get_user_pages_fast(
 		    (unsigned long)data + ((unsigned long)got * PAGE_SIZE),
-		    num_pages - got, write_page, 0, pages + got, NULL);
+		    num_pages - got, write_page, pages + got);
 		if (rc < 0)
 			break;
 		BUG_ON(rc == 0);
 		got += rc;
 	}
-	up_read(&current->mm->mmap_sem);
 	if (rc < 0)
 		goto fail;
 	return pages;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 16/26] mm: Provide get_user_pages_unlocked()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (14 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 15/26] ceph: Convert ceph_get_direct_page_vector() to get_user_pages_fast() Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 16:25   ` Christoph Hellwig
  2013-10-02 16:28   ` KOSAKI Motohiro
  2013-10-02 14:27 ` [PATCH 17/26] kvm: Use get_user_pages_unlocked() in async_pf_execute() Jan Kara
                   ` (9 subsequent siblings)
  25 siblings, 2 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara

Provide a wrapper for get_user_pages() which takes care of acquiring and
releasing mmap_sem. Using this function reduces amount of places in
which we deal with mmap_sem.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/mm.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8b6e55ee8855..70031ead06a5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1031,6 +1031,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		    struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
+static inline long
+get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+		 	unsigned long start, unsigned long nr_pages,
+			int write, int force, struct page **pages)
+{
+	long ret;
+
+	down_read(&mm->mmap_sem);
+	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
+			     NULL);
+	up_read(&mm->mmap_sem);
+	return ret;
+}
+
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 17/26] kvm: Use get_user_pages_unlocked() in async_pf_execute()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (15 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 16/26] mm: Provide get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 14:59   ` Gleb Natapov
  2013-10-02 14:27 ` [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked() Jan Kara
                   ` (8 subsequent siblings)
  25 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Gleb Natapov, Paolo Bonzini, kvm

CC: Gleb Natapov <gleb@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: kvm@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 virt/kvm/async_pf.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 8a39dda7a325..8d4b39a4bc12 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -67,9 +67,7 @@ static void async_pf_execute(struct work_struct *work)
 	might_sleep();
 
 	use_mm(mm);
-	down_read(&mm->mmap_sem);
-	get_user_pages(current, mm, addr, 1, 1, 0, &page, NULL);
-	up_read(&mm->mmap_sem);
+	get_user_pages_unlocked(current, mm, addr, 1, 1, 0, &page);
 	unuse_mm(mm);
 
 	spin_lock(&vcpu->async_pf.lock);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (16 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 17/26] kvm: Use get_user_pages_unlocked() in async_pf_execute() Jan Kara
@ 2013-10-02 14:27 ` Jan Kara
  2013-10-02 16:32   ` KOSAKI Motohiro
  2013-10-02 14:28 ` [PATCH 19/26] ivtv: Convert driver " Jan Kara
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:27 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/process_vm_access.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index fd26d0433509..c1bc47d8ed90 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -64,12 +64,8 @@ static int process_vm_rw_pages(struct task_struct *task,
 	*bytes_copied = 0;
 
 	/* Get the pages we're interested in */
-	down_read(&mm->mmap_sem);
-	pages_pinned = get_user_pages(task, mm, pa,
-				      nr_pages_to_copy,
-				      vm_write, 0, process_pages, NULL);
-	up_read(&mm->mmap_sem);
-
+	pages_pinned = get_user_pages_unlocked(task, mm, pa, nr_pages_to_copy,
+					       vm_write, 0, process_pages);
 	if (pages_pinned != nr_pages_to_copy) {
 		rc = -EFAULT;
 		goto end;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 19/26] ivtv: Convert driver to use get_user_pages_unlocked()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (17 preceding siblings ...)
  2013-10-02 14:27 ` [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-05 12:02   ` Andy Walls
  2013-10-02 14:28 ` [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked() Jan Kara
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Andy Walls, linux-media

CC: Andy Walls <awalls@md.metrocast.net>
CC: linux-media@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/media/pci/ivtv/ivtv-udma.c |  6 ++----
 drivers/media/pci/ivtv/ivtv-yuv.c  | 12 ++++++------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/media/pci/ivtv/ivtv-udma.c b/drivers/media/pci/ivtv/ivtv-udma.c
index 7338cb2d0a38..6012e5049076 100644
--- a/drivers/media/pci/ivtv/ivtv-udma.c
+++ b/drivers/media/pci/ivtv/ivtv-udma.c
@@ -124,10 +124,8 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr,
 	}
 
 	/* Get user pages for DMA Xfer */
-	down_read(&current->mm->mmap_sem);
-	err = get_user_pages(current, current->mm,
-			user_dma.uaddr, user_dma.page_count, 0, 1, dma->map, NULL);
-	up_read(&current->mm->mmap_sem);
+	err = get_user_pages_unlocked(current, current->mm, user_dma.uaddr,
+				      user_dma.page_count, 0, 1, dma->map);
 
 	if (user_dma.page_count != err) {
 		IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
diff --git a/drivers/media/pci/ivtv/ivtv-yuv.c b/drivers/media/pci/ivtv/ivtv-yuv.c
index 2ad65eb29832..9365995917d8 100644
--- a/drivers/media/pci/ivtv/ivtv-yuv.c
+++ b/drivers/media/pci/ivtv/ivtv-yuv.c
@@ -75,15 +75,15 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct ivtv_user_dma *dma,
 	ivtv_udma_get_page_info (&uv_dma, (unsigned long)args->uv_source, 360 * uv_decode_height);
 
 	/* Get user pages for DMA Xfer */
-	down_read(&current->mm->mmap_sem);
-	y_pages = get_user_pages(current, current->mm, y_dma.uaddr, y_dma.page_count, 0, 1, &dma->map[0], NULL);
+	y_pages = get_user_pages_unlocked(current, current->mm, y_dma.uaddr,
+					  y_dma.page_count, 0, 1, &dma->map[0]);
 	uv_pages = 0; /* silence gcc. value is set and consumed only if: */
 	if (y_pages == y_dma.page_count) {
-		uv_pages = get_user_pages(current, current->mm,
-					  uv_dma.uaddr, uv_dma.page_count, 0, 1,
-					  &dma->map[y_pages], NULL);
+		uv_pages = get_user_pages_unlocked(current, current->mm,
+						   uv_dma.uaddr,
+						   uv_dma.page_count, 0, 1,
+						   &dma->map[y_pages]);
 	}
-	up_read(&current->mm->mmap_sem);
 
 	if (y_pages != y_dma.page_count || uv_pages != uv_dma.page_count) {
 		int rc = -EFAULT;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (18 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 19/26] ivtv: Convert driver " Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 21/26] ib: Convert ipath_get_user_pages() " Jan Kara
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Roland Dreier, linux-rdma

Convert ib_umem_get() to use get_user_pages_unlocked(). This
significantly shortens the section where mmap_sem is held (we only need
it for updating of mm->pinned_vm and inside get_user_pages()) and removes
the knowledge about locking of get_user_pages().

CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/core/umem.c | 41 +++++++++++++++++------------------------
 1 file changed, 17 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a84112322071..0640a89021a9 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -80,7 +80,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 {
 	struct ib_umem *umem;
 	struct page **page_list;
-	struct vm_area_struct **vma_list;
 	struct ib_umem_chunk *chunk;
 	unsigned long locked;
 	unsigned long lock_limit;
@@ -125,34 +124,31 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 		return ERR_PTR(-ENOMEM);
 	}
 
-	/*
-	 * if we can't alloc the vma_list, it's not so bad;
-	 * just assume the memory is not hugetlb memory
-	 */
-	vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
-	if (!vma_list)
-		umem->hugetlb = 0;
-
 	npages = PAGE_ALIGN(size + umem->offset) >> PAGE_SHIFT;
 
 	down_write(&current->mm->mmap_sem);
 
-	locked     = npages + current->mm->pinned_vm;
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-	if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) {
-		ret = -ENOMEM;
-		goto out;
+	locked = npages;
+	if (npages + current->mm->pinned_vm > lock_limit &&
+	    !capable(CAP_IPC_LOCK)) {
+		up_write(&current->mm->mmap_sem);
+		kfree(umem);
+		free_page((unsigned long) page_list);
+		return ERR_PTR(-ENOMEM);
 	}
+	current->mm->pinned_vm += npages;
+
+	up_write(&current->mm->mmap_sem);
 
 	cur_base = addr & PAGE_MASK;
 
 	ret = 0;
 	while (npages) {
-		ret = get_user_pages(current, current->mm, cur_base,
+		ret = get_user_pages_unlocked(current, current->mm, cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
-				     1, !umem->writable, page_list, vma_list);
+				     1, !umem->writable, page_list);
 
 		if (ret < 0)
 			goto out;
@@ -174,8 +170,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 			chunk->nents = min_t(int, ret, IB_UMEM_MAX_PAGE_CHUNK);
 			sg_init_table(chunk->page_list, chunk->nents);
 			for (i = 0; i < chunk->nents; ++i) {
-				if (vma_list &&
-				    !is_vm_hugetlb_page(vma_list[i + off]))
+				if (!PageHuge(page_list[i + off]))
 					umem->hugetlb = 0;
 				sg_set_page(&chunk->page_list[i], page_list[i + off], PAGE_SIZE, 0);
 			}
@@ -206,12 +201,10 @@ out:
 	if (ret < 0) {
 		__ib_umem_release(context->device, umem, 0);
 		kfree(umem);
-	} else
-		current->mm->pinned_vm = locked;
-
-	up_write(&current->mm->mmap_sem);
-	if (vma_list)
-		free_page((unsigned long) vma_list);
+		down_write(&current->mm->mmap_sem);
+		current->mm->pinned_vm -= locked;
+		up_write(&current->mm->mmap_sem);
+	}
 	free_page((unsigned long) page_list);
 
 	return ret < 0 ? ERR_PTR(ret) : umem;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 21/26] ib: Convert ipath_get_user_pages() to get_user_pages_unlocked()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (19 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Mike Marciniszyn, Roland Dreier, linux-rdma

Convert ipath_get_user_pages() to use get_user_pages_unlocked(). This
shortens the section where we hold mmap_sem for writing and also removes
the knowledge about get_user_pages() locking from ipath driver. We also
fix a bug in testing pinned number of pages when changing the code.

CC: Mike Marciniszyn <infinipath@intel.com>
CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/ipath/ipath_user_pages.c | 62 +++++++++++---------------
 1 file changed, 27 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c
index dc66c4506916..a89af9654112 100644
--- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
@@ -52,40 +52,58 @@ static void __ipath_release_user_pages(struct page **p, size_t num_pages,
 	}
 }
 
-/* call with current->mm->mmap_sem held */
-static int __ipath_get_user_pages(unsigned long start_page, size_t num_pages,
-				  struct page **p, struct vm_area_struct **vma)
+/**
+ * ipath_get_user_pages - lock user pages into memory
+ * @start_page: the start page
+ * @num_pages: the number of pages
+ * @p: the output page structures
+ *
+ * This function takes a given start page (page aligned user virtual
+ * address) and pins it and the following specified number of pages.  For
+ * now, num_pages is always 1, but that will probably change at some point
+ * (because caller is doing expected sends on a single virtually contiguous
+ * buffer, so we can do all pages at once).
+ */
+int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
+			 struct page **p)
 {
 	unsigned long lock_limit;
 	size_t got;
 	int ret;
 
+	down_write(&current->mm->mmap_sem);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-	if (num_pages > lock_limit) {
+	if (current->mm->pinned_vm + num_pages > lock_limit && 
+	    !capable(CAP_IPC_LOCK)) {
+		up_write(&current->mm->mmap_sem);
 		ret = -ENOMEM;
 		goto bail;
 	}
+	current->mm->pinned_vm += num_pages;
+	up_write(&current->mm->mmap_sem);
 
 	ipath_cdbg(VERBOSE, "pin %lx pages from vaddr %lx\n",
 		   (unsigned long) num_pages, start_page);
 
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages(current, current->mm,
-				     start_page + got * PAGE_SIZE,
-				     num_pages - got, 1, 1,
-				     p + got, vma);
+		ret = get_user_pages_unlocked(current, current->mm,
+					      start_page + got * PAGE_SIZE,
+					      num_pages - got, 1, 1,
+					      p + got);
 		if (ret < 0)
 			goto bail_release;
 	}
 
-	current->mm->pinned_vm += num_pages;
 
 	ret = 0;
 	goto bail;
 
 bail_release:
 	__ipath_release_user_pages(p, got, 0);
+	down_write(&current->mm->mmap_sem);
+	current->mm->pinned_vm -= num_pages;
+	up_write(&current->mm->mmap_sem);
 bail:
 	return ret;
 }
@@ -146,32 +164,6 @@ dma_addr_t ipath_map_single(struct pci_dev *hwdev, void *ptr, size_t size,
 	return phys;
 }
 
-/**
- * ipath_get_user_pages - lock user pages into memory
- * @start_page: the start page
- * @num_pages: the number of pages
- * @p: the output page structures
- *
- * This function takes a given start page (page aligned user virtual
- * address) and pins it and the following specified number of pages.  For
- * now, num_pages is always 1, but that will probably change at some point
- * (because caller is doing expected sends on a single virtually contiguous
- * buffer, so we can do all pages at once).
- */
-int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
-			 struct page **p)
-{
-	int ret;
-
-	down_write(&current->mm->mmap_sem);
-
-	ret = __ipath_get_user_pages(start_page, num_pages, p, NULL);
-
-	up_write(&current->mm->mmap_sem);
-
-	return ret;
-}
-
 void ipath_release_user_pages(struct page **p, size_t num_pages)
 {
 	down_write(&current->mm->mmap_sem);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (20 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 21/26] ib: Convert ipath_get_user_pages() " Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Mike Marciniszyn, Roland Dreier, linux-rdma

Function ipath_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
ipath_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even
more interestingly the function ipath_user_sdma_queue_pkts() (and also
ipath_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault. So just make ipath_user_sdma_pin_pages() use
get_user_pages_unlocked() and leave mmap_sem locking for mm.

CC: Mike Marciniszyn <infinipath@intel.com>
CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/ipath/ipath_user_sdma.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_user_sdma.c b/drivers/infiniband/hw/ipath/ipath_user_sdma.c
index f5cb13b21445..462c9b136e85 100644
--- a/drivers/infiniband/hw/ipath/ipath_user_sdma.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_sdma.c
@@ -280,8 +280,8 @@ static int ipath_user_sdma_pin_pages(const struct ipath_devdata *dd,
 	int j;
 	int ret;
 
-	ret = get_user_pages(current, current->mm, addr,
-			     npages, 0, 1, pages, NULL);
+	ret = get_user_pages_unlocked(current, current->mm, addr,
+				      npages, 0, 1, pages);
 
 	if (ret != npages) {
 		int i;
@@ -811,10 +811,7 @@ int ipath_user_sdma_writev(struct ipath_devdata *dd,
 	while (dim) {
 		const int mxp = 8;
 
-		down_write(&current->mm->mmap_sem);
 		ret = ipath_user_sdma_queue_pkts(dd, pq, &list, iov, dim, mxp);
-		up_write(&current->mm->mmap_sem);
-
 		if (ret <= 0)
 			goto done_unlock;
 		else {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (21 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:54   ` Marciniszyn, Mike
  2013-10-04 13:52   ` Marciniszyn, Mike
  2013-10-02 14:28 ` [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
                   ` (2 subsequent siblings)
  25 siblings, 2 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Mike Marciniszyn, Roland Dreier, linux-rdma

Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
shortens the section where we hold mmap_sem for writing and also removes
the knowledge about get_user_pages() locking from ipath driver. We also
fix a bug in testing pinned number of pages when changing the code.

CC: Mike Marciniszyn <infinipath@intel.com>
CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++-----------------
 1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 2bc1d2b96298..57ce83c2d1d9 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages,
 	}
 }
 
-/*
- * Call with current->mm->mmap_sem held.
+/**
+ * qib_get_user_pages - lock user pages into memory
+ * @start_page: the start page
+ * @num_pages: the number of pages
+ * @p: the output page structures
+ *
+ * This function takes a given start page (page aligned user virtual
+ * address) and pins it and the following specified number of pages.  For
+ * now, num_pages is always 1, but that will probably change at some point
+ * (because caller is doing expected sends on a single virtually contiguous
+ * buffer, so we can do all pages at once).
  */
-static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
-				struct page **p, struct vm_area_struct **vma)
+int qib_get_user_pages(unsigned long start_page, size_t num_pages,
+		       struct page **p)
 {
 	unsigned long lock_limit;
 	size_t got;
 	int ret;
+	struct mm_struct *mm = current->mm;
 
+	down_write(&mm->mmap_sem);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
+	if (mm->pinned_vm + num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
+		up_write(&mm->mmap_sem);
 		ret = -ENOMEM;
 		goto bail;
 	}
+	mm->pinned_vm += num_pages;
+	up_write(&mm->mmap_sem);
 
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages(current, current->mm,
-				     start_page + got * PAGE_SIZE,
-				     num_pages - got, 1, 1,
-				     p + got, vma);
+		ret = get_user_pages_unlocked(current, mm,
+					      start_page + got * PAGE_SIZE,
+					      num_pages - got, 1, 1,
+					      p + got);
 		if (ret < 0)
 			goto bail_release;
 	}
 
-	current->mm->pinned_vm += num_pages;
 
 	ret = 0;
 	goto bail;
 
 bail_release:
 	__qib_release_user_pages(p, got, 0);
+	down_write(&mm->mmap_sem);
+	mm->pinned_vm -= num_pages;
+	up_write(&mm->mmap_sem);
 bail:
 	return ret;
 }
@@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page,
 	return phys;
 }
 
-/**
- * qib_get_user_pages - lock user pages into memory
- * @start_page: the start page
- * @num_pages: the number of pages
- * @p: the output page structures
- *
- * This function takes a given start page (page aligned user virtual
- * address) and pins it and the following specified number of pages.  For
- * now, num_pages is always 1, but that will probably change at some point
- * (because caller is doing expected sends on a single virtually contiguous
- * buffer, so we can do all pages at once).
- */
-int qib_get_user_pages(unsigned long start_page, size_t num_pages,
-		       struct page **p)
-{
-	int ret;
-
-	down_write(&current->mm->mmap_sem);
-
-	ret = __qib_get_user_pages(start_page, num_pages, p, NULL);
-
-	up_write(&current->mm->mmap_sem);
-
-	return ret;
-}
-
 void qib_release_user_pages(struct page **p, size_t num_pages)
 {
 	if (current->mm) /* during close after signal, mm can be NULL */
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (22 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast() Jan Kara
  2013-10-02 14:28 ` [PATCH 26/26] aio: Remove useless get_user_pages() call Jan Kara
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Mike Marciniszyn, Roland Dreier, linux-rdma

Function qib_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
more interestingly the function qib_user_sdma_queue_pkts() (and also
qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault. So just make qib_user_sdma_pin_pages() use
get_user_pages_unlocked() and leave mmap_sem locking for mm.

CC: Mike Marciniszyn <infinipath@intel.com>
CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/qib/qib_user_sdma.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c
index d0a0ea0c14d6..c1b6463acd59 100644
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -594,8 +594,8 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd,
 		else
 			j = npages;
 
-		ret = get_user_pages(current, current->mm, addr,
-			     j, 0, 1, pages, NULL);
+		ret = get_user_pages_unlocked(current, current->mm, addr,
+			 		      j, 0, 1, pages);
 		if (ret != j) {
 			i = 0;
 			j = ret;
@@ -1294,11 +1294,8 @@ int qib_user_sdma_writev(struct qib_ctxtdata *rcd,
 		int mxp = 8;
 		int ndesc = 0;
 
-		down_write(&current->mm->mmap_sem);
 		ret = qib_user_sdma_queue_pkts(dd, ppd, pq,
 				iov, dim, &list, &mxp, &ndesc);
-		up_write(&current->mm->mmap_sem);
-
 		if (ret < 0)
 			goto done_unlock;
 		else {
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast()
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (23 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  2013-10-02 14:28 ` [PATCH 26/26] aio: Remove useless get_user_pages() call Jan Kara
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML; +Cc: linux-mm, Jan Kara, Roland Dreier, linux-rdma

Function mthca_map_user_db() appears to call get_user_pages() without
holding mmap_sem. Fix the bug by using get_user_pages_fast() instead
which also takes care of the locking.

CC: Roland Dreier <roland@kernel.org>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/hw/mthca/mthca_memfree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 7d2e42dd6926..c3543b27a2a7 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -472,8 +472,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
 		goto out;
 	}
 
-	ret = get_user_pages(current, current->mm, uaddr & PAGE_MASK, 1, 1, 0,
-			     pages, NULL);
+	ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, 1, pages);
 	if (ret < 0)
 		goto out;
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 26/26] aio: Remove useless get_user_pages() call
       [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
                   ` (24 preceding siblings ...)
  2013-10-02 14:28 ` [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast() Jan Kara
@ 2013-10-02 14:28 ` Jan Kara
  25 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 14:28 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, Jan Kara, linux-fsdevel, linux-aio, Alexander Viro,
	Benjamin LaHaise

get_user_pages() call in aio_setup_ring() is useless these days. We
create all ringbuffer the pages in page cache using
find_or_create_page() anyway so we can just use the pointers we get from
this function instead of having to look them up via user address space.

CC: linux-fsdevel@vger.kernel.org
CC: linux-aio@kvack.org
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/aio.c | 48 +++++++++++++++++-------------------------------
 1 file changed, 17 insertions(+), 31 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 6b868f0e0c4c..a14a33027990 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -269,9 +269,21 @@ static int aio_setup_ring(struct kioctx *ctx)
 	file->f_inode->i_mapping->a_ops = &aio_ctx_aops;
 	file->f_inode->i_mapping->private_data = ctx;
 	file->f_inode->i_size = PAGE_SIZE * (loff_t)nr_pages;
+	ctx->aio_ring_file = file;
+	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring))
+			/ sizeof(struct io_event);
+
+	ctx->ring_pages = ctx->internal_pages;
+	if (nr_pages > AIO_RING_PAGES) {
+		ctx->ring_pages = kcalloc(nr_pages, sizeof(struct page *),
+					  GFP_KERNEL);
+		if (!ctx->ring_pages)
+			return -ENOMEM;
+	}
 
 	for (i = 0; i < nr_pages; i++) {
 		struct page *page;
+
 		page = find_or_create_page(file->f_inode->i_mapping,
 					   i, GFP_HIGHUSER | __GFP_ZERO);
 		if (!page)
@@ -281,19 +293,13 @@ static int aio_setup_ring(struct kioctx *ctx)
 		SetPageUptodate(page);
 		SetPageDirty(page);
 		unlock_page(page);
+		ctx->ring_pages[i] = page;
 	}
-	ctx->aio_ring_file = file;
-	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring))
-			/ sizeof(struct io_event);
-
-	ctx->ring_pages = ctx->internal_pages;
-	if (nr_pages > AIO_RING_PAGES) {
-		ctx->ring_pages = kcalloc(nr_pages, sizeof(struct page *),
-					  GFP_KERNEL);
-		if (!ctx->ring_pages)
-			return -ENOMEM;
+	ctx->nr_pages = i;
+	if (unlikely(ctx->nr_pages != nr_pages)) {
+		aio_free_ring(ctx);
+		return -EAGAIN;
 	}
-
 	ctx->mmap_size = nr_pages * PAGE_SIZE;
 	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 
@@ -309,28 +315,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	}
 
 	pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
-
-	/* We must do this while still holding mmap_sem for write, as we
-	 * need to be protected against userspace attempting to mremap()
-	 * or munmap() the ring buffer.
-	 */
-	ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages,
-				       1, 0, ctx->ring_pages, NULL);
-
-	/* Dropping the reference here is safe as the page cache will hold
-	 * onto the pages for us.  It is also required so that page migration
-	 * can unmap the pages and get the right reference count.
-	 */
-	for (i = 0; i < ctx->nr_pages; i++)
-		put_page(ctx->ring_pages[i]);
-
 	up_write(&mm->mmap_sem);
 
-	if (unlikely(ctx->nr_pages != nr_pages)) {
-		aio_free_ring(ctx);
-		return -EAGAIN;
-	}
-
 	ctx->user_id = ctx->mmap_base;
 	ctx->nr_events = nr_events; /* trusted copy */
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
@ 2013-10-02 14:54   ` Marciniszyn, Mike
  2013-10-02 15:28     ` Jan Kara
  2013-10-04 13:52   ` Marciniszyn, Mike
  1 sibling, 1 reply; 55+ messages in thread
From: Marciniszyn, Mike @ 2013-10-02 14:54 UTC (permalink / raw)
  To: Jan Kara, LKML; +Cc: linux-mm, infinipath, Roland Dreier, linux-rdma

Thanks!!

I would like to test these two patches and also do the stable work for the deadlock as well.  Do you have these patches in a repo somewhere to save me a bit of work?

We had been working on an internal version of the deadlock portion of this patch that uses get_user_pages_fast() vs. the new get_user_pages_unlocked().

The risk of GUP fast is the loss of the "force" arg on GUP fast, which I don't see as significant give our use case.

Some nits on the subject and commit message:
1. use IB/qib, IB/ipath vs. ib
2. use the correct ipath vs. qib in the commit message text

Mike

> -----Original Message-----
> From: Jan Kara [mailto:jack@suse.cz]
> Sent: Wednesday, October 02, 2013 10:28 AM
> To: LKML
> Cc: linux-mm@kvack.org; Jan Kara; infinipath; Roland Dreier; linux-
> rdma@vger.kernel.org
> Subject: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> get_user_pages_unlocked()
> 
> Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> shortens the section where we hold mmap_sem for writing and also removes
> the knowledge about get_user_pages() locking from ipath driver. We also fix
> a bug in testing pinned number of pages when changing the code.
> 
> CC: Mike Marciniszyn <infinipath@intel.com>
> CC: Roland Dreier <roland@kernel.org>
> CC: linux-rdma@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++----------------
> -
>  1 file changed, 26 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c
> b/drivers/infiniband/hw/qib/qib_user_pages.c
> index 2bc1d2b96298..57ce83c2d1d9 100644
> --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> @@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page
> **p, size_t num_pages,
>  	}
>  }
> 
> -/*
> - * Call with current->mm->mmap_sem held.
> +/**
> + * qib_get_user_pages - lock user pages into memory
> + * @start_page: the start page
> + * @num_pages: the number of pages
> + * @p: the output page structures
> + *
> + * This function takes a given start page (page aligned user virtual
> + * address) and pins it and the following specified number of pages.
> +For
> + * now, num_pages is always 1, but that will probably change at some
> +point
> + * (because caller is doing expected sends on a single virtually
> +contiguous
> + * buffer, so we can do all pages at once).
>   */
> -static int __qib_get_user_pages(unsigned long start_page, size_t
> num_pages,
> -				struct page **p, struct vm_area_struct
> **vma)
> +int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> +		       struct page **p)
>  {
>  	unsigned long lock_limit;
>  	size_t got;
>  	int ret;
> +	struct mm_struct *mm = current->mm;
> 
> +	down_write(&mm->mmap_sem);
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> 
> -	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> +	if (mm->pinned_vm + num_pages > lock_limit &&
> !capable(CAP_IPC_LOCK)) {
> +		up_write(&mm->mmap_sem);
>  		ret = -ENOMEM;
>  		goto bail;
>  	}
> +	mm->pinned_vm += num_pages;
> +	up_write(&mm->mmap_sem);
> 
>  	for (got = 0; got < num_pages; got += ret) {
> -		ret = get_user_pages(current, current->mm,
> -				     start_page + got * PAGE_SIZE,
> -				     num_pages - got, 1, 1,
> -				     p + got, vma);
> +		ret = get_user_pages_unlocked(current, mm,
> +					      start_page + got * PAGE_SIZE,
> +					      num_pages - got, 1, 1,
> +					      p + got);
>  		if (ret < 0)
>  			goto bail_release;
>  	}
> 
> -	current->mm->pinned_vm += num_pages;
> 
>  	ret = 0;
>  	goto bail;
> 
>  bail_release:
>  	__qib_release_user_pages(p, got, 0);
> +	down_write(&mm->mmap_sem);
> +	mm->pinned_vm -= num_pages;
> +	up_write(&mm->mmap_sem);
>  bail:
>  	return ret;
>  }
> @@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev,
> struct page *page,
>  	return phys;
>  }
> 
> -/**
> - * qib_get_user_pages - lock user pages into memory
> - * @start_page: the start page
> - * @num_pages: the number of pages
> - * @p: the output page structures
> - *
> - * This function takes a given start page (page aligned user virtual
> - * address) and pins it and the following specified number of pages.  For
> - * now, num_pages is always 1, but that will probably change at some point
> - * (because caller is doing expected sends on a single virtually contiguous
> - * buffer, so we can do all pages at once).
> - */
> -int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> -		       struct page **p)
> -{
> -	int ret;
> -
> -	down_write(&current->mm->mmap_sem);
> -
> -	ret = __qib_get_user_pages(start_page, num_pages, p, NULL);
> -
> -	up_write(&current->mm->mmap_sem);
> -
> -	return ret;
> -}
> -
>  void qib_release_user_pages(struct page **p, size_t num_pages)  {
>  	if (current->mm) /* during close after signal, mm can be NULL */
> --
> 1.8.1.4


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 17/26] kvm: Use get_user_pages_unlocked() in async_pf_execute()
  2013-10-02 14:27 ` [PATCH 17/26] kvm: Use get_user_pages_unlocked() in async_pf_execute() Jan Kara
@ 2013-10-02 14:59   ` Gleb Natapov
  0 siblings, 0 replies; 55+ messages in thread
From: Gleb Natapov @ 2013-10-02 14:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, Paolo Bonzini, kvm

Looks straightforward.

On Wed, Oct 02, 2013 at 04:27:58PM +0200, Jan Kara wrote:
> CC: Gleb Natapov <gleb@redhat.com>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: kvm@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  virt/kvm/async_pf.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 8a39dda7a325..8d4b39a4bc12 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -67,9 +67,7 @@ static void async_pf_execute(struct work_struct *work)
>  	might_sleep();
>  
>  	use_mm(mm);
> -	down_read(&mm->mmap_sem);
> -	get_user_pages(current, mm, addr, 1, 1, 0, &page, NULL);
> -	up_read(&mm->mmap_sem);
> +	get_user_pages_unlocked(current, mm, addr, 1, 1, 0, &page);
>  	unuse_mm(mm);
>  
>  	spin_lock(&vcpu->async_pf.lock);
> -- 
> 1.8.1.4

--
			Gleb.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 14:54   ` Marciniszyn, Mike
@ 2013-10-02 15:28     ` Jan Kara
  2013-10-02 15:32       ` Marciniszyn, Mike
  0 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 15:28 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

On Wed 02-10-13 14:54:59, Marciniszyn, Mike wrote:
> Thanks!!
> 
> I would like to test these two patches and also do the stable work for
> the deadlock as well.  Do you have these patches in a repo somewhere to
> save me a bit of work?
  I've pushed the patches to:
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git
into the branch 'get_user_pages'.

> We had been working on an internal version of the deadlock portion of
> this patch that uses get_user_pages_fast() vs. the new
> get_user_pages_unlocked().
> 
> The risk of GUP fast is the loss of the "force" arg on GUP fast, which I
> don't see as significant give our use case.
  Yes. I was discussing with Roland some time ago whether the force
argument is needed and he said it is. So I kept the arguments of
get_user_pages() intact and just simplified the locking...

BTW: Infiniband still needs mmap_sem for modification of mm->pinned_vm. It
might be worthwhile to actually change that to atomic_long_t (only
kernel/events/core.c would need update besides infiniband) and avoid taking
mmap_sem in infiniband code altogether.

> Some nits on the subject and commit message:
> 1. use IB/qib, IB/ipath vs. ib
> 2. use the correct ipath vs. qib in the commit message text
  Sure will do. Thanks the quick reply and for comments.

								Honza
> > -----Original Message-----
> > From: Jan Kara [mailto:jack@suse.cz]
> > Sent: Wednesday, October 02, 2013 10:28 AM
> > To: LKML
> > Cc: linux-mm@kvack.org; Jan Kara; infinipath; Roland Dreier; linux-
> > rdma@vger.kernel.org
> > Subject: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> > get_user_pages_unlocked()
> > 
> > Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> > shortens the section where we hold mmap_sem for writing and also removes
> > the knowledge about get_user_pages() locking from ipath driver. We also fix
> > a bug in testing pinned number of pages when changing the code.
> > 
> > CC: Mike Marciniszyn <infinipath@intel.com>
> > CC: Roland Dreier <roland@kernel.org>
> > CC: linux-rdma@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  drivers/infiniband/hw/qib/qib_user_pages.c | 62 +++++++++++++----------------
> > -
> >  1 file changed, 26 insertions(+), 36 deletions(-)
> > 
> > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c
> > b/drivers/infiniband/hw/qib/qib_user_pages.c
> > index 2bc1d2b96298..57ce83c2d1d9 100644
> > --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> > +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> > @@ -48,39 +48,55 @@ static void __qib_release_user_pages(struct page
> > **p, size_t num_pages,
> >  	}
> >  }
> > 
> > -/*
> > - * Call with current->mm->mmap_sem held.
> > +/**
> > + * qib_get_user_pages - lock user pages into memory
> > + * @start_page: the start page
> > + * @num_pages: the number of pages
> > + * @p: the output page structures
> > + *
> > + * This function takes a given start page (page aligned user virtual
> > + * address) and pins it and the following specified number of pages.
> > +For
> > + * now, num_pages is always 1, but that will probably change at some
> > +point
> > + * (because caller is doing expected sends on a single virtually
> > +contiguous
> > + * buffer, so we can do all pages at once).
> >   */
> > -static int __qib_get_user_pages(unsigned long start_page, size_t
> > num_pages,
> > -				struct page **p, struct vm_area_struct
> > **vma)
> > +int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> > +		       struct page **p)
> >  {
> >  	unsigned long lock_limit;
> >  	size_t got;
> >  	int ret;
> > +	struct mm_struct *mm = current->mm;
> > 
> > +	down_write(&mm->mmap_sem);
> >  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > 
> > -	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> > +	if (mm->pinned_vm + num_pages > lock_limit &&
> > !capable(CAP_IPC_LOCK)) {
> > +		up_write(&mm->mmap_sem);
> >  		ret = -ENOMEM;
> >  		goto bail;
> >  	}
> > +	mm->pinned_vm += num_pages;
> > +	up_write(&mm->mmap_sem);
> > 
> >  	for (got = 0; got < num_pages; got += ret) {
> > -		ret = get_user_pages(current, current->mm,
> > -				     start_page + got * PAGE_SIZE,
> > -				     num_pages - got, 1, 1,
> > -				     p + got, vma);
> > +		ret = get_user_pages_unlocked(current, mm,
> > +					      start_page + got * PAGE_SIZE,
> > +					      num_pages - got, 1, 1,
> > +					      p + got);
> >  		if (ret < 0)
> >  			goto bail_release;
> >  	}
> > 
> > -	current->mm->pinned_vm += num_pages;
> > 
> >  	ret = 0;
> >  	goto bail;
> > 
> >  bail_release:
> >  	__qib_release_user_pages(p, got, 0);
> > +	down_write(&mm->mmap_sem);
> > +	mm->pinned_vm -= num_pages;
> > +	up_write(&mm->mmap_sem);
> >  bail:
> >  	return ret;
> >  }
> > @@ -117,32 +133,6 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev,
> > struct page *page,
> >  	return phys;
> >  }
> > 
> > -/**
> > - * qib_get_user_pages - lock user pages into memory
> > - * @start_page: the start page
> > - * @num_pages: the number of pages
> > - * @p: the output page structures
> > - *
> > - * This function takes a given start page (page aligned user virtual
> > - * address) and pins it and the following specified number of pages.  For
> > - * now, num_pages is always 1, but that will probably change at some point
> > - * (because caller is doing expected sends on a single virtually contiguous
> > - * buffer, so we can do all pages at once).
> > - */
> > -int qib_get_user_pages(unsigned long start_page, size_t num_pages,
> > -		       struct page **p)
> > -{
> > -	int ret;
> > -
> > -	down_write(&current->mm->mmap_sem);
> > -
> > -	ret = __qib_get_user_pages(start_page, num_pages, p, NULL);
> > -
> > -	up_write(&current->mm->mmap_sem);
> > -
> > -	return ret;
> > -}
> > -
> >  void qib_release_user_pages(struct page **p, size_t num_pages)  {
> >  	if (current->mm) /* during close after signal, mm can be NULL */
> > --
> > 1.8.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 15:28     ` Jan Kara
@ 2013-10-02 15:32       ` Marciniszyn, Mike
  2013-10-02 15:38         ` Jan Kara
  2013-10-04 13:44         ` Marciniszyn, Mike
  0 siblings, 2 replies; 55+ messages in thread
From: Marciniszyn, Mike @ 2013-10-02 15:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

> > The risk of GUP fast is the loss of the "force" arg on GUP fast, which
> > I don't see as significant give our use case.
>   Yes. I was discussing with Roland some time ago whether the force
> argument is needed and he said it is. So I kept the arguments of
> get_user_pages() intact and just simplified the locking...

The PSM side of the code is a more traditional use of GUP (like direct I/O), so I think it is a different use case than the locking for IB memory regions.

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 15:32       ` Marciniszyn, Mike
@ 2013-10-02 15:38         ` Jan Kara
  2013-10-04 13:39           ` Marciniszyn, Mike
  2013-10-04 13:44         ` Marciniszyn, Mike
  1 sibling, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 15:38 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote:
> > > The risk of GUP fast is the loss of the "force" arg on GUP fast, which
> > > I don't see as significant give our use case.
> >   Yes. I was discussing with Roland some time ago whether the force
> > argument is needed and he said it is. So I kept the arguments of
> > get_user_pages() intact and just simplified the locking...
> 
> The PSM side of the code is a more traditional use of GUP (like direct
> I/O), so I think it is a different use case than the locking for IB
> memory regions.
  Ah, I see. Whatever suits you best. I don't really care as long as
get_user_pages() locking doesn't leak into IB drivers :)

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 16/26] mm: Provide get_user_pages_unlocked()
  2013-10-02 14:27 ` [PATCH 16/26] mm: Provide get_user_pages_unlocked() Jan Kara
@ 2013-10-02 16:25   ` Christoph Hellwig
  2013-10-02 16:28   ` KOSAKI Motohiro
  1 sibling, 0 replies; 55+ messages in thread
From: Christoph Hellwig @ 2013-10-02 16:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm

On Wed, Oct 02, 2013 at 04:27:57PM +0200, Jan Kara wrote:
> Provide a wrapper for get_user_pages() which takes care of acquiring and
> releasing mmap_sem. Using this function reduces amount of places in
> which we deal with mmap_sem.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Seem like this should be the default one and the one without the locked
should be __unlocked.  Maybe a grand rename is in order?


get_user_pages_fast	-> get_user_pages
get_user_pages_unlocked	-> __get_user_pages
get_user_pages_unlocked	-> __get_user_pages_locked

steering people to the most sensible ones by default?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 16/26] mm: Provide get_user_pages_unlocked()
  2013-10-02 14:27 ` [PATCH 16/26] mm: Provide get_user_pages_unlocked() Jan Kara
  2013-10-02 16:25   ` Christoph Hellwig
@ 2013-10-02 16:28   ` KOSAKI Motohiro
  2013-10-02 19:39     ` Jan Kara
  1 sibling, 1 reply; 55+ messages in thread
From: KOSAKI Motohiro @ 2013-10-02 16:28 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, kosaki.motohiro

(10/2/13 10:27 AM), Jan Kara wrote:
> Provide a wrapper for get_user_pages() which takes care of acquiring and
> releasing mmap_sem. Using this function reduces amount of places in
> which we deal with mmap_sem.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>   include/linux/mm.h | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8b6e55ee8855..70031ead06a5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1031,6 +1031,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>   		    struct vm_area_struct **vmas);
>   int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>   			struct page **pages);
> +static inline long
> +get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
> +		 	unsigned long start, unsigned long nr_pages,
> +			int write, int force, struct page **pages)
> +{
> +	long ret;
> +
> +	down_read(&mm->mmap_sem);
> +	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
> +			     NULL);
> +	up_read(&mm->mmap_sem);
> +	return ret;
> +}

Hmmm. I like the idea, but I really dislike this name. I don't like xx_unlocked 
function takes a lock. It is a source of confusing, I believe. 








^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked()
  2013-10-02 14:27 ` [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked() Jan Kara
@ 2013-10-02 16:32   ` KOSAKI Motohiro
  2013-10-02 19:36     ` Jan Kara
  0 siblings, 1 reply; 55+ messages in thread
From: KOSAKI Motohiro @ 2013-10-02 16:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, kosaki.motohiro

(10/2/13 10:27 AM), Jan Kara wrote:
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>   mm/process_vm_access.c | 8 ++------
>   1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> index fd26d0433509..c1bc47d8ed90 100644
> --- a/mm/process_vm_access.c
> +++ b/mm/process_vm_access.c
> @@ -64,12 +64,8 @@ static int process_vm_rw_pages(struct task_struct *task,
>   	*bytes_copied = 0;
>   
>   	/* Get the pages we're interested in */
> -	down_read(&mm->mmap_sem);
> -	pages_pinned = get_user_pages(task, mm, pa,
> -				      nr_pages_to_copy,
> -				      vm_write, 0, process_pages, NULL);
> -	up_read(&mm->mmap_sem);
> -
> +	pages_pinned = get_user_pages_unlocked(task, mm, pa, nr_pages_to_copy,
> +					       vm_write, 0, process_pages);
>   	if (pages_pinned != nr_pages_to_copy) {
>   		rc = -EFAULT;
>   		goto end;

This is wrong because original code is wrong. In this function, page may be pointed to 
anon pages. Then, you should keep to take mmap_sem until finish to copying. Otherwise
concurrent fork() makes nasty COW issue.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked()
  2013-10-02 16:32   ` KOSAKI Motohiro
@ 2013-10-02 19:36     ` Jan Kara
  2013-10-03 22:40       ` KOSAKI Motohiro
  0 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-02 19:36 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Jan Kara, LKML, linux-mm

On Wed 02-10-13 12:32:33, KOSAKI Motohiro wrote:
> (10/2/13 10:27 AM), Jan Kara wrote:
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >   mm/process_vm_access.c | 8 ++------
> >   1 file changed, 2 insertions(+), 6 deletions(-)
> > 
> > diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> > index fd26d0433509..c1bc47d8ed90 100644
> > --- a/mm/process_vm_access.c
> > +++ b/mm/process_vm_access.c
> > @@ -64,12 +64,8 @@ static int process_vm_rw_pages(struct task_struct *task,
> >   	*bytes_copied = 0;
> >   
> >   	/* Get the pages we're interested in */
> > -	down_read(&mm->mmap_sem);
> > -	pages_pinned = get_user_pages(task, mm, pa,
> > -				      nr_pages_to_copy,
> > -				      vm_write, 0, process_pages, NULL);
> > -	up_read(&mm->mmap_sem);
> > -
> > +	pages_pinned = get_user_pages_unlocked(task, mm, pa, nr_pages_to_copy,
> > +					       vm_write, 0, process_pages);
> >   	if (pages_pinned != nr_pages_to_copy) {
> >   		rc = -EFAULT;
> >   		goto end;
> 
> This is wrong because original code is wrong. In this function, page may
> be pointed to anon pages. Then, you should keep to take mmap_sem until
> finish to copying. Otherwise concurrent fork() makes nasty COW issue.
  Hum, can you be more specific? I suppose you are speaking about situation
when the remote task we are copying data from/to does fork while
process_vm_rw_pages() runs. If we are copying data from remote task, I
don't see how COW could cause any problem. If we are copying to remote task
and fork happens after get_user_pages() but before copy_to_user() then I
can see we might be having some trouble. copy_to_user() would then copy
data into both original remote process and its child thus essentially
bypassing COW. If the child process manages to COW some of the pages before
copy_to_user() happens, it can even see only some of the pages. Is that what
you mean?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 16/26] mm: Provide get_user_pages_unlocked()
  2013-10-02 16:28   ` KOSAKI Motohiro
@ 2013-10-02 19:39     ` Jan Kara
  0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 19:39 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Jan Kara, LKML, linux-mm

On Wed 02-10-13 12:28:11, KOSAKI Motohiro wrote:
> (10/2/13 10:27 AM), Jan Kara wrote:
> > Provide a wrapper for get_user_pages() which takes care of acquiring and
> > releasing mmap_sem. Using this function reduces amount of places in
> > which we deal with mmap_sem.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >   include/linux/mm.h | 14 ++++++++++++++
> >   1 file changed, 14 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 8b6e55ee8855..70031ead06a5 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1031,6 +1031,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
> >   		    struct vm_area_struct **vmas);
> >   int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >   			struct page **pages);
> > +static inline long
> > +get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
> > +		 	unsigned long start, unsigned long nr_pages,
> > +			int write, int force, struct page **pages)
> > +{
> > +	long ret;
> > +
> > +	down_read(&mm->mmap_sem);
> > +	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
> > +			     NULL);
> > +	up_read(&mm->mmap_sem);
> > +	return ret;
> > +}
> 
> Hmmm. I like the idea, but I really dislike this name. I don't like xx_unlocked 
> function takes a lock. It is a source of confusing, I believe. 
  Sure, I'm not very happy about the name either. As Christoph wrote
probably renaming all get_user_pages() variants might be appropriate. I'll
think about names some more.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/26] omap3isp: Make isp_video_buffer_prepare_user() use get_user_pages_fast()
  2013-10-02 14:27 ` [PATCH 05/26] omap3isp: Make isp_video_buffer_prepare_user() " Jan Kara
@ 2013-10-02 19:41   ` Laurent Pinchart
  2013-10-02 20:18     ` Jan Kara
  0 siblings, 1 reply; 55+ messages in thread
From: Laurent Pinchart @ 2013-10-02 19:41 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, linux-media

Hi Jan,

Thank you for the patch.

On Wednesday 02 October 2013 16:27:46 Jan Kara wrote:
> CC: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> CC: linux-media@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

Could you briefly explain where you're headed with this ? The V4L2 subsystem 
has suffered for quite a long time from potentiel AB-BA deadlocks, due the 
drivers taking the mmap_sem semaphore while holding the V4L2 buffers queue 
lock in the code path below, and taking them in reverse order in the mmap() 
path (as the mm core takes the mmap_sem semaphore before calling the mmap() 
operation). We've solved that by releasing the V4L2 buffers queue lock before 
taking the mmap_sem lock below and taking it back after releasing the mmap_sem 
lock. Having an explicit indication that the mmap_sem lock was taken helped 
keeping the problem in mind. I'm not against hiding it in 
get_user_pages_fast(), but I'd like to know what the next step is. Maybe it 
would also be worth it adding a /* get_user_pages_fast() takes the mmap_sem */ 
comment before the call ?

> ---
>  drivers/media/platform/omap3isp/ispqueue.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/media/platform/omap3isp/ispqueue.c
> b/drivers/media/platform/omap3isp/ispqueue.c index
> e15f01342058..bed380395e6c 100644
> --- a/drivers/media/platform/omap3isp/ispqueue.c
> +++ b/drivers/media/platform/omap3isp/ispqueue.c
> @@ -331,13 +331,9 @@ static int isp_video_buffer_prepare_user(struct
> isp_video_buffer *buf) if (buf->pages == NULL)
>  		return -ENOMEM;
> 
> -	down_read(&current->mm->mmap_sem);
> -	ret = get_user_pages(current, current->mm, data & PAGE_MASK,
> -			     buf->npages,
> -			     buf->vbuf.type == V4L2_BUF_TYPE_VIDEO_CAPTURE, 0,
> -			     buf->pages, NULL);
> -	up_read(&current->mm->mmap_sem);
> -
> +	ret = get_user_pages_fast(data & PAGE_MASK, buf->npages,
> +				  buf->vbuf.type == V4L2_BUF_TYPE_VIDEO_CAPTURE,
> +				  buf->pages);
>  	if (ret != buf->npages) {
>  		buf->npages = ret < 0 ? 0 : ret;
>  		isp_video_buffer_cleanup(buf);
-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/26] omap3isp: Make isp_video_buffer_prepare_user() use get_user_pages_fast()
  2013-10-02 19:41   ` Laurent Pinchart
@ 2013-10-02 20:18     ` Jan Kara
  0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-02 20:18 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Jan Kara, LKML, linux-mm, linux-media

On Wed 02-10-13 21:41:10, Laurent Pinchart wrote:
> Hi Jan,
> 
> Thank you for the patch.
> 
> On Wednesday 02 October 2013 16:27:46 Jan Kara wrote:
> > CC: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > CC: linux-media@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
  Thanks!
> 
> Could you briefly explain where you're headed with this ?
  My motivation is that currently filesystems have to workaround locking
problems with ->page_mkwrite() (the write page fault handler) being called
with mmap_sem held. The plan is to provide ability to ->page_mkwrite()
handler to drop mmap_sem. And that needs an audit of all get_user_pages()
callers to verify they won't be broken by this locking change. So I've
started with making this audit simpler.

> The V4L2 subsystem has suffered for quite a long time from potentiel
> AB-BA deadlocks, due the drivers taking the mmap_sem semaphore while
> holding the V4L2 buffers queue lock in the code path below, and taking
> them in reverse order in the mmap() path (as the mm core takes the
> mmap_sem semaphore before calling the mmap() operation).
  Yeah, I've read about this in some comment in V4L2. Also there are some
places (drivers/media/platform/omap/omap_vout.c and
drivers/media/v4l2-core/) which acquire mmap_sem pretty early to avoid lock
inversion with the queue lock. These are actually causing me quite some
headache :)

> We've solved that by releasing the V4L2 buffers queue lock before 
> taking the mmap_sem lock below and taking it back after releasing the mmap_sem 
> lock. Having an explicit indication that the mmap_sem lock was taken helped 
> keeping the problem in mind. I'm not against hiding it in 
> get_user_pages_fast(), but I'd like to know what the next step is. Maybe it 
> would also be worth it adding a /* get_user_pages_fast() takes the mmap_sem */ 
> comment before the call ?
  OK, I can add the comment. Thanks for review.

								Honza

> > ---
> >  drivers/media/platform/omap3isp/ispqueue.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/media/platform/omap3isp/ispqueue.c
> > b/drivers/media/platform/omap3isp/ispqueue.c index
> > e15f01342058..bed380395e6c 100644
> > --- a/drivers/media/platform/omap3isp/ispqueue.c
> > +++ b/drivers/media/platform/omap3isp/ispqueue.c
> > @@ -331,13 +331,9 @@ static int isp_video_buffer_prepare_user(struct
> > isp_video_buffer *buf) if (buf->pages == NULL)
> >  		return -ENOMEM;
> > 
> > -	down_read(&current->mm->mmap_sem);
> > -	ret = get_user_pages(current, current->mm, data & PAGE_MASK,
> > -			     buf->npages,
> > -			     buf->vbuf.type == V4L2_BUF_TYPE_VIDEO_CAPTURE, 0,
> > -			     buf->pages, NULL);
> > -	up_read(&current->mm->mmap_sem);
> > -
> > +	ret = get_user_pages_fast(data & PAGE_MASK, buf->npages,
> > +				  buf->vbuf.type == V4L2_BUF_TYPE_VIDEO_CAPTURE,
> > +				  buf->pages);
> >  	if (ret != buf->npages) {
> >  		buf->npages = ret < 0 ? 0 : ret;
> >  		isp_video_buffer_cleanup(buf);
> -- 
> Regards,
> 
> Laurent Pinchart
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked()
  2013-10-02 19:36     ` Jan Kara
@ 2013-10-03 22:40       ` KOSAKI Motohiro
  2013-10-07 20:55         ` Jan Kara
  0 siblings, 1 reply; 55+ messages in thread
From: KOSAKI Motohiro @ 2013-10-03 22:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: KOSAKI Motohiro, LKML, linux-mm

(10/2/13 3:36 PM), Jan Kara wrote:
> On Wed 02-10-13 12:32:33, KOSAKI Motohiro wrote:
>> (10/2/13 10:27 AM), Jan Kara wrote:
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>>    mm/process_vm_access.c | 8 ++------
>>>    1 file changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
>>> index fd26d0433509..c1bc47d8ed90 100644
>>> --- a/mm/process_vm_access.c
>>> +++ b/mm/process_vm_access.c
>>> @@ -64,12 +64,8 @@ static int process_vm_rw_pages(struct task_struct *task,
>>>    	*bytes_copied = 0;
>>>
>>>    	/* Get the pages we're interested in */
>>> -	down_read(&mm->mmap_sem);
>>> -	pages_pinned = get_user_pages(task, mm, pa,
>>> -				      nr_pages_to_copy,
>>> -				      vm_write, 0, process_pages, NULL);
>>> -	up_read(&mm->mmap_sem);
>>> -
>>> +	pages_pinned = get_user_pages_unlocked(task, mm, pa, nr_pages_to_copy,
>>> +					       vm_write, 0, process_pages);
>>>    	if (pages_pinned != nr_pages_to_copy) {
>>>    		rc = -EFAULT;
>>>    		goto end;
>>
>> This is wrong because original code is wrong. In this function, page may
>> be pointed to anon pages. Then, you should keep to take mmap_sem until
>> finish to copying. Otherwise concurrent fork() makes nasty COW issue.
>    Hum, can you be more specific? I suppose you are speaking about situation
> when the remote task we are copying data from/to does fork while
> process_vm_rw_pages() runs. If we are copying data from remote task, I
> don't see how COW could cause any problem. If we are copying to remote task
> and fork happens after get_user_pages() but before copy_to_user() then I
> can see we might be having some trouble. copy_to_user() would then copy
> data into both original remote process and its child thus essentially
> bypassing COW. If the child process manages to COW some of the pages before
> copy_to_user() happens, it can even see only some of the pages. Is that what
> you mean?

scenario 1: vm_write==0

Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages
P1 unlock mmap_sem.
Process P2 call fork(). and make P3.
P2 write memory pa. now the "process_pages" is owned by P3 (the child process)
P3 write memory pa. and then the content of "process_pages" is changed.
P1 read process_pages as P2's page. but actually, it is P3's data. Then, P1 observe garbage, at least unintended, data was read.


scenario 2: vm_write==1

Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages. It makes COW break and any anon page sharing broke.
P1 unlock mmap_sem.
P2 call fork(). and make P3. And, now COW page sharing is restored.
P2 write memory pa. now the "process_pages" is owned by P3.
P3 write memory pa. it mean P3 changes "process_pages".
P1 write process_pages as P2's page. but actually, it is P3's. It override above P3's write and then P3 observe data corruption.



The solution is to keep holding mmap_sem until finishing process_pages access
because mmap_sem prevent fork. and then race never be happen. I mean you cann't use
get_user_pages_unlock() if target address point to anon pages.

I'm not sure these story match your explanation.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/26] fsl_hypervisor: Convert ioctl_memcpy() to use get_user_pages_fast()
  2013-10-02 14:27 ` [PATCH 13/26] fsl_hypervisor: Convert ioctl_memcpy() " Jan Kara
@ 2013-10-04  2:38   ` Timur Tabi
  0 siblings, 0 replies; 55+ messages in thread
From: Timur Tabi @ 2013-10-04  2:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, Timur Tabi

On Wed, Oct 2, 2013 at 9:27 AM, Jan Kara <jack@suse.cz> wrote:
> CC: Timur Tabi <timur@freescale.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

This seems okay, but I don't have access to hardware at the moment to test it.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 15:38         ` Jan Kara
@ 2013-10-04 13:39           ` Marciniszyn, Mike
  2013-10-04 13:46             ` Marciniszyn, Mike
  0 siblings, 1 reply; 55+ messages in thread
From: Marciniszyn, Mike @ 2013-10-04 13:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, infinipath, Roland Dreier, linux-rdma



> -----Original Message-----
> From: Jan Kara [mailto:jack@suse.cz]
> Sent: Wednesday, October 02, 2013 11:39 AM
> To: Marciniszyn, Mike
> Cc: Jan Kara; LKML; linux-mm@kvack.org; infinipath; Roland Dreier; linux-
> rdma@vger.kernel.org
> Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> get_user_pages_unlocked()
> 
> On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote:
> > > > The risk of GUP fast is the loss of the "force" arg on GUP fast,
> > > > which I don't see as significant give our use case.
> > >   Yes. I was discussing with Roland some time ago whether the force
> > > argument is needed and he said it is. So I kept the arguments of
> > > get_user_pages() intact and just simplified the locking...
> >
> > The PSM side of the code is a more traditional use of GUP (like direct
> > I/O), so I think it is a different use case than the locking for IB
> > memory regions.
>   Ah, I see. Whatever suits you best. I don't really care as long as
> get_user_pages() locking doesn't leak into IB drivers :)
> 
> 								Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 15:32       ` Marciniszyn, Mike
  2013-10-02 15:38         ` Jan Kara
@ 2013-10-04 13:44         ` Marciniszyn, Mike
  1 sibling, 0 replies; 55+ messages in thread
From: Marciniszyn, Mike @ 2013-10-04 13:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

> The PSM side of the code is a more traditional use of GUP (like direct I/O), so
> I think it is a different use case than the locking for IB memory regions.

I have resubmitted the two deadlock fixes using get_user_pages_fast() and marked them stable.

See http://marc.info/?l=linux-rdma&m=138089335506355&w=2

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-04 13:39           ` Marciniszyn, Mike
@ 2013-10-04 13:46             ` Marciniszyn, Mike
  0 siblings, 0 replies; 55+ messages in thread
From: Marciniszyn, Mike @ 2013-10-04 13:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

Inadvertent send!

Mike

> -----Original Message-----
> From: Marciniszyn, Mike
> Sent: Friday, October 04, 2013 9:39 AM
> To: Jan Kara
> Cc: LKML; linux-mm@kvack.org; infinipath; Roland Dreier; linux-
> rdma@vger.kernel.org
> Subject: RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> get_user_pages_unlocked()
> 
> 
> 
> > -----Original Message-----
> > From: Jan Kara [mailto:jack@suse.cz]
> > Sent: Wednesday, October 02, 2013 11:39 AM
> > To: Marciniszyn, Mike
> > Cc: Jan Kara; LKML; linux-mm@kvack.org; infinipath; Roland Dreier;
> > linux- rdma@vger.kernel.org
> > Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> > get_user_pages_unlocked()
> >
> > On Wed 02-10-13 15:32:47, Marciniszyn, Mike wrote:
> > > > > The risk of GUP fast is the loss of the "force" arg on GUP fast,
> > > > > which I don't see as significant give our use case.
> > > >   Yes. I was discussing with Roland some time ago whether the
> > > > force argument is needed and he said it is. So I kept the
> > > > arguments of
> > > > get_user_pages() intact and just simplified the locking...
> > >
> > > The PSM side of the code is a more traditional use of GUP (like
> > > direct I/O), so I think it is a different use case than the locking
> > > for IB memory regions.
> >   Ah, I see. Whatever suits you best. I don't really care as long as
> > get_user_pages() locking doesn't leak into IB drivers :)
> >
> > 								Honza
> > --
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
  2013-10-02 14:54   ` Marciniszyn, Mike
@ 2013-10-04 13:52   ` Marciniszyn, Mike
  2013-10-04 18:33     ` Jan Kara
  1 sibling, 1 reply; 55+ messages in thread
From: Marciniszyn, Mike @ 2013-10-04 13:52 UTC (permalink / raw)
  To: Jan Kara, LKML; +Cc: linux-mm, infinipath, Roland Dreier, linux-rdma

> Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> shortens the section where we hold mmap_sem for writing and also removes
> the knowledge about get_user_pages() locking from ipath driver. We also fix
> a bug in testing pinned number of pages when changing the code.
> 

This patch and the sibling ipath patch will nominally take the mmap_sem twice where the old routine only took it once.   This is a performance issue.

Is the intent here to deprecate get_user_pages()?

I agree, the old code's lock limit test is broke and needs to be fixed.   I like the elimination of the silly wrapper routine!

Could the lock limit test be pushed into another version of the wrapper so that there is only one set of mmap_sem transactions?

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-04 13:52   ` Marciniszyn, Mike
@ 2013-10-04 18:33     ` Jan Kara
  2013-10-07 15:20       ` Marciniszyn, Mike
  2013-10-07 15:38       ` Marciniszyn, Mike
  0 siblings, 2 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-04 18:33 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

On Fri 04-10-13 13:52:49, Marciniszyn, Mike wrote:
> > Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> > shortens the section where we hold mmap_sem for writing and also removes
> > the knowledge about get_user_pages() locking from ipath driver. We also fix
> > a bug in testing pinned number of pages when changing the code.
> > 
> 
> This patch and the sibling ipath patch will nominally take the mmap_sem
> twice where the old routine only took it once.   This is a performance
> issue.
  It will take mmap_sem only once during normal operation. Only if
get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
the change of mm->pinned_vm.

> Is the intent here to deprecate get_user_pages()?
  Well, as much as I'd like to, there are really places in mm code which
need to call get_user_pages() while holding mmap_sem to be able to inspect
corresponding vmas etc. So I want to reduce get_user_pages() use as much as
possible but I'm not really hoping in completely removing it.

> I agree, the old code's lock limit test is broke and needs to be fixed.
> I like the elimination of the silly wrapper routine!
> 
> Could the lock limit test be pushed into another version of the wrapper
> so that there is only one set of mmap_sem transactions?
  I'm sorry, I don't understand what you mean here...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/26] lustre: Convert ll_get_user_pages() to use get_user_pages_fast()
  2013-10-02 14:27 ` [PATCH 10/26] lustre: Convert ll_get_user_pages() " Jan Kara
@ 2013-10-05  6:27   ` Dilger, Andreas
  0 siblings, 0 replies; 55+ messages in thread
From: Dilger, Andreas @ 2013-10-05  6:27 UTC (permalink / raw)
  To: Jan Kara, LKML
  Cc: linux-mm, Greg Kroah-Hartman, Peng Tao, hpdd-discuss@lists.01.org

On 2013/10/02 8:27 AM, "Jan Kara" <jack@suse.cz> wrote:
>CC: Greg Kroah-Hartman <greg@kroah.com>
>CC: Peng Tao <tao.peng@emc.com>
>CC: Andreas Dilger <andreas.dilger@intel.com>
>CC: hpdd-discuss@lists.01.org
>Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Andreas Dilger <andreas.dilger@intel.com>

>---
> drivers/staging/lustre/lustre/llite/rw26.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/staging/lustre/lustre/llite/rw26.c
>b/drivers/staging/lustre/lustre/llite/rw26.c
>index 96c29ad2fc8c..7e3e0967993b 100644
>--- a/drivers/staging/lustre/lustre/llite/rw26.c
>+++ b/drivers/staging/lustre/lustre/llite/rw26.c
>@@ -202,11 +202,8 @@ static inline int ll_get_user_pages(int rw, unsigned
>long user_addr,
> 
> 	OBD_ALLOC_LARGE(*pages, *max_pages * sizeof(**pages));
> 	if (*pages) {
>-		down_read(&current->mm->mmap_sem);
>-		result = get_user_pages(current, current->mm, user_addr,
>-					*max_pages, (rw == READ), 0, *pages,
>-					NULL);
>-		up_read(&current->mm->mmap_sem);
>+		result = get_user_pages_fast(user_addr, *max_pages,
>+					     (rw == READ), *pages);
> 		if (unlikely(result <= 0))
> 			OBD_FREE_LARGE(*pages, *max_pages * sizeof(**pages));
> 	}


Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 19/26] ivtv: Convert driver to use get_user_pages_unlocked()
  2013-10-02 14:28 ` [PATCH 19/26] ivtv: Convert driver " Jan Kara
@ 2013-10-05 12:02   ` Andy Walls
  2013-10-07 17:22     ` Jan Kara
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Walls @ 2013-10-05 12:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, linux-media

Hi Jan:

<rant>
This patch alone does not have suffcient information for me to evaluate
it.  get_user_pages_unlocked() is added in another patch which I did not
receive, and which I cannot find in any list archives.

I wasted quite a bit of time looking for this additional patch:

https://git.kernel.org/cgit/linux/kernel/git/jack/linux-fs.git/commit/?h=get_user_pages&id=624fc1bfb70fb65d32d31fbd16427ad9c234653e

</rant>

If I found the correct patch for adding get_user_pages_unlocked(), then
the patch below looks fine.

Reviewed-by: Andy Walls <awalls@md.metrocast.net>
Acked-by: Andy Walls <awalls@md.metrocast.net>

Regards,
Andy

On Wed, 2013-10-02 at 16:28 +0200, Jan Kara wrote:
> CC: Andy Walls <awalls@md.metrocast.net>
> CC: linux-media@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  drivers/media/pci/ivtv/ivtv-udma.c |  6 ++----
>  drivers/media/pci/ivtv/ivtv-yuv.c  | 12 ++++++------
>  2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/media/pci/ivtv/ivtv-udma.c b/drivers/media/pci/ivtv/ivtv-udma.c
> index 7338cb2d0a38..6012e5049076 100644
> --- a/drivers/media/pci/ivtv/ivtv-udma.c
> +++ b/drivers/media/pci/ivtv/ivtv-udma.c
> @@ -124,10 +124,8 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr,
>  	}
>  
>  	/* Get user pages for DMA Xfer */
> -	down_read(&current->mm->mmap_sem);
> -	err = get_user_pages(current, current->mm,
> -			user_dma.uaddr, user_dma.page_count, 0, 1, dma->map, NULL);
> -	up_read(&current->mm->mmap_sem);
> +	err = get_user_pages_unlocked(current, current->mm, user_dma.uaddr,
> +				      user_dma.page_count, 0, 1, dma->map);
>  
>  	if (user_dma.page_count != err) {
>  		IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
> diff --git a/drivers/media/pci/ivtv/ivtv-yuv.c b/drivers/media/pci/ivtv/ivtv-yuv.c
> index 2ad65eb29832..9365995917d8 100644
> --- a/drivers/media/pci/ivtv/ivtv-yuv.c
> +++ b/drivers/media/pci/ivtv/ivtv-yuv.c
> @@ -75,15 +75,15 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct ivtv_user_dma *dma,
>  	ivtv_udma_get_page_info (&uv_dma, (unsigned long)args->uv_source, 360 * uv_decode_height);
>  
>  	/* Get user pages for DMA Xfer */
> -	down_read(&current->mm->mmap_sem);
> -	y_pages = get_user_pages(current, current->mm, y_dma.uaddr, y_dma.page_count, 0, 1, &dma->map[0], NULL);
> +	y_pages = get_user_pages_unlocked(current, current->mm, y_dma.uaddr,
> +					  y_dma.page_count, 0, 1, &dma->map[0]);
>  	uv_pages = 0; /* silence gcc. value is set and consumed only if: */
>  	if (y_pages == y_dma.page_count) {
> -		uv_pages = get_user_pages(current, current->mm,
> -					  uv_dma.uaddr, uv_dma.page_count, 0, 1,
> -					  &dma->map[y_pages], NULL);
> +		uv_pages = get_user_pages_unlocked(current, current->mm,
> +						   uv_dma.uaddr,
> +						   uv_dma.page_count, 0, 1,
> +						   &dma->map[y_pages]);
>  	}
> -	up_read(&current->mm->mmap_sem);
>  
>  	if (y_pages != y_dma.page_count || uv_pages != uv_dma.page_count) {
>  		int rc = -EFAULT;



^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-04 18:33     ` Jan Kara
@ 2013-10-07 15:20       ` Marciniszyn, Mike
  2013-10-07 15:38       ` Marciniszyn, Mike
  1 sibling, 0 replies; 55+ messages in thread
From: Marciniszyn, Mike @ 2013-10-07 15:20 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, infinipath, Roland Dreier, linux-rdma



> -----Original Message-----
> From: Jan Kara [mailto:jack@suse.cz]
> Sent: Friday, October 04, 2013 2:33 PM
> To: Marciniszyn, Mike
> Cc: Jan Kara; LKML; linux-mm@kvack.org; infinipath; Roland Dreier; linux-
> rdma@vger.kernel.org
> Subject: Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to
> get_user_pages_unlocked()
> 
> On Fri 04-10-13 13:52:49, Marciniszyn, Mike wrote:
> > > Convert qib_get_user_pages() to use get_user_pages_unlocked().  This
> > > shortens the section where we hold mmap_sem for writing and also
> > > removes the knowledge about get_user_pages() locking from ipath
> > > driver. We also fix a bug in testing pinned number of pages when
> changing the code.
> > >
> >
> > This patch and the sibling ipath patch will nominally take the mmap_sem
> > twice where the old routine only took it once.   This is a performance
> > issue.
>   It will take mmap_sem only once during normal operation. Only if
> get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> the change of mm->pinned_vm.
> 
> > Is the intent here to deprecate get_user_pages()?
>   Well, as much as I'd like to, there are really places in mm code which need
> to call get_user_pages() while holding mmap_sem to be able to inspect
> corresponding vmas etc. So I want to reduce get_user_pages() use as much
> as possible but I'm not really hoping in completely removing it.
> 
> > I agree, the old code's lock limit test is broke and needs to be fixed.
> > I like the elimination of the silly wrapper routine!
> >
> > Could the lock limit test be pushed into another version of the
> > wrapper so that there is only one set of mmap_sem transactions?
>   I'm sorry, I don't understand what you mean here...
> 
> 								Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-04 18:33     ` Jan Kara
  2013-10-07 15:20       ` Marciniszyn, Mike
@ 2013-10-07 15:38       ` Marciniszyn, Mike
  2013-10-07 17:26         ` Jan Kara
  1 sibling, 1 reply; 55+ messages in thread
From: Marciniszyn, Mike @ 2013-10-07 15:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

> > This patch and the sibling ipath patch will nominally take the mmap_sem
> > twice where the old routine only took it once.   This is a performance
> > issue.
>   It will take mmap_sem only once during normal operation. Only if
> get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> the change of mm->pinned_vm.
> 
> > Is the intent here to deprecate get_user_pages()?

The old code looked like:
__qib_get_user_pages()
	(broken) ulimit test
             for (...)
		get_user_pages()

qib_get_user_pages()
	mmap_sem lock
	__qib_get_user_pages()
             mmap_sem() unlock

The new code is:

get_user_pages_unlocked()
	mmap_sem  lock
	get_user_pages()
	mmap_sem unlock

qib_get_user_pages()
	mmap_sem lock
             ulimit test and locked pages maintenance
             mmap_sem unlock
	for (...)
		get_user_pages_unlocked()

I count an additional pair of mmap_sem transactions in the normal case.

> 
> > Could the lock limit test be pushed into another version of the
> > wrapper so that there is only one set of mmap_sem transactions?
>   I'm sorry, I don't understand what you mean here...
> 

This is what I had in mind:

get_user_pages_ulimit_unlocked()
	mmap_sem  lock
	ulimit test and locked pages maintenance (from qib/ipath)
             for (...)
	       get_user_pages_unlocked()	
	mmap_sem unlock
	
qib_get_user_pages()
	get_user_pages_ulimit_unlocked()

This really pushes the code into a new wrapper common to ipath/qib and any others that might want to combine locking with ulimit enforcement.

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 19/26] ivtv: Convert driver to use get_user_pages_unlocked()
  2013-10-05 12:02   ` Andy Walls
@ 2013-10-07 17:22     ` Jan Kara
  0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-07 17:22 UTC (permalink / raw)
  To: Andy Walls; +Cc: Jan Kara, LKML, linux-mm, linux-media

  Hello,

On Sat 05-10-13 08:02:21, Andy Walls wrote:
> <rant>
> This patch alone does not have suffcient information for me to evaluate
> it.  get_user_pages_unlocked() is added in another patch which I did not
> receive, and which I cannot find in any list archives.
> 
> I wasted quite a bit of time looking for this additional patch:
> 
> https://git.kernel.org/cgit/linux/kernel/git/jack/linux-fs.git/commit/?h=get_user_pages&id=624fc1bfb70fb65d32d31fbd16427ad9c234653e
> 
> </rant>
  Sorry, I should have CCed that patch to driver maintainers as well.

> If I found the correct patch for adding get_user_pages_unlocked(), then
> the patch below looks fine.
> 
> Reviewed-by: Andy Walls <awalls@md.metrocast.net>
> Acked-by: Andy Walls <awalls@md.metrocast.net>
  Thanks.

								Honza

> On Wed, 2013-10-02 at 16:28 +0200, Jan Kara wrote:
> > CC: Andy Walls <awalls@md.metrocast.net>
> > CC: linux-media@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  drivers/media/pci/ivtv/ivtv-udma.c |  6 ++----
> >  drivers/media/pci/ivtv/ivtv-yuv.c  | 12 ++++++------
> >  2 files changed, 8 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/media/pci/ivtv/ivtv-udma.c b/drivers/media/pci/ivtv/ivtv-udma.c
> > index 7338cb2d0a38..6012e5049076 100644
> > --- a/drivers/media/pci/ivtv/ivtv-udma.c
> > +++ b/drivers/media/pci/ivtv/ivtv-udma.c
> > @@ -124,10 +124,8 @@ int ivtv_udma_setup(struct ivtv *itv, unsigned long ivtv_dest_addr,
> >  	}
> >  
> >  	/* Get user pages for DMA Xfer */
> > -	down_read(&current->mm->mmap_sem);
> > -	err = get_user_pages(current, current->mm,
> > -			user_dma.uaddr, user_dma.page_count, 0, 1, dma->map, NULL);
> > -	up_read(&current->mm->mmap_sem);
> > +	err = get_user_pages_unlocked(current, current->mm, user_dma.uaddr,
> > +				      user_dma.page_count, 0, 1, dma->map);
> >  
> >  	if (user_dma.page_count != err) {
> >  		IVTV_DEBUG_WARN("failed to map user pages, returned %d instead of %d\n",
> > diff --git a/drivers/media/pci/ivtv/ivtv-yuv.c b/drivers/media/pci/ivtv/ivtv-yuv.c
> > index 2ad65eb29832..9365995917d8 100644
> > --- a/drivers/media/pci/ivtv/ivtv-yuv.c
> > +++ b/drivers/media/pci/ivtv/ivtv-yuv.c
> > @@ -75,15 +75,15 @@ static int ivtv_yuv_prep_user_dma(struct ivtv *itv, struct ivtv_user_dma *dma,
> >  	ivtv_udma_get_page_info (&uv_dma, (unsigned long)args->uv_source, 360 * uv_decode_height);
> >  
> >  	/* Get user pages for DMA Xfer */
> > -	down_read(&current->mm->mmap_sem);
> > -	y_pages = get_user_pages(current, current->mm, y_dma.uaddr, y_dma.page_count, 0, 1, &dma->map[0], NULL);
> > +	y_pages = get_user_pages_unlocked(current, current->mm, y_dma.uaddr,
> > +					  y_dma.page_count, 0, 1, &dma->map[0]);
> >  	uv_pages = 0; /* silence gcc. value is set and consumed only if: */
> >  	if (y_pages == y_dma.page_count) {
> > -		uv_pages = get_user_pages(current, current->mm,
> > -					  uv_dma.uaddr, uv_dma.page_count, 0, 1,
> > -					  &dma->map[y_pages], NULL);
> > +		uv_pages = get_user_pages_unlocked(current, current->mm,
> > +						   uv_dma.uaddr,
> > +						   uv_dma.page_count, 0, 1,
> > +						   &dma->map[y_pages]);
> >  	}
> > -	up_read(&current->mm->mmap_sem);
> >  
> >  	if (y_pages != y_dma.page_count || uv_pages != uv_dma.page_count) {
> >  		int rc = -EFAULT;
> 
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-07 15:38       ` Marciniszyn, Mike
@ 2013-10-07 17:26         ` Jan Kara
  2013-10-08 19:06           ` Jan Kara
  0 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-07 17:26 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

On Mon 07-10-13 15:38:24, Marciniszyn, Mike wrote:
> > > This patch and the sibling ipath patch will nominally take the mmap_sem
> > > twice where the old routine only took it once.   This is a performance
> > > issue.
> >   It will take mmap_sem only once during normal operation. Only if
> > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> > the change of mm->pinned_vm.
> > 
> > > Is the intent here to deprecate get_user_pages()?
> 
> The old code looked like:
> __qib_get_user_pages()
> 	(broken) ulimit test
>              for (...)
> 		get_user_pages()
> 
> qib_get_user_pages()
> 	mmap_sem lock
> 	__qib_get_user_pages()
>              mmap_sem() unlock
> 
> The new code is:
> 
> get_user_pages_unlocked()
> 	mmap_sem  lock
> 	get_user_pages()
> 	mmap_sem unlock
> 
> qib_get_user_pages()
> 	mmap_sem lock
>              ulimit test and locked pages maintenance
>              mmap_sem unlock
> 	for (...)
> 		get_user_pages_unlocked()
> 
> I count an additional pair of mmap_sem transactions in the normal case.
  Ah, sorry, you are right.

> > > Could the lock limit test be pushed into another version of the
> > > wrapper so that there is only one set of mmap_sem transactions?
> >   I'm sorry, I don't understand what you mean here...
> > 
> 
> This is what I had in mind:
> 
> get_user_pages_ulimit_unlocked()
> 	mmap_sem  lock
> 	ulimit test and locked pages maintenance (from qib/ipath)
>              for (...)
> 	       get_user_pages_unlocked()	
> 	mmap_sem unlock
> 	
> qib_get_user_pages()
> 	get_user_pages_ulimit_unlocked()
> 
> This really pushes the code into a new wrapper common to ipath/qib and
> any others that might want to combine locking with ulimit enforcement.
  We could do that but frankly, I'd rather change ulimit enforcement to not
require mmap_sem and use atomic counter instead. I'll see what I can do.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked()
  2013-10-03 22:40       ` KOSAKI Motohiro
@ 2013-10-07 20:55         ` Jan Kara
  2013-10-08  0:10           ` KOSAKI Motohiro
  0 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-07 20:55 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Jan Kara, LKML, linux-mm

On Thu 03-10-13 18:40:06, KOSAKI Motohiro wrote:
> (10/2/13 3:36 PM), Jan Kara wrote:
> >On Wed 02-10-13 12:32:33, KOSAKI Motohiro wrote:
> >>(10/2/13 10:27 AM), Jan Kara wrote:
> >>>Signed-off-by: Jan Kara <jack@suse.cz>
> >>>---
> >>>   mm/process_vm_access.c | 8 ++------
> >>>   1 file changed, 2 insertions(+), 6 deletions(-)
> >>>
> >>>diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> >>>index fd26d0433509..c1bc47d8ed90 100644
> >>>--- a/mm/process_vm_access.c
> >>>+++ b/mm/process_vm_access.c
> >>>@@ -64,12 +64,8 @@ static int process_vm_rw_pages(struct task_struct *task,
> >>>   	*bytes_copied = 0;
> >>>
> >>>   	/* Get the pages we're interested in */
> >>>-	down_read(&mm->mmap_sem);
> >>>-	pages_pinned = get_user_pages(task, mm, pa,
> >>>-				      nr_pages_to_copy,
> >>>-				      vm_write, 0, process_pages, NULL);
> >>>-	up_read(&mm->mmap_sem);
> >>>-
> >>>+	pages_pinned = get_user_pages_unlocked(task, mm, pa, nr_pages_to_copy,
> >>>+					       vm_write, 0, process_pages);
> >>>   	if (pages_pinned != nr_pages_to_copy) {
> >>>   		rc = -EFAULT;
> >>>   		goto end;
> >>
> >>This is wrong because original code is wrong. In this function, page may
> >>be pointed to anon pages. Then, you should keep to take mmap_sem until
> >>finish to copying. Otherwise concurrent fork() makes nasty COW issue.
> >   Hum, can you be more specific? I suppose you are speaking about situation
> >when the remote task we are copying data from/to does fork while
> >process_vm_rw_pages() runs. If we are copying data from remote task, I
> >don't see how COW could cause any problem. If we are copying to remote task
> >and fork happens after get_user_pages() but before copy_to_user() then I
> >can see we might be having some trouble. copy_to_user() would then copy
> >data into both original remote process and its child thus essentially
> >bypassing COW. If the child process manages to COW some of the pages before
> >copy_to_user() happens, it can even see only some of the pages. Is that what
> >you mean?
> 
> scenario 1: vm_write==0
> 
> Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages
> P1 unlock mmap_sem.
> Process P2 call fork(). and make P3.
> P2 write memory pa. now the "process_pages" is owned by P3 (the child process)
> P3 write memory pa. and then the content of "process_pages" is changed.
> P1 read process_pages as P2's page. but actually, it is P3's data. Then,
>   P1 observe garbage, at least unintended, data was read.
  Yeah, this really looks buggy because P1 can see data in (supposedly)
P2's address space which P2 never wrote there.

> scenario 2: vm_write==1
> 
> Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages.
> It makes COW break and any anon page sharing broke.
> P1 unlock mmap_sem.
> P2 call fork(). and make P3. And, now COW page sharing is restored.
> P2 write memory pa. now the "process_pages" is owned by P3.
> P3 write memory pa. it mean P3 changes "process_pages".
> P1 write process_pages as P2's page. but actually, it is P3's. It
> override above P3's write and then P3 observe data corruption.
  Yep, this is a similar problem as above. Thanks for pointing this out.

> The solution is to keep holding mmap_sem until finishing process_pages
> access because mmap_sem prevent fork. and then race never be happen. I
> mean you cann't use get_user_pages_unlock() if target address point to
> anon pages.
  Yeah, if you are accessing third party mm, you've convinced me you
currently need mmap_sem to avoid problems with COW on anon pages. I'm just
thinking that this "hold mmap_sem to prevent fork" is somewhat subtle
(definitely would deserve a comment) and if it would be needed in more
places we might be better off if we have a more explicit mechanism for that
(like a special lock, fork sequence count, or something like that). Anyway
I'll have this in mind and if I see other places that need this, I'll try
to come up with some solution. Thanks again for explanation.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked()
  2013-10-07 20:55         ` Jan Kara
@ 2013-10-08  0:10           ` KOSAKI Motohiro
  0 siblings, 0 replies; 55+ messages in thread
From: KOSAKI Motohiro @ 2013-10-08  0:10 UTC (permalink / raw)
  To: Jan Kara; +Cc: KOSAKI Motohiro, LKML, linux-mm

(10/7/13 4:55 PM), Jan Kara wrote:
> On Thu 03-10-13 18:40:06, KOSAKI Motohiro wrote:
>> (10/2/13 3:36 PM), Jan Kara wrote:
>>> On Wed 02-10-13 12:32:33, KOSAKI Motohiro wrote:
>>>> (10/2/13 10:27 AM), Jan Kara wrote:
>>>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>>>> ---
>>>>>    mm/process_vm_access.c | 8 ++------
>>>>>    1 file changed, 2 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
>>>>> index fd26d0433509..c1bc47d8ed90 100644
>>>>> --- a/mm/process_vm_access.c
>>>>> +++ b/mm/process_vm_access.c
>>>>> @@ -64,12 +64,8 @@ static int process_vm_rw_pages(struct task_struct *task,
>>>>>    	*bytes_copied = 0;
>>>>>
>>>>>    	/* Get the pages we're interested in */
>>>>> -	down_read(&mm->mmap_sem);
>>>>> -	pages_pinned = get_user_pages(task, mm, pa,
>>>>> -				      nr_pages_to_copy,
>>>>> -				      vm_write, 0, process_pages, NULL);
>>>>> -	up_read(&mm->mmap_sem);
>>>>> -
>>>>> +	pages_pinned = get_user_pages_unlocked(task, mm, pa, nr_pages_to_copy,
>>>>> +					       vm_write, 0, process_pages);
>>>>>    	if (pages_pinned != nr_pages_to_copy) {
>>>>>    		rc = -EFAULT;
>>>>>    		goto end;
>>>>
>>>> This is wrong because original code is wrong. In this function, page may
>>>> be pointed to anon pages. Then, you should keep to take mmap_sem until
>>>> finish to copying. Otherwise concurrent fork() makes nasty COW issue.
>>>    Hum, can you be more specific? I suppose you are speaking about situation
>>> when the remote task we are copying data from/to does fork while
>>> process_vm_rw_pages() runs. If we are copying data from remote task, I
>>> don't see how COW could cause any problem. If we are copying to remote task
>>> and fork happens after get_user_pages() but before copy_to_user() then I
>>> can see we might be having some trouble. copy_to_user() would then copy
>>> data into both original remote process and its child thus essentially
>>> bypassing COW. If the child process manages to COW some of the pages before
>>> copy_to_user() happens, it can even see only some of the pages. Is that what
>>> you mean?
>>
>> scenario 1: vm_write==0
>>
>> Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages
>> P1 unlock mmap_sem.
>> Process P2 call fork(). and make P3.
>> P2 write memory pa. now the "process_pages" is owned by P3 (the child process)
>> P3 write memory pa. and then the content of "process_pages" is changed.
>> P1 read process_pages as P2's page. but actually, it is P3's data. Then,
>>    P1 observe garbage, at least unintended, data was read.
>    Yeah, this really looks buggy because P1 can see data in (supposedly)
> P2's address space which P2 never wrote there.
>
>> scenario 2: vm_write==1
>>
>> Process P1 call get_user_pages(pa, process_pages) in process_vm_rw_pages.
>> It makes COW break and any anon page sharing broke.
>> P1 unlock mmap_sem.
>> P2 call fork(). and make P3. And, now COW page sharing is restored.
>> P2 write memory pa. now the "process_pages" is owned by P3.
>> P3 write memory pa. it mean P3 changes "process_pages".
>> P1 write process_pages as P2's page. but actually, it is P3's. It
>> override above P3's write and then P3 observe data corruption.
>    Yep, this is a similar problem as above. Thanks for pointing this out.
>
>> The solution is to keep holding mmap_sem until finishing process_pages
>> access because mmap_sem prevent fork. and then race never be happen. I
>> mean you cann't use get_user_pages_unlock() if target address point to
>> anon pages.
>    Yeah, if you are accessing third party mm,

one correction. the condition is,

  - third party mm, or
  - current process have multiple threads and other threads makes fork() and COW break

>you've convinced me you
> currently need mmap_sem to avoid problems with COW on anon pages. I'm just
> thinking that this "hold mmap_sem to prevent fork" is somewhat subtle
> (definitely would deserve a comment) and if it would be needed in more
> places we might be better off if we have a more explicit mechanism for that
> (like a special lock, fork sequence count, or something like that).

Hmm..

Actually, I tried this several years ago. But MM people disliked it because
they prefer faster kernel than simple code. Any additional lock potentially
makes slower the kernel.

However, I fully agree the current mechanism is too complex and misleading,
or at least, very hard to understanding. So, any improvement idea is welcome.

> Anyway
> I'll have this in mind and if I see other places that need this, I'll try
> to come up with some solution. Thanks again for explanation.

NP.

I tried to explain this at MM summit. but my English is too poor and I couldn't
explain enough to you. Sorry about that.


Anyway, I'd like to fix process_vm_rw_pages() before my memory will be flushed again.
Thank you for helping remember this bug.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-07 17:26         ` Jan Kara
@ 2013-10-08 19:06           ` Jan Kara
  2013-10-16 21:39             ` Jan Kara
  0 siblings, 1 reply; 55+ messages in thread
From: Jan Kara @ 2013-10-08 19:06 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

On Mon 07-10-13 19:26:04, Jan Kara wrote:
> On Mon 07-10-13 15:38:24, Marciniszyn, Mike wrote:
> > > > This patch and the sibling ipath patch will nominally take the mmap_sem
> > > > twice where the old routine only took it once.   This is a performance
> > > > issue.
> > >   It will take mmap_sem only once during normal operation. Only if
> > > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> > > the change of mm->pinned_vm.
> > > 
> > > > Is the intent here to deprecate get_user_pages()?
> > 
> > The old code looked like:
> > __qib_get_user_pages()
> > 	(broken) ulimit test
> >              for (...)
> > 		get_user_pages()
> > 
> > qib_get_user_pages()
> > 	mmap_sem lock
> > 	__qib_get_user_pages()
> >              mmap_sem() unlock
> > 
> > The new code is:
> > 
> > get_user_pages_unlocked()
> > 	mmap_sem  lock
> > 	get_user_pages()
> > 	mmap_sem unlock
> > 
> > qib_get_user_pages()
> > 	mmap_sem lock
> >              ulimit test and locked pages maintenance
> >              mmap_sem unlock
> > 	for (...)
> > 		get_user_pages_unlocked()
> > 
> > I count an additional pair of mmap_sem transactions in the normal case.
>   Ah, sorry, you are right.
> 
> > > > Could the lock limit test be pushed into another version of the
> > > > wrapper so that there is only one set of mmap_sem transactions?
> > >   I'm sorry, I don't understand what you mean here...
> > > 
> > 
> > This is what I had in mind:
> > 
> > get_user_pages_ulimit_unlocked()
> > 	mmap_sem  lock
> > 	ulimit test and locked pages maintenance (from qib/ipath)
> >              for (...)
> > 	       get_user_pages_unlocked()	
> > 	mmap_sem unlock
> > 	
> > qib_get_user_pages()
> > 	get_user_pages_ulimit_unlocked()
> > 
> > This really pushes the code into a new wrapper common to ipath/qib and
> > any others that might want to combine locking with ulimit enforcement.
>   We could do that but frankly, I'd rather change ulimit enforcement to not
> require mmap_sem and use atomic counter instead. I'll see what I can do.
  OK, so something like the attached patch (compile tested only). What do
you think? I'm just not 100% sure removing mmap_sem surrounding stuff like
__ipath_release_user_pages() is safe. I don't see a reason why it shouldn't
be - we have references to the pages and we only mark them dirty and put the
reference - but maybe I miss something subtle...

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-mm-Switch-mm-pinned_vm-to-atomic_long_t.patch --]
[-- Type: text/x-patch, Size: 14825 bytes --]

>From 44fe90e8303b293370f077ae665d6e43846cf277 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 8 Oct 2013 14:16:24 +0200
Subject: [PATCH] mm: Switch mm->pinned_vm to atomic_long_t

Currently updates to mm->pinned_vm were protected by mmap_sem.
kernel/events/core.c actually held it only for reading so that may have
been racy. The only other user - Infiniband - used mmap_sem for writing
but it caused quite some complications to it.

So switch mm->pinned_vm and convert all the places using it. This allows
quite some simplifications to Infiniband code and it now doesn't have to
care about mmap_sem at all.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/infiniband/core/umem.c                 | 56 +++------------------
 drivers/infiniband/core/uverbs_cmd.c           |  1 -
 drivers/infiniband/core/uverbs_main.c          |  2 -
 drivers/infiniband/hw/ipath/ipath_file_ops.c   |  2 +-
 drivers/infiniband/hw/ipath/ipath_kernel.h     |  1 -
 drivers/infiniband/hw/ipath/ipath_user_pages.c | 70 +++-----------------------
 drivers/infiniband/hw/qib/qib_user_pages.c     | 20 ++------
 fs/proc/task_mmu.c                             |  2 +-
 include/linux/mm_types.h                       |  2 +-
 include/rdma/ib_umem.h                         |  3 --
 include/rdma/ib_verbs.h                        |  1 -
 kernel/events/core.c                           | 13 +++--
 12 files changed, 29 insertions(+), 144 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 0640a89021a9..294d2e468177 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -80,6 +80,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 {
 	struct ib_umem *umem;
 	struct page **page_list;
+	struct mm_struct *mm = current->mm;
 	struct ib_umem_chunk *chunk;
 	unsigned long locked;
 	unsigned long lock_limit;
@@ -126,20 +127,13 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	npages = PAGE_ALIGN(size + umem->offset) >> PAGE_SHIFT;
 
-	down_write(&current->mm->mmap_sem);
-
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 	locked = npages;
-	if (npages + current->mm->pinned_vm > lock_limit &&
+	if (atomic_long_add_return(&mm->pinned_vm, npages) > lock_limit &&
 	    !capable(CAP_IPC_LOCK)) {
-		up_write(&current->mm->mmap_sem);
-		kfree(umem);
-		free_page((unsigned long) page_list);
-		return ERR_PTR(-ENOMEM);
+		ret = -ENOMEM;
+		goto out;
 	}
-	current->mm->pinned_vm += npages;
-
-	up_write(&current->mm->mmap_sem);
 
 	cur_base = addr & PAGE_MASK;
 
@@ -201,9 +195,7 @@ out:
 	if (ret < 0) {
 		__ib_umem_release(context->device, umem, 0);
 		kfree(umem);
-		down_write(&current->mm->mmap_sem);
-		current->mm->pinned_vm -= locked;
-		up_write(&current->mm->mmap_sem);
+		atomic_long_sub(&mm->pinned_vm, locked);
 	}
 	free_page((unsigned long) page_list);
 
@@ -211,17 +203,6 @@ out:
 }
 EXPORT_SYMBOL(ib_umem_get);
 
-static void ib_umem_account(struct work_struct *work)
-{
-	struct ib_umem *umem = container_of(work, struct ib_umem, work);
-
-	down_write(&umem->mm->mmap_sem);
-	umem->mm->pinned_vm -= umem->diff;
-	up_write(&umem->mm->mmap_sem);
-	mmput(umem->mm);
-	kfree(umem);
-}
-
 /**
  * ib_umem_release - release memory pinned with ib_umem_get
  * @umem: umem struct to release
@@ -234,37 +215,14 @@ void ib_umem_release(struct ib_umem *umem)
 
 	__ib_umem_release(umem->context->device, umem, 1);
 
-	mm = get_task_mm(current);
+	mm = current->mm;
 	if (!mm) {
 		kfree(umem);
 		return;
 	}
 
 	diff = PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT;
-
-	/*
-	 * We may be called with the mm's mmap_sem already held.  This
-	 * can happen when a userspace munmap() is the call that drops
-	 * the last reference to our file and calls our release
-	 * method.  If there are memory regions to destroy, we'll end
-	 * up here and not be able to take the mmap_sem.  In that case
-	 * we defer the vm_locked accounting to the system workqueue.
-	 */
-	if (context->closing) {
-		if (!down_write_trylock(&mm->mmap_sem)) {
-			INIT_WORK(&umem->work, ib_umem_account);
-			umem->mm   = mm;
-			umem->diff = diff;
-
-			queue_work(ib_wq, &umem->work);
-			return;
-		}
-	} else
-		down_write(&mm->mmap_sem);
-
-	current->mm->pinned_vm -= diff;
-	up_write(&mm->mmap_sem);
-	mmput(mm);
+	atomic_long_sub(&mm->pinned_vm, diff);
 	kfree(umem);
 }
 EXPORT_SYMBOL(ib_umem_release);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index f2b81b9ee0d6..16381c6eae77 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -332,7 +332,6 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 	INIT_LIST_HEAD(&ucontext->ah_list);
 	INIT_LIST_HEAD(&ucontext->xrcd_list);
 	INIT_LIST_HEAD(&ucontext->rule_list);
-	ucontext->closing = 0;
 
 	resp.num_comp_vectors = file->device->num_comp_vectors;
 
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 75ad86c4abf8..8fdc9ca62c27 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -196,8 +196,6 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
 	if (!context)
 		return 0;
 
-	context->closing = 1;
-
 	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
 		struct ib_ah *ah = uobj->object;
 
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index 6d7f453b4d05..f219f15ad1cf 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -2025,7 +2025,7 @@ static void unlock_expected_tids(struct ipath_portdata *pd)
 		dd->ipath_pageshadow[i] = NULL;
 		pci_unmap_page(dd->pcidev, dd->ipath_physshadow[i],
 			PAGE_SIZE, PCI_DMA_FROMDEVICE);
-		ipath_release_user_pages_on_close(&ps, 1);
+		ipath_release_user_pages(&ps, 1);
 		cnt++;
 		ipath_stats.sps_pageunlocks++;
 	}
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 6559af60bffd..afc2f868541b 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -1083,7 +1083,6 @@ static inline void ipath_sdma_desc_unreserve(struct ipath_devdata *dd, u16 cnt)
 
 int ipath_get_user_pages(unsigned long, size_t, struct page **);
 void ipath_release_user_pages(struct page **, size_t);
-void ipath_release_user_pages_on_close(struct page **, size_t);
 int ipath_eeprom_read(struct ipath_devdata *, u8, void *, int);
 int ipath_eeprom_write(struct ipath_devdata *, u8, const void *, int);
 int ipath_tempsense_read(struct ipath_devdata *, u8 regnum);
diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c
index a89af9654112..8081e76fa72c 100644
--- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
+++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
@@ -68,26 +68,22 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
 			 struct page **p)
 {
 	unsigned long lock_limit;
+	struct mm_struct *mm = current->mm;
 	size_t got;
 	int ret;
 
-	down_write(&current->mm->mmap_sem);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-	if (current->mm->pinned_vm + num_pages > lock_limit && 
+	if (atomic_long_add_return(&mm->pinned_vm, num_pages) > lock_limit &&
 	    !capable(CAP_IPC_LOCK)) {
-		up_write(&current->mm->mmap_sem);
 		ret = -ENOMEM;
-		goto bail;
+		goto bail_sub;
 	}
-	current->mm->pinned_vm += num_pages;
-	up_write(&current->mm->mmap_sem);
 
 	ipath_cdbg(VERBOSE, "pin %lx pages from vaddr %lx\n",
 		   (unsigned long) num_pages, start_page);
 
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages_unlocked(current, current->mm,
+		ret = get_user_pages_unlocked(current, mm,
 					      start_page + got * PAGE_SIZE,
 					      num_pages - got, 1, 1,
 					      p + got);
@@ -101,9 +97,8 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
 
 bail_release:
 	__ipath_release_user_pages(p, got, 0);
-	down_write(&current->mm->mmap_sem);
-	current->mm->pinned_vm -= num_pages;
-	up_write(&current->mm->mmap_sem);
+bail_sub:
+	atomic_long_sub(&mm->pinned_vm, num_pages);
 bail:
 	return ret;
 }
@@ -166,56 +161,7 @@ dma_addr_t ipath_map_single(struct pci_dev *hwdev, void *ptr, size_t size,
 
 void ipath_release_user_pages(struct page **p, size_t num_pages)
 {
-	down_write(&current->mm->mmap_sem);
-
-	__ipath_release_user_pages(p, num_pages, 1);
-
-	current->mm->pinned_vm -= num_pages;
-
-	up_write(&current->mm->mmap_sem);
-}
-
-struct ipath_user_pages_work {
-	struct work_struct work;
-	struct mm_struct *mm;
-	unsigned long num_pages;
-};
-
-static void user_pages_account(struct work_struct *_work)
-{
-	struct ipath_user_pages_work *work =
-		container_of(_work, struct ipath_user_pages_work, work);
-
-	down_write(&work->mm->mmap_sem);
-	work->mm->pinned_vm -= work->num_pages;
-	up_write(&work->mm->mmap_sem);
-	mmput(work->mm);
-	kfree(work);
-}
-
-void ipath_release_user_pages_on_close(struct page **p, size_t num_pages)
-{
-	struct ipath_user_pages_work *work;
-	struct mm_struct *mm;
-
 	__ipath_release_user_pages(p, num_pages, 1);
-
-	mm = get_task_mm(current);
-	if (!mm)
-		return;
-
-	work = kmalloc(sizeof(*work), GFP_KERNEL);
-	if (!work)
-		goto bail_mm;
-
-	INIT_WORK(&work->work, user_pages_account);
-	work->mm = mm;
-	work->num_pages = num_pages;
-
-	queue_work(ib_wq, &work->work);
-	return;
-
-bail_mm:
-	mmput(mm);
-	return;
+	if (current->mm)
+		atomic_long_sub(&current->mm->pinned_vm, num_pages);
 }
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 57ce83c2d1d9..049ab8db5f32 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -68,16 +68,13 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 	int ret;
 	struct mm_struct *mm = current->mm;
 
-	down_write(&mm->mmap_sem);
 	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-	if (mm->pinned_vm + num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
-		up_write(&mm->mmap_sem);
+	if (atomic_long_add_return(&mm->pinned_vm, num_pages) > lock_limit &&
+	    !capable(CAP_IPC_LOCK)) {
 		ret = -ENOMEM;
 		goto bail;
 	}
-	mm->pinned_vm += num_pages;
-	up_write(&mm->mmap_sem);
 
 	for (got = 0; got < num_pages; got += ret) {
 		ret = get_user_pages_unlocked(current, mm,
@@ -94,9 +91,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 
 bail_release:
 	__qib_release_user_pages(p, got, 0);
-	down_write(&mm->mmap_sem);
-	mm->pinned_vm -= num_pages;
-	up_write(&mm->mmap_sem);
+	atomic_long_sub(&mm->pinned_vm, num_pages);
 bail:
 	return ret;
 }
@@ -135,13 +130,8 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page,
 
 void qib_release_user_pages(struct page **p, size_t num_pages)
 {
-	if (current->mm) /* during close after signal, mm can be NULL */
-		down_write(&current->mm->mmap_sem);
-
 	__qib_release_user_pages(p, num_pages, 1);
 
-	if (current->mm) {
-		current->mm->pinned_vm -= num_pages;
-		up_write(&current->mm->mmap_sem);
-	}
+	if (current->mm)
+		atomic_long_sub(&current->mm->pinned_vm, num_pages);
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7366e9d63cee..9123bfef1dea 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -57,7 +57,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		hiwater_vm << (PAGE_SHIFT-10),
 		total_vm << (PAGE_SHIFT-10),
 		mm->locked_vm << (PAGE_SHIFT-10),
-		mm->pinned_vm << (PAGE_SHIFT-10),
+		atomic_long_read(&mm->pinned_vm) << (PAGE_SHIFT-10),
 		hiwater_rss << (PAGE_SHIFT-10),
 		total_rss << (PAGE_SHIFT-10),
 		data << (PAGE_SHIFT-10),
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d9851eeb6e1d..f2bf72a86b70 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -355,7 +355,7 @@ struct mm_struct {
 
 	unsigned long total_vm;		/* Total pages mapped */
 	unsigned long locked_vm;	/* Pages that have PG_mlocked set */
-	unsigned long pinned_vm;	/* Refcount permanently increased */
+	atomic_long_t pinned_vm;	/* Refcount permanently increased */
 	unsigned long shared_vm;	/* Shared pages (files) */
 	unsigned long exec_vm;		/* VM_EXEC & ~VM_WRITE */
 	unsigned long stack_vm;		/* VM_GROWSUP/DOWN */
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 9ee0d2e51b16..2dbbd2c56074 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -47,9 +47,6 @@ struct ib_umem {
 	int                     writable;
 	int                     hugetlb;
 	struct list_head	chunk_list;
-	struct work_struct	work;
-	struct mm_struct       *mm;
-	unsigned long		diff;
 };
 
 struct ib_umem_chunk {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e393171e2fac..bce6d2b91ec7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -961,7 +961,6 @@ struct ib_ucontext {
 	struct list_head	ah_list;
 	struct list_head	xrcd_list;
 	struct list_head	rule_list;
-	int			closing;
 };
 
 struct ib_uobject {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dd236b66ca3a..80d2ba7bd51c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3922,7 +3922,7 @@ again:
 	 */
 
 	atomic_long_sub((size >> PAGE_SHIFT) + 1, &mmap_user->locked_vm);
-	vma->vm_mm->pinned_vm -= mmap_locked;
+	atomic_long_sub(&vma->vm_mm->pinned_vm, mmap_locked);
 	free_uid(mmap_user);
 
 	ring_buffer_put(rb); /* could be last */
@@ -3944,7 +3944,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 	struct ring_buffer *rb;
 	unsigned long vma_size;
 	unsigned long nr_pages;
-	long user_extra, extra;
+	long user_extra, extra = 0;
 	int ret = 0, flags = 0;
 
 	/*
@@ -4006,16 +4006,14 @@ again:
 
 	user_locked = atomic_long_read(&user->locked_vm) + user_extra;
 
-	extra = 0;
 	if (user_locked > user_lock_limit)
 		extra = user_locked - user_lock_limit;
 
 	lock_limit = rlimit(RLIMIT_MEMLOCK);
 	lock_limit >>= PAGE_SHIFT;
-	locked = vma->vm_mm->pinned_vm + extra;
 
-	if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() &&
-		!capable(CAP_IPC_LOCK)) {
+	if (atomic_long_add_return(&vma->vm_mm->pinned_vm, extra) > lock_limit &&
+	    perf_paranoid_tracepoint_raw() && !capable(CAP_IPC_LOCK)) {
 		ret = -EPERM;
 		goto unlock;
 	}
@@ -4039,7 +4037,6 @@ again:
 	rb->mmap_user = get_current_user();
 
 	atomic_long_add(user_extra, &user->locked_vm);
-	vma->vm_mm->pinned_vm += extra;
 
 	ring_buffer_attach(event, rb);
 	rcu_assign_pointer(event->rb, rb);
@@ -4049,6 +4046,8 @@ again:
 unlock:
 	if (!ret)
 		atomic_inc(&event->mmap_count);
+	else if (extra)
+		atomic_long_sub(&vma->vm_mm->pinned_vm, extra);
 	mutex_unlock(&event->mmap_mutex);
 
 	/*
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked()
  2013-10-08 19:06           ` Jan Kara
@ 2013-10-16 21:39             ` Jan Kara
  0 siblings, 0 replies; 55+ messages in thread
From: Jan Kara @ 2013-10-16 21:39 UTC (permalink / raw)
  To: Marciniszyn, Mike
  Cc: Jan Kara, LKML, linux-mm, infinipath, Roland Dreier, linux-rdma

On Tue 08-10-13 21:06:04, Jan Kara wrote:
> On Mon 07-10-13 19:26:04, Jan Kara wrote:
> > On Mon 07-10-13 15:38:24, Marciniszyn, Mike wrote:
> > > > > This patch and the sibling ipath patch will nominally take the mmap_sem
> > > > > twice where the old routine only took it once.   This is a performance
> > > > > issue.
> > > >   It will take mmap_sem only once during normal operation. Only if
> > > > get_user_pages_unlocked() fail, we have to take mmap_sem again to undo
> > > > the change of mm->pinned_vm.
> > > > 
> > > > > Is the intent here to deprecate get_user_pages()?
> > > 
> > > The old code looked like:
> > > __qib_get_user_pages()
> > > 	(broken) ulimit test
> > >              for (...)
> > > 		get_user_pages()
> > > 
> > > qib_get_user_pages()
> > > 	mmap_sem lock
> > > 	__qib_get_user_pages()
> > >              mmap_sem() unlock
> > > 
> > > The new code is:
> > > 
> > > get_user_pages_unlocked()
> > > 	mmap_sem  lock
> > > 	get_user_pages()
> > > 	mmap_sem unlock
> > > 
> > > qib_get_user_pages()
> > > 	mmap_sem lock
> > >              ulimit test and locked pages maintenance
> > >              mmap_sem unlock
> > > 	for (...)
> > > 		get_user_pages_unlocked()
> > > 
> > > I count an additional pair of mmap_sem transactions in the normal case.
> >   Ah, sorry, you are right.
> > 
> > > > > Could the lock limit test be pushed into another version of the
> > > > > wrapper so that there is only one set of mmap_sem transactions?
> > > >   I'm sorry, I don't understand what you mean here...
> > > > 
> > > 
> > > This is what I had in mind:
> > > 
> > > get_user_pages_ulimit_unlocked()
> > > 	mmap_sem  lock
> > > 	ulimit test and locked pages maintenance (from qib/ipath)
> > >              for (...)
> > > 	       get_user_pages_unlocked()	
> > > 	mmap_sem unlock
> > > 	
> > > qib_get_user_pages()
> > > 	get_user_pages_ulimit_unlocked()
> > > 
> > > This really pushes the code into a new wrapper common to ipath/qib and
> > > any others that might want to combine locking with ulimit enforcement.
> >   We could do that but frankly, I'd rather change ulimit enforcement to not
> > require mmap_sem and use atomic counter instead. I'll see what I can do.
>   OK, so something like the attached patch (compile tested only). What do
> you think? I'm just not 100% sure removing mmap_sem surrounding stuff like
> __ipath_release_user_pages() is safe. I don't see a reason why it shouldn't
> be - we have references to the pages and we only mark them dirty and put the
> reference - but maybe I miss something subtle...
  Ping? Any opinion on this?

								Honza

> From 44fe90e8303b293370f077ae665d6e43846cf277 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Tue, 8 Oct 2013 14:16:24 +0200
> Subject: [PATCH] mm: Switch mm->pinned_vm to atomic_long_t
> 
> Currently updates to mm->pinned_vm were protected by mmap_sem.
> kernel/events/core.c actually held it only for reading so that may have
> been racy. The only other user - Infiniband - used mmap_sem for writing
> but it caused quite some complications to it.
> 
> So switch mm->pinned_vm and convert all the places using it. This allows
> quite some simplifications to Infiniband code and it now doesn't have to
> care about mmap_sem at all.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  drivers/infiniband/core/umem.c                 | 56 +++------------------
>  drivers/infiniband/core/uverbs_cmd.c           |  1 -
>  drivers/infiniband/core/uverbs_main.c          |  2 -
>  drivers/infiniband/hw/ipath/ipath_file_ops.c   |  2 +-
>  drivers/infiniband/hw/ipath/ipath_kernel.h     |  1 -
>  drivers/infiniband/hw/ipath/ipath_user_pages.c | 70 +++-----------------------
>  drivers/infiniband/hw/qib/qib_user_pages.c     | 20 ++------
>  fs/proc/task_mmu.c                             |  2 +-
>  include/linux/mm_types.h                       |  2 +-
>  include/rdma/ib_umem.h                         |  3 --
>  include/rdma/ib_verbs.h                        |  1 -
>  kernel/events/core.c                           | 13 +++--
>  12 files changed, 29 insertions(+), 144 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 0640a89021a9..294d2e468177 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -80,6 +80,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  {
>  	struct ib_umem *umem;
>  	struct page **page_list;
> +	struct mm_struct *mm = current->mm;
>  	struct ib_umem_chunk *chunk;
>  	unsigned long locked;
>  	unsigned long lock_limit;
> @@ -126,20 +127,13 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
>  
>  	npages = PAGE_ALIGN(size + umem->offset) >> PAGE_SHIFT;
>  
> -	down_write(&current->mm->mmap_sem);
> -
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  	locked = npages;
> -	if (npages + current->mm->pinned_vm > lock_limit &&
> +	if (atomic_long_add_return(&mm->pinned_vm, npages) > lock_limit &&
>  	    !capable(CAP_IPC_LOCK)) {
> -		up_write(&current->mm->mmap_sem);
> -		kfree(umem);
> -		free_page((unsigned long) page_list);
> -		return ERR_PTR(-ENOMEM);
> +		ret = -ENOMEM;
> +		goto out;
>  	}
> -	current->mm->pinned_vm += npages;
> -
> -	up_write(&current->mm->mmap_sem);
>  
>  	cur_base = addr & PAGE_MASK;
>  
> @@ -201,9 +195,7 @@ out:
>  	if (ret < 0) {
>  		__ib_umem_release(context->device, umem, 0);
>  		kfree(umem);
> -		down_write(&current->mm->mmap_sem);
> -		current->mm->pinned_vm -= locked;
> -		up_write(&current->mm->mmap_sem);
> +		atomic_long_sub(&mm->pinned_vm, locked);
>  	}
>  	free_page((unsigned long) page_list);
>  
> @@ -211,17 +203,6 @@ out:
>  }
>  EXPORT_SYMBOL(ib_umem_get);
>  
> -static void ib_umem_account(struct work_struct *work)
> -{
> -	struct ib_umem *umem = container_of(work, struct ib_umem, work);
> -
> -	down_write(&umem->mm->mmap_sem);
> -	umem->mm->pinned_vm -= umem->diff;
> -	up_write(&umem->mm->mmap_sem);
> -	mmput(umem->mm);
> -	kfree(umem);
> -}
> -
>  /**
>   * ib_umem_release - release memory pinned with ib_umem_get
>   * @umem: umem struct to release
> @@ -234,37 +215,14 @@ void ib_umem_release(struct ib_umem *umem)
>  
>  	__ib_umem_release(umem->context->device, umem, 1);
>  
> -	mm = get_task_mm(current);
> +	mm = current->mm;
>  	if (!mm) {
>  		kfree(umem);
>  		return;
>  	}
>  
>  	diff = PAGE_ALIGN(umem->length + umem->offset) >> PAGE_SHIFT;
> -
> -	/*
> -	 * We may be called with the mm's mmap_sem already held.  This
> -	 * can happen when a userspace munmap() is the call that drops
> -	 * the last reference to our file and calls our release
> -	 * method.  If there are memory regions to destroy, we'll end
> -	 * up here and not be able to take the mmap_sem.  In that case
> -	 * we defer the vm_locked accounting to the system workqueue.
> -	 */
> -	if (context->closing) {
> -		if (!down_write_trylock(&mm->mmap_sem)) {
> -			INIT_WORK(&umem->work, ib_umem_account);
> -			umem->mm   = mm;
> -			umem->diff = diff;
> -
> -			queue_work(ib_wq, &umem->work);
> -			return;
> -		}
> -	} else
> -		down_write(&mm->mmap_sem);
> -
> -	current->mm->pinned_vm -= diff;
> -	up_write(&mm->mmap_sem);
> -	mmput(mm);
> +	atomic_long_sub(&mm->pinned_vm, diff);
>  	kfree(umem);
>  }
>  EXPORT_SYMBOL(ib_umem_release);
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index f2b81b9ee0d6..16381c6eae77 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -332,7 +332,6 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
>  	INIT_LIST_HEAD(&ucontext->ah_list);
>  	INIT_LIST_HEAD(&ucontext->xrcd_list);
>  	INIT_LIST_HEAD(&ucontext->rule_list);
> -	ucontext->closing = 0;
>  
>  	resp.num_comp_vectors = file->device->num_comp_vectors;
>  
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 75ad86c4abf8..8fdc9ca62c27 100644
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -196,8 +196,6 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
>  	if (!context)
>  		return 0;
>  
> -	context->closing = 1;
> -
>  	list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) {
>  		struct ib_ah *ah = uobj->object;
>  
> diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
> index 6d7f453b4d05..f219f15ad1cf 100644
> --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
> +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
> @@ -2025,7 +2025,7 @@ static void unlock_expected_tids(struct ipath_portdata *pd)
>  		dd->ipath_pageshadow[i] = NULL;
>  		pci_unmap_page(dd->pcidev, dd->ipath_physshadow[i],
>  			PAGE_SIZE, PCI_DMA_FROMDEVICE);
> -		ipath_release_user_pages_on_close(&ps, 1);
> +		ipath_release_user_pages(&ps, 1);
>  		cnt++;
>  		ipath_stats.sps_pageunlocks++;
>  	}
> diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
> index 6559af60bffd..afc2f868541b 100644
> --- a/drivers/infiniband/hw/ipath/ipath_kernel.h
> +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
> @@ -1083,7 +1083,6 @@ static inline void ipath_sdma_desc_unreserve(struct ipath_devdata *dd, u16 cnt)
>  
>  int ipath_get_user_pages(unsigned long, size_t, struct page **);
>  void ipath_release_user_pages(struct page **, size_t);
> -void ipath_release_user_pages_on_close(struct page **, size_t);
>  int ipath_eeprom_read(struct ipath_devdata *, u8, void *, int);
>  int ipath_eeprom_write(struct ipath_devdata *, u8, const void *, int);
>  int ipath_tempsense_read(struct ipath_devdata *, u8 regnum);
> diff --git a/drivers/infiniband/hw/ipath/ipath_user_pages.c b/drivers/infiniband/hw/ipath/ipath_user_pages.c
> index a89af9654112..8081e76fa72c 100644
> --- a/drivers/infiniband/hw/ipath/ipath_user_pages.c
> +++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c
> @@ -68,26 +68,22 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
>  			 struct page **p)
>  {
>  	unsigned long lock_limit;
> +	struct mm_struct *mm = current->mm;
>  	size_t got;
>  	int ret;
>  
> -	down_write(&current->mm->mmap_sem);
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -
> -	if (current->mm->pinned_vm + num_pages > lock_limit && 
> +	if (atomic_long_add_return(&mm->pinned_vm, num_pages) > lock_limit &&
>  	    !capable(CAP_IPC_LOCK)) {
> -		up_write(&current->mm->mmap_sem);
>  		ret = -ENOMEM;
> -		goto bail;
> +		goto bail_sub;
>  	}
> -	current->mm->pinned_vm += num_pages;
> -	up_write(&current->mm->mmap_sem);
>  
>  	ipath_cdbg(VERBOSE, "pin %lx pages from vaddr %lx\n",
>  		   (unsigned long) num_pages, start_page);
>  
>  	for (got = 0; got < num_pages; got += ret) {
> -		ret = get_user_pages_unlocked(current, current->mm,
> +		ret = get_user_pages_unlocked(current, mm,
>  					      start_page + got * PAGE_SIZE,
>  					      num_pages - got, 1, 1,
>  					      p + got);
> @@ -101,9 +97,8 @@ int ipath_get_user_pages(unsigned long start_page, size_t num_pages,
>  
>  bail_release:
>  	__ipath_release_user_pages(p, got, 0);
> -	down_write(&current->mm->mmap_sem);
> -	current->mm->pinned_vm -= num_pages;
> -	up_write(&current->mm->mmap_sem);
> +bail_sub:
> +	atomic_long_sub(&mm->pinned_vm, num_pages);
>  bail:
>  	return ret;
>  }
> @@ -166,56 +161,7 @@ dma_addr_t ipath_map_single(struct pci_dev *hwdev, void *ptr, size_t size,
>  
>  void ipath_release_user_pages(struct page **p, size_t num_pages)
>  {
> -	down_write(&current->mm->mmap_sem);
> -
> -	__ipath_release_user_pages(p, num_pages, 1);
> -
> -	current->mm->pinned_vm -= num_pages;
> -
> -	up_write(&current->mm->mmap_sem);
> -}
> -
> -struct ipath_user_pages_work {
> -	struct work_struct work;
> -	struct mm_struct *mm;
> -	unsigned long num_pages;
> -};
> -
> -static void user_pages_account(struct work_struct *_work)
> -{
> -	struct ipath_user_pages_work *work =
> -		container_of(_work, struct ipath_user_pages_work, work);
> -
> -	down_write(&work->mm->mmap_sem);
> -	work->mm->pinned_vm -= work->num_pages;
> -	up_write(&work->mm->mmap_sem);
> -	mmput(work->mm);
> -	kfree(work);
> -}
> -
> -void ipath_release_user_pages_on_close(struct page **p, size_t num_pages)
> -{
> -	struct ipath_user_pages_work *work;
> -	struct mm_struct *mm;
> -
>  	__ipath_release_user_pages(p, num_pages, 1);
> -
> -	mm = get_task_mm(current);
> -	if (!mm)
> -		return;
> -
> -	work = kmalloc(sizeof(*work), GFP_KERNEL);
> -	if (!work)
> -		goto bail_mm;
> -
> -	INIT_WORK(&work->work, user_pages_account);
> -	work->mm = mm;
> -	work->num_pages = num_pages;
> -
> -	queue_work(ib_wq, &work->work);
> -	return;
> -
> -bail_mm:
> -	mmput(mm);
> -	return;
> +	if (current->mm)
> +		atomic_long_sub(&current->mm->pinned_vm, num_pages);
>  }
> diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
> index 57ce83c2d1d9..049ab8db5f32 100644
> --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> @@ -68,16 +68,13 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
>  	int ret;
>  	struct mm_struct *mm = current->mm;
>  
> -	down_write(&mm->mmap_sem);
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>  
> -	if (mm->pinned_vm + num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> -		up_write(&mm->mmap_sem);
> +	if (atomic_long_add_return(&mm->pinned_vm, num_pages) > lock_limit &&
> +	    !capable(CAP_IPC_LOCK)) {
>  		ret = -ENOMEM;
>  		goto bail;
>  	}
> -	mm->pinned_vm += num_pages;
> -	up_write(&mm->mmap_sem);
>  
>  	for (got = 0; got < num_pages; got += ret) {
>  		ret = get_user_pages_unlocked(current, mm,
> @@ -94,9 +91,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
>  
>  bail_release:
>  	__qib_release_user_pages(p, got, 0);
> -	down_write(&mm->mmap_sem);
> -	mm->pinned_vm -= num_pages;
> -	up_write(&mm->mmap_sem);
> +	atomic_long_sub(&mm->pinned_vm, num_pages);
>  bail:
>  	return ret;
>  }
> @@ -135,13 +130,8 @@ dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page,
>  
>  void qib_release_user_pages(struct page **p, size_t num_pages)
>  {
> -	if (current->mm) /* during close after signal, mm can be NULL */
> -		down_write(&current->mm->mmap_sem);
> -
>  	__qib_release_user_pages(p, num_pages, 1);
>  
> -	if (current->mm) {
> -		current->mm->pinned_vm -= num_pages;
> -		up_write(&current->mm->mmap_sem);
> -	}
> +	if (current->mm)
> +		atomic_long_sub(&current->mm->pinned_vm, num_pages);
>  }
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 7366e9d63cee..9123bfef1dea 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -57,7 +57,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  		hiwater_vm << (PAGE_SHIFT-10),
>  		total_vm << (PAGE_SHIFT-10),
>  		mm->locked_vm << (PAGE_SHIFT-10),
> -		mm->pinned_vm << (PAGE_SHIFT-10),
> +		atomic_long_read(&mm->pinned_vm) << (PAGE_SHIFT-10),
>  		hiwater_rss << (PAGE_SHIFT-10),
>  		total_rss << (PAGE_SHIFT-10),
>  		data << (PAGE_SHIFT-10),
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index d9851eeb6e1d..f2bf72a86b70 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -355,7 +355,7 @@ struct mm_struct {
>  
>  	unsigned long total_vm;		/* Total pages mapped */
>  	unsigned long locked_vm;	/* Pages that have PG_mlocked set */
> -	unsigned long pinned_vm;	/* Refcount permanently increased */
> +	atomic_long_t pinned_vm;	/* Refcount permanently increased */
>  	unsigned long shared_vm;	/* Shared pages (files) */
>  	unsigned long exec_vm;		/* VM_EXEC & ~VM_WRITE */
>  	unsigned long stack_vm;		/* VM_GROWSUP/DOWN */
> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> index 9ee0d2e51b16..2dbbd2c56074 100644
> --- a/include/rdma/ib_umem.h
> +++ b/include/rdma/ib_umem.h
> @@ -47,9 +47,6 @@ struct ib_umem {
>  	int                     writable;
>  	int                     hugetlb;
>  	struct list_head	chunk_list;
> -	struct work_struct	work;
> -	struct mm_struct       *mm;
> -	unsigned long		diff;
>  };
>  
>  struct ib_umem_chunk {
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index e393171e2fac..bce6d2b91ec7 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -961,7 +961,6 @@ struct ib_ucontext {
>  	struct list_head	ah_list;
>  	struct list_head	xrcd_list;
>  	struct list_head	rule_list;
> -	int			closing;
>  };
>  
>  struct ib_uobject {
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dd236b66ca3a..80d2ba7bd51c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3922,7 +3922,7 @@ again:
>  	 */
>  
>  	atomic_long_sub((size >> PAGE_SHIFT) + 1, &mmap_user->locked_vm);
> -	vma->vm_mm->pinned_vm -= mmap_locked;
> +	atomic_long_sub(&vma->vm_mm->pinned_vm, mmap_locked);
>  	free_uid(mmap_user);
>  
>  	ring_buffer_put(rb); /* could be last */
> @@ -3944,7 +3944,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
>  	struct ring_buffer *rb;
>  	unsigned long vma_size;
>  	unsigned long nr_pages;
> -	long user_extra, extra;
> +	long user_extra, extra = 0;
>  	int ret = 0, flags = 0;
>  
>  	/*
> @@ -4006,16 +4006,14 @@ again:
>  
>  	user_locked = atomic_long_read(&user->locked_vm) + user_extra;
>  
> -	extra = 0;
>  	if (user_locked > user_lock_limit)
>  		extra = user_locked - user_lock_limit;
>  
>  	lock_limit = rlimit(RLIMIT_MEMLOCK);
>  	lock_limit >>= PAGE_SHIFT;
> -	locked = vma->vm_mm->pinned_vm + extra;
>  
> -	if ((locked > lock_limit) && perf_paranoid_tracepoint_raw() &&
> -		!capable(CAP_IPC_LOCK)) {
> +	if (atomic_long_add_return(&vma->vm_mm->pinned_vm, extra) > lock_limit &&
> +	    perf_paranoid_tracepoint_raw() && !capable(CAP_IPC_LOCK)) {
>  		ret = -EPERM;
>  		goto unlock;
>  	}
> @@ -4039,7 +4037,6 @@ again:
>  	rb->mmap_user = get_current_user();
>  
>  	atomic_long_add(user_extra, &user->locked_vm);
> -	vma->vm_mm->pinned_vm += extra;
>  
>  	ring_buffer_attach(event, rb);
>  	rcu_assign_pointer(event->rb, rb);
> @@ -4049,6 +4046,8 @@ again:
>  unlock:
>  	if (!ret)
>  		atomic_inc(&event->mmap_count);
> +	else if (extra)
> +		atomic_long_sub(&vma->vm_mm->pinned_vm, extra);
>  	mutex_unlock(&event->mmap_mutex);
>  
>  	/*
> -- 
> 1.8.1.4
> 

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2013-10-16 21:40 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1380724087-13927-1-git-send-email-jack@suse.cz>
2013-10-02 14:27 ` [PATCH 01/26] cris: Convert cryptocop to use get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` [PATCH 02/26] ia64: Use get_user_pages_fast() in err_inject.c Jan Kara
2013-10-02 14:27 ` [PATCH 03/26] dma: Use get_user_pages_fast() in dma_pin_iovec_pages() Jan Kara
2013-10-02 14:27 ` [PATCH 04/26] drm: Convert via driver to use get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` [PATCH 05/26] omap3isp: Make isp_video_buffer_prepare_user() " Jan Kara
2013-10-02 19:41   ` Laurent Pinchart
2013-10-02 20:18     ` Jan Kara
2013-10-02 14:27 ` [PATCH 06/26] vmw_vmci: Convert driver to " Jan Kara
2013-10-02 14:27 ` [PATCH 07/26] st: Convert sgl_map_user_pages() " Jan Kara
2013-10-02 14:27 ` [PATCH 08/26] ced1401: Convert driver " Jan Kara
2013-10-02 14:27 ` [PATCH 09/26] crystalhd: Convert crystalhd_map_dio() " Jan Kara
2013-10-02 14:27 ` [PATCH 10/26] lustre: Convert ll_get_user_pages() " Jan Kara
2013-10-05  6:27   ` Dilger, Andreas
2013-10-02 14:27 ` [PATCH 11/26] sep: Convert sep_lock_user_pages() to get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` [PATCH 12/26] pvr2fb: Convert pvr2fb_write() to use get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` [PATCH 13/26] fsl_hypervisor: Convert ioctl_memcpy() " Jan Kara
2013-10-04  2:38   ` Timur Tabi
2013-10-02 14:27 ` [PATCH 14/26] nfs: Convert direct IO " Jan Kara
2013-10-02 14:27 ` [PATCH 15/26] ceph: Convert ceph_get_direct_page_vector() to get_user_pages_fast() Jan Kara
2013-10-02 14:27 ` [PATCH 16/26] mm: Provide get_user_pages_unlocked() Jan Kara
2013-10-02 16:25   ` Christoph Hellwig
2013-10-02 16:28   ` KOSAKI Motohiro
2013-10-02 19:39     ` Jan Kara
2013-10-02 14:27 ` [PATCH 17/26] kvm: Use get_user_pages_unlocked() in async_pf_execute() Jan Kara
2013-10-02 14:59   ` Gleb Natapov
2013-10-02 14:27 ` [PATCH 18/26] mm: Convert process_vm_rw_pages() to use get_user_pages_unlocked() Jan Kara
2013-10-02 16:32   ` KOSAKI Motohiro
2013-10-02 19:36     ` Jan Kara
2013-10-03 22:40       ` KOSAKI Motohiro
2013-10-07 20:55         ` Jan Kara
2013-10-08  0:10           ` KOSAKI Motohiro
2013-10-02 14:28 ` [PATCH 19/26] ivtv: Convert driver " Jan Kara
2013-10-05 12:02   ` Andy Walls
2013-10-07 17:22     ` Jan Kara
2013-10-02 14:28 ` [PATCH 20/26] ib: Convert ib_umem_get() to get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` [PATCH 21/26] ib: Convert ipath_get_user_pages() " Jan Kara
2013-10-02 14:28 ` [PATCH 22/26] ib: Convert ipath_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` [PATCH 23/26] ib: Convert qib_get_user_pages() to get_user_pages_unlocked() Jan Kara
2013-10-02 14:54   ` Marciniszyn, Mike
2013-10-02 15:28     ` Jan Kara
2013-10-02 15:32       ` Marciniszyn, Mike
2013-10-02 15:38         ` Jan Kara
2013-10-04 13:39           ` Marciniszyn, Mike
2013-10-04 13:46             ` Marciniszyn, Mike
2013-10-04 13:44         ` Marciniszyn, Mike
2013-10-04 13:52   ` Marciniszyn, Mike
2013-10-04 18:33     ` Jan Kara
2013-10-07 15:20       ` Marciniszyn, Mike
2013-10-07 15:38       ` Marciniszyn, Mike
2013-10-07 17:26         ` Jan Kara
2013-10-08 19:06           ` Jan Kara
2013-10-16 21:39             ` Jan Kara
2013-10-02 14:28 ` [PATCH 24/26] ib: Convert qib_user_sdma_pin_pages() to use get_user_pages_unlocked() Jan Kara
2013-10-02 14:28 ` [PATCH 25/26] ib: Convert mthca_map_user_db() to use get_user_pages_fast() Jan Kara
2013-10-02 14:28 ` [PATCH 26/26] aio: Remove useless get_user_pages() call Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).