All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/7] preparation for block layer simplification
@ 2015-01-12 11:43 Dongsu Park
  2015-01-12 11:43 ` [PATCH v2 1/7] block: replace sg_iovec with iov_iter Dongsu Park
  0 siblings, 1 reply; 15+ messages in thread
From: Dongsu Park @ 2015-01-12 11:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Christoph Hellwig, Kent Overstreet, Jens Axboe, Dongsu Park

This is a preparation series for simplifying block layer based on immutable
biovecs, a spin off of the v1 of simplifying patchset. [1] The original
goal of simplifying block layer was actually making generic_make_request()
accept arbitrarily sized bios, and pushing the splitting down to the
underlying drivers.

This patchset aims at cleaning up several parts that are independent of
core changes in the block layer. Doing that, it could be possible to
change block layer relatively easily without keeping up with many patches.

- Patches 01-04/07 do refactoring to make the block layer use the new
  iov_iter interface.
- Patch 05/07 does refactoring and cleanup in MD-RAID10.
- Patch 06/07 changes a way of refcounting in fs/buffer.c, to be consistent
  with immutable biovecs API.
- Patch 07/07 makes bio submission in kernel/power in a sane way.

Patches are against 3.19-rc4. These are also available in my git repo at:

  https://github.com/dongsupark/linux.git block-prep-simplify

This patchset should be first applied prior to upcoming patchsets such as
"simplify block layer based on immutable biovecs." This patchset itself
should not bring any regression to end-users.

Comments are welcome.
Dongsu

Changes in v2:
- split up preparation patches from v1 into this separate series.
- In the patch 02, pass iov_iter by value to __bio_copy_iov(), and split
  into read/write variants, as suggested by Christoph Hellwig.
- minor changes like writing more commit messages etc.

[1] https://lkml.org/lkml/2014/12/22/128


Dongsu Park (1):
  block: rewrite __bio_copy_iov()

Kent Overstreet (6):
  block: replace sg_iovec with iov_iter
  block: refactor iov_count_pages() from bio_{copy,map}_user_iov()
  block: refactor bio_get_user_pages() from __bio_map_user_iov()
  md/raid10: make sync_request_write() call bio_copy_data()
  fs: make _submit_bh consistent with generic bio chaining
  PM: submit bio in a sane way in cases without bio_chain

 block/bio.c             | 330 +++++++++++++++++++++++++-----------------------
 block/blk-map.c         |  27 ++--
 block/scsi_ioctl.c      |  19 +--
 drivers/md/raid10.c     |  20 +--
 drivers/scsi/sg.c       |  15 +--
 fs/buffer.c             |   4 +-
 include/linux/bio.h     |  10 +-
 include/linux/blkdev.h  |   4 +-
 include/linux/uio.h     |   2 +
 kernel/power/block_io.c |  23 +++-
 lib/iovec.c             |  30 +++++
 11 files changed, 263 insertions(+), 221 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/7] block: replace sg_iovec with iov_iter
  2015-01-12 11:43 [RFC PATCH v2 0/7] preparation for block layer simplification Dongsu Park
@ 2015-01-12 11:43 ` Dongsu Park
  2015-01-12 11:43   ` [PATCH v2 2/7] block: rewrite __bio_copy_iov() Dongsu Park
  0 siblings, 1 reply; 15+ messages in thread
From: Dongsu Park @ 2015-01-12 11:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Hellwig, Kent Overstreet, Jens Axboe, Doug Gilbert,
	James E.J. Bottomley, Dongsu Park

From: Kent Overstreet <kmo@daterainc.com>

Make use of a new interface provided by iov_iter, backed by
scatter-gather list of iovec, instead of the old interface based on
sg_iovec. Also use iov_iter_advance() instead of manual iteration.

This commit should contain only literal replacements, without
functional changes.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
---
 block/bio.c            | 114 +++++++++++++++++++++++++------------------------
 block/blk-map.c        |  27 ++++++------
 block/scsi_ioctl.c     |  19 +++------
 drivers/scsi/sg.c      |  15 +++----
 include/linux/bio.h    |   8 ++--
 include/linux/blkdev.h |   4 +-
 6 files changed, 90 insertions(+), 97 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 471d738..8267676 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1022,18 +1022,17 @@ void bio_copy_data(struct bio *dst, struct bio *src)
 EXPORT_SYMBOL(bio_copy_data);
 
 struct bio_map_data {
-	int nr_sgvecs;
 	int is_our_pages;
-	struct sg_iovec sgvecs[];
+	struct iov_iter iter;
+	struct iovec sgvecs[];
 };
 
 static void bio_set_map_data(struct bio_map_data *bmd, struct bio *bio,
-			     const struct sg_iovec *iov, int iov_count,
-			     int is_our_pages)
+			     const struct iov_iter *iter, int is_our_pages)
 {
-	memcpy(bmd->sgvecs, iov, sizeof(struct sg_iovec) * iov_count);
-	bmd->nr_sgvecs = iov_count;
 	bmd->is_our_pages = is_our_pages;
+	bmd->iter = *iter;
+	memcpy(bmd->sgvecs, iter->iov, sizeof(struct iovec) * iter->nr_segs);
 	bio->bi_private = bmd;
 }
 
@@ -1047,33 +1046,30 @@ static struct bio_map_data *bio_alloc_map_data(unsigned int iov_count,
 		       sizeof(struct sg_iovec) * iov_count, gfp_mask);
 }
 
-static int __bio_copy_iov(struct bio *bio, const struct sg_iovec *iov, int iov_count,
+static int __bio_copy_iov(struct bio *bio, const struct iov_iter *iter,
 			  int to_user, int from_user, int do_free_page)
 {
 	int ret = 0, i;
 	struct bio_vec *bvec;
-	int iov_idx = 0;
-	unsigned int iov_off = 0;
+	struct iov_iter iov_iter = *iter;
 
 	bio_for_each_segment_all(bvec, bio, i) {
 		char *bv_addr = page_address(bvec->bv_page);
 		unsigned int bv_len = bvec->bv_len;
 
-		while (bv_len && iov_idx < iov_count) {
-			unsigned int bytes;
-			char __user *iov_addr;
-
-			bytes = min_t(unsigned int,
-				      iov[iov_idx].iov_len - iov_off, bv_len);
-			iov_addr = iov[iov_idx].iov_base + iov_off;
+		while (bv_len && iov_iter.count) {
+			struct iovec iov = iov_iter_iovec(&iov_iter);
+			unsigned int bytes = min_t(unsigned int, bv_len,
+						   iov.iov_len);
 
 			if (!ret) {
 				if (to_user)
-					ret = copy_to_user(iov_addr, bv_addr,
-							   bytes);
+					ret = copy_to_user(iov.iov_base,
+							   bv_addr, bytes);
 
 				if (from_user)
-					ret = copy_from_user(bv_addr, iov_addr,
+					ret = copy_from_user(bv_addr,
+							     iov.iov_base,
 							     bytes);
 
 				if (ret)
@@ -1082,13 +1078,7 @@ static int __bio_copy_iov(struct bio *bio, const struct sg_iovec *iov, int iov_c
 
 			bv_len -= bytes;
 			bv_addr += bytes;
-			iov_addr += bytes;
-			iov_off += bytes;
-
-			if (iov[iov_idx].iov_len == iov_off) {
-				iov_idx++;
-				iov_off = 0;
-			}
+			iov_iter_advance(&iov_iter, bytes);
 		}
 
 		if (do_free_page)
@@ -1117,7 +1107,7 @@ int bio_uncopy_user(struct bio *bio)
 		 * don't copy into a random user address space, just free.
 		 */
 		if (current->mm)
-			ret = __bio_copy_iov(bio, bmd->sgvecs, bmd->nr_sgvecs,
+			ret = __bio_copy_iov(bio, &bmd->iter,
 					     bio_data_dir(bio) == READ,
 					     0, bmd->is_our_pages);
 		else if (bmd->is_our_pages)
@@ -1145,7 +1135,7 @@ EXPORT_SYMBOL(bio_uncopy_user);
  */
 struct bio *bio_copy_user_iov(struct request_queue *q,
 			      struct rq_map_data *map_data,
-			      const struct sg_iovec *iov, int iov_count,
+			      const struct iov_iter *iter,
 			      int write_to_vm, gfp_t gfp_mask)
 {
 	struct bio_map_data *bmd;
@@ -1154,16 +1144,17 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 	struct bio *bio;
 	int i, ret;
 	int nr_pages = 0;
-	unsigned int len = 0;
+	unsigned int len;
 	unsigned int offset = map_data ? map_data->offset & ~PAGE_MASK : 0;
 
-	for (i = 0; i < iov_count; i++) {
+	for (i = 0; i < iter->nr_segs; i++) {
 		unsigned long uaddr;
 		unsigned long end;
 		unsigned long start;
 
-		uaddr = (unsigned long)iov[i].iov_base;
-		end = (uaddr + iov[i].iov_len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		uaddr = (unsigned long) iter->iov[i].iov_base;
+		end = (uaddr + iter->iov[i].iov_len + PAGE_SIZE - 1)
+			>> PAGE_SHIFT;
 		start = uaddr >> PAGE_SHIFT;
 
 		/*
@@ -1173,13 +1164,12 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 			return ERR_PTR(-EINVAL);
 
 		nr_pages += end - start;
-		len += iov[i].iov_len;
 	}
 
 	if (offset)
 		nr_pages++;
 
-	bmd = bio_alloc_map_data(iov_count, gfp_mask);
+	bmd = bio_alloc_map_data(iter->nr_segs, gfp_mask);
 	if (!bmd)
 		return ERR_PTR(-ENOMEM);
 
@@ -1196,7 +1186,12 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 	if (map_data) {
 		nr_pages = 1 << map_data->page_order;
 		i = map_data->offset / PAGE_SIZE;
+	} else {
+		i = 0;
 	}
+
+	len = iter->count;
+
 	while (len) {
 		unsigned int bytes = PAGE_SIZE;
 
@@ -1238,12 +1233,12 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 	 */
 	if ((!write_to_vm && (!map_data || !map_data->null_mapped)) ||
 	    (map_data && map_data->from_user)) {
-		ret = __bio_copy_iov(bio, iov, iov_count, 0, 1, 0);
+		ret = __bio_copy_iov(bio, iter, 0, 1, 0);
 		if (ret)
 			goto cleanup;
 	}
 
-	bio_set_map_data(bmd, bio, iov, iov_count, map_data ? 0 : 1);
+	bio_set_map_data(bmd, bio, iter, map_data ? 0 : 1);
 	return bio;
 cleanup:
 	if (!map_data)
@@ -1273,30 +1268,35 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
 			  unsigned long uaddr, unsigned int len,
 			  int write_to_vm, gfp_t gfp_mask)
 {
-	struct sg_iovec iov;
+	struct iovec iov;
+	struct iov_iter i;
 
-	iov.iov_base = (void __user *)uaddr;
+	iov.iov_base = (void __user *) uaddr;
 	iov.iov_len = len;
 
-	return bio_copy_user_iov(q, map_data, &iov, 1, write_to_vm, gfp_mask);
+	iov_iter_init(&i, write_to_vm ? WRITE : READ, &iov, 1, len);
+
+	return bio_copy_user_iov(q, map_data, &i, write_to_vm, gfp_mask);
 }
 EXPORT_SYMBOL(bio_copy_user);
 
 static struct bio *__bio_map_user_iov(struct request_queue *q,
 				      struct block_device *bdev,
-				      const struct sg_iovec *iov, int iov_count,
+				      const struct iov_iter *iter,
 				      int write_to_vm, gfp_t gfp_mask)
 {
-	int i, j;
+	int j;
 	int nr_pages = 0;
 	struct page **pages;
 	struct bio *bio;
 	int cur_page = 0;
 	int ret, offset;
+	struct iov_iter i;
+	struct iovec iov;
 
-	for (i = 0; i < iov_count; i++) {
-		unsigned long uaddr = (unsigned long)iov[i].iov_base;
-		unsigned long len = iov[i].iov_len;
+	iov_for_each(iov, i, *iter) {
+		unsigned long uaddr = (unsigned long) iov.iov_base;
+		unsigned long len = iov.iov_len;
 		unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
 		unsigned long start = uaddr >> PAGE_SHIFT;
 
@@ -1326,9 +1326,9 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	if (!pages)
 		goto out;
 
-	for (i = 0; i < iov_count; i++) {
-		unsigned long uaddr = (unsigned long)iov[i].iov_base;
-		unsigned long len = iov[i].iov_len;
+	iov_for_each(iov, i, *iter) {
+		unsigned long uaddr = (unsigned long) iov.iov_base;
+		unsigned long len = iov.iov_len;
 		unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
 		unsigned long start = uaddr >> PAGE_SHIFT;
 		const int local_nr_pages = end - start;
@@ -1383,10 +1383,10 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	return bio;
 
  out_unmap:
-	for (i = 0; i < nr_pages; i++) {
-		if(!pages[i])
+	for (j = 0; j < nr_pages; j++) {
+		if (!pages[j])
 			break;
-		page_cache_release(pages[i]);
+		page_cache_release(pages[j]);
 	}
  out:
 	kfree(pages);
@@ -1410,12 +1410,15 @@ struct bio *bio_map_user(struct request_queue *q, struct block_device *bdev,
 			 unsigned long uaddr, unsigned int len, int write_to_vm,
 			 gfp_t gfp_mask)
 {
-	struct sg_iovec iov;
+	struct iovec iov;
+	struct iov_iter i;
 
-	iov.iov_base = (void __user *)uaddr;
+	iov.iov_base = (void __user *) uaddr;
 	iov.iov_len = len;
 
-	return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm, gfp_mask);
+	iov_iter_init(&i, write_to_vm ? WRITE : READ, &iov, 1, len);
+
+	return bio_map_user_iov(q, bdev, &i, write_to_vm, gfp_mask);
 }
 EXPORT_SYMBOL(bio_map_user);
 
@@ -1432,13 +1435,12 @@ EXPORT_SYMBOL(bio_map_user);
  *	device. Returns an error pointer in case of error.
  */
 struct bio *bio_map_user_iov(struct request_queue *q, struct block_device *bdev,
-			     const struct sg_iovec *iov, int iov_count,
+			     const struct iov_iter *iter,
 			     int write_to_vm, gfp_t gfp_mask)
 {
 	struct bio *bio;
 
-	bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm,
-				 gfp_mask);
+	bio = __bio_map_user_iov(q, bdev, iter, write_to_vm, gfp_mask);
 	if (IS_ERR(bio))
 		return bio;
 
diff --git a/block/blk-map.c b/block/blk-map.c
index f890d43..496af28 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -5,7 +5,7 @@
 #include <linux/module.h>
 #include <linux/bio.h>
 #include <linux/blkdev.h>
-#include <scsi/sg.h>		/* for struct sg_iovec */
+#include <linux/uio.h>
 
 #include "blk.h"
 
@@ -187,20 +187,22 @@ EXPORT_SYMBOL(blk_rq_map_user);
  *    unmapping.
  */
 int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
-			struct rq_map_data *map_data, const struct sg_iovec *iov,
-			int iov_count, unsigned int len, gfp_t gfp_mask)
+			struct rq_map_data *map_data,
+			const struct iov_iter *iter, gfp_t gfp_mask)
 {
 	struct bio *bio;
-	int i, read = rq_data_dir(rq) == READ;
+	int read = rq_data_dir(rq) == READ;
 	int unaligned = 0;
+	struct iov_iter i;
+	struct iovec iov;
 
-	if (!iov || iov_count <= 0)
+	if (!iter || !iter->count)
 		return -EINVAL;
 
-	for (i = 0; i < iov_count; i++) {
-		unsigned long uaddr = (unsigned long)iov[i].iov_base;
+	iov_for_each(iov, i, *iter) {
+		unsigned long uaddr = (unsigned long) iov.iov_base;
 
-		if (!iov[i].iov_len)
+		if (!iov.iov_len)
 			return -EINVAL;
 
 		/*
@@ -210,16 +212,15 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 			unaligned = 1;
 	}
 
-	if (unaligned || (q->dma_pad_mask & len) || map_data)
-		bio = bio_copy_user_iov(q, map_data, iov, iov_count, read,
-					gfp_mask);
+	if (unaligned || (q->dma_pad_mask & iter->count) || map_data)
+		bio = bio_copy_user_iov(q, map_data, iter, read, gfp_mask);
 	else
-		bio = bio_map_user_iov(q, NULL, iov, iov_count, read, gfp_mask);
+		bio = bio_map_user_iov(q, NULL, iter, read, gfp_mask);
 
 	if (IS_ERR(bio))
 		return PTR_ERR(bio);
 
-	if (bio->bi_iter.bi_size != len) {
+	if (bio->bi_iter.bi_size != iter->count) {
 		/*
 		 * Grab an extra reference to this bio, as bio_unmap_user()
 		 * expects to be able to drop it twice as it happens on the
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 28163fa..8c6652b 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -332,7 +332,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 
 	ret = 0;
 	if (hdr->iovec_count) {
-		size_t iov_data_len;
+		struct iov_iter i;
 		struct iovec *iov = NULL;
 
 		ret = rw_copy_check_uvector(-1, hdr->dxferp, hdr->iovec_count,
@@ -342,20 +342,13 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
 			goto out_free_cdb;
 		}
 
-		iov_data_len = ret;
-		ret = 0;
-
 		/* SG_IO howto says that the shorter of the two wins */
-		if (hdr->dxfer_len < iov_data_len) {
-			hdr->iovec_count = iov_shorten(iov,
-						       hdr->iovec_count,
-						       hdr->dxfer_len);
-			iov_data_len = hdr->dxfer_len;
-		}
+		iov_iter_init(&i,
+			      rq_data_dir(rq) == READ ? WRITE : READ,
+			      iov, hdr->iovec_count,
+			      min_t(unsigned, ret, hdr->dxfer_len));
 
-		ret = blk_rq_map_user_iov(q, rq, NULL, (struct sg_iovec *) iov,
-					  hdr->iovec_count,
-					  iov_data_len, GFP_KERNEL);
+		ret = blk_rq_map_user_iov(q, rq, NULL, &i, GFP_KERNEL);
 		kfree(iov);
 	} else if (hdr->dxfer_len)
 		ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index b14f64c..3ce5cad 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1734,22 +1734,19 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
 	}
 
 	if (iov_count) {
-		int len, size = sizeof(struct sg_iovec) * iov_count;
+		int size = sizeof(struct sg_iovec) * iov_count;
 		struct iovec *iov;
+		struct iov_iter i;
 
 		iov = memdup_user(hp->dxferp, size);
 		if (IS_ERR(iov))
 			return PTR_ERR(iov);
 
-		len = iov_length(iov, iov_count);
-		if (hp->dxfer_len < len) {
-			iov_count = iov_shorten(iov, iov_count, hp->dxfer_len);
-			len = hp->dxfer_len;
-		}
+		iov_iter_init(&i, rw, iov, iov_count,
+			      min_t(size_t, hp->dxfer_len,
+				    iov_length(iov, iov_count)));
 
-		res = blk_rq_map_user_iov(q, rq, md, (struct sg_iovec *)iov,
-					  iov_count,
-					  len, GFP_ATOMIC);
+		res = blk_rq_map_user_iov(q, rq, md, &i, GFP_ATOMIC);
 		kfree(iov);
 	} else
 		res = blk_rq_map_user(q, rq, md, hp->dxferp,
diff --git a/include/linux/bio.h b/include/linux/bio.h
index efead0b..a69f7b1 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -430,11 +430,11 @@ extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
 extern int bio_get_nr_vecs(struct block_device *);
 extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
 				unsigned long, unsigned int, int, gfp_t);
-struct sg_iovec;
+struct iov_iter;
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
 				    struct block_device *,
-				    const struct sg_iovec *, int, int, gfp_t);
+				    const struct iov_iter *, int, gfp_t);
 extern void bio_unmap_user(struct bio *);
 extern struct bio *bio_map_kern(struct request_queue *, void *, unsigned int,
 				gfp_t);
@@ -466,8 +466,8 @@ extern struct bio *bio_copy_user(struct request_queue *, struct rq_map_data *,
 				 unsigned long, unsigned int, int, gfp_t);
 extern struct bio *bio_copy_user_iov(struct request_queue *,
 				     struct rq_map_data *,
-				     const struct sg_iovec *,
-				     int, int, gfp_t);
+				     const struct iov_iter *,
+				     int, gfp_t);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
 extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 92f4b4b..9621b73 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -850,8 +850,8 @@ extern int blk_rq_map_user(struct request_queue *, struct request *,
 extern int blk_rq_unmap_user(struct bio *);
 extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t);
 extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
-			       struct rq_map_data *, const struct sg_iovec *,
-			       int, unsigned int, gfp_t);
+			       struct rq_map_data *, const struct iov_iter *,
+			       gfp_t);
 extern int blk_execute_rq(struct request_queue *, struct gendisk *,
 			  struct request *, int);
 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/7] block: rewrite __bio_copy_iov()
  2015-01-12 11:43 ` [PATCH v2 1/7] block: replace sg_iovec with iov_iter Dongsu Park
@ 2015-01-12 11:43   ` Dongsu Park
  2015-01-12 11:44     ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Dongsu Park
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Dongsu Park @ 2015-01-12 11:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Hellwig, Kent Overstreet, Jens Axboe, Dongsu Park, Al Viro

Rewrite __bio_copy_iov() so that it can call either _read() or _write()
variant, which is determined by direction to_iov, given as either READ
or WRITE. Moreover, make __bio_copy_iov() take its parameter iov_iter
by value, to avoid awkward situations like ref-/dereferencing pointer
and value repeatedly.

This commit should contain only literal replacements, without
functional changes.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
---
 block/bio.c | 113 ++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 75 insertions(+), 38 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8267676..7b1aed3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1046,46 +1046,84 @@ static struct bio_map_data *bio_alloc_map_data(unsigned int iov_count,
 		       sizeof(struct sg_iovec) * iov_count, gfp_mask);
 }
 
-static int __bio_copy_iov(struct bio *bio, const struct iov_iter *iter,
-			  int to_user, int from_user, int do_free_page)
+/**
+ * __bio_copy_iov_read - copy all pages from iov_iter to bio
+ * @bio: The &struct bio which describes the I/O as destination
+ * @iter: iov_iter as source
+ *
+ * Copy all pages from iov_iter to bio.
+ * Returns 0 on success, or error on failure.
+ */
+static int __bio_copy_iov_read(struct bio *bio, struct iov_iter iter)
 {
-	int ret = 0, i;
+	int i;
 	struct bio_vec *bvec;
-	struct iov_iter iov_iter = *iter;
 
 	bio_for_each_segment_all(bvec, bio, i) {
-		char *bv_addr = page_address(bvec->bv_page);
-		unsigned int bv_len = bvec->bv_len;
-
-		while (bv_len && iov_iter.count) {
-			struct iovec iov = iov_iter_iovec(&iov_iter);
-			unsigned int bytes = min_t(unsigned int, bv_len,
-						   iov.iov_len);
-
-			if (!ret) {
-				if (to_user)
-					ret = copy_to_user(iov.iov_base,
-							   bv_addr, bytes);
-
-				if (from_user)
-					ret = copy_from_user(bv_addr,
-							     iov.iov_base,
-							     bytes);
-
-				if (ret)
-					ret = -EFAULT;
-			}
+		ssize_t ret;
 
-			bv_len -= bytes;
-			bv_addr += bytes;
-			iov_iter_advance(&iov_iter, bytes);
-		}
+		ret = copy_page_from_iter(bvec->bv_page,
+					  bvec->bv_offset,
+					  bvec->bv_len,
+					  &iter);
 
-		if (do_free_page)
-			__free_page(bvec->bv_page);
+		if (!iov_iter_count(&iter))
+			break;
+
+		if (ret < bvec->bv_len)
+			return -EFAULT;
 	}
 
-	return ret;
+	return 0;
+}
+
+/**
+ * __bio_copy_iov_write - copy all pages from bio to iov_iter
+ * @bio: The &struct bio which describes the I/O as source
+ * @iter: iov_iter as destination
+ *
+ * Copy all pages from bio to iov_iter.
+ * Returns 0 on success, or error on failure.
+ */
+static int __bio_copy_iov_write(struct bio *bio, struct iov_iter iter)
+{
+	int i;
+	struct bio_vec *bvec;
+
+	bio_for_each_segment_all(bvec, bio, i) {
+		ssize_t ret;
+
+		ret = copy_page_to_iter(bvec->bv_page,
+					bvec->bv_offset,
+					bvec->bv_len,
+					&iter);
+
+		if (!iov_iter_count(&iter))
+			break;
+
+		if (ret < bvec->bv_len)
+			return -EFAULT;
+	}
+
+	return 0;
+}
+
+/**
+ * __bio_copy_iov - copy all pages between bio and iov_iter
+ * @bio: The &struct bio which describes the I/O
+ * @iter: iov_iter either as source or destination
+ * @to_iov: whether to %READ (0) or %WRITE (1)
+ *
+ * Simple wrapper around __bio_copy_iov_{write,read}().
+ * Returns 0 on success, or the error returned as-is on failure.
+ */
+static inline int __bio_copy_iov(struct bio *bio, struct iov_iter iter,
+				 int to_iov)
+{
+	if (to_iov == WRITE)
+		return __bio_copy_iov_write(bio, iter);
+	else
+		return __bio_copy_iov_read(bio, iter);
 }
 
 /**
@@ -1106,11 +1144,10 @@ int bio_uncopy_user(struct bio *bio)
 		 * if we're in a workqueue, the request is orphaned, so
 		 * don't copy into a random user address space, just free.
 		 */
-		if (current->mm)
-			ret = __bio_copy_iov(bio, &bmd->iter,
-					     bio_data_dir(bio) == READ,
-					     0, bmd->is_our_pages);
-		else if (bmd->is_our_pages)
+		if (current->mm && bio_data_dir(bio) == READ)
+			ret = __bio_copy_iov(bio, bmd->iter, WRITE);
+
+		if (bmd->is_our_pages)
 			bio_for_each_segment_all(bvec, bio, i)
 				__free_page(bvec->bv_page);
 	}
@@ -1233,7 +1270,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 	 */
 	if ((!write_to_vm && (!map_data || !map_data->null_mapped)) ||
 	    (map_data && map_data->from_user)) {
-		ret = __bio_copy_iov(bio, iter, 0, 1, 0);
+		ret = __bio_copy_iov(bio, *iter, READ);
 		if (ret)
 			goto cleanup;
 	}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov()
  2015-01-12 11:43   ` [PATCH v2 2/7] block: rewrite __bio_copy_iov() Dongsu Park
@ 2015-01-12 11:44     ` Dongsu Park
  2015-01-12 11:44       ` [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov() Dongsu Park
  2015-01-16 11:41       ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Christoph Hellwig
  2015-01-15 17:10     ` [PATCH v2 2/7] block: rewrite __bio_copy_iov() Christoph Hellwig
  2015-01-15 18:18     ` Christoph Hellwig
  2 siblings, 2 replies; 15+ messages in thread
From: Dongsu Park @ 2015-01-12 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Hellwig, Kent Overstreet, Jens Axboe, Hans J. Koch,
	Greg Kroah-Hartman, Al Viro, Dongsu Park

From: Kent Overstreet <kmo@daterainc.com>

Refactor the common part in bio_copy_user_iov() and
__bio_map_user_iov() to separate out iov_count_pages() into the general
iov_iter API, instead of open coding iov iterations as done previously.

This commit should contain only literal replacements, without
functional changes.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Hans J. Koch" <hjk@hansjkoch.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
---
 block/bio.c         | 43 ++++++-------------------------------------
 include/linux/uio.h |  2 ++
 lib/iovec.c         | 30 ++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 7b1aed3..9ad76ed 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1184,24 +1184,9 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
 	unsigned int len;
 	unsigned int offset = map_data ? map_data->offset & ~PAGE_MASK : 0;
 
-	for (i = 0; i < iter->nr_segs; i++) {
-		unsigned long uaddr;
-		unsigned long end;
-		unsigned long start;
-
-		uaddr = (unsigned long) iter->iov[i].iov_base;
-		end = (uaddr + iter->iov[i].iov_len + PAGE_SIZE - 1)
-			>> PAGE_SHIFT;
-		start = uaddr >> PAGE_SHIFT;
-
-		/*
-		 * Overflow, abort
-		 */
-		if (end < start)
-			return ERR_PTR(-EINVAL);
-
-		nr_pages += end - start;
-	}
+	nr_pages = iov_count_pages(iter, 0);
+	if (nr_pages < 0)
+		return ERR_PTR(nr_pages);
 
 	if (offset)
 		nr_pages++;
@@ -1331,25 +1316,9 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	struct iov_iter i;
 	struct iovec iov;
 
-	iov_for_each(iov, i, *iter) {
-		unsigned long uaddr = (unsigned long) iov.iov_base;
-		unsigned long len = iov.iov_len;
-		unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-		unsigned long start = uaddr >> PAGE_SHIFT;
-
-		/*
-		 * Overflow, abort
-		 */
-		if (end < start)
-			return ERR_PTR(-EINVAL);
-
-		nr_pages += end - start;
-		/*
-		 * buffer must be aligned to at least hardsector size for now
-		 */
-		if (uaddr & queue_dma_alignment(q))
-			return ERR_PTR(-EINVAL);
-	}
+	nr_pages = iov_count_pages(iter, queue_dma_alignment(q));
+	if (nr_pages < 0)
+		return ERR_PTR(nr_pages);
 
 	if (!nr_pages)
 		return ERR_PTR(-EINVAL);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 1c5e453..142ff1b 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -72,6 +72,8 @@ static inline struct iovec iov_iter_iovec(const struct iov_iter *iter)
 
 unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to);
 
+int iov_count_pages(const struct iov_iter *iter, unsigned align);
+
 size_t iov_iter_copy_from_user_atomic(struct page *page,
 		struct iov_iter *i, unsigned long offset, size_t bytes);
 void iov_iter_advance(struct iov_iter *i, size_t bytes);
diff --git a/lib/iovec.c b/lib/iovec.c
index 2d99cb4..2e75086 100644
--- a/lib/iovec.c
+++ b/lib/iovec.c
@@ -1,5 +1,7 @@
 #include <linux/uaccess.h>
 #include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
 #include <linux/uio.h>
 
 /*
@@ -85,3 +87,31 @@ int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
 	return 0;
 }
 EXPORT_SYMBOL(memcpy_fromiovecend);
+
+int iov_count_pages(const struct iov_iter *iter, unsigned align)
+{
+	struct iov_iter i = *iter;
+	int nr_pages = 0;
+
+	while (iov_iter_count(&i)) {
+		unsigned long uaddr = (unsigned long) i.iov->iov_base +
+			i.iov_offset;
+		unsigned long len = i.iov->iov_len - i.iov_offset;
+
+		if ((uaddr & align) || (len & align))
+			return -EINVAL;
+
+		/*
+		 * Overflow, abort
+		 */
+		if (uaddr + len < uaddr)
+			return -EINVAL;
+
+		nr_pages += DIV_ROUND_UP(len + offset_in_page(uaddr),
+					 PAGE_SIZE);
+		iov_iter_advance(&i, len);
+	}
+
+	return nr_pages;
+}
+EXPORT_SYMBOL(iov_count_pages);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov()
  2015-01-12 11:44     ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Dongsu Park
@ 2015-01-12 11:44       ` Dongsu Park
  2015-01-12 11:44         ` [PATCH v2 5/7] md/raid10: make sync_request_write() call bio_copy_data() Dongsu Park
  2015-01-16 11:44         ` [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov() Christoph Hellwig
  2015-01-16 11:41       ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Christoph Hellwig
  1 sibling, 2 replies; 15+ messages in thread
From: Dongsu Park @ 2015-01-12 11:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Christoph Hellwig, Kent Overstreet, Jens Axboe, Dongsu Park

From: Kent Overstreet <kmo@daterainc.com>

Split up a part of the code that was in __bio_map_user_iov() into
a new function bio_get_user_pages(). This helper is going to be used
by future block layer rewriting, especially from direct-IO part.

Note that this relies on the recent change to make
generic_make_request() take arbitrarily sized bios - we're not using
bio_add_page() here.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
---
 block/bio.c         | 130 +++++++++++++++++++++++++++-------------------------
 include/linux/bio.h |   2 +
 2 files changed, 70 insertions(+), 62 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 9ad76ed..7ff846d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1302,19 +1302,79 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
 }
 EXPORT_SYMBOL(bio_copy_user);
 
+/**
+ * bio_get_user_pages - pin user pages and add them to a biovec
+ * @bio: bio to add pages to
+ * @uaddr: start of user address
+ * @len: length in bytes
+ * @write_to_vm: bool indicating writing to pages or not
+ *
+ * Pins pages for up to @len bytes and appends them to @bio's bvec array. May
+ * pin only part of the requested pages - @bio need not have room for all the
+ * pages and can already have had pages added to it.
+ *
+ * Returns the number of bytes from @len added to @bio.
+ */
+ssize_t bio_get_user_pages(struct bio *bio, struct iov_iter *i, int write_to_vm)
+{
+	while (bio->bi_vcnt < bio->bi_max_vecs && iov_iter_count(i)) {
+		struct iovec iov = iov_iter_iovec(i);
+		int ret;
+		unsigned nr_pages, bytes;
+		unsigned offset = offset_in_page(iov.iov_base);
+		struct bio_vec *bv;
+		struct page **pages;
+
+		nr_pages = min_t(size_t,
+				 DIV_ROUND_UP(iov.iov_len + offset, PAGE_SIZE),
+				 bio->bi_max_vecs - bio->bi_vcnt);
+
+		bv = &bio->bi_io_vec[bio->bi_vcnt];
+		pages = (void *) bv;
+
+		ret = get_user_pages_fast((unsigned long) iov.iov_base,
+					  nr_pages, write_to_vm, pages);
+		if (ret < 0) {
+			if (bio->bi_vcnt)
+				return 0;
+
+			return ret;
+		}
+
+		bio->bi_vcnt += ret;
+		bytes = ret * PAGE_SIZE - offset;
+
+		while (ret--) {
+			bv[ret].bv_page = pages[ret];
+			bv[ret].bv_len = PAGE_SIZE;
+			bv[ret].bv_offset = 0;
+		}
+
+		bv[0].bv_offset += offset;
+		bv[0].bv_len -= offset;
+
+		if (bytes > iov.iov_len) {
+			bio->bi_io_vec[bio->bi_vcnt - 1].bv_len -=
+				bytes - iov.iov_len;
+			bytes = iov.iov_len;
+		}
+
+		bio->bi_iter.bi_size += bytes;
+		iov_iter_advance(i, bytes);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(bio_get_user_pages);
+
 static struct bio *__bio_map_user_iov(struct request_queue *q,
 				      struct block_device *bdev,
 				      const struct iov_iter *iter,
 				      int write_to_vm, gfp_t gfp_mask)
 {
-	int j;
+	ssize_t ret;
 	int nr_pages = 0;
-	struct page **pages;
 	struct bio *bio;
-	int cur_page = 0;
-	int ret, offset;
-	struct iov_iter i;
-	struct iovec iov;
 
 	nr_pages = iov_count_pages(iter, queue_dma_alignment(q));
 	if (nr_pages < 0)
@@ -1327,57 +1387,10 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	if (!bio)
 		return ERR_PTR(-ENOMEM);
 
-	ret = -ENOMEM;
-	pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
-	if (!pages)
+	ret = bio_get_user_pages(bio, (struct iov_iter *)iter, write_to_vm);
+	if (ret < 0)
 		goto out;
 
-	iov_for_each(iov, i, *iter) {
-		unsigned long uaddr = (unsigned long) iov.iov_base;
-		unsigned long len = iov.iov_len;
-		unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-		unsigned long start = uaddr >> PAGE_SHIFT;
-		const int local_nr_pages = end - start;
-		const int page_limit = cur_page + local_nr_pages;
-
-		ret = get_user_pages_fast(uaddr, local_nr_pages,
-				write_to_vm, &pages[cur_page]);
-		if (ret < local_nr_pages) {
-			ret = -EFAULT;
-			goto out_unmap;
-		}
-
-		offset = uaddr & ~PAGE_MASK;
-		for (j = cur_page; j < page_limit; j++) {
-			unsigned int bytes = PAGE_SIZE - offset;
-
-			if (len <= 0)
-				break;
-			
-			if (bytes > len)
-				bytes = len;
-
-			/*
-			 * sorry...
-			 */
-			if (bio_add_pc_page(q, bio, pages[j], bytes, offset) <
-					    bytes)
-				break;
-
-			len -= bytes;
-			offset = 0;
-		}
-
-		cur_page = j;
-		/*
-		 * release the pages we didn't map into the bio, if any
-		 */
-		while (j < page_limit)
-			page_cache_release(pages[j++]);
-	}
-
-	kfree(pages);
-
 	/*
 	 * set data direction, and check if mapped pages need bouncing
 	 */
@@ -1388,14 +1401,7 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
 	bio->bi_flags |= (1 << BIO_USER_MAPPED);
 	return bio;
 
- out_unmap:
-	for (j = 0; j < nr_pages; j++) {
-		if (!pages[j])
-			break;
-		page_cache_release(pages[j]);
-	}
  out:
-	kfree(pages);
 	bio_put(bio);
 	return ERR_PTR(ret);
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a69f7b1..c80131a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -428,6 +428,8 @@ extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
 			   unsigned int, unsigned int);
 extern int bio_get_nr_vecs(struct block_device *);
+struct iov_iter;
+extern ssize_t bio_get_user_pages(struct bio *, struct iov_iter *, int);
 extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
 				unsigned long, unsigned int, int, gfp_t);
 struct iov_iter;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/7] md/raid10: make sync_request_write() call bio_copy_data()
  2015-01-12 11:44       ` [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov() Dongsu Park
@ 2015-01-12 11:44         ` Dongsu Park
  2015-01-12 11:44           ` [PATCH v2 6/7] fs: make _submit_bh consistent with generic bio chaining Dongsu Park
  2015-01-16 11:44         ` [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov() Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Dongsu Park @ 2015-01-12 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Hellwig, Kent Overstreet, Jens Axboe, Neil Brown,
	linux-raid, Dongsu Park

From: Kent Overstreet <kmo@daterainc.com>

Refactor sync_request_write() of md/raid10 to use bio_copy_data()
instead of open coding bio_vec iterations.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
---
 drivers/md/raid10.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 32e282f..4a40354 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2107,18 +2107,11 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 		tbio->bi_vcnt = vcnt;
 		tbio->bi_iter.bi_size = r10_bio->sectors << 9;
 		tbio->bi_rw = WRITE;
-		tbio->bi_private = r10_bio;
 		tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
-
-		for (j=0; j < vcnt ; j++) {
-			tbio->bi_io_vec[j].bv_offset = 0;
-			tbio->bi_io_vec[j].bv_len = PAGE_SIZE;
-
-			memcpy(page_address(tbio->bi_io_vec[j].bv_page),
-			       page_address(fbio->bi_io_vec[j].bv_page),
-			       PAGE_SIZE);
-		}
 		tbio->bi_end_io = end_sync_write;
+		tbio->bi_private = r10_bio;
+
+		bio_copy_data(tbio, fbio);
 
 		d = r10_bio->devs[i].devnum;
 		atomic_inc(&conf->mirrors[d].rdev->nr_pending);
@@ -2134,17 +2127,14 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 	 * that are active
 	 */
 	for (i = 0; i < conf->copies; i++) {
-		int j, d;
+		int d;
 
 		tbio = r10_bio->devs[i].repl_bio;
 		if (!tbio || !tbio->bi_end_io)
 			continue;
 		if (r10_bio->devs[i].bio->bi_end_io != end_sync_write
 		    && r10_bio->devs[i].bio != fbio)
-			for (j = 0; j < vcnt; j++)
-				memcpy(page_address(tbio->bi_io_vec[j].bv_page),
-				       page_address(fbio->bi_io_vec[j].bv_page),
-				       PAGE_SIZE);
+			bio_copy_data(tbio, fbio);
 		d = r10_bio->devs[i].devnum;
 		atomic_inc(&r10_bio->remaining);
 		md_sync_acct(conf->mirrors[d].replacement->bdev,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 6/7] fs: make _submit_bh consistent with generic bio chaining
  2015-01-12 11:44         ` [PATCH v2 5/7] md/raid10: make sync_request_write() call bio_copy_data() Dongsu Park
@ 2015-01-12 11:44           ` Dongsu Park
  2015-01-12 11:44             ` [PATCH v2 7/7] PM: submit bio in a sane way in cases without bio_chain Dongsu Park
  0 siblings, 1 reply; 15+ messages in thread
From: Dongsu Park @ 2015-01-12 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Hellwig, Kent Overstreet, Jens Axboe, Al Viro,
	linux-fsdevel, Dongsu Park

From: Kent Overstreet <kmo@daterainc.com>

Make _submit_bh() handle refcounting by increasing bio->bi_remaining,
followed by bio_endio(). Since bio chaining was introduced with
196d38bccfcf ("block: Generic bio chaining"), refcounting should be
done on bi_remaining instead of ancient bio_cnt. Doing that, calling
convention can be consistent with the immutable biovecs API.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
---
 fs/buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 20805db..dbe5699 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3041,13 +3041,13 @@ int _submit_bh(int rw, struct buffer_head *bh, unsigned long bio_flags)
 	if (buffer_prio(bh))
 		rw |= REQ_PRIO;
 
-	bio_get(bio);
+	atomic_inc(&bio->bi_remaining);
 	submit_bio(rw, bio);
 
 	if (bio_flagged(bio, BIO_EOPNOTSUPP))
 		ret = -EOPNOTSUPP;
 
-	bio_put(bio);
+	bio_endio(bio, 0);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(_submit_bh);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 7/7] PM: submit bio in a sane way in cases without bio_chain
  2015-01-12 11:44           ` [PATCH v2 6/7] fs: make _submit_bh consistent with generic bio chaining Dongsu Park
@ 2015-01-12 11:44             ` Dongsu Park
  2015-01-30  0:49               ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Dongsu Park @ 2015-01-12 11:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Hellwig, Kent Overstreet, Jens Axboe, linux-pm, Dongsu Park

From: Kent Overstreet <kmo@daterainc.com>

Make bio submission in kernel/power/block_io.c to properly submit
bios also when bio_chain is not available. In that case, it's not
necessary to handle refcount with bio_get(), but it's saner to simply
call a predefined helper submit_bio_wait(). So call bio_get() only
when bio_chain is given.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
---
 kernel/power/block_io.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
index 9a58bc2..7206408 100644
--- a/kernel/power/block_io.c
+++ b/kernel/power/block_io.c
@@ -34,7 +34,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
 	bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);
 	bio->bi_iter.bi_sector = sector;
 	bio->bi_bdev = bdev;
-	bio->bi_end_io = end_swap_bio_read;
 
 	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
 		printk(KERN_ERR "PM: Adding page to bio failed at %llu\n",
@@ -44,15 +43,29 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
 	}
 
 	lock_page(page);
-	bio_get(bio);
 
 	if (bio_chain == NULL) {
-		submit_bio(bio_rw, bio);
-		wait_on_page_locked(page);
+		int err = submit_bio_wait(bio_rw, bio);
+
+		if (err) {
+			SetPageError(page);
+			ClearPageUptodate(page);
+			pr_alert("Read-error on swap-device (%u:%u:%llu)\n",
+				 imajor(bio->bi_bdev->bd_inode),
+				 iminor(bio->bi_bdev->bd_inode),
+				 (unsigned long long)bio->bi_iter.bi_sector);
+		} else {
+			SetPageUptodate(page);
+		}
+
 		if (rw == READ)
-			bio_set_pages_dirty(bio);
+			set_page_dirty_lock(page);
+		unlock_page(page);
 		bio_put(bio);
 	} else {
+		bio->bi_end_io = end_swap_bio_read;
+		bio_get(bio);
+
 		if (rw == READ)
 			get_page(page);	/* These pages are freed later */
 		bio->bi_private = *bio_chain;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] block: rewrite __bio_copy_iov()
  2015-01-12 11:43   ` [PATCH v2 2/7] block: rewrite __bio_copy_iov() Dongsu Park
  2015-01-12 11:44     ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Dongsu Park
@ 2015-01-15 17:10     ` Christoph Hellwig
  2015-01-15 18:18     ` Christoph Hellwig
  2 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2015-01-15 17:10 UTC (permalink / raw)
  To: Dongsu Park
  Cc: linux-kernel, Christoph Hellwig, Kent Overstreet, Jens Axboe, Al Viro

> +/**
> + * __bio_copy_iov - copy all pages between bio and iov_iter
> + * @bio: The &struct bio which describes the I/O
> + * @iter: iov_iter either as source or destination
> + * @to_iov: whether to %READ (0) or %WRITE (1)
> + *
> + * Simple wrapper around __bio_copy_iov_{write,read}().
> + * Returns 0 on success, or the error returned as-is on failure.
> + */
> +static inline int __bio_copy_iov(struct bio *bio, struct iov_iter iter,
> +				 int to_iov)
> +{
> +	if (to_iov == WRITE)
> +		return __bio_copy_iov_write(bio, iter);
> +	else
> +		return __bio_copy_iov_read(bio, iter);
>  }

No need to keep this wrapper..


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] block: rewrite __bio_copy_iov()
  2015-01-12 11:43   ` [PATCH v2 2/7] block: rewrite __bio_copy_iov() Dongsu Park
  2015-01-12 11:44     ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Dongsu Park
  2015-01-15 17:10     ` [PATCH v2 2/7] block: rewrite __bio_copy_iov() Christoph Hellwig
@ 2015-01-15 18:18     ` Christoph Hellwig
  2015-01-16 11:31       ` Christoph Hellwig
  2 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-01-15 18:18 UTC (permalink / raw)
  To: Dongsu Park
  Cc: linux-kernel, Christoph Hellwig, Kent Overstreet, Jens Axboe, Al Viro

On Mon, Jan 12, 2015 at 12:43:59PM +0100, Dongsu Park wrote:
> Rewrite __bio_copy_iov() so that it can call either _read() or _write()
> variant, which is determined by direction to_iov, given as either READ
> or WRITE. Moreover, make __bio_copy_iov() take its parameter iov_iter
> by value, to avoid awkward situations like ref-/dereferencing pointer
> and value repeatedly.
> 
> This commit should contain only literal replacements, without
> functional changes.

This breaks booting a simple KVM VM for me:

[    2.692732] general protection fault: 0000 [#1] SMP 
[    2.696041] Modules linked in:
[    2.696041] CPU: 2 PID: 1819 Comm: cdrom_id Not tainted 3.19.0-rc4+ #47
[    2.696041] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[    2.696041] task: ffff88007b318b90 ti: ffff88007a0b0000 task.ti: ffff88007a0b0000
[    2.696041] RIP: 0010:[<ffffffff81742460>]  [<ffffffff81742460>] bio_uncopy_user+0x60/0x160
[    2.701775] RSP: 0018:ffff88007a0b3a88  EFLAGS: 00010246
[    2.701775] RAX: 0000000000000024 RBX: 20202020554d4551 RCX: 0000000000000000
[    2.701775] RDX: 0000000000000024 RSI: ffff88007a6c7024 RDI: ffff88007cc9e304
[    2.705548] RBP: ffff88007a0b3b08 R08: 0000000000000024 R09: 0000000000000000
[    2.705548] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[    2.705548] R13: ffff88007cc9e280 R14: ffff880079cdd200 R15: 0000000000000000
[    2.705548] FS:  00007fdeb0282700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[    2.705548] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    2.705548] CR2: 0000000001ebd008 CR3: 000000007aca6000 CR4: 00000000000006e0
[    2.705548] Stack:
[    2.715017]  0000000000000001 0000000000000000 0000000000000024 ffff88007a0b3a70
[    2.716562]  0000000000000001 0000000000000001 0000000000000024 0000000000000000
[    2.717630]  ffff88007a0b3a70 0000000000000001 ffff88007a0b3b18 ffff88007cc9e280
[    2.717630] Call Trace:
[    2.717630]  [<ffffffff8174fbb4>] __blk_rq_unmap_user+0x14/0x40
[    2.717630]  [<ffffffff8174fc11>] blk_rq_unmap_user+0x31/0x60
[    2.717630]  [<ffffffff8175cb33>] sg_io+0x2c3/0x4a0
[    2.724739]  [<ffffffff8175d1d5>] scsi_cmd_ioctl+0x425/0x4a0
[    2.724739]  [<ffffffff8175d29a>] scsi_cmd_blk_ioctl+0x4a/0x60
[    2.726432]  [<ffffffff81b4ae1b>] cdrom_ioctl+0x3b/0xc10
[    2.726432]  [<ffffffff810fdecd>] ? trace_hardirqs_on+0xd/0x10
[    2.726432]  [<ffffffff81a50638>] ? sr_block_ioctl+0x48/0xd0
[    2.726432]  [<ffffffff810fddfd>] ? trace_hardirqs_on_caller+0x10d/0x1d0
[    2.726432]  [<ffffffff810fdecd>] ? trace_hardirqs_on+0xd/0x10
[    2.726432]  [<ffffffff81a50674>] sr_block_ioctl+0x84/0xd0
[    2.726432]  [<ffffffff81759782>] blkdev_ioctl+0x232/0x7f0
[    2.726432]  [<ffffffff811fdb6c>] block_ioctl+0x3c/0x40
[    2.726432]  [<ffffffff811d8d93>] do_vfs_ioctl+0x83/0x5b0
[    2.726432]  [<ffffffff811d6021>] ? final_putname+0x21/0x50
[    2.726432]  [<ffffffff81e04095>] ? sysret_check+0x22/0x5d
[    2.726432]  [<ffffffff811d9307>] SyS_ioctl+0x47/0x90
[    2.726432]  [<ffffffff81e04069>] system_call_fastpath+0x12/0x17
[    2.726432] Code: 48 83 b8 48 03 00 00 00 74 06 f6 47 18 01 74 63 41 8b 1e 85 db 74 30 66 41 83 7d 60 00 49 8b 5d 68 74 24 45 31 e4 0f 1f 44 00 00 <48> 8b 3b 31 f6 41 83 c4 01 48 83 c3 10 e8 7e d4 a3 ff 41 0f b7 
[    2.726432] RIP  [<ffffffff81742460>] bio_uncopy_user+0x60/0x160
[    2.750102]  RSP <ffff88007a0b3a88>
[    2.751775] ---[ end trace 577bd821e65932ad ]---



(gdb) l *(bio_uncopy_user+0x60/0x160)
0xffffffff81742400 is in bio_uncopy_user (../block/bio.c:1137).
1132	 *
1133	 *	Free pages allocated from bio_copy_user() and write back
data
1134	 *	to user space in case of a read.
1135	 */
1136	int bio_uncopy_user(struct bio *bio)
1137	{
1138		struct bio_map_data *bmd = bio->bi_private;
1139		struct bio_vec *bvec;
1140		int ret = 0, i;
1141	


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] block: rewrite __bio_copy_iov()
  2015-01-15 18:18     ` Christoph Hellwig
@ 2015-01-16 11:31       ` Christoph Hellwig
  2015-01-16 11:58         ` Dongsu Park
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-01-16 11:31 UTC (permalink / raw)
  To: Dongsu Park
  Cc: linux-kernel, Christoph Hellwig, Kent Overstreet, Jens Axboe, Al Viro

On Thu, Jan 15, 2015 at 10:18:17AM -0800, Christoph Hellwig wrote:
> This breaks booting a simple KVM VM for me:

Seems like the issue actually is in the patch before this one, but
only shows up with this one applied.

The root cause is that we only copy the iov_iter, but not the
actual iovecs into the bio_map_data.

I have a fixed series, which I'll send out together with various
related cleanups ASAP.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov()
  2015-01-12 11:44     ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Dongsu Park
  2015-01-12 11:44       ` [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov() Dongsu Park
@ 2015-01-16 11:41       ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2015-01-16 11:41 UTC (permalink / raw)
  To: Dongsu Park
  Cc: linux-kernel, Christoph Hellwig, Kent Overstreet, Jens Axboe,
	Hans J. Koch, Greg Kroah-Hartman, Al Viro

mm/iov_iter.c already has a iov_iter_npages, although that
one doesn't do an alignment check yet.  But it might be worth
to just the iov_iter_alignment similar to how iov_iter_alignment
does it instead of this new function.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov()
  2015-01-12 11:44       ` [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov() Dongsu Park
  2015-01-12 11:44         ` [PATCH v2 5/7] md/raid10: make sync_request_write() call bio_copy_data() Dongsu Park
@ 2015-01-16 11:44         ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2015-01-16 11:44 UTC (permalink / raw)
  To: Dongsu Park; +Cc: linux-kernel, Christoph Hellwig, Kent Overstreet, Jens Axboe

This switches to opencoding bio_add_pc_page, and thus losing the
max_hw_sectors check.  Additionally it exports a function that's
not used outside the file without a user showing up in the
series.

For now I'd suggest defeing this to the series that uses it, but
I suspect we can't really use common code between SG_IO and direct I/O
in the end due to the different nature of filesystem vs BLOCK_PC I/O.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] block: rewrite __bio_copy_iov()
  2015-01-16 11:31       ` Christoph Hellwig
@ 2015-01-16 11:58         ` Dongsu Park
  0 siblings, 0 replies; 15+ messages in thread
From: Dongsu Park @ 2015-01-16 11:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, Kent Overstreet, Jens Axboe, Al Viro

Hi Christoph,

On 16.01.2015 03:31, Christoph Hellwig wrote:
> On Thu, Jan 15, 2015 at 10:18:17AM -0800, Christoph Hellwig wrote:
> > This breaks booting a simple KVM VM for me:
> Seems like the issue actually is in the patch before this one, but
> only shows up with this one applied.
> The root cause is that we only copy the iov_iter, but not the
> actual iovecs into the bio_map_data.
> I have a fixed series, which I'll send out together with various
> related cleanups ASAP.

Thanks for testing it and finding out the root cause.
Strange, I haven't never seen the bug.
Maybe I'd have to test it also with virtio-scsi, which I don't do usually.

Dongsu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 7/7] PM: submit bio in a sane way in cases without bio_chain
  2015-01-12 11:44             ` [PATCH v2 7/7] PM: submit bio in a sane way in cases without bio_chain Dongsu Park
@ 2015-01-30  0:49               ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2015-01-30  0:49 UTC (permalink / raw)
  To: Dongsu Park
  Cc: linux-kernel, Christoph Hellwig, Kent Overstreet, Jens Axboe, linux-pm

On Monday, January 12, 2015 12:44:04 PM Dongsu Park wrote:
> From: Kent Overstreet <kmo@daterainc.com>
> 
> Make bio submission in kernel/power/block_io.c to properly submit
> bios also when bio_chain is not available. In that case, it's not
> necessary to handle refcount with bio_get(), but it's saner to simply
> call a predefined helper submit_bio_wait(). So call bio_get() only
> when bio_chain is given.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: linux-pm@vger.kernel.org
> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
> [dpark: add more description in commit message]
> Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>

This is fine by me, but I'd like to have an ACK from a block layer
expert here if I'm to apply it.

If you want to push this through the block layer subsystem tree,
please feel free to add my ACK to it.

> ---
>  kernel/power/block_io.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
> index 9a58bc2..7206408 100644
> --- a/kernel/power/block_io.c
> +++ b/kernel/power/block_io.c
> @@ -34,7 +34,6 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
>  	bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);
>  	bio->bi_iter.bi_sector = sector;
>  	bio->bi_bdev = bdev;
> -	bio->bi_end_io = end_swap_bio_read;
>  
>  	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
>  		printk(KERN_ERR "PM: Adding page to bio failed at %llu\n",
> @@ -44,15 +43,29 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
>  	}
>  
>  	lock_page(page);
> -	bio_get(bio);
>  
>  	if (bio_chain == NULL) {
> -		submit_bio(bio_rw, bio);
> -		wait_on_page_locked(page);
> +		int err = submit_bio_wait(bio_rw, bio);
> +
> +		if (err) {
> +			SetPageError(page);
> +			ClearPageUptodate(page);
> +			pr_alert("Read-error on swap-device (%u:%u:%llu)\n",
> +				 imajor(bio->bi_bdev->bd_inode),
> +				 iminor(bio->bi_bdev->bd_inode),
> +				 (unsigned long long)bio->bi_iter.bi_sector);
> +		} else {
> +			SetPageUptodate(page);
> +		}
> +
>  		if (rw == READ)
> -			bio_set_pages_dirty(bio);
> +			set_page_dirty_lock(page);
> +		unlock_page(page);
>  		bio_put(bio);
>  	} else {
> +		bio->bi_end_io = end_swap_bio_read;
> +		bio_get(bio);
> +
>  		if (rw == READ)
>  			get_page(page);	/* These pages are freed later */
>  		bio->bi_private = *bio_chain;
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-01-30  0:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-12 11:43 [RFC PATCH v2 0/7] preparation for block layer simplification Dongsu Park
2015-01-12 11:43 ` [PATCH v2 1/7] block: replace sg_iovec with iov_iter Dongsu Park
2015-01-12 11:43   ` [PATCH v2 2/7] block: rewrite __bio_copy_iov() Dongsu Park
2015-01-12 11:44     ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Dongsu Park
2015-01-12 11:44       ` [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov() Dongsu Park
2015-01-12 11:44         ` [PATCH v2 5/7] md/raid10: make sync_request_write() call bio_copy_data() Dongsu Park
2015-01-12 11:44           ` [PATCH v2 6/7] fs: make _submit_bh consistent with generic bio chaining Dongsu Park
2015-01-12 11:44             ` [PATCH v2 7/7] PM: submit bio in a sane way in cases without bio_chain Dongsu Park
2015-01-30  0:49               ` Rafael J. Wysocki
2015-01-16 11:44         ` [PATCH v2 4/7] block: refactor bio_get_user_pages() from __bio_map_user_iov() Christoph Hellwig
2015-01-16 11:41       ` [PATCH v2 3/7] block: refactor iov_count_pages() from bio_{copy,map}_user_iov() Christoph Hellwig
2015-01-15 17:10     ` [PATCH v2 2/7] block: rewrite __bio_copy_iov() Christoph Hellwig
2015-01-15 18:18     ` Christoph Hellwig
2015-01-16 11:31       ` Christoph Hellwig
2015-01-16 11:58         ` Dongsu Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.