All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: axboe@fb.com, Matias Bjorling <mb@lightnvm.io>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH 01/10] block: don't decrement nr_phys_segments for physically contigous segments
Date: Tue, 14 May 2019 17:05:45 +0800	[thread overview]
Message-ID: <20190514090544.GC20468@ming.t460p> (raw)
In-Reply-To: <20190514051441.GA6294@lst.de>

[-- Attachment #1: Type: text/plain, Size: 3297 bytes --]

On Tue, May 14, 2019 at 07:14:41AM +0200, Christoph Hellwig wrote:
> On Tue, May 14, 2019 at 12:36:43PM +0800, Ming Lei wrote:
> > > > Some workloads need this optimization, please see 729204ef49ec00b
> > > > ("block: relax check on sg gap"):
> > > 
> > > And we still allow to merge the segments with this patch.  The only
> > > difference is that these merges do get accounted as extra segments.
> > 
> > It is easy for .nr_phys_segments to reach the max segment limit by this
> > way, then no new bio can be merged any more.
> 
> As said in my other mail we only decremented it for request merges
> in the non-gap case before and no one complained.

However we still may make it better, for example, the attached patch can
save 10~20% time when running 'mkfs.ntfs -s 512 /dev/vda', lots of small
request(63KB) can be merged to big IO(1MB).

> 
> > We don't consider segment merge between two bios in ll_new_hw_segment(),
> > in my mkfs test over virtio-blk, request size can be increased to ~1M(several
> > segments) from 63k(126 bios/segments) easily if the segment merge between
> > two bios is considered.
> 
> With the gap devices we have unlimited segment size, see my next patch
> to actually enforce that.  Which is much more efficient than using

But this patch does effect on non-gap device, and actually most of
device are non-gap type.

> multiple segments.  Also instead of hacking up the merge path even more
> we can fix the block device buffered I/O path to submit large I/Os
> instead of relying on merging like we do in the direct I/O code and every
> major file system.  I have that on my plate as a todo list item.
> 
> > > We do that in a couple of places.  For one the nvme single segment
> > > optimization that triggered this bug.  Also for range discard support
> > > in nvme and virtio.  Then we have loop that  iterate the segments, but
> > > doesn't use the nr_phys_segments count, and plenty of others that
> > > iterate over pages at the moment but should be iterating bvecs,
> > > e.g. ubd or aoe.
> > 
> > Seems discard segment doesn't consider bios merge for nvme and virtio,
> > so it should be fine in this way. Will take a close look at nvme/virtio
> > discard segment merge later.
> 
> I found the bio case by looking at doing the proper accounting in the
> bio merge path and hitting KASAN warning due to the range kmalloc.
> So that issue is real as well.

I guess the following patch may fix it:

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index f1d90cd3dc47..a913110de389 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -175,7 +175,7 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr,
 
 static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap)
 {
-	unsigned short segments = blk_rq_nr_discard_segments(req);
+	unsigned short segments = 0;
 	unsigned short n = 0;
 	struct virtio_blk_discard_write_zeroes *range;
 	struct bio *bio;
@@ -184,6 +184,9 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap)
 	if (unmap)
 		flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP;
 
+	__rq_for_each_bio(bio, req)
+		segments++;
+
 	range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC);
 	if (!range)
 		return -ENOMEM;


Thanks,
Ming

[-- Attachment #2: 0001-block-merge-bios-segment.patch --]
[-- Type: text/plain, Size: 1110 bytes --]

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 21e87a714a73..af122efeb28d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -622,6 +622,8 @@ static inline int ll_new_hw_segment(struct request_queue *q,
 				    struct bio *bio)
 {
 	int nr_phys_segs = bio_phys_segments(q, bio);
+	unsigned int seg_size =
+		req->biotail->bi_seg_back_size + bio->bi_seg_front_size;
 
 	if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(q))
 		goto no_merge;
@@ -629,11 +631,15 @@ static inline int ll_new_hw_segment(struct request_queue *q,
 	if (blk_integrity_merge_bio(q, req, bio) == false)
 		goto no_merge;
 
-	/*
-	 * This will form the start of a new hw segment.  Bump both
-	 * counters.
-	 */
-	req->nr_phys_segments += nr_phys_segs;
+	if (blk_phys_contig_segment(q, req->biotail, bio)) {
+		if (nr_phys_segs)
+			req->nr_phys_segments += nr_phys_segs - 1;
+		if (nr_phys_segs == 1)
+			bio->bi_seg_back_size = seg_size;
+		if (req->nr_phys_segments == 1)
+			req->bio->bi_seg_front_size = seg_size;
+	} else
+		req->nr_phys_segments += nr_phys_segs;
 	return 1;
 
 no_merge:
-- 
2.20.1


  reply	other threads:[~2019-05-14  9:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-13  6:37 fix nr_phys_segments vs iterators accounting Christoph Hellwig
2019-05-13  6:37 ` [PATCH 01/10] block: don't decrement nr_phys_segments for physically contigous segments Christoph Hellwig
2019-05-13  9:45   ` Ming Lei
2019-05-13 12:03     ` Christoph Hellwig
2019-05-13 12:37       ` Christoph Hellwig
2019-05-14  4:36       ` Ming Lei
2019-05-14  5:14         ` Christoph Hellwig
2019-05-14  9:05           ` Ming Lei [this message]
2019-05-14 13:51             ` Christoph Hellwig
2019-05-14 13:57               ` Hannes Reinecke
2019-05-14 14:27               ` Ming Lei
2019-05-14 14:31                 ` Christoph Hellwig
2019-05-14 14:32                   ` Christoph Hellwig
2019-05-13  6:37 ` [PATCH 02/10] block: force an unlimited segment size on queues with a virt boundary Christoph Hellwig
2019-05-15  8:19   ` Ming Lei
2019-05-13  6:37 ` [PATCH 03/10] block: remove the segment size check in bio_will_gap Christoph Hellwig
2019-05-15  8:34   ` Ming Lei
2019-05-13  6:37 ` [PATCH 04/10] block: remove the bi_seg_{front,back}_size fields in struct bio Christoph Hellwig
2019-05-13  6:37 ` [PATCH 05/10] block: initialize the write priority in blk_rq_bio_prep Christoph Hellwig
2019-05-13 15:04   ` Chaitanya Kulkarni
2019-05-13  6:37 ` [PATCH 06/10] block: remove blk_init_request_from_bio Christoph Hellwig
2019-05-13  6:37 ` [PATCH 07/10] block: remove the bi_phys_segments field in struct bio Christoph Hellwig
2019-05-13  6:37 ` [PATCH 08/10] block: simplify blk_recalc_rq_segments Christoph Hellwig
2019-05-13  6:37 ` [PATCH 09/10] block: untangle the end of blk_bio_segment_split Christoph Hellwig
2019-05-13  6:37 ` [PATCH 10/10] block: mark blk_rq_bio_prep as inline Christoph Hellwig
2019-05-13 14:57   ` Chaitanya Kulkarni
     [not found] ` <CGME20190513063855epcas5p33ef8c4c0a0055bd0b66eadc859796f0f@epcms2p6>
2019-05-13  7:34   ` [PATCH 05/10] block: initialize the write priority in blk_rq_bio_prep Minwoo Im

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190514090544.GC20468@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=mb@lightnvm.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.