On Tue, May 14, 2019 at 07:14:41AM +0200, Christoph Hellwig wrote: > On Tue, May 14, 2019 at 12:36:43PM +0800, Ming Lei wrote: > > > > Some workloads need this optimization, please see 729204ef49ec00b > > > > ("block: relax check on sg gap"): > > > > > > And we still allow to merge the segments with this patch. The only > > > difference is that these merges do get accounted as extra segments. > > > > It is easy for .nr_phys_segments to reach the max segment limit by this > > way, then no new bio can be merged any more. > > As said in my other mail we only decremented it for request merges > in the non-gap case before and no one complained. However we still may make it better, for example, the attached patch can save 10~20% time when running 'mkfs.ntfs -s 512 /dev/vda', lots of small request(63KB) can be merged to big IO(1MB). > > > We don't consider segment merge between two bios in ll_new_hw_segment(), > > in my mkfs test over virtio-blk, request size can be increased to ~1M(several > > segments) from 63k(126 bios/segments) easily if the segment merge between > > two bios is considered. > > With the gap devices we have unlimited segment size, see my next patch > to actually enforce that. Which is much more efficient than using But this patch does effect on non-gap device, and actually most of device are non-gap type. > multiple segments. Also instead of hacking up the merge path even more > we can fix the block device buffered I/O path to submit large I/Os > instead of relying on merging like we do in the direct I/O code and every > major file system. I have that on my plate as a todo list item. > > > > We do that in a couple of places. For one the nvme single segment > > > optimization that triggered this bug. Also for range discard support > > > in nvme and virtio. Then we have loop that iterate the segments, but > > > doesn't use the nr_phys_segments count, and plenty of others that > > > iterate over pages at the moment but should be iterating bvecs, > > > e.g. ubd or aoe. > > > > Seems discard segment doesn't consider bios merge for nvme and virtio, > > so it should be fine in this way. Will take a close look at nvme/virtio > > discard segment merge later. > > I found the bio case by looking at doing the proper accounting in the > bio merge path and hitting KASAN warning due to the range kmalloc. > So that issue is real as well. I guess the following patch may fix it: diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index f1d90cd3dc47..a913110de389 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -175,7 +175,7 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr, static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) { - unsigned short segments = blk_rq_nr_discard_segments(req); + unsigned short segments = 0; unsigned short n = 0; struct virtio_blk_discard_write_zeroes *range; struct bio *bio; @@ -184,6 +184,9 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) if (unmap) flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP; + __rq_for_each_bio(bio, req) + segments++; + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); if (!range) return -ENOMEM; Thanks, Ming