All of lore.kernel.org
 help / color / mirror / Atom feed
From: <ed.tsai@mediatek.com>
To: <ming.lei@redhat.com>, <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	AngeloGioacchino Del Regno
	<angelogioacchino.delregno@collabora.com>
Cc: <wsd_upstream@mediatek.com>, <chun-hung.wu@mediatek.com>,
	<casper.li@mediatek.com>, <will.shiu@mediatek.com>,
	<light.hsieh@mediatek.com>, Ed Tsai <ed.tsai@mediatek.com>,
	<linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-mediatek@lists.infradead.org>
Subject: [PATCH v2] block: limit the extract size to align queue limit
Date: Fri, 10 Nov 2023 13:19:49 +0800	[thread overview]
Message-ID: <20231110051950.21972-1-ed.tsai@mediatek.com> (raw)

From: Ed Tsai <ed.tsai@mediatek.com>

When an application performs a large IO, it fills and submits multiple
full bios to the block layer. Referring to commit 07173c3ec276
("block: enable multipage bvecs"), the full bio size is no longer fixed
at 1MB but can vary based on the physical memory layout.

The size of the full bio no longer aligns with the maximum IO size of
the queue. Therefore, in a 64MB read, you may see many unaligned bios
being submitted.

Executing the command to perform a 64MB read:

	dd if=/data/test_file of=/dev/null bs=64m count=1 iflag=direct

It demonstrates the submission of numerous unaligned bios:

	block_bio_queue: 254,52 R 2933336 + 2136
	block_bio_queue: 254,52 R 2935472 + 2152
	block_bio_queue: 254,52 R 2937624 + 2128
	block_bio_queue: 254,52 R 2939752 + 2160

This patch limits the number of extract pages to ensure that we submit
the bio once we fill enough pages, preventing the block layer from
spliting small I/Os in between.

I performed the Antutu V10 Storage Test on a UFS 4.0 device, which
resulted in a significant improvement in the Sequential test:

Sequential Read (average of 5 rounds):
Original: 3033.7 MB/sec
Patched: 3520.9 MB/sec

Sequential Write (average of 5 rounds):
Original: 2225.4 MB/sec
Patched: 2800.3 MB/sec

Link: https://lore.kernel.org/linux-arm-kernel/20231025092255.27930-1-ed.tsai@mediatek.com/
Signed-off-by: Ed Tsai <ed.tsai@mediatek.com>

---
 block/bio.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 816d412c06e9..8d3a112e68da 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1227,8 +1227,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_extraction_t extraction_flags = 0;
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
+	struct block_device *bdev = bio->bi_bdev;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
+	ssize_t max_extract = UINT_MAX - bio->bi_iter.bi_size;
 	ssize_t size, left;
 	unsigned len, i = 0;
 	size_t offset;
@@ -1242,7 +1244,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-	if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
+	if (bdev && blk_queue_pci_p2pdma(bdev->bd_disk->queue))
 		extraction_flags |= ITER_ALLOW_P2PDMA;
 
 	/*
@@ -1252,16 +1254,21 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * result to ensure the bio's total size is correct. The remainder of
 	 * the iov data will be picked up in the next bio iteration.
 	 */
-	size = iov_iter_extract_pages(iter, &pages,
-				      UINT_MAX - bio->bi_iter.bi_size,
+	if (bdev && bio_op(bio) != REQ_OP_ZONE_APPEND) {
+		unsigned int max = queue_max_bytes(bdev_get_queue(bdev));
+
+		max_extract = bio->bi_iter.bi_size ?
+			max - bio->bi_iter.bi_size & (max - 1) : max;
+	}
+	size = iov_iter_extract_pages(iter, &pages, max_extract,
 				      nr_pages, extraction_flags, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
 	nr_pages = DIV_ROUND_UP(offset + size, PAGE_SIZE);
 
-	if (bio->bi_bdev) {
-		size_t trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1);
+	if (bdev) {
+		size_t trim = size & (bdev_logical_block_size(bdev) - 1);
 		iov_iter_revert(iter, trim);
 		size -= trim;
 	}
-- 
2.18.0


WARNING: multiple messages have this Message-ID (diff)
From: <ed.tsai@mediatek.com>
To: <ming.lei@redhat.com>, <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	AngeloGioacchino Del Regno
	<angelogioacchino.delregno@collabora.com>
Cc: <wsd_upstream@mediatek.com>, <chun-hung.wu@mediatek.com>,
	<casper.li@mediatek.com>, <will.shiu@mediatek.com>,
	<light.hsieh@mediatek.com>, Ed Tsai <ed.tsai@mediatek.com>,
	<linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-mediatek@lists.infradead.org>
Subject: [PATCH v2] block: limit the extract size to align queue limit
Date: Fri, 10 Nov 2023 13:19:49 +0800	[thread overview]
Message-ID: <20231110051950.21972-1-ed.tsai@mediatek.com> (raw)

From: Ed Tsai <ed.tsai@mediatek.com>

When an application performs a large IO, it fills and submits multiple
full bios to the block layer. Referring to commit 07173c3ec276
("block: enable multipage bvecs"), the full bio size is no longer fixed
at 1MB but can vary based on the physical memory layout.

The size of the full bio no longer aligns with the maximum IO size of
the queue. Therefore, in a 64MB read, you may see many unaligned bios
being submitted.

Executing the command to perform a 64MB read:

	dd if=/data/test_file of=/dev/null bs=64m count=1 iflag=direct

It demonstrates the submission of numerous unaligned bios:

	block_bio_queue: 254,52 R 2933336 + 2136
	block_bio_queue: 254,52 R 2935472 + 2152
	block_bio_queue: 254,52 R 2937624 + 2128
	block_bio_queue: 254,52 R 2939752 + 2160

This patch limits the number of extract pages to ensure that we submit
the bio once we fill enough pages, preventing the block layer from
spliting small I/Os in between.

I performed the Antutu V10 Storage Test on a UFS 4.0 device, which
resulted in a significant improvement in the Sequential test:

Sequential Read (average of 5 rounds):
Original: 3033.7 MB/sec
Patched: 3520.9 MB/sec

Sequential Write (average of 5 rounds):
Original: 2225.4 MB/sec
Patched: 2800.3 MB/sec

Link: https://lore.kernel.org/linux-arm-kernel/20231025092255.27930-1-ed.tsai@mediatek.com/
Signed-off-by: Ed Tsai <ed.tsai@mediatek.com>

---
 block/bio.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 816d412c06e9..8d3a112e68da 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1227,8 +1227,10 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	iov_iter_extraction_t extraction_flags = 0;
 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
+	struct block_device *bdev = bio->bi_bdev;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
+	ssize_t max_extract = UINT_MAX - bio->bi_iter.bi_size;
 	ssize_t size, left;
 	unsigned len, i = 0;
 	size_t offset;
@@ -1242,7 +1244,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-	if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
+	if (bdev && blk_queue_pci_p2pdma(bdev->bd_disk->queue))
 		extraction_flags |= ITER_ALLOW_P2PDMA;
 
 	/*
@@ -1252,16 +1254,21 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * result to ensure the bio's total size is correct. The remainder of
 	 * the iov data will be picked up in the next bio iteration.
 	 */
-	size = iov_iter_extract_pages(iter, &pages,
-				      UINT_MAX - bio->bi_iter.bi_size,
+	if (bdev && bio_op(bio) != REQ_OP_ZONE_APPEND) {
+		unsigned int max = queue_max_bytes(bdev_get_queue(bdev));
+
+		max_extract = bio->bi_iter.bi_size ?
+			max - bio->bi_iter.bi_size & (max - 1) : max;
+	}
+	size = iov_iter_extract_pages(iter, &pages, max_extract,
 				      nr_pages, extraction_flags, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
 	nr_pages = DIV_ROUND_UP(offset + size, PAGE_SIZE);
 
-	if (bio->bi_bdev) {
-		size_t trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1);
+	if (bdev) {
+		size_t trim = size & (bdev_logical_block_size(bdev) - 1);
 		iov_iter_revert(iter, trim);
 		size -= trim;
 	}
-- 
2.18.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2023-11-10  6:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-10  5:19 ed.tsai [this message]
2023-11-10  5:19 ` [PATCH v2] block: limit the extract size to align queue limit ed.tsai
2023-11-10  6:36 ` Christoph Hellwig
2023-11-10  6:36   ` Christoph Hellwig
2023-11-10 10:59 ` kernel test robot
2023-11-10 10:59   ` kernel test robot
2023-11-10 16:03 ` Ming Lei
2023-11-10 16:03   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231110051950.21972-1-ed.tsai@mediatek.com \
    --to=ed.tsai@mediatek.com \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=axboe@kernel.dk \
    --cc=casper.li@mediatek.com \
    --cc=chun-hung.wu@mediatek.com \
    --cc=hch@lst.de \
    --cc=light.hsieh@mediatek.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=matthias.bgg@gmail.com \
    --cc=ming.lei@redhat.com \
    --cc=will.shiu@mediatek.com \
    --cc=wsd_upstream@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.