From: Ming Lei <ming.lei@redhat.com>
To: Maxim Mikityanskiy <maxtram95@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
Jens Axboe <axboe@kernel.dk>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@lst.de>,
linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Revert "block: simplify set_init_blocksize" to regain lost performance
Date: Wed, 27 Jan 2021 23:18:38 +0800 [thread overview]
Message-ID: <20210127151838.GA1325688@T590> (raw)
In-Reply-To: <CAKErNvpCdTvg-Bx-U+k3jYiazoz-Pr0LwruaSh+LszH9yP5c8A@mail.gmail.com>
On Wed, Jan 27, 2021 at 09:44:50AM +0200, Maxim Mikityanskiy wrote:
> On Wed, Jan 27, 2021 at 6:23 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 1/26/21 11:59 AM, Maxim Mikityanskiy wrote:
> > > The cited commit introduced a serious regression with SATA write speed,
> > > as found by bisecting. This patch reverts this commit, which restores
> > > write speed back to the values observed before this commit.
> > >
> > > The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs
> > > (WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a
> > > single HDD, the rest are different RAID levels built over the first
> > > partitions of 4 HDDs. Test results are in MB/s, R is read, W is write.
> > >
> > > | Direct | RAID0 | RAID10 f2 | RAID10 n2 | RAID6
> > > ----------------+--------+-------+-----------+-----------+--------
> > > 9011495c9466 | R:256 | R:313 | R:276 | R:313 | R:323
> > > (before faulty) | W:254 | W:253 | W:195 | W:204 | W:117
> > > ----------------+--------+-------+-----------+-----------+--------
> > > 5ff9f19231a0 | R:257 | R:398 | R:312 | R:344 | R:391
> > > (faulty commit) | W:154 | W:122 | W:67.7 | W:66.6 | W:67.2
> > > ----------------+--------+-------+-----------+-----------+--------
> > > 5.10.10 | R:256 | R:401 | R:312 | R:356 | R:375
> > > unpatched | W:149 | W:123 | W:64 | W:64.1 | W:61.5
> > > ----------------+--------+-------+-----------+-----------+--------
> > > 5.10.10 | R:255 | R:396 | R:312 | R:340 | R:393
> > > patched | W:247 | W:274 | W:220 | W:225 | W:121
> > >
> > > Applying this patch doesn't hurt read performance, while improves the
> > > write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed
> > > is restored back to the state before the faulty commit, and even a bit
> > > higher in RAID tests (which aren't HDD-bound on this device) - that is
> > > likely related to other optimizations done between the faulty commit and
> > > 5.10.10 which also improved the read speed.
> > >
> > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
> > > Fixes: 5ff9f19231a0 ("block: simplify set_init_blocksize")
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Jens Axboe <axboe@kernel.dk>
> > > ---
> > > fs/block_dev.c | 10 +++++++++-
> > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > > index 3b8963e228a1..235b5042672e 100644
> > > --- a/fs/block_dev.c
> > > +++ b/fs/block_dev.c
> > > @@ -130,7 +130,15 @@ EXPORT_SYMBOL(truncate_bdev_range);
> > >
> > > static void set_init_blocksize(struct block_device *bdev)
> > > {
> > > - bdev->bd_inode->i_blkbits = blksize_bits(bdev_logical_block_size(bdev));
> > > + unsigned int bsize = bdev_logical_block_size(bdev);
> > > + loff_t size = i_size_read(bdev->bd_inode);
> > > +
> > > + while (bsize < PAGE_SIZE) {
> > > + if (size & bsize)
> > > + break;
> > > + bsize <<= 1;
> > > + }
> > > + bdev->bd_inode->i_blkbits = blksize_bits(bsize);
> > > }
> > >
> > > int set_blocksize(struct block_device *bdev, int size)
> >
> > How can this patch affect write speed? I haven't found any calls of
> > set_init_blocksize() in the I/O path. Did I perhaps overlook something?
>
> I don't know the exact mechanism how this change affects the speed,
> I'm not an expert in the block device subsystem (I'm a networking
> guy). This commit was found by git bisect, and my performance test
> confirmed that reverting it fixes the bug.
>
> It looks to me as this function sets the block size as part of control
> flow, and this size is used later in the fast path, and the commit
> that removed the loop decreased this block size.
Right, the issue is stupid __block_write_full_page() which submits single bio
for each buffer head. And I have tried to improve the situation by merging
BHs into single bio, see below patch:
https://lore.kernel.org/linux-block/20201230000815.3448707-1-ming.lei@redhat.com/
The above patch should improve perf for your test case.
--
Ming
next prev parent reply other threads:[~2021-01-27 15:21 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-26 19:59 [PATCH] Revert "block: simplify set_init_blocksize" to regain lost performance Maxim Mikityanskiy
2021-01-27 4:23 ` Bart Van Assche
2021-01-27 7:44 ` Maxim Mikityanskiy
2021-01-27 15:18 ` Ming Lei [this message]
2021-01-27 16:12 ` Christoph Hellwig
2021-01-27 16:15 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210127151838.GA1325688@T590 \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maxtram95@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).