All of lore.kernel.org
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: linux-raid@vger.kernel.org
Cc: Shaohua Li <shli@kernel.org>, Shaohua Li <shli@fb.com>,
	Song Liu <song@kernel.org>
Subject: Re: Raid5 Batching Question
Date: Wed, 2 Mar 2022 16:24:55 -0700	[thread overview]
Message-ID: <34a8b64a-37d3-9966-1fe8-d57c432600d7@deltatee.com> (raw)
In-Reply-To: <11cfe3aa-b778-b3e5-a152-50abc6c054ac@deltatee.com>



On 2022-02-24 5:17 p.m., Logan Gunthorpe wrote:
> Hello,
> 
> We've been looking at trying to improve sequential write performance out
> of Raid5 on modern hardware. Our profiling so far seems to indicate that
> one of the issues is high CPU due handling all the stripe heads, one for
> each page. Some investigation shows that Shaohua already added a
> batching feature back in 2015 which seems like it is exactly what we need.
> 
> However, after adding some additional debug prints we're not seeing any
> batching occurring in our basic testing and I find myself rather
> confused by the code.
> 
> I see that batches are supposed to be created at the end of
> add_stripe_bio() with a call to stripe_add_to_batch_list(). But in our
> testing stripe_can_batch() never returns true.
> 
> stripe_can_batch() calls is_full_stripe_write() which returns the
> following formula:
> 
>   overwrite_disks == (disks - max_degraded)
> 
> In our simple tests, disks is 3 and this is raid5 so max_degraded is 1.
> However, overwrite_disks is also always 1. So, 1 != (3-1) and
> is_full_stripe_write() always seems to return false.
> 
> overwrite_disks appears to be incremented on the stripe only once
> earlier in add_stripe_bio() after seeming to check if all sectors in the
> page are being written. But I don't see how overwrite_disks would ever
> be 2 for a single stripe.
> 
> What am I missing? How can I ensure batches are being used with large
> sequential writes?

Replying to myself:

Looks like batching is happening for me, it's just that when I tried to
test it and trace it through, the chunk size was too large and the
amount of data was too low such that there was not enough to fill up the
other disks in the chunk.

When running larger jobs batching seems to work correctly, but with
larger chunks stripe heads might be handled before the rest of the batch
is added.

So I'm back to square one trying to find some performance improvements
on the write path.

Thanks,

Logan


  reply	other threads:[~2022-03-02 23:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-25  0:17 Raid5 Batching Question Logan Gunthorpe
2022-03-02 23:24 ` Logan Gunthorpe [this message]
2022-03-03  1:40   ` Guoqing Jiang
2022-03-03 16:20     ` Logan Gunthorpe
2022-03-04 14:21       ` o1bigtenor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34a8b64a-37d3-9966-1fe8-d57c432600d7@deltatee.com \
    --to=logang@deltatee.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=shli@kernel.org \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.