All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid5 Batching Question
@ 2022-02-25  0:17 Logan Gunthorpe
  2022-03-02 23:24 ` Logan Gunthorpe
  0 siblings, 1 reply; 5+ messages in thread
From: Logan Gunthorpe @ 2022-02-25  0:17 UTC (permalink / raw)
  To: linux-raid; +Cc: Shaohua Li, Shaohua Li, Song Liu

Hello,

We've been looking at trying to improve sequential write performance out
of Raid5 on modern hardware. Our profiling so far seems to indicate that
one of the issues is high CPU due handling all the stripe heads, one for
each page. Some investigation shows that Shaohua already added a
batching feature back in 2015 which seems like it is exactly what we need.

However, after adding some additional debug prints we're not seeing any
batching occurring in our basic testing and I find myself rather
confused by the code.

I see that batches are supposed to be created at the end of
add_stripe_bio() with a call to stripe_add_to_batch_list(). But in our
testing stripe_can_batch() never returns true.

stripe_can_batch() calls is_full_stripe_write() which returns the
following formula:

  overwrite_disks == (disks - max_degraded)

In our simple tests, disks is 3 and this is raid5 so max_degraded is 1.
However, overwrite_disks is also always 1. So, 1 != (3-1) and
is_full_stripe_write() always seems to return false.

overwrite_disks appears to be incremented on the stripe only once
earlier in add_stripe_bio() after seeming to check if all sectors in the
page are being written. But I don't see how overwrite_disks would ever
be 2 for a single stripe.

What am I missing? How can I ensure batches are being used with large
sequential writes?

Thanks,

Logan




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid5 Batching Question
  2022-02-25  0:17 Raid5 Batching Question Logan Gunthorpe
@ 2022-03-02 23:24 ` Logan Gunthorpe
  2022-03-03  1:40   ` Guoqing Jiang
  0 siblings, 1 reply; 5+ messages in thread
From: Logan Gunthorpe @ 2022-03-02 23:24 UTC (permalink / raw)
  To: linux-raid; +Cc: Shaohua Li, Shaohua Li, Song Liu



On 2022-02-24 5:17 p.m., Logan Gunthorpe wrote:
> Hello,
> 
> We've been looking at trying to improve sequential write performance out
> of Raid5 on modern hardware. Our profiling so far seems to indicate that
> one of the issues is high CPU due handling all the stripe heads, one for
> each page. Some investigation shows that Shaohua already added a
> batching feature back in 2015 which seems like it is exactly what we need.
> 
> However, after adding some additional debug prints we're not seeing any
> batching occurring in our basic testing and I find myself rather
> confused by the code.
> 
> I see that batches are supposed to be created at the end of
> add_stripe_bio() with a call to stripe_add_to_batch_list(). But in our
> testing stripe_can_batch() never returns true.
> 
> stripe_can_batch() calls is_full_stripe_write() which returns the
> following formula:
> 
>   overwrite_disks == (disks - max_degraded)
> 
> In our simple tests, disks is 3 and this is raid5 so max_degraded is 1.
> However, overwrite_disks is also always 1. So, 1 != (3-1) and
> is_full_stripe_write() always seems to return false.
> 
> overwrite_disks appears to be incremented on the stripe only once
> earlier in add_stripe_bio() after seeming to check if all sectors in the
> page are being written. But I don't see how overwrite_disks would ever
> be 2 for a single stripe.
> 
> What am I missing? How can I ensure batches are being used with large
> sequential writes?

Replying to myself:

Looks like batching is happening for me, it's just that when I tried to
test it and trace it through, the chunk size was too large and the
amount of data was too low such that there was not enough to fill up the
other disks in the chunk.

When running larger jobs batching seems to work correctly, but with
larger chunks stripe heads might be handled before the rest of the batch
is added.

So I'm back to square one trying to find some performance improvements
on the write path.

Thanks,

Logan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid5 Batching Question
  2022-03-02 23:24 ` Logan Gunthorpe
@ 2022-03-03  1:40   ` Guoqing Jiang
  2022-03-03 16:20     ` Logan Gunthorpe
  0 siblings, 1 reply; 5+ messages in thread
From: Guoqing Jiang @ 2022-03-03  1:40 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-raid; +Cc: Shaohua Li, Shaohua Li, Song Liu



On 3/3/22 7:24 AM, Logan Gunthorpe wrote:
> So I'm back to square one trying to find some performance improvements
> on the write path.

Did you try with group_thread_cnt?

/sys/block/md0/md/group_thread_cnt

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid5 Batching Question
  2022-03-03  1:40   ` Guoqing Jiang
@ 2022-03-03 16:20     ` Logan Gunthorpe
  2022-03-04 14:21       ` o1bigtenor
  0 siblings, 1 reply; 5+ messages in thread
From: Logan Gunthorpe @ 2022-03-03 16:20 UTC (permalink / raw)
  To: Guoqing Jiang, linux-raid; +Cc: Shaohua Li, Shaohua Li, Song Liu



On 2022-03-02 6:40 p.m., Guoqing Jiang wrote:
> 
> 
> On 3/3/22 7:24 AM, Logan Gunthorpe wrote:
>> So I'm back to square one trying to find some performance improvements
>> on the write path.
> 
> Did you try with group_thread_cnt?
> 
> /sys/block/md0/md/group_thread_cnt

Yes, I've tried group_thread_cnt and raised stripe_cache_size and a few
other parameters. Still get limited write performance.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid5 Batching Question
  2022-03-03 16:20     ` Logan Gunthorpe
@ 2022-03-04 14:21       ` o1bigtenor
  0 siblings, 0 replies; 5+ messages in thread
From: o1bigtenor @ 2022-03-04 14:21 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Guoqing Jiang, Linux-RAID, Shaohua Li, Shaohua Li, Song Liu

On Thu, Mar 3, 2022 at 6:27 PM Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 2022-03-02 6:40 p.m., Guoqing Jiang wrote:
> >
> >
> > On 3/3/22 7:24 AM, Logan Gunthorpe wrote:
> >> So I'm back to square one trying to find some performance improvements
> >> on the write path.
> >
> > Did you try with group_thread_cnt?
> >
> > /sys/block/md0/md/group_thread_cnt
>
> Yes, I've tried group_thread_cnt and raised stripe_cache_size and a few
> other parameters. Still get limited write performance.
>

Ermmmmmm - - - not speaking as any kind of expert - - - but - - - is this
not where your Raid 10 shines - - - fastest write with the striping and
and and. This may NOT be the case - - - tossing the idea into the ring
(hoping the real 'experts' will chime in!).

HTH

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-04 14:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25  0:17 Raid5 Batching Question Logan Gunthorpe
2022-03-02 23:24 ` Logan Gunthorpe
2022-03-03  1:40   ` Guoqing Jiang
2022-03-03 16:20     ` Logan Gunthorpe
2022-03-04 14:21       ` o1bigtenor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.