All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid5 Batching Question
@ 2022-02-25  0:17 Logan Gunthorpe
  2022-03-02 23:24 ` Logan Gunthorpe
  0 siblings, 1 reply; 5+ messages in thread
From: Logan Gunthorpe @ 2022-02-25  0:17 UTC (permalink / raw)
  To: linux-raid; +Cc: Shaohua Li, Shaohua Li, Song Liu

Hello,

We've been looking at trying to improve sequential write performance out
of Raid5 on modern hardware. Our profiling so far seems to indicate that
one of the issues is high CPU due handling all the stripe heads, one for
each page. Some investigation shows that Shaohua already added a
batching feature back in 2015 which seems like it is exactly what we need.

However, after adding some additional debug prints we're not seeing any
batching occurring in our basic testing and I find myself rather
confused by the code.

I see that batches are supposed to be created at the end of
add_stripe_bio() with a call to stripe_add_to_batch_list(). But in our
testing stripe_can_batch() never returns true.

stripe_can_batch() calls is_full_stripe_write() which returns the
following formula:

  overwrite_disks == (disks - max_degraded)

In our simple tests, disks is 3 and this is raid5 so max_degraded is 1.
However, overwrite_disks is also always 1. So, 1 != (3-1) and
is_full_stripe_write() always seems to return false.

overwrite_disks appears to be incremented on the stripe only once
earlier in add_stripe_bio() after seeming to check if all sectors in the
page are being written. But I don't see how overwrite_disks would ever
be 2 for a single stripe.

What am I missing? How can I ensure batches are being used with large
sequential writes?

Thanks,

Logan




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-04 14:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25  0:17 Raid5 Batching Question Logan Gunthorpe
2022-03-02 23:24 ` Logan Gunthorpe
2022-03-03  1:40   ` Guoqing Jiang
2022-03-03 16:20     ` Logan Gunthorpe
2022-03-04 14:21       ` o1bigtenor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.