All of lore.kernel.org
 help / color / mirror / Atom feed
* Performance Testing MD-RAID10 with 1 failed drive
@ 2022-10-19 19:30 Umang Agarwalla
  2022-10-19 21:00 ` Reindl Harald
  0 siblings, 1 reply; 14+ messages in thread
From: Umang Agarwalla @ 2022-10-19 19:30 UTC (permalink / raw)
  To: linux-raid

Hello all,

We run Linux RAID 10 in our production with 8 SAS HDDs 7200RPM.
We recently got to know from the application owners that the writes on
these machines get affected when there is one failed drive in this
RAID10 setup, but unfortunately we do not have much data around to
prove this and exactly replicate this in production.

Wanted to know from the people of this mailing list if they have ever
come across any such issues.
Theoretically as per my understanding a RAID10 with even a failed
drive should be able to handle all the production traffic without any
issues. Please let me know if my understanding of this is correct or
not.

Also if anyone can help with some links/guides/how-tos about this
would be great.

Thanks,
Umang Agarwalla

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-19 19:30 Performance Testing MD-RAID10 with 1 failed drive Umang Agarwalla
@ 2022-10-19 21:00 ` Reindl Harald
  2022-10-19 21:12   ` Umang Agarwalla
  2022-10-19 21:25   ` Wols Lists
  0 siblings, 2 replies; 14+ messages in thread
From: Reindl Harald @ 2022-10-19 21:00 UTC (permalink / raw)
  To: Umang Agarwalla, linux-raid



Am 19.10.22 um 21:30 schrieb Umang Agarwalla:
> Hello all,
> 
> We run Linux RAID 10 in our production with 8 SAS HDDs 7200RPM.
> We recently got to know from the application owners that the writes on
> these machines get affected when there is one failed drive in this
> RAID10 setup, but unfortunately we do not have much data around to
> prove this and exactly replicate this in production.
> 
> Wanted to know from the people of this mailing list if they have ever
> come across any such issues.
> Theoretically as per my understanding a RAID10 with even a failed
> drive should be able to handle all the production traffic without any
> issues. Please let me know if my understanding of this is correct or
> not.

"without any issue" is nonsense by common sense

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-19 21:00 ` Reindl Harald
@ 2022-10-19 21:12   ` Umang Agarwalla
  2022-10-19 21:25   ` Wols Lists
  1 sibling, 0 replies; 14+ messages in thread
From: Umang Agarwalla @ 2022-10-19 21:12 UTC (permalink / raw)
  To: Reindl Harald; +Cc: linux-raid

Hello Reindl, All

Thanks for your reply. I do understand that. Could you please help me
understand how much hit can the write takes in such a scenario.
Any resources on how to benchmark this ?


On Thu, Oct 20, 2022 at 2:30 AM Reindl Harald <h.reindl@thelounge.net> wrote:
>
>
>
> Am 19.10.22 um 21:30 schrieb Umang Agarwalla:
> > Hello all,
> >
> > We run Linux RAID 10 in our production with 8 SAS HDDs 7200RPM.
> > We recently got to know from the application owners that the writes on
> > these machines get affected when there is one failed drive in this
> > RAID10 setup, but unfortunately we do not have much data around to
> > prove this and exactly replicate this in production.
> >
> > Wanted to know from the people of this mailing list if they have ever
> > come across any such issues.
> > Theoretically as per my understanding a RAID10 with even a failed
> > drive should be able to handle all the production traffic without any
> > issues. Please let me know if my understanding of this is correct or
> > not.
>
> "without any issue" is nonsense by common sense

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-19 21:00 ` Reindl Harald
  2022-10-19 21:12   ` Umang Agarwalla
@ 2022-10-19 21:25   ` Wols Lists
  2022-10-19 22:56     ` Reindl Harald
  2022-10-19 23:23     ` Roger Heflin
  1 sibling, 2 replies; 14+ messages in thread
From: Wols Lists @ 2022-10-19 21:25 UTC (permalink / raw)
  To: Reindl Harald, Umang Agarwalla, linux-raid

On 19/10/2022 22:00, Reindl Harald wrote:
> 
> 
> Am 19.10.22 um 21:30 schrieb Umang Agarwalla:
>> Hello all,
>>
>> We run Linux RAID 10 in our production with 8 SAS HDDs 7200RPM.
>> We recently got to know from the application owners that the writes on
>> these machines get affected when there is one failed drive in this
>> RAID10 setup, but unfortunately we do not have much data around to
>> prove this and exactly replicate this in production.
>>
>> Wanted to know from the people of this mailing list if they have ever
>> come across any such issues.
>> Theoretically as per my understanding a RAID10 with even a failed
>> drive should be able to handle all the production traffic without any
>> issues. Please let me know if my understanding of this is correct or
>> not.
> 
> "without any issue" is nonsense by common sense

No need for the sark. And why shouldn't it be "without any issue"? 
Common sense is usually mistaken. And common sense says to me the exact 
opposite - with a drive missing that's one fewer write, so if anything 
it should be quicker.

Given that - on the system my brother was using - the ops guys didn't 
notice their raid-6 was missing TWO drives, it seems like lost drives 
aren't particularly noticeable by their absence ...

Okay, with a drive missing it's DANGEROUS, but it should not have any 
noticeable impact on a production system until you replace the drive and 
it's rebuilding.

Unfortunately, I don't know enough to say whether a missing drive would, 
or should, impact performance.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-19 21:25   ` Wols Lists
@ 2022-10-19 22:56     ` Reindl Harald
  2022-10-19 23:23     ` Roger Heflin
  1 sibling, 0 replies; 14+ messages in thread
From: Reindl Harald @ 2022-10-19 22:56 UTC (permalink / raw)
  To: Wols Lists, Umang Agarwalla, linux-raid



Am 19.10.22 um 23:25 schrieb Wols Lists:
> On 19/10/2022 22:00, Reindl Harald wrote:
>>
>>
>> Am 19.10.22 um 21:30 schrieb Umang Agarwalla:
>>> Hello all,
>>>
>>> We run Linux RAID 10 in our production with 8 SAS HDDs 7200RPM.
>>> We recently got to know from the application owners that the writes on
>>> these machines get affected when there is one failed drive in this
>>> RAID10 setup, but unfortunately we do not have much data around to
>>> prove this and exactly replicate this in production.
>>>
>>> Wanted to know from the people of this mailing list if they have ever
>>> come across any such issues.
>>> Theoretically as per my understanding a RAID10 with even a failed
>>> drive should be able to handle all the production traffic without any
>>> issues. Please let me know if my understanding of this is correct or
>>> not.
>>
>> "without any issue" is nonsense by common sense
> 
> No need for the sark. And why shouldn't it be "without any issue"? 
> Common sense is usually mistaken. And common sense says to me the exact 
> opposite 

your common sense told me you can change RAID10 to RAID1 because it's 
only a metadata change not long ago

your same common sense told me years ago the "writemostly" on RAID10 
don't work as in RAID1 because RAID10 on mdraid is not the same as 
mirrored RAID0 but the same common sense pretended a few weaks ago the 
opposite

so your common sense is incompatible with my common sense and yours is 
based on assumptions while mine don#t need foreign assumptions and 
guessings because it can do them alone

in other words: keep your assumptions for your own and respond only to 
things you *know* and you have done in real life

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-19 21:25   ` Wols Lists
  2022-10-19 22:56     ` Reindl Harald
@ 2022-10-19 23:23     ` Roger Heflin
  2022-10-20  6:43       ` Umang Agarwalla
  1 sibling, 1 reply; 14+ messages in thread
From: Roger Heflin @ 2022-10-19 23:23 UTC (permalink / raw)
  To: Wols Lists; +Cc: Reindl Harald, Umang Agarwalla, linux-raid

Is the  drive completely  failed out of the raid10?

With a drive missing I would only expect read issues, but if the read
load is high enough that it really needs both disks for the read load,
then that would cause the writes to be slower if the total IO
(read+write load) is overloading the disks.

With 7200 rpm disks you can do a max of about 100-150 seeks and/or
IOPS per second on each disk, any more than that and all performance
on the disks will start to back up.   It will be worse if the
application is writing sync to the disks (app guys love sync but fail
to understand how it interacts with spinning disk hardware).

Sar -d will show the disks and the tps (iops) and the wait time (7200
disk has seek time of around 5-8ms).   It will also show similar stats
on the md device itself.  If the device is getting backed up that
means that app guys failed to understand the ability of the hardware
and what their application needs.

On Wed, Oct 19, 2022 at 5:11 PM Wols Lists <antlists@youngman.org.uk> wrote:
>
> On 19/10/2022 22:00, Reindl Harald wrote:
> >
> >
> > Am 19.10.22 um 21:30 schrieb Umang Agarwalla:
> >> Hello all,
> >>
> >> We run Linux RAID 10 in our production with 8 SAS HDDs 7200RPM.
> >> We recently got to know from the application owners that the writes on
> >> these machines get affected when there is one failed drive in this
> >> RAID10 setup, but unfortunately we do not have much data around to
> >> prove this and exactly replicate this in production.
> >>
> >> Wanted to know from the people of this mailing list if they have ever
> >> come across any such issues.
> >> Theoretically as per my understanding a RAID10 with even a failed
> >> drive should be able to handle all the production traffic without any
> >> issues. Please let me know if my understanding of this is correct or
> >> not.
> >
> > "without any issue" is nonsense by common sense
>
> No need for the sark. And why shouldn't it be "without any issue"?
> Common sense is usually mistaken. And common sense says to me the exact
> opposite - with a drive missing that's one fewer write, so if anything
> it should be quicker.
>
> Given that - on the system my brother was using - the ops guys didn't
> notice their raid-6 was missing TWO drives, it seems like lost drives
> aren't particularly noticeable by their absence ...
>
> Okay, with a drive missing it's DANGEROUS, but it should not have any
> noticeable impact on a production system until you replace the drive and
> it's rebuilding.
>
> Unfortunately, I don't know enough to say whether a missing drive would,
> or should, impact performance.
>
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-19 23:23     ` Roger Heflin
@ 2022-10-20  6:43       ` Umang Agarwalla
  2022-10-21  0:14         ` Andy Smith
  0 siblings, 1 reply; 14+ messages in thread
From: Umang Agarwalla @ 2022-10-20  6:43 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Wols Lists, Reindl Harald, linux-raid

Hello Roger, All,

Thanks for your response.

Yes, the scenario is when the drive completely fails out of raid10. I
know it's not right to approach to run an array with a failed drive.
But what I am trying to understand is, how to benchmark the
performance hit in such a condition.
It's always a priority for us to get the failed drive replaced.

We run kafka brokers on these machines to be specific on the type of
the workload it is handling.

On Thu, Oct 20, 2022 at 4:54 AM Roger Heflin <rogerheflin@gmail.com> wrote:
>
> Is the  drive completely  failed out of the raid10?
>
> With a drive missing I would only expect read issues, but if the read
> load is high enough that it really needs both disks for the read load,
> then that would cause the writes to be slower if the total IO
> (read+write load) is overloading the disks.
>
> With 7200 rpm disks you can do a max of about 100-150 seeks and/or
> IOPS per second on each disk, any more than that and all performance
> on the disks will start to back up.   It will be worse if the
> application is writing sync to the disks (app guys love sync but fail
> to understand how it interacts with spinning disk hardware).
>
> Sar -d will show the disks and the tps (iops) and the wait time (7200
> disk has seek time of around 5-8ms).   It will also show similar stats
> on the md device itself.  If the device is getting backed up that
> means that app guys failed to understand the ability of the hardware
> and what their application needs.
>
> On Wed, Oct 19, 2022 at 5:11 PM Wols Lists <antlists@youngman.org.uk> wrote:
> >
> > On 19/10/2022 22:00, Reindl Harald wrote:
> > >
> > >
> > > Am 19.10.22 um 21:30 schrieb Umang Agarwalla:
> > >> Hello all,
> > >>
> > >> We run Linux RAID 10 in our production with 8 SAS HDDs 7200RPM.
> > >> We recently got to know from the application owners that the writes on
> > >> these machines get affected when there is one failed drive in this
> > >> RAID10 setup, but unfortunately we do not have much data around to
> > >> prove this and exactly replicate this in production.
> > >>
> > >> Wanted to know from the people of this mailing list if they have ever
> > >> come across any such issues.
> > >> Theoretically as per my understanding a RAID10 with even a failed
> > >> drive should be able to handle all the production traffic without any
> > >> issues. Please let me know if my understanding of this is correct or
> > >> not.
> > >
> > > "without any issue" is nonsense by common sense
> >
> > No need for the sark. And why shouldn't it be "without any issue"?
> > Common sense is usually mistaken. And common sense says to me the exact
> > opposite - with a drive missing that's one fewer write, so if anything
> > it should be quicker.
> >
> > Given that - on the system my brother was using - the ops guys didn't
> > notice their raid-6 was missing TWO drives, it seems like lost drives
> > aren't particularly noticeable by their absence ...
> >
> > Okay, with a drive missing it's DANGEROUS, but it should not have any
> > noticeable impact on a production system until you replace the drive and
> > it's rebuilding.
> >
> > Unfortunately, I don't know enough to say whether a missing drive would,
> > or should, impact performance.
> >
> > Cheers,
> > Wol

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-20  6:43       ` Umang Agarwalla
@ 2022-10-21  0:14         ` Andy Smith
  2022-10-21  8:15           ` Pascal Hambourg
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Smith @ 2022-10-21  0:14 UTC (permalink / raw)
  To: linux-raid

Hello,

On Thu, Oct 20, 2022 at 12:13:19PM +0530, Umang Agarwalla wrote:
> But what I am trying to understand is, how to benchmark the
> performance hit in such a condition.

Perhaps you could use dm-dust to make an unreliable block device
from a real device?

    https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-dust.html

1. Create dust device

2. Create an array that includes the dust device

3. Do some work on it while it's in "bypass" mode and benchmark this
   to account for overhead of dm-dust

4. Add some bad sectors, maybe whole device

5. Enable "fail read on bad block" mode

6. Do more work and watch device get kicked out of RAID

7. See if benchmark shows any performance change beyond what you'd
   expect for reduced number of devices

If you have real hardware disks though, can you not just:

# echo offline > /sys/block/$DISK/device/state
# echo 1 > /sys/block/$DISK/device/delete

to power it off mid-operation? (Might need to reboot to get it back after that)

Cheers,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-21  0:14         ` Andy Smith
@ 2022-10-21  8:15           ` Pascal Hambourg
  2022-10-21 10:51             ` Andy Smith
  0 siblings, 1 reply; 14+ messages in thread
From: Pascal Hambourg @ 2022-10-21  8:15 UTC (permalink / raw)
  To: linux-raid

Le 21/10/2022 à 02:14, Andy Smith a écrit :
> 
> On Thu, Oct 20, 2022 at 12:13:19PM +0530, Umang Agarwalla wrote:
>> But what I am trying to understand is, how to benchmark the
>> performance hit in such a condition.
> 
> Perhaps you could use dm-dust to make an unreliable block device
> from a real device?

That seems uselessly complicated to me. What about this ?

- benchmark the array in the clean state
- fail and remove a drive
- benchmark the array in the degraded state
- add a new drive and start the resync
- benchmark the array in the resync state

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-21  8:15           ` Pascal Hambourg
@ 2022-10-21 10:51             ` Andy Smith
  2022-10-21 11:51               ` Roger Heflin
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Smith @ 2022-10-21 10:51 UTC (permalink / raw)
  To: linux-raid

Hello,

On Fri, Oct 21, 2022 at 10:15:42AM +0200, Pascal Hambourg wrote:
> Le 21/10/2022 à 02:14, Andy Smith a écrit :
> > Perhaps you could use dm-dust to make an unreliable block device
> > from a real device?
> 
> That seems uselessly complicated to me.

Well I too do not understand why OP can't just fail one existing
device, but it seemed important to them to experience actual errors
and have it kicked out for that. A half way measure might be the
offline / delete poking I mentioned in /sys/block.

*shrug*

Cheers,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-21 10:51             ` Andy Smith
@ 2022-10-21 11:51               ` Roger Heflin
  2022-10-21 15:24                 ` Andy Smith
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Heflin @ 2022-10-21 11:51 UTC (permalink / raw)
  To: linux-raid

It is likely much simpler.

Using a 2 disks raid 1 array with 100 reads iops and 100 write iops to
the filesystem you would see this with 2 disks:   150 iops per disk
(100 writes + 50 reads), but with one disk only in the raid you see
200 iops/disk (100 reads+100writes) and at that 7200 rpm spinning
disks would be overcapacity. Now with 8 disks the numbers scale up,
but the general idea is still the same.  Once a disk fails then all of
the reads it was handling have to go to the single remaining disk and
that read load could result in that remaining disk not being able to
keep up.

The original poster needs to get sar or iostat stat to see what the
actual io rates are, but if they don't understand what the spinning
disk array can do fully redundant and with a disk failed it is not
unlikely that the IO load is higher than a can be sustained with a
single disk failed.

On Fri, Oct 21, 2022 at 6:34 AM Andy Smith <andy@strugglers.net> wrote:
>
> Hello,
>
> On Fri, Oct 21, 2022 at 10:15:42AM +0200, Pascal Hambourg wrote:
> > Le 21/10/2022 à 02:14, Andy Smith a écrit :
> > > Perhaps you could use dm-dust to make an unreliable block device
> > > from a real device?
> >
> > That seems uselessly complicated to me.
>
> Well I too do not understand why OP can't just fail one existing
> device, but it seemed important to them to experience actual errors
> and have it kicked out for that. A half way measure might be the
> offline / delete poking I mentioned in /sys/block.
>
> *shrug*
>
> Cheers,
> Andy
>
> --
> https://bitfolk.com/ -- No-nonsense VPS hosting

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-21 11:51               ` Roger Heflin
@ 2022-10-21 15:24                 ` Andy Smith
  2022-10-21 16:01                   ` Umang Agarwalla
  2022-10-21 16:53                   ` Roger Heflin
  0 siblings, 2 replies; 14+ messages in thread
From: Andy Smith @ 2022-10-21 15:24 UTC (permalink / raw)
  To: linux-raid

Hello,

On Fri, Oct 21, 2022 at 06:51:41AM -0500, Roger Heflin wrote:
> The original poster needs to get sar or iostat stat to see what the
> actual io rates are, but if they don't understand what the spinning
> disk array can do fully redundant and with a disk failed it is not
> unlikely that the IO load is higher than a can be sustained with a
> single disk failed.

Though OP is using RAID-10 not RAID-1, and with more than 2 devices
IIRC. OP wants to check the performance and I agree they should do
that for both the normal case and the degraded case, but what are we
expecting *in theory*? For RAID-10 on 4 devices we wouldn't expect
much performance hit would we? Since a read is striped across 2
devices and there's a mirror of each so it'll read from the good
half of the mirror for each read IO.

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-21 15:24                 ` Andy Smith
@ 2022-10-21 16:01                   ` Umang Agarwalla
  2022-10-21 16:53                   ` Roger Heflin
  1 sibling, 0 replies; 14+ messages in thread
From: Umang Agarwalla @ 2022-10-21 16:01 UTC (permalink / raw)
  To: linux-raid; +Cc: Roger Heflin

Hello Andy,Roger,Pascal,All

Thanks a lot for your suggestions, yes indeed they are 8 actual HDDs
in a Dell Server made into a near=2 layout Raid10 array.

I will try out all the options you mentioned. My major concern was how
to benchmark this over a longer period of time.
I am not very much into performance testing, and hence wanted to have
some resources to understand how to benchmark this correctly with good
data points to present a case to the application owners.
So will continuous capture of sar and iostat be fine enough to give us
detailed data around it?
I would try out both the ways which you all suggested, manually mark a
drive failed to make it go into a degraded state.
I will also read more on dm-dust

Thanks,
Umang


On Fri, Oct 21, 2022 at 8:59 PM Andy Smith <andy@strugglers.net> wrote:
>
> Hello,
>
> On Fri, Oct 21, 2022 at 06:51:41AM -0500, Roger Heflin wrote:
> > The original poster needs to get sar or iostat stat to see what the
> > actual io rates are, but if they don't understand what the spinning
> > disk array can do fully redundant and with a disk failed it is not
> > unlikely that the IO load is higher than a can be sustained with a
> > single disk failed.
>
> Though OP is using RAID-10 not RAID-1, and with more than 2 devices
> IIRC. OP wants to check the performance and I agree they should do
> that for both the normal case and the degraded case, but what are we
> expecting *in theory*? For RAID-10 on 4 devices we wouldn't expect
> much performance hit would we? Since a read is striped across 2
> devices and there's a mirror of each so it'll read from the good
> half of the mirror for each read IO.
>
> --
> https://bitfolk.com/ -- No-nonsense VPS hosting

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Performance Testing MD-RAID10 with 1 failed drive
  2022-10-21 15:24                 ` Andy Smith
  2022-10-21 16:01                   ` Umang Agarwalla
@ 2022-10-21 16:53                   ` Roger Heflin
  1 sibling, 0 replies; 14+ messages in thread
From: Roger Heflin @ 2022-10-21 16:53 UTC (permalink / raw)
  To: linux-raid

A performance hit or not depends on exactly how high the IO load is.
If the fully redundant array is running at the iops limit for said
devices then any reads suddenly having to be serviced by a single
device will overload the array.   For any IO's that could go to the
2-disk mirror will have to get handled by a single disk now and will
overload that single disk if the IO load is too much.

For the most part the number of devices just increases the IO capacity
(raid-10 performs as a striped raid-1).

Benchmarking it requires knowing detail about the IO load, iops gets
hard to understand when you say have a write cache and have a 4k
blocks that get written and synced at say 100 bytes at at time (400
IOPS to that single block, but will be merged by the write cache.  And
if your defined benchmark differs from your actual load that results
will not be useful for guessing when the real load will break it.  And
if 2 iops are on the same disk track (sequential IO) then if merged
right there will not need to be an expensive seek between them.    And
nothing is write only, a lot of reads of the underlying fs data has to
be done for a write to happen (allocate blocks-bookkeeping-move from
free list, to the file being writtens data), and those reads will be
using the 2 disk mirror that has a failed disk and all reads are now
being handled by a single device.

If you had the total iops and/or sar data from a few minutes when it
was overloading (the LV's, md* sd* devices) for a few minutes you
could probably see it.  Generally it is almost impossible to get the
benchmark "right" such that it will be useful for telling when the
application will overload the disk devices.

I troubleshoot  a lot of DB io load "issues" and said DB's are all
running the same application code, but each has slightly different
underlying workloads and can look significantly different and overload
the underlying disk array in very different ways, depending on either
what the DB is doing wrong, or how the clients is doing queries and/or
defineds their workflows.

The give way is watching the await times, and %util numbers.

On Fri, Oct 21, 2022 at 10:30 AM Andy Smith <andy@strugglers.net> wrote:
>
> Hello,
>
> On Fri, Oct 21, 2022 at 06:51:41AM -0500, Roger Heflin wrote:
> > The original poster needs to get sar or iostat stat to see what the
> > actual io rates are, but if they don't understand what the spinning
> > disk array can do fully redundant and with a disk failed it is not
> > unlikely that the IO load is higher than a can be sustained with a
> > single disk failed.
>
> Though OP is using RAID-10 not RAID-1, and with more than 2 devices
> IIRC. OP wants to check the performance and I agree they should do
> that for both the normal case and the degraded case, but what are we
> expecting *in theory*? For RAID-10 on 4 devices we wouldn't expect
> much performance hit would we? Since a read is striped across 2
> devices and there's a mirror of each so it'll read from the good
> half of the mirror for each read IO.
>
> --
> https://bitfolk.com/ -- No-nonsense VPS hosting

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-21 16:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-19 19:30 Performance Testing MD-RAID10 with 1 failed drive Umang Agarwalla
2022-10-19 21:00 ` Reindl Harald
2022-10-19 21:12   ` Umang Agarwalla
2022-10-19 21:25   ` Wols Lists
2022-10-19 22:56     ` Reindl Harald
2022-10-19 23:23     ` Roger Heflin
2022-10-20  6:43       ` Umang Agarwalla
2022-10-21  0:14         ` Andy Smith
2022-10-21  8:15           ` Pascal Hambourg
2022-10-21 10:51             ` Andy Smith
2022-10-21 11:51               ` Roger Heflin
2022-10-21 15:24                 ` Andy Smith
2022-10-21 16:01                   ` Umang Agarwalla
2022-10-21 16:53                   ` Roger Heflin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.