[linux-lvm] LVM commands extremely slow during raid check/resync

All of lore.kernel.org
 help / color / mirror / Atom feed

* [linux-lvm] LVM commands extremely slow during raid check/resync
@ 2012-03-25  7:56 Larkin Lowrey
  2012-03-26 13:02 ` James Candelaria
  2012-03-26 20:55 ` Ray Morris
  0 siblings, 2 replies; 14+ messages in thread
From: Larkin Lowrey @ 2012-03-25  7:56 UTC (permalink / raw)
  To: linux-lvm

I've been suffering from an extreme slowdown of the various lvm commands
during high I/O load ever since updating from Fedora 15 to 16.

I notice this particularly Sunday AMs when Fedora kicks of a raid-check.
What is normally near instantaneous, commands like lvs and lvcreate
--snapshot take minutes to complete (literally). This causes my backup
jobs to timeout and fail.

While all this is going on, the various filesystems are reasonably
responsive (considering the raid-check is running) and I can read/write
to files without problems. It seems that this slow-down is unique to lvm.

I have three raid 5 arrays of 8, 6, and 6 drives. The root fs sits
entirely within the 8 disk array as does the spare area used for snapshots.

Interestingly, perhaps, if I can coax a backup into running, the lvs
command, for example, will complete in just 15-30 seconds instead of
120-180s. It would seem that the random I/O of the backup is able to
break things up enough for the lvm commands to squeeze in.

I'm at a loss for what to do about this or what data to scan for clues.
Any suggestions?

kernel 3.2.10-3.fc16.x86_64

lvm> version
  LVM version:     2.02.86(2) (2011-07-08)
  Library version: 1.02.65 (2011-07-08)
  Driver version:  4.22.0

--Larkin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-25  7:56 [linux-lvm] LVM commands extremely slow during raid check/resync Larkin Lowrey
@ 2012-03-26 13:02 ` James Candelaria
  2012-03-26 17:49   ` Larkin Lowrey
  2012-03-26 20:55 ` Ray Morris
  1 sibling, 1 reply; 14+ messages in thread
From: James Candelaria @ 2012-03-26 13:02 UTC (permalink / raw)
  To: LVM general discussion and development

Larkin,

This is likely due to the way the scheduler and MD driver are interacting.  The MD driver has likely filled the queue of the devices pretty deep with read requests, then when you attempt to run a LVM command such as lvcreate your request gets inserted.  You must "wait-in-line" for this command (likely only a few sectors worth of IO) to get its turn on the media.  The MD driver realizes that there is another request to the media and slows itself briefly, but since there is no further IO after the LVM command it then goes back to its job of resyncing your array in earnest.

James

-----Original Message-----
From: linux-lvm-bounces@redhat.com [mailto:linux-lvm-bounces@redhat.com] On Behalf Of Larkin Lowrey
Sent: Sunday, March 25, 2012 3:56 AM
To: linux-lvm@redhat.com
Subject: [linux-lvm] LVM commands extremely slow during raid check/resync

I've been suffering from an extreme slowdown of the various lvm commands
during high I/O load ever since updating from Fedora 15 to 16.

I notice this particularly Sunday AMs when Fedora kicks of a raid-check.
What is normally near instantaneous, commands like lvs and lvcreate
--snapshot take minutes to complete (literally). This causes my backup
jobs to timeout and fail.

While all this is going on, the various filesystems are reasonably
responsive (considering the raid-check is running) and I can read/write
to files without problems. It seems that this slow-down is unique to lvm.

I have three raid 5 arrays of 8, 6, and 6 drives. The root fs sits
entirely within the 8 disk array as does the spare area used for snapshots.

Interestingly, perhaps, if I can coax a backup into running, the lvs
command, for example, will complete in just 15-30 seconds instead of
120-180s. It would seem that the random I/O of the backup is able to
break things up enough for the lvm commands to squeeze in.

I'm at a loss for what to do about this or what data to scan for clues.
Any suggestions?

kernel 3.2.10-3.fc16.x86_64

lvm> version
  LVM version:     2.02.86(2) (2011-07-08)
  Library version: 1.02.65 (2011-07-08)
  Driver version:  4.22.0

--Larkin

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-26 13:02 ` James Candelaria
@ 2012-03-26 17:49   ` Larkin Lowrey
  0 siblings, 0 replies; 14+ messages in thread
From: Larkin Lowrey @ 2012-03-26 17:49 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: James Candelaria

Thank you for the reply.

I/O does slow down during the raid check and that's to be expected, but
two minutes to create  a snapshot seems excessive. I only experience a
few seconds delay when modifying and copying files. It is hard to
imagine how those operations can complete in seconds while lvs and
lvcreate take minutes. I suppose if lvm was issuing a lot of fsyncs then
a delay of that magnitude could be expected.

This phenomenon did not occur when I was still running Fedora 15 so
something changed.

--Larkin

On 3/26/2012 8:02 AM, James Candelaria wrote:
> Larkin,
>
> This is likely due to the way the scheduler and MD driver are interacting.  The MD driver has likely filled the queue of the devices pretty deep with read requests, then when you attempt to run a LVM command such as lvcreate your request gets inserted.  You must "wait-in-line" for this command (likely only a few sectors worth of IO) to get its turn on the media.  The MD driver realizes that there is another request to the media and slows itself briefly, but since there is no further IO after the LVM command it then goes back to its job of resyncing your array in earnest.
>
> James
>
>
> -----Original Message-----
> From: linux-lvm-bounces@redhat.com [mailto:linux-lvm-bounces@redhat.com] On Behalf Of Larkin Lowrey
> Sent: Sunday, March 25, 2012 3:56 AM
> To: linux-lvm@redhat.com
> Subject: [linux-lvm] LVM commands extremely slow during raid check/resync
>
> I've been suffering from an extreme slowdown of the various lvm commands
> during high I/O load ever since updating from Fedora 15 to 16.
>
> I notice this particularly Sunday AMs when Fedora kicks of a raid-check.
> What is normally near instantaneous, commands like lvs and lvcreate
> --snapshot take minutes to complete (literally). This causes my backup
> jobs to timeout and fail.
>
> While all this is going on, the various filesystems are reasonably
> responsive (considering the raid-check is running) and I can read/write
> to files without problems. It seems that this slow-down is unique to lvm.
>
> I have three raid 5 arrays of 8, 6, and 6 drives. The root fs sits
> entirely within the 8 disk array as does the spare area used for snapshots.
>
> Interestingly, perhaps, if I can coax a backup into running, the lvs
> command, for example, will complete in just 15-30 seconds instead of
> 120-180s. It would seem that the random I/O of the backup is able to
> break things up enough for the lvm commands to squeeze in.
>
> I'm at a loss for what to do about this or what data to scan for clues.
> Any suggestions?
>
> kernel 3.2.10-3.fc16.x86_64
>
> lvm> version
>   LVM version:     2.02.86(2) (2011-07-08)
>   Library version: 1.02.65 (2011-07-08)
>   Driver version:  4.22.0
>
> --Larkin
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-25  7:56 [linux-lvm] LVM commands extremely slow during raid check/resync Larkin Lowrey
  2012-03-26 13:02 ` James Candelaria
@ 2012-03-26 20:55 ` Ray Morris
  2012-03-26 23:51   ` Larkin Lowrey
  2012-03-27 14:31   ` Zdenek Kabelac
  1 sibling, 2 replies; 14+ messages in thread
From: Ray Morris @ 2012-03-26 20:55 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: llowrey

Put -vvvv on the command and see what takes so long. In our case, 
it was checking all of the devices to see if they were PVs.
"All devices" includes LVs, so it was checking LVs to see if they
were PVs, and activating an LV triggered a scan in case it was 
a PV, so activating a volume group was especially slow (hours).
The solution was to use "filter" in lvm.conf like this:

filter = [ "r|^/dev/dm.*|", "r|^/dev/vg-.*|","a|^/dev/sd*|", "a|^/dev/md*|", "r|.*|" ]

That checks only /dev/sd* and /dev/md*, to see if they are PVs, 
skipping the checks of LVs to see if they are also PVs. Since the
device list is cached, use vgscan -vvvv to check that it's checking 
the right things and maybe delete that cache first. My rule IS 
a bit redundant because I had trouble getting the simpler form 
to do what I wanted. I ended up using a belt and suspenders 
approach, specifying both "do not scan my LVs" and "scan only
/dev/sd*".
-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php




On Sun, 25 Mar 2012 02:56:11 -0500
Larkin Lowrey <llowrey@nuclearwinter.com> wrote:

> I've been suffering from an extreme slowdown of the various lvm
> commands during high I/O load ever since updating from Fedora 15 to
> 16.
> 
> I notice this particularly Sunday AMs when Fedora kicks of a
> raid-check. What is normally near instantaneous, commands like lvs
> and lvcreate --snapshot take minutes to complete (literally). This
> causes my backup jobs to timeout and fail.
> 
> While all this is going on, the various filesystems are reasonably
> responsive (considering the raid-check is running) and I can
> read/write to files without problems. It seems that this slow-down is
> unique to lvm.
> 
> I have three raid 5 arrays of 8, 6, and 6 drives. The root fs sits
> entirely within the 8 disk array as does the spare area used for
> snapshots.
> 
> Interestingly, perhaps, if I can coax a backup into running, the lvs
> command, for example, will complete in just 15-30 seconds instead of
> 120-180s. It would seem that the random I/O of the backup is able to
> break things up enough for the lvm commands to squeeze in.
> 
> I'm at a loss for what to do about this or what data to scan for
> clues. Any suggestions?
> 
> kernel 3.2.10-3.fc16.x86_64
> 
> lvm> version
>   LVM version:     2.02.86(2) (2011-07-08)
>   Library version: 1.02.65 (2011-07-08)
>   Driver version:  4.22.0
> 
> --Larkin
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-26 20:55 ` Ray Morris
@ 2012-03-26 23:51   ` Larkin Lowrey
  2012-03-27 14:34     ` Zdenek Kabelac
  2012-03-27 14:31   ` Zdenek Kabelac
  1 sibling, 1 reply; 14+ messages in thread
From: Larkin Lowrey @ 2012-03-26 23:51 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Ray Morris

That helped bring the lvcreate time down from 2min to 1min so that's an
improvement.  Thank you.

The source of the remaining slowdown is the writing of metadata to my 4
PVs. The writes are small and the arrays are all raid5 so each metadata
write is also requiring a read. I'm still at a loss for why this was not
a problem when running F15 but the filter is a workable solution for me
so I'll leave it alone.

--Larkin

On 3/26/2012 3:55 PM, Ray Morris wrote:
> Put -vvvv on the command and see what takes so long. In our case, 
> it was checking all of the devices to see if they were PVs.
> "All devices" includes LVs, so it was checking LVs to see if they
> were PVs, and activating an LV triggered a scan in case it was 
> a PV, so activating a volume group was especially slow (hours).
> The solution was to use "filter" in lvm.conf like this:
>
> filter = [ "r|^/dev/dm.*|", "r|^/dev/vg-.*|","a|^/dev/sd*|", "a|^/dev/md*|", "r|.*|" ]
>
> That checks only /dev/sd* and /dev/md*, to see if they are PVs, 
> skipping the checks of LVs to see if they are also PVs. Since the
> device list is cached, use vgscan -vvvv to check that it's checking 
> the right things and maybe delete that cache first. My rule IS 
> a bit redundant because I had trouble getting the simpler form 
> to do what I wanted. I ended up using a belt and suspenders 
> approach, specifying both "do not scan my LVs" and "scan only
> /dev/sd*".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-26 20:55 ` Ray Morris
  2012-03-26 23:51   ` Larkin Lowrey
@ 2012-03-27 14:31   ` Zdenek Kabelac
  2012-03-27 18:11     ` Ray Morris
  1 sibling, 1 reply; 14+ messages in thread
From: Zdenek Kabelac @ 2012-03-27 14:31 UTC (permalink / raw)
  To: linux-lvm

Dne 26.3.2012 22:55, Ray Morris napsal(a):
> Put -vvvv on the command and see what takes so long. In our case, 
> it was checking all of the devices to see if they were PVs.
> "All devices" includes LVs, so it was checking LVs to see if they
> were PVs, and activating an LV triggered a scan in case it was 
> a PV, so activating a volume group was especially slow (hours).
> The solution was to use "filter" in lvm.conf like this:
> 
> filter = [ "r|^/dev/dm.*|", "r|^/dev/vg-.*|","a|^/dev/sd*|", "a|^/dev/md*|", "r|.*|" ]
> 
> That checks only /dev/sd* and /dev/md*, to see if they are PVs, 
> skipping the checks of LVs to see if they are also PVs. Since the
> device list is cached, use vgscan -vvvv to check that it's checking 
> the right things and maybe delete that cache first. My rule IS 
> a bit redundant because I had trouble getting the simpler form 
> to do what I wanted. I ended up using a belt and suspenders 
> approach, specifying both "do not scan my LVs" and "scan only
> /dev/sd*".

Could you check upstream  CVS version of lvm2 with 2 extra patches:
(not yet upstream)

https://www.redhat.com/archives/lvm-devel/2012-March/msg00171.html
https://www.redhat.com/archives/lvm-devel/2012-March/msg00172.html

Whether your slow PV operations are solved ?

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-26 23:51   ` Larkin Lowrey
@ 2012-03-27 14:34     ` Zdenek Kabelac
  2012-03-27 21:24       ` Larkin Lowrey
  0 siblings, 1 reply; 14+ messages in thread
From: Zdenek Kabelac @ 2012-03-27 14:34 UTC (permalink / raw)
  To: linux-lvm

Dne 27.3.2012 01:51, Larkin Lowrey napsal(a):
> That helped bring the lvcreate time down from 2min to 1min so that's an
> improvement.  Thank you.
> 
> The source of the remaining slowdown is the writing of metadata to my 4
> PVs. The writes are small and the arrays are all raid5 so each metadata
> write is also requiring a read. I'm still at a loss for why this was not
> a problem when running F15 but the filter is a workable solution for me
> so I'll leave it alone.
> 


Could you please attach attach output of 'lvs -a' and eventually your md array
setup.

Ideally open regular bugzilla for f16.

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-27 14:31   ` Zdenek Kabelac
@ 2012-03-27 18:11     ` Ray Morris
  2012-03-27 20:36       ` Zdenek Kabelac
  0 siblings, 1 reply; 14+ messages in thread
From: Ray Morris @ 2012-03-27 18:11 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: zkabelac

> > so it was checking LVs to see if they
> > were PVs ...
> > The solution was to use "filter" in lvm.conf like this:

> Could you check upstream  CVS version of lvm2 with 2 extra patches:
> (not yet upstream)
> 
> https://www.redhat.com/archives/lvm-devel/2012-March/msg00171.html
> https://www.redhat.com/archives/lvm-devel/2012-March/msg00172.html
> 
> Whether your slow PV operations are solved ?

Patch #2 seems to apply when they are a lot of PVs. In our case, we 
have very few PVs and a lot of LVs, so I don't think it would affect
us. Patch #1 is a bit less clear to me. Is it applicable to an
environment with few PVs? 
-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php




On Tue, 27 Mar 2012 16:31:49 +0200
Zdenek Kabelac <zkabelac@redhat.com> wrote:

> Dne 26.3.2012 22:55, Ray Morris napsal(a):
> > Put -vvvv on the command and see what takes so long. In our case, 
> > it was checking all of the devices to see if they were PVs.
> > "All devices" includes LVs, so it was checking LVs to see if they
> > were PVs, and activating an LV triggered a scan in case it was 
> > a PV, so activating a volume group was especially slow (hours).
> > The solution was to use "filter" in lvm.conf like this:
> > 
> > filter = [ "r|^/dev/dm.*|", "r|^/dev/vg-.*|","a|^/dev/sd*|",
> > "a|^/dev/md*|", "r|.*|" ]
> > 
> > That checks only /dev/sd* and /dev/md*, to see if they are PVs, 
> > skipping the checks of LVs to see if they are also PVs. Since the
> > device list is cached, use vgscan -vvvv to check that it's checking 
> > the right things and maybe delete that cache first. My rule IS 
> > a bit redundant because I had trouble getting the simpler form 
> > to do what I wanted. I ended up using a belt and suspenders 
> > approach, specifying both "do not scan my LVs" and "scan only
> > /dev/sd*".
> 
> Could you check upstream  CVS version of lvm2 with 2 extra patches:
> (not yet upstream)
> 
> https://www.redhat.com/archives/lvm-devel/2012-March/msg00171.html
> https://www.redhat.com/archives/lvm-devel/2012-March/msg00172.html
> 
> Whether your slow PV operations are solved ?
> 
> Zdenek
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-27 18:11     ` Ray Morris
@ 2012-03-27 20:36       ` Zdenek Kabelac
  0 siblings, 0 replies; 14+ messages in thread
From: Zdenek Kabelac @ 2012-03-27 20:36 UTC (permalink / raw)
  To: Ray Morris; +Cc: LVM general discussion and development

Dne 27.3.2012 20:11, Ray Morris napsal(a):
>>> so it was checking LVs to see if they
>>> were PVs ...
>>> The solution was to use "filter" in lvm.conf like this:
> 
>> Could you check upstream  CVS version of lvm2 with 2 extra patches:
>> (not yet upstream)
>>
>> https://www.redhat.com/archives/lvm-devel/2012-March/msg00171.html
>> https://www.redhat.com/archives/lvm-devel/2012-March/msg00172.html
>>
>> Whether your slow PV operations are solved ?
> 
> Patch #2 seems to apply when they are a lot of PVs. In our case, we 
> have very few PVs and a lot of LVs, so I don't think it would affect
> us. Patch #1 is a bit less clear to me. Is it applicable to an
> environment with few PVs? 

I'm interested in the case which takes  'hours' according to your email. I'm
aware of some problems if you specify extra parameters on command line (i.e.
list of PVs on command line), but generic commands with arguments that selects
just some vg/[lv] should be already working with decent speed (at least with
recent enough version).

Actually which version of lvm is slow for you?
If you are able to reproduce your problems with current upstream - could you
try to describe exact workflow of your slow commands (how many PVs, VGs, LVs)?

Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-27 14:34     ` Zdenek Kabelac
@ 2012-03-27 21:24       ` Larkin Lowrey
  2012-03-28  7:53         ` Zdenek Kabelac
  0 siblings, 1 reply; 14+ messages in thread
From: Larkin Lowrey @ 2012-03-27 21:24 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: Zdenek Kabelac

[-- Attachment #1: Type: text/plain, Size: 2407 bytes --]

I'll try the patches when I get a chance. In the mean time, I've
provided the info you requested as well as a "profiled" run of "lvcreate
-vvvv" attached as lvcreate.txt.gz. The file is pipe delimited with the
2nd field being the delta timestamps in ms between the current line and
the prior line. When that lvcreate was run all arrays, except md0, were
doing a check.

# pvs -a
  PV               VG   Fmt  Attr PSize   PFree
  /dev/Raid/Boot             ---       0       0
  /dev/Raid/Root             ---       0       0
  /dev/Raid/Swap             ---       0       0
  /dev/Raid/Videos           ---       0       0
  /dev/md0         Raid lvm2 a--  496.00m      0
  /dev/md1         Raid lvm2 a--    2.03t 100.00g
  /dev/md10        Raid lvm2 a--    1.46t      0
  /dev/md2         Raid lvm2 a--    9.10t      0

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md10 : active raid5 sdt1[6] sds1[5] sdm1[0] sdn1[1] sdl1[2] sdk1[4]
      1562845120 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6]
[UUUUUU]

md2 : active raid5 sdr1[5] sdo1[4] sdq1[0] sdp1[3] sdg1[2] sdh1[1]
      9767559680 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md0 : active raid6 sde1[4] sdc1[2] sdf1[5] sda1[1] sdb1[0] sdd1[3]
      509952 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6]
[UUUUUU]

md1 : active raid5 sdb2[10] sde2[1] sdc2[3] sda2[9] sdd2[0] sdi2[6]
sdf2[4] sdj2[8]
      2180641792 blocks super 1.2 level 5, 64k chunk, algorithm 2 [8/8]
[UUUUUUUU]

unused devices: <none>


--Larkin

On 3/27/2012 9:34 AM, Zdenek Kabelac wrote:
> Dne 27.3.2012 01:51, Larkin Lowrey napsal(a):
>> That helped bring the lvcreate time down from 2min to 1min so that's an
>> improvement.  Thank you.
>>
>> The source of the remaining slowdown is the writing of metadata to my 4
>> PVs. The writes are small and the arrays are all raid5 so each metadata
>> write is also requiring a read. I'm still at a loss for why this was not
>> a problem when running F15 but the filter is a workable solution for me
>> so I'll leave it alone.
>>
>
> Could you please attach attach output of 'lvs -a' and eventually your md array
> setup.
>
> Ideally open regular bugzilla for f16.
>
> Zdenek
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>

[-- Attachment #2: lvcreate.txt.gz --]
[-- Type: application/gzip, Size: 17163 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-27 21:24       ` Larkin Lowrey
@ 2012-03-28  7:53         ` Zdenek Kabelac
  2012-03-28 18:26           ` Stuart D Gathman
  0 siblings, 1 reply; 14+ messages in thread
From: Zdenek Kabelac @ 2012-03-28  7:53 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: LVM general discussion and development

Dne 27.3.2012 23:24, Larkin Lowrey napsal(a):
> I'll try the patches when I get a chance. In the mean time, I've
> provided the info you requested as well as a "profiled" run of "lvcreate
> -vvvv" attached as lvcreate.txt.gz. The file is pipe delimited with the
> 2nd field being the delta timestamps in ms between the current line and
> the prior line. When that lvcreate was run all arrays, except md0, were
> doing a check.
> 
> # pvs -a
>   PV               VG   Fmt  Attr PSize   PFree
>   /dev/Raid/Boot             ---       0       0
>   /dev/Raid/Root             ---       0       0
>   /dev/Raid/Swap             ---       0       0
>   /dev/Raid/Videos           ---       0       0
>   /dev/md0         Raid lvm2 a--  496.00m      0
>   /dev/md1         Raid lvm2 a--    2.03t 100.00g
>   /dev/md10        Raid lvm2 a--    1.46t      0
>   /dev/md2         Raid lvm2 a--    9.10t      0
> 
> # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md10 : active raid5 sdt1[6] sds1[5] sdm1[0] sdn1[1] sdl1[2] sdk1[4]
>       1562845120 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6]
> [UUUUUU]
> 
> md2 : active raid5 sdr1[5] sdo1[4] sdq1[0] sdp1[3] sdg1[2] sdh1[1]
>       9767559680 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
> 
> md0 : active raid6 sde1[4] sdc1[2] sdf1[5] sda1[1] sdb1[0] sdd1[3]
>       509952 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6]
> [UUUUUU]
> 
> md1 : active raid5 sdb2[10] sde2[1] sdc2[3] sda2[9] sdd2[0] sdi2[6]
> sdf2[4] sdj2[8]
>       2180641792 blocks super 1.2 level 5, 64k chunk, algorithm 2 [8/8]
> [UUUUUUUU]
> 
> unused devices: <none>
> 
> 


So I've just quickly checked the log - and it seems in many case it takes even
upto 4 seconds to finish single  read/write operation.

All the reads from block devices must by directio (older versions have had
some bugs there, where some reads were from buffer cache - that's why your
older F15 might have been giving you faster results - but it's been bug giving
inconsistent results in some situation - mainly virtualization)

It seems that your  cfq scheduler should be tuned better for raid arrays - I
assume you allow the system to create very large queues of buffers and your
mdraid isn't fast enough to store dirty pages on disk - I'd probably suggest
to significantly lower the maximum amount of dirty pages - as creation of
snapshot requires fs sync operation it will need to wait till all buffers
before the operation are in place.

Check for these sysctl options like:

vm.dirty_ratio
vm.dirty_background_ratio
vm.swappiness

and try to do some experiments with those values - if you have a huge RAM -
and large percentage of RAM could be dirtied, then you have a problem
(personally I'd try to keep dirty size in the range of MB, not GB) - but it
depends on the workload....

And another thing which might help a bit 'scan' perfomance is usage of udev.
Check you setting of  lvm.cong   devices/obtain_device_list_from_udev  value.
Are you using it set to 1 ?


Zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-28  7:53         ` Zdenek Kabelac
@ 2012-03-28 18:26           ` Stuart D Gathman
  2012-03-29  0:27             ` Ray Morris
  2012-03-29  9:48             ` Zdenek Kabelac
  0 siblings, 2 replies; 14+ messages in thread
From: Stuart D Gathman @ 2012-03-28 18:26 UTC (permalink / raw)
  To: linux-lvm

Long ago, Nostradamus foresaw that on 03/28/2012 03:53 AM, Zdenek 
Kabelac would write:
> It seems that your cfq scheduler should be tuned better for raid 
> arrays - I assume you allow the system to create very large queues of 
> buffers and your mdraid isn't fast enough to store dirty pages on disk 
> - I'd probably suggest to significantly lower the maximum amount of 
> dirty pages - as creation of snapshot requires fs sync operation it 
> will need to wait till all buffers before the operation are in place.
A question (or minor nit): how could lvm possibly require a fs sync to 
create a snapshot?  I could see this for Xen, where guest OS has to 
support a com channel to host.  But for full virtualization, LVM doesn't 
know in general what OS is running, or how to suggest an FS sync.  Or is 
this something an admin does, run a script that tells guest to sync 
before creating shapshot through lvm (to maximize the amount of useful 
data in the snapshot)?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-28 18:26           ` Stuart D Gathman
@ 2012-03-29  0:27             ` Ray Morris
  2012-03-29  9:48             ` Zdenek Kabelac
  1 sibling, 0 replies; 14+ messages in thread
From: Ray Morris @ 2012-03-29  0:27 UTC (permalink / raw)
  To: LVM general discussion and development

> A question (or minor nit): how could lvm possibly require a fs sync
> to create a snapshot?  I could see this for Xen, where guest OS has
> to support a com channel to host.  But for full virtualization, LVM
> doesn't know in general what OS is running,

LVM doesn't have anything whatever to do with virtualization. There are
some cool ways to use them together, but actually most systems using
LVM aren't using virtualization. Sure, you could put a disk image on 
a filesystem which is on a logical volume, but the snapshot and the
sync are in the OS where the logical volume is. Whether files in the FS
on the LV are used by Firefox or by qemu doesn't matter.
-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php




On Wed, 28 Mar 2012 14:26:11 -0400
Stuart D Gathman <stuart@bmsi.com> wrote:

> Long ago, Nostradamus foresaw that on 03/28/2012 03:53 AM, Zdenek 
> Kabelac would write:
> > It seems that your cfq scheduler should be tuned better for raid 
> > arrays - I assume you allow the system to create very large queues
> > of buffers and your mdraid isn't fast enough to store dirty pages
> > on disk 
> > - I'd probably suggest to significantly lower the maximum amount of 
> > dirty pages - as creation of snapshot requires fs sync operation it 
> > will need to wait till all buffers before the operation are in
> > place.
> A question (or minor nit): how could lvm possibly require a fs sync
> to create a snapshot?  I could see this for Xen, where guest OS has
> to support a com channel to host.  But for full virtualization, LVM
> doesn't know in general what OS is running, or how to suggest an FS
> sync.  Or is this something an admin does, run a script that tells
> guest to sync before creating shapshot through lvm (to maximize the
> amount of useful data in the snapshot)?
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-lvm] LVM commands extremely slow during raid check/resync
  2012-03-28 18:26           ` Stuart D Gathman
  2012-03-29  0:27             ` Ray Morris
@ 2012-03-29  9:48             ` Zdenek Kabelac
  1 sibling, 0 replies; 14+ messages in thread
From: Zdenek Kabelac @ 2012-03-29  9:48 UTC (permalink / raw)
  To: LVM general discussion and development

Dne 28.3.2012 20:26, Stuart D Gathman napsal(a):
> Long ago, Nostradamus foresaw that on 03/28/2012 03:53 AM, Zdenek Kabelac
> would write:
>> It seems that your cfq scheduler should be tuned better for raid arrays - I
>> assume you allow the system to create very large queues of buffers and your
>> mdraid isn't fast enough to store dirty pages on disk - I'd probably suggest
>> to significantly lower the maximum amount of dirty pages - as creation of
>> snapshot requires fs sync operation it will need to wait till all buffers
>> before the operation are in place.
> A question (or minor nit): how could lvm possibly require a fs sync to create
> a snapshot?  I could see this for Xen, where guest OS has to support a com
> channel to host.  But for full virtualization, LVM doesn't know in general
> what OS is running, or how to suggest an FS sync.  Or is this something an
> admin does, run a script that tells guest to sync before creating shapshot
> through lvm (to maximize the amount of useful data in the snapshot)?

You may check the man page for dmsetup suspend operation - options  nolockfs
and noflush.

For lvm creation of snapshot I guess everyone wants to get the filesystem in
'stable' condition - so all in-flight operation before creation of snapshot
happened should hit the disk - and if you run fsck you should get pretty
consistent results.

Or thin it would be preferable that flush & lockfs part should be skipped at
this moment and users would get their snapshot fs in quite broken state ?

zdenek

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-03-29  9:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-25  7:56 [linux-lvm] LVM commands extremely slow during raid check/resync Larkin Lowrey
2012-03-26 13:02 ` James Candelaria
2012-03-26 17:49   ` Larkin Lowrey
2012-03-26 20:55 ` Ray Morris
2012-03-26 23:51   ` Larkin Lowrey
2012-03-27 14:34     ` Zdenek Kabelac
2012-03-27 21:24       ` Larkin Lowrey
2012-03-28  7:53         ` Zdenek Kabelac
2012-03-28 18:26           ` Stuart D Gathman
2012-03-29  0:27             ` Ray Morris
2012-03-29  9:48             ` Zdenek Kabelac
2012-03-27 14:31   ` Zdenek Kabelac
2012-03-27 18:11     ` Ray Morris
2012-03-27 20:36       ` Zdenek Kabelac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.