All of lore.kernel.org
 help / color / mirror / Atom feed
* Latency in logical volume layer?
@ 2017-04-18 21:27 Chris Adams
  0 siblings, 0 replies; only message in thread
From: Chris Adams @ 2017-04-18 21:27 UTC (permalink / raw)
  To: linux-kernel

I am trying to figure out a storage latency issue I am seeing with oVirt
and iSCSI storage, and I am looking for a little help (or to be told
"you're doing it wrong" as usual).

I have an oVirt virtualization cluster running with 7 CentOS 7 servers,
a dedicated storage LAN (separate switches), and iSCSI multipath running
to a SAN.  Occasionally, at times when there's no apparent load spike or
anything, oVirt will report 5+ second latency accessing a storage
domain.  I can't see any network issue or problem at the SAN, so I
started looking at Linux.

oVirt reports this when it tries to read the storage domain metadata.
With iSCSI storage, oVirt access it via multipath, and treats the whole
device as a PV for Linux LVM (no partitioning).  The metadata is a small
LV that each node reads the first 4K from every few seconds (using
O_DIRECT to avoid caching).  I wrote a perl script to replicate this
access pattern (open with O_DIRECT, read the first 4K, close) and report
times.  I do see higher than expected latency sometimes - 50-200ms
latency happens fairly regularly.

I added doing the same open/read/close on the PV (the multipath device),
and I do not see the same latency there.  It is a very consistent
0.25-0.55ms latency.  I put a host in maintenance mode, and disabled
multipath, and I saw similar behavior (comparing reads from the raw SCSI
device and the LV device).

I am testing on a host with no VMs.  I do sometimes (not always) see
similar latency on multiple hosts (others are running VMs)
simultaneously.

That's where I'm lost - how does going up the stack from the multipath
device to the LV add so much latency (but not all the time)?

I recognize that the CentOS 7 kernel is not mainline, but was hoping
that maybe somebody would say "that's a known thing", or "that's
expected", or "you're measuring wrong".

Any suggestions, places to look, etc.?  Thanks.
-- 
Chris Adams <linux@cmadams.net>

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2017-04-18 21:40 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-18 21:27 Latency in logical volume layer? Chris Adams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.