* Substantial performance difference when reading/writing to device-mapper vs. the individual device
@ 2013-07-22 11:47 Kaul
2013-07-24 11:49 ` Kaul
0 siblings, 1 reply; 3+ messages in thread
From: Kaul @ 2013-07-22 11:47 UTC (permalink / raw)
To: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 802 bytes --]
We are seeing a substantial difference in performance when we perform a
read/write to /dev/mapper/... vs. the specific device (/dev/sdXX)
What can we do to further isolate the issue?
We are using CentOS 6.4, with all updates, 2 CPUs, 4 FC ports:
Here's a table comparing the results:
# of LUNs
# of Paths per device
Native Multipath Device
IO Pattern
IOPS
Latency Micro
BW KBps
4
16
No
100% Read
605,661.4
3,381
2,420,736
4
16
No
100% Write
477,515.1
4,288
1,908,736
8
16
No
100% Read
663,339.4
6,174
2,650,112
8
16
No
100% Write
536,936.9
7,628
2,146,304
4
16
Yes
100% Read
456,108.9
1,122
1,824,256
4
16
Yes
100% Write
371,665.8
1,377
1,486,336
8
16
Yes
100% Read
519,450.2
1,971
2,077,696
8
16
Yes
100% Write
448,840.4
2,281
1,795,072
[-- Attachment #1.2: Type: text/html, Size: 20397 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Substantial performance difference when reading/writing to device-mapper vs. the individual device
2013-07-22 11:47 Substantial performance difference when reading/writing to device-mapper vs. the individual device Kaul
@ 2013-07-24 11:49 ` Kaul
2013-07-25 9:38 ` Jun'ichi Nomura
0 siblings, 1 reply; 3+ messages in thread
From: Kaul @ 2013-07-24 11:49 UTC (permalink / raw)
To: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 2423 bytes --]
Reply to self:
Could it be explained by the difference in max_segments between the
different devices and the dm device?
Sounds like https://bugzilla.redhat.com/show_bug.cgi?id=755046 which is
supposed to be fixed in 6.4, I reckon:
3514f0c5615a00003 dm-3 XtremIO,XtremApp
size=1.0T features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
|- 0:0:2:2 sdi 8:128 active ready running
|- 0:0:3:2 sdl 8:176 active ready running
|- 0:0:1:2 sdf 8:80 active ready running
|- 0:0:0:2 sdc 8:32 active ready running
|- 1:0:0:2 sds 65:32 active ready running
|- 1:0:3:2 sdab 65:176 active ready running
|- 1:0:2:2 sdy 65:128 active ready running
`- 1:0:1:2 sdv 65:80 active ready running
[root@lg545 ~]# cat /sys/class/block/dm-3/queue/max_segments
128
[root@lg545 ~]# cat /sys/class/block/sdi/queue/max_segments
1024
[root@lg545 ~]# cat /sys/class/block/sdl/queue/max_segments
1024
[root@lg545 ~]# cat /sys/class/block/sdf/queue/max_segments
1024
[root@lg545 ~]# cat /sys/class/block/sdc/queue/max_segments
1024
[root@lg545 ~]# cat /sys/class/block/sds/queue/max_segments
1024
[root@lg545 ~]# cat /sys/class/block/sdab/queue/max_segments
1024
[root@lg545 ~]# cat /sys/class/block/sdy/queue/max_segments
1024
[root@lg545 ~]# cat /sys/class/block/sdv/queue/max_segments
1024
On Mon, Jul 22, 2013 at 2:47 PM, Kaul <mykaul@gmail.com> wrote:
> We are seeing a substantial difference in performance when we perform a
> read/write to /dev/mapper/... vs. the specific device (/dev/sdXX)
> What can we do to further isolate the issue?
>
> We are using CentOS 6.4, with all updates, 2 CPUs, 4 FC ports:
> Here's a table comparing the results:
>
> # of LUNs
>
> # of Paths per device
>
> Native Multipath Device
>
> IO Pattern
>
> IOPS
>
> Latency Micro
>
> BW KBps
>
> 4
>
> 16
>
> No
>
> 100% Read
>
> 605,661.4
>
> 3,381
>
> 2,420,736
>
> 4
>
> 16
>
> No
>
> 100% Write
>
> 477,515.1
>
> 4,288
>
> 1,908,736
>
> 8
>
> 16
>
> No
>
> 100% Read
>
> 663,339.4
>
> 6,174
>
> 2,650,112
>
> 8
>
> 16
>
> No
>
> 100% Write
>
> 536,936.9
>
> 7,628
>
> 2,146,304
>
> 4
>
> 16
>
> Yes
>
> 100% Read
>
> 456,108.9
>
> 1,122
>
> 1,824,256
>
> 4
>
> 16
>
> Yes
>
> 100% Write
>
> 371,665.8
>
> 1,377
>
> 1,486,336
>
> 8
>
> 16
>
> Yes
>
> 100% Read
>
> 519,450.2
>
> 1,971
>
> 2,077,696
>
> 8
>
> 16
>
> Yes
>
> 100% Write
>
> 448,840.4
>
> 2,281
>
> 1,795,072
>
[-- Attachment #1.2: Type: text/html, Size: 21423 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Substantial performance difference when reading/writing to device-mapper vs. the individual device
2013-07-24 11:49 ` Kaul
@ 2013-07-25 9:38 ` Jun'ichi Nomura
0 siblings, 0 replies; 3+ messages in thread
From: Jun'ichi Nomura @ 2013-07-25 9:38 UTC (permalink / raw)
To: dm-devel, mykaul
On 07/24/13 20:49, Kaul wrote:
> Could it be explained by the difference in max_segments between the different devices and the dm device?
It depends on work load.
Have you already checked IO pattern with "iostat -xN"?
For mostly sequential IO where a lot of segments are merged,
"max_segments" might affect the performance.
For mostly random and small IO where merge does not occur so often,
it does not likely matter.
> Sounds like https://bugzilla.redhat.com/show_bug.cgi?id=755046 which is supposed to be fixed in 6.4, I reckon:
You could check other request_queue parameters to see if any differences
between dm device an sd device exist. (/sys/class/block/*/queue/*)
Many of them could affect performance.
Also I think you should check if the same phenomenon happens with the
latest upstream kernel to get feedbacks from upstream mailing list.
The other thing I would check is CPU load, perhaps starting with commands
like top and mpstat, whether there are enough idle cycles left for the
application/kernel to submit/process IOs.
> 3514f0c5615a00003 dm-3 XtremIO,XtremApp
> size=1.0T features='0' hwhandler='0' wp=rw
> `-+- policy='queue-length 0' prio=1 status=active
> |- 0:0:2:2 sdi 8:128 active ready running
> |- 0:0:3:2 sdl 8:176 active ready running
> |- 0:0:1:2 sdf 8:80 active ready running
> |- 0:0:0:2 sdc 8:32 active ready running
> |- 1:0:0:2 sds 65:32 active ready running
> |- 1:0:3:2 sdab 65:176 active ready running
> |- 1:0:2:2 sdy 65:128 active ready running
> `- 1:0:1:2 sdv 65:80 active ready running
>
>
> [root@lg545 ~]# cat /sys/class/block/dm-3/queue/max_segments
> 128
> [root@lg545 ~]# cat /sys/class/block/sdi/queue/max_segments
> 1024
> [root@lg545 ~]# cat /sys/class/block/sdl/queue/max_segments
> 1024
> [root@lg545 ~]# cat /sys/class/block/sdf/queue/max_segments
> 1024
> [root@lg545 ~]# cat /sys/class/block/sdc/queue/max_segments
> 1024
> [root@lg545 ~]# cat /sys/class/block/sds/queue/max_segments
> 1024
> [root@lg545 ~]# cat /sys/class/block/sdab/queue/max_segments
> 1024
> [root@lg545 ~]# cat /sys/class/block/sdy/queue/max_segments
> 1024
> [root@lg545 ~]# cat /sys/class/block/sdv/queue/max_segments
> 1024
>
>
> On Mon, Jul 22, 2013 at 2:47 PM, Kaul <mykaul@gmail.com <mailto:mykaul@gmail.com>> wrote:
>
> We are seeing a substantial difference in performance when we perform a read/write to /dev/mapper/... vs. the specific device (/dev/sdXX)
> What can we do to further isolate the issue?
>
> We are using CentOS 6.4, with all updates, 2 CPUs, 4 FC ports:
> Here's a table comparing the results:
>
> # of LUNs
> # of Paths per device
> Native Multipath Device
> IO Pattern
> IOPS
> Latency Micro
> BW KBps
>
> 4
> 16
> No
> 100% Read
> 605,661.4
> 3,381
> 2,420,736
>
> 4
> 16
> No
> 100% Write
> 477,515.1
> 4,288
> 1,908,736
>
> 8
> 16
> No
> 100% Read
> 663,339.4
> 6,174
> 2,650,112
>
> 8
> 16
> No
> 100% Write
> 536,936.9
> 7,628
> 2,146,304
>
> 4
> 16
> Yes
> 100% Read
> 456,108.9
> 1,122
> 1,824,256
>
> 4
> 16
> Yes
> 100% Write
> 371,665.8
> 1,377
> 1,486,336
>
> 8
> 16
> Yes
> 100% Read
> 519,450.2
> 1,971
> 2,077,696
>
> 8
> 16
> Yes
> 100% Write
> 448,840.4
> 2,281
> 1,795,072
--
Jun'ichi Nomura, NEC Corporation
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-07-25 9:38 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-22 11:47 Substantial performance difference when reading/writing to device-mapper vs. the individual device Kaul
2013-07-24 11:49 ` Kaul
2013-07-25 9:38 ` Jun'ichi Nomura
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.