From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kaul Subject: Substantial performance difference when reading/writing to device-mapper vs. the individual device Date: Mon, 22 Jul 2013 14:47:16 +0300 Message-ID: Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5455903886219857072==" Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: dm-devel@redhat.com List-Id: dm-devel.ids --===============5455903886219857072== Content-Type: multipart/alternative; boundary=047d7b15b1615ade9f04e2183f44 --047d7b15b1615ade9f04e2183f44 Content-Type: text/plain; charset=ISO-8859-1 We are seeing a substantial difference in performance when we perform a read/write to /dev/mapper/... vs. the specific device (/dev/sdXX) What can we do to further isolate the issue? We are using CentOS 6.4, with all updates, 2 CPUs, 4 FC ports: Here's a table comparing the results: # of LUNs # of Paths per device Native Multipath Device IO Pattern IOPS Latency Micro BW KBps 4 16 No 100% Read 605,661.4 3,381 2,420,736 4 16 No 100% Write 477,515.1 4,288 1,908,736 8 16 No 100% Read 663,339.4 6,174 2,650,112 8 16 No 100% Write 536,936.9 7,628 2,146,304 4 16 Yes 100% Read 456,108.9 1,122 1,824,256 4 16 Yes 100% Write 371,665.8 1,377 1,486,336 8 16 Yes 100% Read 519,450.2 1,971 2,077,696 8 16 Yes 100% Write 448,840.4 2,281 1,795,072 --047d7b15b1615ade9f04e2183f44 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
We are seeing a substantial difference in performance when= we perform a read/write to /dev/mapper/... vs. the specific device (/dev/s= dXX)
What can we do to further isolate the issue?

We are using CentOS 6.4, with all updates, 2 CPUs, 4 FC ports:
Here's a table comparing the results:

# of LUNs

# of Paths per device

Native Multipath Device<= /p>

IO Pattern

IOPS

Latency Micro

BW KBps

4

16

No

100% Read

605,661.4

3,381

2,420,736

4

16

No

100% Write

477,515.1

4,288

1,908,736

8

16

No

100% Read

663,339.4

6,174

2,650,112

8

16

No

100% Write

536,936.9

7,628

2,146,304

4

16

Yes

100% Read

456,108.9

1,122

1,824,256

4

16

Yes

100% Write

371,665.8

1,377

1,486,336

8

16

Yes

100% Read

519,450.2

1,971

2,077,696

8

16

Yes

100% Write

448,840.4

2,281

1,795,072

--047d7b15b1615ade9f04e2183f44-- --===============5455903886219857072== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline --===============5455903886219857072==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kaul Subject: Re: Substantial performance difference when reading/writing to device-mapper vs. the individual device Date: Wed, 24 Jul 2013 14:49:05 +0300 Message-ID: References: Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5527200849341125565==" Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: dm-devel@redhat.com List-Id: dm-devel.ids --===============5527200849341125565== Content-Type: multipart/alternative; boundary=047d7b15a0d581f5d604e24081e5 --047d7b15a0d581f5d604e24081e5 Content-Type: text/plain; charset=ISO-8859-1 Reply to self: Could it be explained by the difference in max_segments between the different devices and the dm device? Sounds like https://bugzilla.redhat.com/show_bug.cgi?id=755046 which is supposed to be fixed in 6.4, I reckon: 3514f0c5615a00003 dm-3 XtremIO,XtremApp size=1.0T features='0' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 0:0:2:2 sdi 8:128 active ready running |- 0:0:3:2 sdl 8:176 active ready running |- 0:0:1:2 sdf 8:80 active ready running |- 0:0:0:2 sdc 8:32 active ready running |- 1:0:0:2 sds 65:32 active ready running |- 1:0:3:2 sdab 65:176 active ready running |- 1:0:2:2 sdy 65:128 active ready running `- 1:0:1:2 sdv 65:80 active ready running [root@lg545 ~]# cat /sys/class/block/dm-3/queue/max_segments 128 [root@lg545 ~]# cat /sys/class/block/sdi/queue/max_segments 1024 [root@lg545 ~]# cat /sys/class/block/sdl/queue/max_segments 1024 [root@lg545 ~]# cat /sys/class/block/sdf/queue/max_segments 1024 [root@lg545 ~]# cat /sys/class/block/sdc/queue/max_segments 1024 [root@lg545 ~]# cat /sys/class/block/sds/queue/max_segments 1024 [root@lg545 ~]# cat /sys/class/block/sdab/queue/max_segments 1024 [root@lg545 ~]# cat /sys/class/block/sdy/queue/max_segments 1024 [root@lg545 ~]# cat /sys/class/block/sdv/queue/max_segments 1024 On Mon, Jul 22, 2013 at 2:47 PM, Kaul wrote: > We are seeing a substantial difference in performance when we perform a > read/write to /dev/mapper/... vs. the specific device (/dev/sdXX) > What can we do to further isolate the issue? > > We are using CentOS 6.4, with all updates, 2 CPUs, 4 FC ports: > Here's a table comparing the results: > > # of LUNs > > # of Paths per device > > Native Multipath Device > > IO Pattern > > IOPS > > Latency Micro > > BW KBps > > 4 > > 16 > > No > > 100% Read > > 605,661.4 > > 3,381 > > 2,420,736 > > 4 > > 16 > > No > > 100% Write > > 477,515.1 > > 4,288 > > 1,908,736 > > 8 > > 16 > > No > > 100% Read > > 663,339.4 > > 6,174 > > 2,650,112 > > 8 > > 16 > > No > > 100% Write > > 536,936.9 > > 7,628 > > 2,146,304 > > 4 > > 16 > > Yes > > 100% Read > > 456,108.9 > > 1,122 > > 1,824,256 > > 4 > > 16 > > Yes > > 100% Write > > 371,665.8 > > 1,377 > > 1,486,336 > > 8 > > 16 > > Yes > > 100% Read > > 519,450.2 > > 1,971 > > 2,077,696 > > 8 > > 16 > > Yes > > 100% Write > > 448,840.4 > > 2,281 > > 1,795,072 > --047d7b15a0d581f5d604e24081e5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Reply to self:

Could it be explained by= the difference in max_segments between the different devices and the dm de= vice?
Sounds like=A0https://bugzil= la.redhat.com/show_bug.cgi?id=3D755046=A0which is supposed to be fixed = in 6.4, I reckon:

=A0

3514f0c5615a00003 dm-3 XtremIO,XtremApp

size=3D1.0T features=3D'0' hwhandler=3D'0' wp=3Drw

`-+- policy=3D'queue-length 0' prio=3D1 status=3Dactive

=A0 |- 0:0:2:2 sdi=A0 8:128=A0 active ready running

=A0 |- 0:0:3:2 sdl=A0 8:176=A0 active ready running

=A0 |- 0:0:1:2 sdf=A0 8:80=A0=A0 active ready running

=A0 |- 0:0:0:2 sdc=A0 8:32=A0=A0 active ready running

=A0 |- 1:0:0:2 sds=A0 65:32=A0 active ready running

=A0 |- 1:0:3:2 sdab 65:176 active ready running

=A0 |- 1:0:2:2 sdy=A0 65:128 active ready running

=A0 `- 1:0:1:2 sdv=A0 65:80=A0 active ready running

=A0

[root@lg545 ~]# cat /sys/class/block/dm-3/queue/max_segments

128

[root@lg545 ~]# cat /sys/class/block/sdi/queue/max_segments

1024

[root@lg545 ~]# cat /sys/class/block/sdl/queue/max_segments

1024

[root@lg545 ~]# cat /sys/class/block/sdf/queue/max_segments

1024

[root@lg545 ~]# cat /sys/class/block/sdc/queue/max_segments

1024

[root@lg545 ~]# cat /sys/class/block/sds/queue/max_segments

1024

[root@lg545 ~]# cat /sys/class/block/sdab/queue/max_segments

1024

[root@lg545 ~]# cat /sys/class/block/sdy/queue/max_segments

1024

[root@lg545 ~]# cat /sys/class/block/sdv/queue/max_segments

1024



On Mon, Jul 22, 2013 at 2:47 PM, Kaul <mykaul@gmail.com> wrote:
We are seeing a substantial= difference in performance when we perform a read/write to /dev/mapper/... = vs. the specific device (/dev/sdXX)
What can we do to further isolate the issue?

We are using CentOS 6.4, with all updates, 2 CPUs, 4 FC ports:
Here's a table comparing the results:

# of LUNs

# of Paths per device

Native Multipath Device

IO Pattern

IOPS

Latency Micro

BW KBps

4

16

No

100% Read

605,661.4

3,381

2,420,736

4

16

No

100% Write

477,515.1

4,288

1,908,736

8

16

No

100% Read

663,339.4

6,174

2,650,112

8

16

No

100% Write

536,936.9

7,628

2,146,304

4

16

Yes

100% Read

456,108.9

1,122

1,824,256

4

16

Yes

100% Write

371,665.8

1,377

1,486,336

8

16

Yes

100% Read

519,450.2

1,971

2,077,696

8

16

Yes

100% Write

448,840.4

2,281

1,795,072


--047d7b15a0d581f5d604e24081e5-- --===============5527200849341125565== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline --===============5527200849341125565==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jun'ichi Nomura" Subject: Re: Substantial performance difference when reading/writing to device-mapper vs. the individual device Date: Thu, 25 Jul 2013 18:38:36 +0900 Message-ID: <51F0F21C.5090000@ce.jp.nec.com> References: Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: dm-devel@redhat.com, mykaul@gmail.com List-Id: dm-devel.ids On 07/24/13 20:49, Kaul wrote: > Could it be explained by the difference in max_segments between the different devices and the dm device? It depends on work load. Have you already checked IO pattern with "iostat -xN"? For mostly sequential IO where a lot of segments are merged, "max_segments" might affect the performance. For mostly random and small IO where merge does not occur so often, it does not likely matter. > Sounds like https://bugzilla.redhat.com/show_bug.cgi?id=755046 which is supposed to be fixed in 6.4, I reckon: You could check other request_queue parameters to see if any differences between dm device an sd device exist. (/sys/class/block/*/queue/*) Many of them could affect performance. Also I think you should check if the same phenomenon happens with the latest upstream kernel to get feedbacks from upstream mailing list. The other thing I would check is CPU load, perhaps starting with commands like top and mpstat, whether there are enough idle cycles left for the application/kernel to submit/process IOs. > 3514f0c5615a00003 dm-3 XtremIO,XtremApp > size=1.0T features='0' hwhandler='0' wp=rw > `-+- policy='queue-length 0' prio=1 status=active > |- 0:0:2:2 sdi 8:128 active ready running > |- 0:0:3:2 sdl 8:176 active ready running > |- 0:0:1:2 sdf 8:80 active ready running > |- 0:0:0:2 sdc 8:32 active ready running > |- 1:0:0:2 sds 65:32 active ready running > |- 1:0:3:2 sdab 65:176 active ready running > |- 1:0:2:2 sdy 65:128 active ready running > `- 1:0:1:2 sdv 65:80 active ready running > > > [root@lg545 ~]# cat /sys/class/block/dm-3/queue/max_segments > 128 > [root@lg545 ~]# cat /sys/class/block/sdi/queue/max_segments > 1024 > [root@lg545 ~]# cat /sys/class/block/sdl/queue/max_segments > 1024 > [root@lg545 ~]# cat /sys/class/block/sdf/queue/max_segments > 1024 > [root@lg545 ~]# cat /sys/class/block/sdc/queue/max_segments > 1024 > [root@lg545 ~]# cat /sys/class/block/sds/queue/max_segments > 1024 > [root@lg545 ~]# cat /sys/class/block/sdab/queue/max_segments > 1024 > [root@lg545 ~]# cat /sys/class/block/sdy/queue/max_segments > 1024 > [root@lg545 ~]# cat /sys/class/block/sdv/queue/max_segments > 1024 > > > On Mon, Jul 22, 2013 at 2:47 PM, Kaul > wrote: > > We are seeing a substantial difference in performance when we perform a read/write to /dev/mapper/... vs. the specific device (/dev/sdXX) > What can we do to further isolate the issue? > > We are using CentOS 6.4, with all updates, 2 CPUs, 4 FC ports: > Here's a table comparing the results: > > # of LUNs > # of Paths per device > Native Multipath Device > IO Pattern > IOPS > Latency Micro > BW KBps > > 4 > 16 > No > 100% Read > 605,661.4 > 3,381 > 2,420,736 > > 4 > 16 > No > 100% Write > 477,515.1 > 4,288 > 1,908,736 > > 8 > 16 > No > 100% Read > 663,339.4 > 6,174 > 2,650,112 > > 8 > 16 > No > 100% Write > 536,936.9 > 7,628 > 2,146,304 > > 4 > 16 > Yes > 100% Read > 456,108.9 > 1,122 > 1,824,256 > > 4 > 16 > Yes > 100% Write > 371,665.8 > 1,377 > 1,486,336 > > 8 > 16 > Yes > 100% Read > 519,450.2 > 1,971 > 2,077,696 > > 8 > 16 > Yes > 100% Write > 448,840.4 > 2,281 > 1,795,072 -- Jun'ichi Nomura, NEC Corporation