* ISCSI-SCST performance (with also IET and STGT data)
@ 2009-03-30 17:33 Vladislav Bolkhovitin
[not found] ` <e2e108260903301106y2b750c23kfab978567f3de3a0@mail.gmail.com>
2009-04-01 20:14 ` Bart Van Assche
0 siblings, 2 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-03-30 17:33 UTC (permalink / raw)
To: scst-devel; +Cc: linux-scsi, linux-kernel, iscsitarget-devel, stgt
Hi All,
As part of 1.0.1 release preparations I made some performance tests to
make sure there are no performance regressions in SCST overall and
iSCSI-SCST particularly. Results were quite interesting, so I decided to
publish them together with the corresponding numbers for IET and STGT
iSCSI targets. This isn't a real performance comparison, it includes
only few chosen tests, because I don't have time for a complete
comparison. But I hope somebody will take up what I did and make it
complete.
Setup:
Target: HT 2.4GHz Xeon, x86_32, 2GB of memory limited to 256MB by kernel
command line to have less test data footprint, 75GB 15K RPM SCSI disk as
backstorage, dual port 1Gbps E1000 Intel network card, 2.6.29 kernel.
Initiator: 1.7GHz Xeon, x86_32, 1GB of memory limited to 256MB by kernel
command line to have less test data footprint, dual port 1Gbps E1000
Intel network card, 2.6.27 kernel, open-iscsi 2.0-870-rc3.
The target exported a 5GB file on XFS for FILEIO and 5GB partition for
BLOCKIO.
All the tests were ran 3 times and average written. All the values are
in MB/s. The tests were ran with CFQ and deadline IO schedulers on the
target. All other parameters on both target and initiator were default.
==================================================================
I. SEQUENTIAL ACCESS OVER SINGLE LINE
1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
ISCSI-SCST IET STGT
NULLIO: 106 105 103
FILEIO/CFQ: 82 57 55
FILEIO/deadline 69 69 67
BLOCKIO/CFQ 81 28 -
BLOCKIO/deadline 80 66 -
------------------------------------------------------------------
2. # dd if=/dev/zero of=/dev/sdX bs=512K count=2000
I didn't do other write tests, because I have data on those devices.
ISCSI-SCST IET STGT
NULLIO: 114 114 114
------------------------------------------------------------------
3. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then
# dd if=/mnt/q of=/dev/null bs=512K count=2000
were ran (/mnt/q was created before by the next test)
ISCSI-SCST IET STGT
FILEIO/CFQ: 94 66 46
FILEIO/deadline 74 74 72
BLOCKIO/CFQ 95 35 -
BLOCKIO/deadline 94 95 -
------------------------------------------------------------------
4. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then
# dd if=/dev/zero of=/mnt/q bs=512K count=2000
were ran (/mnt/q was created by the next test before)
ISCSI-SCST IET STGT
FILEIO/CFQ: 97 91 88
FILEIO/deadline 98 96 90
BLOCKIO/CFQ 112 110 -
BLOCKIO/deadline 112 110 -
------------------------------------------------------------------
Conclusions:
1. ISCSI-SCST FILEIO on buffered READs on 27% faster than IET (94 vs
74). With CFQ the difference is 42% (94 vs 66).
2. ISCSI-SCST FILEIO on buffered READs on 30% faster than STGT (94 vs
72). With CFQ the difference is 104% (94 vs 46).
3. ISCSI-SCST BLOCKIO on buffered READs has about the same performance
as IET, but with CFQ it's on 170% faster (95 vs 35).
4. Buffered WRITEs are not so interesting, because they are async. with
many outstanding commands at time, hence latency insensitive, but even
here ISCSI-SCST always a bit faster than IET.
5. STGT always the worst, sometimes considerably.
6. BLOCKIO on buffered WRITEs is constantly faster, than FILEIO, so,
definitely, there is a room for future improvement here.
7. For some reason assess on file system is considerably better, than
the same device directly.
==================================================================
II. Mostly random "realistic" access.
For this test I used io_trash utility. For more details see
http://lkml.org/lkml/2008/11/17/444. To show value of target-side
caching in this test target was ran with full 2GB of memory. I ran
io_trash with the following parameters: "2 2 ./ 500000000 50000000 10
4096 4096 300000 10 90 0 10". Total execution time was measured.
ISCSI-SCST IET STGT
FILEIO/CFQ: 4m45s 5m 5m17s
FILEIO/deadline 5m20s 5m22s 5m35s
BLOCKIO/CFQ 23m3s 23m5s -
BLOCKIO/deadline 23m15s 23m25s -
Conclusions:
1. FILEIO on 500% (five times!) faster than BLOCKIO
2. STGT, as usually, always the worst
3. Deadline always a bit slower
==================================================================
III. SEQUENTIAL ACCESS OVER MPIO
Unfortunately, my dual port network card isn't capable of simultaneous
data transfers, so I had to do some "modeling" and put my network
devices in 100Mbps mode. To make this model more realistic I also used
my old IDE 5200RPM hard drive capable to produce locally 35MB/s
throughput. So I modeled the case of double 1Gbps links with 350MB/s
backstorage, if all the following rules satisfied:
- Both links a capable of simultaneous data transfers
- There is sufficient amount of CPU power on both initiator and target
to cover requirements for the data transfers.
All the tests were done with iSCSI-SCST only.
1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
NULLIO: 23
FILEIO/CFQ: 20
FILEIO/deadline 20
BLOCKIO/CFQ 20
BLOCKIO/deadline 17
Single line NULLIO is 12.
So, there is a 67% improvement using 2 lines. With 1Gbps it should be
equivalent of 200MB/s. Not too bad.
==================================================================
Connection to the target were made with the following iSCSI parameters:
# iscsi-scst-adm --op show --tid=1 --sid=0x10000013d0200
InitialR2T=No
ImmediateData=Yes
MaxConnections=1
MaxRecvDataSegmentLength=2097152
MaxXmitDataSegmentLength=131072
MaxBurstLength=2097152
FirstBurstLength=262144
DefaultTime2Wait=2
DefaultTime2Retain=0
MaxOutstandingR2T=1
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
HeaderDigest=None
DataDigest=None
OFMarker=No
IFMarker=No
OFMarkInt=Reject
IFMarkInt=Reject
# ietadm --op show --tid=1 --sid=0x10000013d0200
InitialR2T=No
ImmediateData=Yes
MaxConnections=1
MaxRecvDataSegmentLength=262144
MaxXmitDataSegmentLength=131072
MaxBurstLength=2097152
FirstBurstLength=262144
DefaultTime2Wait=2
DefaultTime2Retain=20
MaxOutstandingR2T=1
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
HeaderDigest=None
DataDigest=None
OFMarker=No
IFMarker=No
OFMarkInt=Reject
IFMarkInt=Reject
# tgtadm --op show --mode session --tid 1 --sid 1
MaxRecvDataSegmentLength=2097152
MaxXmitDataSegmentLength=131072
HeaderDigest=None
DataDigest=None
InitialR2T=No
MaxOutstandingR2T=1
ImmediateData=Yes
FirstBurstLength=262144
MaxBurstLength=2097152
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
IFMarker=No
OFMarker=No
DefaultTime2Wait=2
DefaultTime2Retain=0
OFMarkInt=Reject
IFMarkInt=Reject
MaxConnections=1
RDMAExtensions=No
TargetRecvDataSegmentLength=262144
InitiatorRecvDataSegmentLength=262144
MaxOutstandingUnexpectedPDUs=0
Vlad
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
[not found] ` <e2e108260903301106y2b750c23kfab978567f3de3a0@mail.gmail.com>
@ 2009-03-30 18:33 ` Vladislav Bolkhovitin
2009-03-30 18:53 ` Bart Van Assche
0 siblings, 1 reply; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-03-30 18:33 UTC (permalink / raw)
To: Bart Van Assche
Cc: scst-devel, iscsitarget-devel, linux-kernel, linux-scsi, stgt
Bart Van Assche, on 03/30/2009 10:06 PM wrote:
> On Mon, Mar 30, 2009 at 7:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
>> As part of 1.0.1 release preparations I made some performance tests to make
>> sure there are no performance regressions in SCST overall and iSCSI-SCST
>> particularly. Results were quite interesting, so I decided to publish them
>> together with the corresponding numbers for IET and STGT iSCSI targets. This
>> isn't a real performance comparison, it includes only few chosen tests,
>> because I don't have time for a complete comparison. But I hope somebody
>> will take up what I did and make it complete.
>>
>> Setup:
>>
>> Target: HT 2.4GHz Xeon, x86_32, 2GB of memory limited to 256MB by kernel
>> command line to have less test data footprint, 75GB 15K RPM SCSI disk as
>> backstorage, dual port 1Gbps E1000 Intel network card, 2.6.29 kernel.
>>
>> Initiator: 1.7GHz Xeon, x86_32, 1GB of memory limited to 256MB by kernel
>> command line to have less test data footprint, dual port 1Gbps E1000 Intel
>> network card, 2.6.27 kernel, open-iscsi 2.0-870-rc3.
>>
>> The target exported a 5GB file on XFS for FILEIO and 5GB partition for
>> BLOCKIO.
>>
>> All the tests were ran 3 times and average written. All the values are in
>> MB/s. The tests were ran with CFQ and deadline IO schedulers on the target.
>> All other parameters on both target and initiator were default.
>
> These are indeed interesting results. There are some aspects of the
> test setup I do not understand however:
> * All tests have been run with buffered I/O instead of direct I/O
> (iflag=direct / oflag=direct). My experience is that the results of
> tests with direct I/O are easier to reproduce (less variation between
> runs). So I have been wondering why the tests have been run with
> buffered I/O instead ?
Real applications use buffered I/O, hence it should be used in tests. It
evaluates all the storage stack on both initiator and target as a
whole. The results are very reproducible, variation is about 10%.
> * It is well known that having more memory in the target system
> improves performance because of read and write caching. What did you
> want to demonstrate by limiting the memory of the target system ?
If I had full 2GB on the target, I would have to spend on the
measurements 10 times more time, since the data footprint should be at
least 4x of the cache size. For sequential read/writes 256MB and 2GB of
the cache are the same.
Where it did matter (io_trash) I increased memory size to full 2GB.
> * Which SCST options were enabled on the target ? Was e.g. the
> NV_CACHE option enabled ?
Defaults, i.e. yes, enabled. But it didn't matter, since all the
filesystems where mounted on the initiator without data barriers enabled.
Thanks,
Vlad
P.S. Please don't drop CC.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-03-30 18:33 ` [Scst-devel] " Vladislav Bolkhovitin
@ 2009-03-30 18:53 ` Bart Van Assche
0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-03-30 18:53 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: scst-devel, iscsitarget-devel, linux-kernel, linux-scsi, stgt
On Mon, Mar 30, 2009 at 8:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Bart Van Assche, on 03/30/2009 10:06 PM wrote:
>> These are indeed interesting results. There are some aspects of the
>> test setup I do not understand however:
>> * All tests have been run with buffered I/O instead of direct I/O
>> (iflag=direct / oflag=direct). My experience is that the results of
>> tests with direct I/O are easier to reproduce (less variation between
>> runs). So I have been wondering why the tests have been run with
>> buffered I/O instead ?
>
> Real applications use buffered I/O, hence it should be used in tests. It
> evaluates all the storage stack on both initiator and target as a whole.
> The results are very reproducible, variation is about 10%.
Most applications do indeed use buffered I/O. Database software
however often uses direct I/O. It might be interesting to publish
performance results for both buffered I/O and direct I/O. A quote from
the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
(Linux Symposium, Ottawa, 2003):
Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
a device without copying the data through the kernel’s buffer cache.
This mechanism can boost performance if the data is unlikely to be
used again in the short term (during a disk backup, for example), or
for applications such as large database management systems that
perform their own caching.
Bart.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-03-30 18:53 ` Bart Van Assche
0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-03-30 18:53 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: scst-devel, iscsitarget-devel, linux-kernel, linux-scsi, stgt
On Mon, Mar 30, 2009 at 8:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Bart Van Assche, on 03/30/2009 10:06 PM wrote:
>> These are indeed interesting results. There are some aspects of the
>> test setup I do not understand however:
>> * All tests have been run with buffered I/O instead of direct I/O
>> (iflag=direct / oflag=direct). My experience is that the results of
>> tests with direct I/O are easier to reproduce (less variation between
>> runs). So I have been wondering why the tests have been run with
>> buffered I/O instead ?
>
> Real applications use buffered I/O, hence it should be used in tests. It
> evaluates all the storage stack on both initiator and target as a whole.
> The results are very reproducible, variation is about 10%.
Most applications do indeed use buffered I/O. Database software
however often uses direct I/O. It might be interesting to publish
performance results for both buffered I/O and direct I/O. A quote from
the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
(Linux Symposium, Ottawa, 2003):
Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
a device without copying the data through the kernel’s buffer cache.
This mechanism can boost performance if the data is unlikely to be
used again in the short term (during a disk backup, for example), or
for applications such as large database management systems that
perform their own caching.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-03-30 18:53 ` Bart Van Assche
(?)
@ 2009-03-31 17:37 ` Vladislav Bolkhovitin
2009-03-31 18:43 ` Ross S. W. Walker
-1 siblings, 1 reply; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-03-31 17:37 UTC (permalink / raw)
To: Bart Van Assche
Cc: iscsitarget-devel, scst-devel, linux-kernel, linux-scsi, stgt
Bart Van Assche, on 03/30/2009 10:53 PM wrote:
> On Mon, Mar 30, 2009 at 8:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
>> Bart Van Assche, on 03/30/2009 10:06 PM wrote:
>>> These are indeed interesting results. There are some aspects of the
>>> test setup I do not understand however:
>>> * All tests have been run with buffered I/O instead of direct I/O
>>> (iflag=direct / oflag=direct). My experience is that the results of
>>> tests with direct I/O are easier to reproduce (less variation between
>>> runs). So I have been wondering why the tests have been run with
>>> buffered I/O instead ?
>> Real applications use buffered I/O, hence it should be used in tests. It
>> evaluates all the storage stack on both initiator and target as a whole.
>> The results are very reproducible, variation is about 10%.
>
> Most applications do indeed use buffered I/O. Database software
> however often uses direct I/O. It might be interesting to publish
> performance results for both buffered I/O and direct I/O.
Yes, sure
> A quote from
> the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
> (Linux Symposium, Ottawa, 2003):
>
> Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
> a device without copying the data through the kernel’s buffer cache.
> This mechanism can boost performance if the data is unlikely to be
> used again in the short term (during a disk backup, for example), or
> for applications such as large database management systems that
> perform their own caching.
Please don't misread phrase "unlikely to be used again in the short
term". If you have read-ahead, all your cached data is *likely* to be
used "again" in the near future after they were read from storage,
although only once in the first read by application. The same is true
for write-back caching, where data written to the cache once for each
command. Both read-ahead and write back are very important for good
performance and O_DIRECT throws them away. All the modern HDDs have a
memory buffer (cache) at least 2MB big on the cheapest ones. This cache
is essential for performance, although how can it make any difference if
the host computer has, say, 1000 times more memory?
Thus, to work effectively with O_DIRECT an application has to be very
smart to workaround the lack of read-ahead and write back.
I personally consider O_DIRECT (as well as BLOCKIO) as nothing more than
a workaround for possible flaws in the storage subsystem. If O_DIRECT
works better, then in 99+% cases there is something in the storage
subsystem, which should be fixed to perform better.
To be complete, there is one case where O_DIRECT and BLOCKIO have an
advantage: both of them transfer data zero-copy. So they are good if
your memory is too slow comparing to storage (InfiniBand case, for
instance) and additional data copy hurts performance noticeably.
> Bart.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
2009-03-31 17:37 ` Vladislav Bolkhovitin
@ 2009-03-31 18:43 ` Ross S. W. Walker
0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-03-31 18:43 UTC (permalink / raw)
To: Vladislav Bolkhovitin, Bart Van Assche
Cc: iscsitarget-devel, scst-devel, linux-kernel, linux-scsi, stgt
Vladislav Bolkhovitin wrote:
> Bart Van Assche, on 03/30/2009 10:53 PM wrote:
> >
> > Most applications do indeed use buffered I/O. Database software
> > however often uses direct I/O. It might be interesting to publish
> > performance results for both buffered I/O and direct I/O.
>
> Yes, sure
>
> > A quote from
> > the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
> > (Linux Symposium, Ottawa, 2003):
> >
> > Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
> > a device without copying the data through the kernel’s buffer cache.
> > This mechanism can boost performance if the data is unlikely to be
> > used again in the short term (during a disk backup, for example), or
> > for applications such as large database management systems that
> > perform their own caching.
>
> Please don't misread phrase "unlikely to be used again in the short
> term". If you have read-ahead, all your cached data is *likely* to be
> used "again" in the near future after they were read from storage,
> although only once in the first read by application. The same is true
> for write-back caching, where data written to the cache once for each
> command. Both read-ahead and write back are very important for good
> performance and O_DIRECT throws them away. All the modern HDDs have a
> memory buffer (cache) at least 2MB big on the cheapest ones.
> This cache is essential for performance, although how can it make any
> difference if the host computer has, say, 1000 times more memory?
>
> Thus, to work effectively with O_DIRECT an application has to be very
> smart to workaround the lack of read-ahead and write back.
True, the application has to perform it's own read-ahead and write-back.
Kind of like how a database does it, or maybe the page cache on the
iSCSI initiator's system ;-)
> I personally consider O_DIRECT (as well as BLOCKIO) as nothing more than
> a workaround for possible flaws in the storage subsystem. If O_DIRECT
> works better, then in 99+% cases there is something in the storage
> subsystem, which should be fixed to perform better.
That's not true, page cached I/O is broken into page sizes which limits
the I/O bandwidth of the storage hardware while imposing a higher CPU
overhead. Obviously page-cached I/O isn't ideal for all situations.
You could also have an amazing backend storage system with it's own
NVRAM cache. Why put the performance overhead onto the target system
when you can off-load it to the controller?
> To be complete, there is one case where O_DIRECT and BLOCKIO have an
> advantage: both of them transfer data zero-copy. So they are good if
> your memory is too slow comparing to storage (InfiniBand case, for
> instance) and additional data copy hurts performance noticeably.
The bottom line which will always be true:
Know your workload, configure your storage to match.
The best storage solutions allow the implementor the most flexibility
in configuring the storage, which I think both IET and SCST do.
IET just needs to fix how it does it workload with CFQ which
somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
gain some extra speed.
Vlad, how about a comparison of SCST vs IET without those kernel hooks?
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
@ 2009-03-31 18:43 ` Ross S. W. Walker
0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-03-31 18:43 UTC (permalink / raw)
To: Vladislav Bolkhovitin, Bart Van Assche
Cc: iscsitarget-devel, scst-devel, linux-kernel, linux-scsi, stgt
Vladislav Bolkhovitin wrote:
> Bart Van Assche, on 03/30/2009 10:53 PM wrote:
> >
> > Most applications do indeed use buffered I/O. Database software
> > however often uses direct I/O. It might be interesting to publish
> > performance results for both buffered I/O and direct I/O.
>
> Yes, sure
>
> > A quote from
> > the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
> > (Linux Symposium, Ottawa, 2003):
> >
> > Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
> > a device without copying the data through the kernel’s buffer cache.
> > This mechanism can boost performance if the data is unlikely to be
> > used again in the short term (during a disk backup, for example), or
> > for applications such as large database management systems that
> > perform their own caching.
>
> Please don't misread phrase "unlikely to be used again in the short
> term". If you have read-ahead, all your cached data is *likely* to be
> used "again" in the near future after they were read from storage,
> although only once in the first read by application. The same is true
> for write-back caching, where data written to the cache once for each
> command. Both read-ahead and write back are very important for good
> performance and O_DIRECT throws them away. All the modern HDDs have a
> memory buffer (cache) at least 2MB big on the cheapest ones.
> This cache is essential for performance, although how can it make any
> difference if the host computer has, say, 1000 times more memory?
>
> Thus, to work effectively with O_DIRECT an application has to be very
> smart to workaround the lack of read-ahead and write back.
True, the application has to perform it's own read-ahead and write-back.
Kind of like how a database does it, or maybe the page cache on the
iSCSI initiator's system ;-)
> I personally consider O_DIRECT (as well as BLOCKIO) as nothing more than
> a workaround for possible flaws in the storage subsystem. If O_DIRECT
> works better, then in 99+% cases there is something in the storage
> subsystem, which should be fixed to perform better.
That's not true, page cached I/O is broken into page sizes which limits
the I/O bandwidth of the storage hardware while imposing a higher CPU
overhead. Obviously page-cached I/O isn't ideal for all situations.
You could also have an amazing backend storage system with it's own
NVRAM cache. Why put the performance overhead onto the target system
when you can off-load it to the controller?
> To be complete, there is one case where O_DIRECT and BLOCKIO have an
> advantage: both of them transfer data zero-copy. So they are good if
> your memory is too slow comparing to storage (InfiniBand case, for
> instance) and additional data copy hurts performance noticeably.
The bottom line which will always be true:
Know your workload, configure your storage to match.
The best storage solutions allow the implementor the most flexibility
in configuring the storage, which I think both IET and SCST do.
IET just needs to fix how it does it workload with CFQ which
somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
gain some extra speed.
Vlad, how about a comparison of SCST vs IET without those kernel hooks?
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.
------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
2009-03-31 18:43 ` Ross S. W. Walker
@ 2009-04-01 6:29 ` Bart Van Assche
-1 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-01 6:29 UTC (permalink / raw)
To: Ross S. W. Walker
Cc: Vladislav Bolkhovitin, iscsitarget-devel, scst-devel,
linux-kernel, linux-scsi, stgt
On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
<RWalker@medallion.com> wrote:
> IET just needs to fix how it does it workload with CFQ which
> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
> gain some extra speed.
I'm not familiar with the implementation details of CFQ, but I know
that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
default number of kernel threads of the scst_vdisk kernel module has
been increased to 5. Could this explain the performance difference
between SCST and IET for FILEIO and BLOCKIO ?
Bart.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
@ 2009-04-01 6:29 ` Bart Van Assche
0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-01 6:29 UTC (permalink / raw)
To: Ross S. W. Walker
Cc: Vladislav Bolkhovitin, iscsitarget-devel, scst-devel,
linux-kernel, linux-scsi, stgt
On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
<RWalker@medallion.com> wrote:
> IET just needs to fix how it does it workload with CFQ which
> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
> gain some extra speed.
I'm not familiar with the implementation details of CFQ, but I know
that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
default number of kernel threads of the scst_vdisk kernel module has
been increased to 5. Could this explain the performance difference
between SCST and IET for FILEIO and BLOCKIO ?
Bart.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
2009-04-01 6:29 ` Bart Van Assche
@ 2009-04-01 12:20 ` Ross Walker
-1 siblings, 0 replies; 34+ messages in thread
From: Ross Walker @ 2009-04-01 12:20 UTC (permalink / raw)
To: Bart Van Assche
Cc: Ross S. W. Walker, Vladislav Bolkhovitin, linux-scsi,
iSCSI Enterprise Target Developer List, linux-kernel, stgt,
scst-devel
On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>
wrote:
> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
> <RWalker@medallion.com> wrote:
>> IET just needs to fix how it does it workload with CFQ which
>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>> gain some extra speed.
>
> I'm not familiar with the implementation details of CFQ, but I know
> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
> default number of kernel threads of the scst_vdisk kernel module has
> been increased to 5. Could this explain the performance difference
> between SCST and IET for FILEIO and BLOCKIO ?
Thank for the update. IET has used 8 threads per target for ages now,
I don't think it is that.
It may be how the I/O threads are forked in SCST that causes them to
be in the same I/O context with each other.
I'm pretty sure implementing a version of the patch that was used for
the dump command (found on the LKML) will fix this.
But thanks goes to Vlad for pointing this dificiency out so we can fix
it to help make IET even better.
-Ross
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
@ 2009-04-01 12:20 ` Ross Walker
0 siblings, 0 replies; 34+ messages in thread
From: Ross Walker @ 2009-04-01 12:20 UTC (permalink / raw)
To: Bart Van Assche
Cc: Ross S. W. Walker, Vladislav Bolkhovitin, linux-scsi,
iSCSI Enterprise Target Developer List, linux-kernel, stgt,
scst-devel
On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>
wrote:
> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
> <RWalker@medallion.com> wrote:
>> IET just needs to fix how it does it workload with CFQ which
>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>> gain some extra speed.
>
> I'm not familiar with the implementation details of CFQ, but I know
> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
> default number of kernel threads of the scst_vdisk kernel module has
> been increased to 5. Could this explain the performance difference
> between SCST and IET for FILEIO and BLOCKIO ?
Thank for the update. IET has used 8 threads per target for ages now,
I don't think it is that.
It may be how the I/O threads are forked in SCST that causes them to
be in the same I/O context with each other.
I'm pretty sure implementing a version of the patch that was used for
the dump command (found on the LKML) will fix this.
But thanks goes to Vlad for pointing this dificiency out so we can fix
it to help make IET even better.
-Ross
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: ISCSI-SCST performance (with also IET and STGT data)
2009-03-30 17:33 ISCSI-SCST performance (with also IET and STGT data) Vladislav Bolkhovitin
@ 2009-04-01 20:14 ` Bart Van Assche
2009-04-01 20:14 ` Bart Van Assche
1 sibling, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-01 20:14 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: scst-devel, linux-scsi, linux-kernel, iscsitarget-devel, stgt
On Mon, Mar 30, 2009 at 7:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
==================================================================
>
> I. SEQUENTIAL ACCESS OVER SINGLE LINE
>
> 1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
>
> ISCSI-SCST IET STGT
> NULLIO: 106 105 103
> FILEIO/CFQ: 82 57 55
> FILEIO/deadline 69 69 67
> BLOCKIO/CFQ 81 28 -
> BLOCKIO/deadline 80 66 -
I have repeated some of these performance tests for iSCSI over IPoIB
(two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
for the buffered I/O test with a block size of 512K (initiator)
against a file of 1GB residing on a tmpfs filesystem on the target are
as follows:
write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
And for a block size of 4 KB:
write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
Or: depending on the test scenario, SCST transfers data between 2% and
30% faster via the iSCSI protocol over this network.
Something that is not relevant for this comparison, but interesting to
know: with the SRP implementation in SCST the maximal read throughput
is 1290 MB/s on the same setup.
Bart.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-01 20:14 ` Bart Van Assche
0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-01 20:14 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: scst-devel, linux-scsi, linux-kernel, iscsitarget-devel, stgt
On Mon, Mar 30, 2009 at 7:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
==================================================================
>
> I. SEQUENTIAL ACCESS OVER SINGLE LINE
>
> 1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
>
> ISCSI-SCST IET STGT
> NULLIO: 106 105 103
> FILEIO/CFQ: 82 57 55
> FILEIO/deadline 69 69 67
> BLOCKIO/CFQ 81 28 -
> BLOCKIO/deadline 80 66 -
I have repeated some of these performance tests for iSCSI over IPoIB
(two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
for the buffered I/O test with a block size of 512K (initiator)
against a file of 1GB residing on a tmpfs filesystem on the target are
as follows:
write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
And for a block size of 4 KB:
write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
Or: depending on the test scenario, SCST transfers data between 2% and
30% faster via the iSCSI protocol over this network.
Something that is not relevant for this comparison, but interesting to
know: with the SRP implementation in SCST the maximal read throughput
is 1290 MB/s on the same setup.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
2009-04-01 12:20 ` Ross Walker
@ 2009-04-01 20:23 ` James Bottomley
-1 siblings, 0 replies; 34+ messages in thread
From: James Bottomley @ 2009-04-01 20:23 UTC (permalink / raw)
To: Ross Walker
Cc: Bart Van Assche, Ross S. W. Walker, Vladislav Bolkhovitin,
linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
stgt, scst-devel
On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>
> wrote:
>
> > On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
> > <RWalker@medallion.com> wrote:
> >> IET just needs to fix how it does it workload with CFQ which
> >> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
> >> gain some extra speed.
> >
> > I'm not familiar with the implementation details of CFQ, but I know
> > that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
> > default number of kernel threads of the scst_vdisk kernel module has
> > been increased to 5. Could this explain the performance difference
> > between SCST and IET for FILEIO and BLOCKIO ?
>
> Thank for the update. IET has used 8 threads per target for ages now,
> I don't think it is that.
>
> It may be how the I/O threads are forked in SCST that causes them to
> be in the same I/O context with each other.
>
> I'm pretty sure implementing a version of the patch that was used for
> the dump command (found on the LKML) will fix this.
>
> But thanks goes to Vlad for pointing this dificiency out so we can fix
> it to help make IET even better.
SCST explicitly fiddles with the io context to get this to happen. It
has a hack to block to export alloc_io_context:
http://marc.info/?t=122893564800003
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
@ 2009-04-01 20:23 ` James Bottomley
0 siblings, 0 replies; 34+ messages in thread
From: James Bottomley @ 2009-04-01 20:23 UTC (permalink / raw)
To: Ross Walker
Cc: Bart Van Assche, Ross S. W. Walker, Vladislav Bolkhovitin,
linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
stgt, scst-devel
On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>
> wrote:
>
> > On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
> > <RWalker@medallion.com> wrote:
> >> IET just needs to fix how it does it workload with CFQ which
> >> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
> >> gain some extra speed.
> >
> > I'm not familiar with the implementation details of CFQ, but I know
> > that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
> > default number of kernel threads of the scst_vdisk kernel module has
> > been increased to 5. Could this explain the performance difference
> > between SCST and IET for FILEIO and BLOCKIO ?
>
> Thank for the update. IET has used 8 threads per target for ages now,
> I don't think it is that.
>
> It may be how the I/O threads are forked in SCST that causes them to
> be in the same I/O context with each other.
>
> I'm pretty sure implementing a version of the patch that was used for
> the dump command (found on the LKML) will fix this.
>
> But thanks goes to Vlad for pointing this dificiency out so we can fix
> it to help make IET even better.
SCST explicitly fiddles with the io context to get this to happen. It
has a hack to block to export alloc_io_context:
http://marc.info/?t=122893564800003
James
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-01 20:23 ` James Bottomley
@ 2009-04-02 7:38 ` Vladislav Bolkhovitin
-1 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 7:38 UTC (permalink / raw)
To: James Bottomley
Cc: Ross Walker, linux-scsi, iSCSI Enterprise Target Developer List,
linux-kernel, Ross S. W. Walker, scst-devel, stgt,
Bart Van Assche
James Bottomley, on 04/02/2009 12:23 AM wrote:
> On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
>> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>
>> wrote:
>>
>>> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
>>> <RWalker@medallion.com> wrote:
>>>> IET just needs to fix how it does it workload with CFQ which
>>>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>>>> gain some extra speed.
>>> I'm not familiar with the implementation details of CFQ, but I know
>>> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
>>> default number of kernel threads of the scst_vdisk kernel module has
>>> been increased to 5. Could this explain the performance difference
>>> between SCST and IET for FILEIO and BLOCKIO ?
>> Thank for the update. IET has used 8 threads per target for ages now,
>> I don't think it is that.
>>
>> It may be how the I/O threads are forked in SCST that causes them to
>> be in the same I/O context with each other.
>>
>> I'm pretty sure implementing a version of the patch that was used for
>> the dump command (found on the LKML) will fix this.
>>
>> But thanks goes to Vlad for pointing this dificiency out so we can fix
>> it to help make IET even better.
>
> SCST explicitly fiddles with the io context to get this to happen. It
> has a hack to block to export alloc_io_context:
>
> http://marc.info/?t=122893564800003
Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
But that's not the only reason for good performance. Particularly, it
can't explain Bart's tmpfs results from the previous message, where the
majority of I/O done to/from RAM without any I/O scheduler involved. (Or
does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I
remember correctly, i.e. the test data set was 25% of RAM.
Thanks,
Vlad
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 7:38 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 7:38 UTC (permalink / raw)
To: James Bottomley
Cc: Ross Walker, linux-scsi, iSCSI Enterprise Target Developer List,
linux-kernel, Ross S. W. Walker, scst-devel, stgt,
Bart Van Assche
James Bottomley, on 04/02/2009 12:23 AM wrote:
> On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
>> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>
>> wrote:
>>
>>> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
>>> <RWalker@medallion.com> wrote:
>>>> IET just needs to fix how it does it workload with CFQ which
>>>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>>>> gain some extra speed.
>>> I'm not familiar with the implementation details of CFQ, but I know
>>> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
>>> default number of kernel threads of the scst_vdisk kernel module has
>>> been increased to 5. Could this explain the performance difference
>>> between SCST and IET for FILEIO and BLOCKIO ?
>> Thank for the update. IET has used 8 threads per target for ages now,
>> I don't think it is that.
>>
>> It may be how the I/O threads are forked in SCST that causes them to
>> be in the same I/O context with each other.
>>
>> I'm pretty sure implementing a version of the patch that was used for
>> the dump command (found on the LKML) will fix this.
>>
>> But thanks goes to Vlad for pointing this dificiency out so we can fix
>> it to help make IET even better.
>
> SCST explicitly fiddles with the io context to get this to happen. It
> has a hack to block to export alloc_io_context:
>
> http://marc.info/?t=122893564800003
Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
But that's not the only reason for good performance. Particularly, it
can't explain Bart's tmpfs results from the previous message, where the
majority of I/O done to/from RAM without any I/O scheduler involved. (Or
does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I
remember correctly, i.e. the test data set was 25% of RAM.
Thanks,
Vlad
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-02 7:38 ` Vladislav Bolkhovitin
@ 2009-04-02 9:02 ` Vladislav Bolkhovitin
-1 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 9:02 UTC (permalink / raw)
To: James Bottomley
Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, Ross S. W. Walker, scst-devel, stgt
Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> James Bottomley, on 04/02/2009 12:23 AM wrote:
>> On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
>>> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>
>>> wrote:
>>>
>>>> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
>>>> <RWalker@medallion.com> wrote:
>>>>> IET just needs to fix how it does it workload with CFQ which
>>>>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>>>>> gain some extra speed.
>>>> I'm not familiar with the implementation details of CFQ, but I know
>>>> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
>>>> default number of kernel threads of the scst_vdisk kernel module has
>>>> been increased to 5. Could this explain the performance difference
>>>> between SCST and IET for FILEIO and BLOCKIO ?
>>> Thank for the update. IET has used 8 threads per target for ages now,
>>> I don't think it is that.
>>>
>>> It may be how the I/O threads are forked in SCST that causes them to
>>> be in the same I/O context with each other.
>>>
>>> I'm pretty sure implementing a version of the patch that was used for
>>> the dump command (found on the LKML) will fix this.
>>>
>>> But thanks goes to Vlad for pointing this dificiency out so we can fix
>>> it to help make IET even better.
>> SCST explicitly fiddles with the io context to get this to happen. It
>> has a hack to block to export alloc_io_context:
>>
>> http://marc.info/?t=122893564800003
>
> Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
>
> But that's not the only reason for good performance. Particularly, it
> can't explain Bart's tmpfs results from the previous message, where the
> majority of I/O done to/from RAM without any I/O scheduler involved. (Or
> does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I
> remember correctly, i.e. the test data set was 25% of RAM.
To remove any suspicions that I'm playing dirty games here I should note
that in many cases I can't say what exactly is responsible for good SCST
performance. I can say only something like "good design and
implementation", but, I guess, it wouldn't be counted too much.
SCST/iSCSI-SCST from the very beginning were designed and made with the
best performance in mind and that has brought the result. Sorry, but at
the moment I can't afford doing any "why it's so good?" kinds of
investigations, because I have a lot more important things to do, like
SCST procfs -> sysfs interface conversion.
Thanks,
Vlad
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 9:02 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 9:02 UTC (permalink / raw)
To: James Bottomley
Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, Ross S. W. Walker, scst-devel, stgt
Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> James Bottomley, on 04/02/2009 12:23 AM wrote:
>> On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
>>> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>
>>> wrote:
>>>
>>>> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
>>>> <RWalker@medallion.com> wrote:
>>>>> IET just needs to fix how it does it workload with CFQ which
>>>>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>>>>> gain some extra speed.
>>>> I'm not familiar with the implementation details of CFQ, but I know
>>>> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
>>>> default number of kernel threads of the scst_vdisk kernel module has
>>>> been increased to 5. Could this explain the performance difference
>>>> between SCST and IET for FILEIO and BLOCKIO ?
>>> Thank for the update. IET has used 8 threads per target for ages now,
>>> I don't think it is that.
>>>
>>> It may be how the I/O threads are forked in SCST that causes them to
>>> be in the same I/O context with each other.
>>>
>>> I'm pretty sure implementing a version of the patch that was used for
>>> the dump command (found on the LKML) will fix this.
>>>
>>> But thanks goes to Vlad for pointing this dificiency out so we can fix
>>> it to help make IET even better.
>> SCST explicitly fiddles with the io context to get this to happen. It
>> has a hack to block to export alloc_io_context:
>>
>> http://marc.info/?t=122893564800003
>
> Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
>
> But that's not the only reason for good performance. Particularly, it
> can't explain Bart's tmpfs results from the previous message, where the
> majority of I/O done to/from RAM without any I/O scheduler involved. (Or
> does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I
> remember correctly, i.e. the test data set was 25% of RAM.
To remove any suspicions that I'm playing dirty games here I should note
that in many cases I can't say what exactly is responsible for good SCST
performance. I can say only something like "good design and
implementation", but, I guess, it wouldn't be counted too much.
SCST/iSCSI-SCST from the very beginning were designed and made with the
best performance in mind and that has brought the result. Sorry, but at
the moment I can't afford doing any "why it's so good?" kinds of
investigations, because I have a lot more important things to do, like
SCST procfs -> sysfs interface conversion.
Thanks,
Vlad
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-02 9:02 ` Vladislav Bolkhovitin
@ 2009-04-02 14:06 ` Ross S. W. Walker
-1 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 14:06 UTC (permalink / raw)
To: Vladislav Bolkhovitin, James Bottomley
Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, scst-devel, stgt
Vladislav Bolkhovitin wrote:
> Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> > James Bottomley, on 04/02/2009 12:23 AM wrote:
> >>
> >> SCST explicitly fiddles with the io context to get this to happen. It
> >> has a hack to block to export alloc_io_context:
> >>
> >> http://marc.info/?t=122893564800003
> >
> > Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
Call it what you like,
Vladislav Bolkhovitin wrote:
> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
>
> I would be interested in knowing how your code defeats CFQ's extremely
> high latency? Does your code reach into the io scheduler too? If not,
> some code hints would be great.
Hmm, CFQ doesn't have any extra processing latency, especially
"extremely", hence there is nothing to defeat. If it had, how could it
been chosen as the default?
----------
List: linux-scsi
Subject: [PATCH][RFC 13/23]: Export of alloc_io_context() function
From: Vladislav Bolkhovitin <vst () vlnb ! net>
Date: 2008-12-10 18:49:19
Message-ID: 49400F2F.4050603 () vlnb ! net
This patch exports alloc_io_context() function. For performance reasons
SCST queues commands using a pool of IO threads. It is considerably
better for performance (>30% increase on sequential reads) if threads in
a pool have the same IO context. Since SCST can be built as a module,
it needs alloc_io_context() function exported.
<snip>
----------
I call that lying.
> > But that's not the only reason for good performance. Particularly, it
> > can't explain Bart's tmpfs results from the previous message, where the
> > majority of I/O done to/from RAM without any I/O scheduler involved. (Or
> > does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I
> > remember correctly, i.e. the test data set was 25% of RAM.
>
> To remove any suspicions that I'm playing dirty games here I should note
<snip>
I don't know what games your playing at, but do me a favor, if your too
stupid enough to realize when your caught in a lie and to just shut up
then please do me the favor and leave me out of any further correspondence
from you.
Thank you,
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 14:06 ` Ross S. W. Walker
0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 14:06 UTC (permalink / raw)
To: Vladislav Bolkhovitin, James Bottomley
Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, scst-devel, stgt
Vladislav Bolkhovitin wrote:
> Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> > James Bottomley, on 04/02/2009 12:23 AM wrote:
> >>
> >> SCST explicitly fiddles with the io context to get this to happen. It
> >> has a hack to block to export alloc_io_context:
> >>
> >> http://marc.info/?t=122893564800003
> >
> > Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
Call it what you like,
Vladislav Bolkhovitin wrote:
> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
>
> I would be interested in knowing how your code defeats CFQ's extremely
> high latency? Does your code reach into the io scheduler too? If not,
> some code hints would be great.
Hmm, CFQ doesn't have any extra processing latency, especially
"extremely", hence there is nothing to defeat. If it had, how could it
been chosen as the default?
----------
List: linux-scsi
Subject: [PATCH][RFC 13/23]: Export of alloc_io_context() function
From: Vladislav Bolkhovitin <vst () vlnb ! net>
Date: 2008-12-10 18:49:19
Message-ID: 49400F2F.4050603 () vlnb ! net
This patch exports alloc_io_context() function. For performance reasons
SCST queues commands using a pool of IO threads. It is considerably
better for performance (>30% increase on sequential reads) if threads in
a pool have the same IO context. Since SCST can be built as a module,
it needs alloc_io_context() function exported.
<snip>
----------
I call that lying.
> > But that's not the only reason for good performance. Particularly, it
> > can't explain Bart's tmpfs results from the previous message, where the
> > majority of I/O done to/from RAM without any I/O scheduler involved. (Or
> > does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I
> > remember correctly, i.e. the test data set was 25% of RAM.
>
> To remove any suspicions that I'm playing dirty games here I should note
<snip>
I don't know what games your playing at, but do me a favor, if your too
stupid enough to realize when your caught in a lie and to just shut up
then please do me the favor and leave me out of any further correspondence
from you.
Thank you,
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-02 14:06 ` Ross S. W. Walker
@ 2009-04-02 14:14 ` Ross S. W. Walker
-1 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 14:14 UTC (permalink / raw)
To: Vladislav Bolkhovitin, James Bottomley
Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, scst-devel, stgt
Ross S. W. Walker wrote:
> Vladislav Bolkhovitin wrote:
> > Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> > > James Bottomley, on 04/02/2009 12:23 AM wrote:
> > >>
> > >> SCST explicitly fiddles with the io context to get this to happen. It
> > >> has a hack to block to export alloc_io_context:
> > >>
> > >> http://marc.info/?t=122893564800003
> > >
> > > Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
>
> Call it what you like,
>
> Vladislav Bolkhovitin wrote:
> > Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> >
> > I would be interested in knowing how your code defeats CFQ's extremely
> > high latency? Does your code reach into the io scheduler too? If not,
> > some code hints would be great.
>
The above quoting was wrong, for accuracy, it should have read:
Vladislav Bolkhovitin wrote:
> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> >
> > I would be interested in knowing how your code defeats CFQ's extremely
> > high latency? Does your code reach into the io scheduler too? If not,
> > some code hints would be great.
>
> Hmm, CFQ doesn't have any extra processing latency, especially
> "extremely", hence there is nothing to defeat. If it had, how could it
> been chosen as the default?
Just so there is no misunderstanding who said what here.
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 14:14 ` Ross S. W. Walker
0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 14:14 UTC (permalink / raw)
To: Vladislav Bolkhovitin, James Bottomley
Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, scst-devel, stgt
Ross S. W. Walker wrote:
> Vladislav Bolkhovitin wrote:
> > Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> > > James Bottomley, on 04/02/2009 12:23 AM wrote:
> > >>
> > >> SCST explicitly fiddles with the io context to get this to happen. It
> > >> has a hack to block to export alloc_io_context:
> > >>
> > >> http://marc.info/?t=122893564800003
> > >
> > > Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
>
> Call it what you like,
>
> Vladislav Bolkhovitin wrote:
> > Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> >
> > I would be interested in knowing how your code defeats CFQ's extremely
> > high latency? Does your code reach into the io scheduler too? If not,
> > some code hints would be great.
>
The above quoting was wrong, for accuracy, it should have read:
Vladislav Bolkhovitin wrote:
> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> >
> > I would be interested in knowing how your code defeats CFQ's extremely
> > high latency? Does your code reach into the io scheduler too? If not,
> > some code hints would be great.
>
> Hmm, CFQ doesn't have any extra processing latency, especially
> "extremely", hence there is nothing to defeat. If it had, how could it
> been chosen as the default?
Just so there is no misunderstanding who said what here.
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-02 14:06 ` Ross S. W. Walker
@ 2009-04-02 15:36 ` Vladislav Bolkhovitin
-1 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 15:36 UTC (permalink / raw)
To: Ross S. W. Walker
Cc: James Bottomley, linux-scsi,
iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, stgt, scst-devel
Ross S. W. Walker, on 04/02/2009 06:06 PM wrote:
> Vladislav Bolkhovitin wrote:
>> Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
>>> James Bottomley, on 04/02/2009 12:23 AM wrote:
>>>> SCST explicitly fiddles with the io context to get this to happen. It
>>>> has a hack to block to export alloc_io_context:
>>>>
>>>> http://marc.info/?t=122893564800003
>>> Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
>
> Call it what you like,
>
> Vladislav Bolkhovitin wrote:
>> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
>>
>> I would be interested in knowing how your code defeats CFQ's extremely
>> high latency? Does your code reach into the io scheduler too? If not,
>> some code hints would be great.
>
> Hmm, CFQ doesn't have any extra processing latency, especially
> "extremely", hence there is nothing to defeat. If it had, how could it
> been chosen as the default?
>
> ----------
> List: linux-scsi
> Subject: [PATCH][RFC 13/23]: Export of alloc_io_context() function
> From: Vladislav Bolkhovitin <vst () vlnb ! net>
> Date: 2008-12-10 18:49:19
> Message-ID: 49400F2F.4050603 () vlnb ! net
>
> This patch exports alloc_io_context() function. For performance reasons
> SCST queues commands using a pool of IO threads. It is considerably
> better for performance (>30% increase on sequential reads) if threads in
> a pool have the same IO context. Since SCST can be built as a module,
> it needs alloc_io_context() function exported.
>
> <snip>
> ----------
>
> I call that lying.
>
>>> But that's not the only reason for good performance. Particularly, it
>>> can't explain Bart's tmpfs results from the previous message, where the
>>> majority of I/O done to/from RAM without any I/O scheduler involved. (Or
>>> does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I
>>> remember correctly, i.e. the test data set was 25% of RAM.
>> To remove any suspicions that I'm playing dirty games here I should note
> <snip>
>
> I don't know what games your playing at, but do me a favor, if your too
> stupid enough to realize when your caught in a lie and to just shut up
> then please do me the favor and leave me out of any further correspondence
> from you.
Think what you want and do what you want. You can even filter out all
e-mails from me, that's your right. But:
1. As I wrote grouping threads into a single IO context doesn't explain
all the performance difference and finding out reasons for other's
performance problems isn't something I can afford at the moment.
2. CFQ doesn't have any processing latency and has never had. Learn to
understand what are your writing about and how to correctly express
yourself at first. You asked about that latency and I replied that there
is nothing to defeat.
3. SCST doesn't have any hooks into CFQ and not going to have in the
considerable future.
> Thank you,
>
> -Ross
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 15:36 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 15:36 UTC (permalink / raw)
To: Ross S. W. Walker
Cc: James Bottomley, linux-scsi,
iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, stgt, scst-devel
Ross S. W. Walker, on 04/02/2009 06:06 PM wrote:
> Vladislav Bolkhovitin wrote:
>> Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
>>> James Bottomley, on 04/02/2009 12:23 AM wrote:
>>>> SCST explicitly fiddles with the io context to get this to happen. It
>>>> has a hack to block to export alloc_io_context:
>>>>
>>>> http://marc.info/?t=122893564800003
>>> Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
>
> Call it what you like,
>
> Vladislav Bolkhovitin wrote:
>> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
>>
>> I would be interested in knowing how your code defeats CFQ's extremely
>> high latency? Does your code reach into the io scheduler too? If not,
>> some code hints would be great.
>
> Hmm, CFQ doesn't have any extra processing latency, especially
> "extremely", hence there is nothing to defeat. If it had, how could it
> been chosen as the default?
>
> ----------
> List: linux-scsi
> Subject: [PATCH][RFC 13/23]: Export of alloc_io_context() function
> From: Vladislav Bolkhovitin <vst () vlnb ! net>
> Date: 2008-12-10 18:49:19
> Message-ID: 49400F2F.4050603 () vlnb ! net
>
> This patch exports alloc_io_context() function. For performance reasons
> SCST queues commands using a pool of IO threads. It is considerably
> better for performance (>30% increase on sequential reads) if threads in
> a pool have the same IO context. Since SCST can be built as a module,
> it needs alloc_io_context() function exported.
>
> <snip>
> ----------
>
> I call that lying.
>
>>> But that's not the only reason for good performance. Particularly, it
>>> can't explain Bart's tmpfs results from the previous message, where the
>>> majority of I/O done to/from RAM without any I/O scheduler involved. (Or
>>> does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I
>>> remember correctly, i.e. the test data set was 25% of RAM.
>> To remove any suspicions that I'm playing dirty games here I should note
> <snip>
>
> I don't know what games your playing at, but do me a favor, if your too
> stupid enough to realize when your caught in a lie and to just shut up
> then please do me the favor and leave me out of any further correspondence
> from you.
Think what you want and do what you want. You can even filter out all
e-mails from me, that's your right. But:
1. As I wrote grouping threads into a single IO context doesn't explain
all the performance difference and finding out reasons for other's
performance problems isn't something I can afford at the moment.
2. CFQ doesn't have any processing latency and has never had. Learn to
understand what are your writing about and how to correctly express
yourself at first. You asked about that latency and I replied that there
is nothing to defeat.
3. SCST doesn't have any hooks into CFQ and not going to have in the
considerable future.
> Thank you,
>
> -Ross
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-01 20:14 ` Bart Van Assche
(?)
@ 2009-04-02 17:16 ` Vladislav Bolkhovitin
2009-04-03 17:08 ` Bart Van Assche
2009-04-04 8:04 ` [Scst-devel] ISCSI-SCST performance (with also IET and STGT data) Bart Van Assche
-1 siblings, 2 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 17:16 UTC (permalink / raw)
To: Bart Van Assche; +Cc: scst-devel, linux-kernel, linux-scsi
Bart Van Assche, on 04/02/2009 12:14 AM wrote:
> On Mon, Mar 30, 2009 at 7:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> ==================================================================
>> I. SEQUENTIAL ACCESS OVER SINGLE LINE
>>
>> 1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
>>
>> ISCSI-SCST IET STGT
>> NULLIO: 106 105 103
>> FILEIO/CFQ: 82 57 55
>> FILEIO/deadline 69 69 67
>> BLOCKIO/CFQ 81 28 -
>> BLOCKIO/deadline 80 66 -
>
> I have repeated some of these performance tests for iSCSI over IPoIB
> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
> for the buffered I/O test with a block size of 512K (initiator)
> against a file of 1GB residing on a tmpfs filesystem on the target are
> as follows:
>
> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>
> And for a block size of 4 KB:
>
> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
Do you have any thoughts why writes are so bad? It shouldn't be so..
> Or: depending on the test scenario, SCST transfers data between 2% and
> 30% faster via the iSCSI protocol over this network.
>
> Something that is not relevant for this comparison, but interesting to
> know: with the SRP implementation in SCST the maximal read throughput
> is 1290 MB/s on the same setup.
This can be well explained. The limiting factor for iSCSI is that
iSCSI/TCP processing overloads a single CPU core. You can prove that on
vmstat output during the test. Sum of user and sys time should be about
100/(number of CPUs) or higher. SRP has a lot more CPU effective, hence
better has throughput.
If you try to test with 2 or more parallel IO streams, you should have
the correspondingly increased aggregate throughput up to the moment you
hit your memory copy bandwidth.
Thanks,
Vlad
> Bart.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-02 15:36 ` Vladislav Bolkhovitin
@ 2009-04-02 17:19 ` Ross S. W. Walker
-1 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 17:19 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: James Bottomley, linux-scsi,
iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, stgt, scst-devel
Vladislav Bolkhovitin wrote:
>
> Think what you want and do what you want. You can even filter out all
> e-mails from me, that's your right. But:
>
> 1. As I wrote grouping threads into a single IO context doesn't explain
> all the performance difference and finding out reasons for other's
> performance problems isn't something I can afford at the moment.
No, not all the performance, but a substantial part of it, enough
so to say IET has a real performance issue when using CFQ scheduler.
> 2. CFQ doesn't have any processing latency and has never had. Learn to
> understand what are your writing about and how to correctly express
> yourself at first. You asked about that latency and I replied that there
> is nothing to defeat.
CFQ pauses briefly before switching I/O contexts in order to make sure
it is giving as much bandwidth to a context before moving on. This is
documented. With a single I/O stream, or random I/O it won't be
noticeable, but for interleaved sequential I/O across multiple threads
with different I/O contexts it can be significant.
Not that Wikipedia is authorative: http://en.wikipedia.org/wiki/CFQ
It's right in the first paragraph:
"... While CFQ does not do explicit anticipatory IO scheduling, it
achieves the same effect of having good aggregate throughput for the
system as a whole, by allowing a process queue to idle at the end of
synchronous IO thereby "anticipating" further close IO from that
process. ..."
You can also check out the LXR:
This one in 2.6.18 kernels (RHEL) show a pause of HZ/10
http://lxr.linux.no/linux+v2.6.18/block/cfq-iosched.c#L30
So given a 10ms time slice, that would equate to ~1ms, in later
kernels it's defined as HZ/5 which can equate to ~2ms. These ms
delays can be an eternity for sequential I/O patterns.
> 3. SCST doesn't have any hooks into CFQ and not going to have in the
> considerable future.
True, SCST doesn't have any hooks into CFQ, but your code modifies
block/blk-ioc.c to export the alloc_io_context(), which by default
is a private function, to allow your kernel based threads to set
their I/O contexts to the same group, therefore avoiding the delay
CFQ imposes on the switching of the I/O contexts between these
threads.
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 17:19 ` Ross S. W. Walker
0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 17:19 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: James Bottomley, linux-scsi,
iSCSI Enterprise Target Developer List, linux-kernel,
Ross Walker, stgt, scst-devel
Vladislav Bolkhovitin wrote:
>
> Think what you want and do what you want. You can even filter out all
> e-mails from me, that's your right. But:
>
> 1. As I wrote grouping threads into a single IO context doesn't explain
> all the performance difference and finding out reasons for other's
> performance problems isn't something I can afford at the moment.
No, not all the performance, but a substantial part of it, enough
so to say IET has a real performance issue when using CFQ scheduler.
> 2. CFQ doesn't have any processing latency and has never had. Learn to
> understand what are your writing about and how to correctly express
> yourself at first. You asked about that latency and I replied that there
> is nothing to defeat.
CFQ pauses briefly before switching I/O contexts in order to make sure
it is giving as much bandwidth to a context before moving on. This is
documented. With a single I/O stream, or random I/O it won't be
noticeable, but for interleaved sequential I/O across multiple threads
with different I/O contexts it can be significant.
Not that Wikipedia is authorative: http://en.wikipedia.org/wiki/CFQ
It's right in the first paragraph:
"... While CFQ does not do explicit anticipatory IO scheduling, it
achieves the same effect of having good aggregate throughput for the
system as a whole, by allowing a process queue to idle at the end of
synchronous IO thereby "anticipating" further close IO from that
process. ..."
You can also check out the LXR:
This one in 2.6.18 kernels (RHEL) show a pause of HZ/10
http://lxr.linux.no/linux+v2.6.18/block/cfq-iosched.c#L30
So given a 10ms time slice, that would equate to ~1ms, in later
kernels it's defined as HZ/5 which can equate to ~2ms. These ms
delays can be an eternity for sequential I/O patterns.
> 3. SCST doesn't have any hooks into CFQ and not going to have in the
> considerable future.
True, SCST doesn't have any hooks into CFQ, but your code modifies
block/blk-ioc.c to export the alloc_io_context(), which by default
is a private function, to allow your kernel based threads to set
their I/O contexts to the same group, therefore avoiding the delay
CFQ imposes on the switching of the I/O contexts between these
threads.
-Ross
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-02 17:16 ` [Scst-devel] " Vladislav Bolkhovitin
@ 2009-04-03 17:08 ` Bart Van Assche
2009-04-03 17:13 ` Sufficool, Stanley
2009-04-04 8:04 ` [Scst-devel] ISCSI-SCST performance (with also IET and STGT data) Bart Van Assche
1 sibling, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2009-04-03 17:08 UTC (permalink / raw)
To: Vladislav Bolkhovitin; +Cc: scst-devel, linux-kernel, linux-scsi
On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Bart Van Assche, on 04/02/2009 12:14 AM wrote:
>> I have repeated some of these performance tests for iSCSI over IPoIB
>> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
>> for the buffered I/O test with a block size of 512K (initiator)
>> against a file of 1GB residing on a tmpfs filesystem on the target are
>> as follows:
>>
>> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
>> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>>
>> And for a block size of 4 KB:
>>
>> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
>> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
>
> Do you have any thoughts why writes are so bad? It shouldn't be so..
It's not impossible that with the 4 KB write test I hit the limits of
the initiator system (Intel E6750 CPU, 2.66 GHz, two cores). Some
statistics I gathered during the 4 KB write test:
Target: CPU load 0.5, 16500 mlx4-comp-0 interrupts per second, same
number of interrupts processed by each core (8250/s).
Initiator: CPU load 1.0, 32850 mlx4-comp-0 interrupts per second, all
interrupts occurred on the same core.
Bart.
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Scst-devel] ISCSI-SCST performance (with also IET and STGTdata)
2009-04-03 17:08 ` Bart Van Assche
@ 2009-04-03 17:13 ` Sufficool, Stanley
0 siblings, 0 replies; 34+ messages in thread
From: Sufficool, Stanley @ 2009-04-03 17:13 UTC (permalink / raw)
To: Bart Van Assche, Vladislav Bolkhovitin
Cc: scst-devel, linux-kernel, linux-scsi
> -----Original Message-----
> From: Bart Van Assche [mailto:bart.vanassche@gmail.com]
> Sent: Friday, April 03, 2009 10:09 AM
> To: Vladislav Bolkhovitin
> Cc: scst-devel; linux-kernel@vger.kernel.org;
> linux-scsi@vger.kernel.org
> Subject: Re: [Scst-devel] ISCSI-SCST performance (with also
> IET and STGTdata)
>
>
> On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin
> <vst@vlnb.net> wrote:
> > Bart Van Assche, on 04/02/2009 12:14 AM wrote:
> >> I have repeated some of these performance tests for iSCSI
> over IPoIB
> >> (two DDR PCIe 1.0 ConnectX HCA's connected back to back).
> The results
> >> for the buffered I/O test with a block size of 512K (initiator)
> >> against a file of 1GB residing on a tmpfs filesystem on the target
> >> are as follows:
> >>
> >> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
> >> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
> >>
> >> And for a block size of 4 KB:
> >>
> >> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
> >> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
> >
> > Do you have any thoughts why writes are so bad? It shouldn't be so..
>
> It's not impossible that with the 4 KB write test I hit the
> limits of the initiator system (Intel E6750 CPU, 2.66 GHz,
> two cores). Some statistics I gathered during the 4 KB write test:
> Target: CPU load 0.5, 16500 mlx4-comp-0 interrupts per
> second, same number of interrupts processed by each core (8250/s).
> Initiator: CPU load 1.0, 32850 mlx4-comp-0 interrupts per
> second, all interrupts occurred on the same core.
Are you using connected mode IPoIB and setting the MTU to 4KB? Would
fragmentation of IPoIB drive up the interrupt rates?
>
> Bart.
>
> --------------------------------------------------------------
> ----------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Scst-devel] ISCSI-SCST performance (with also IET and STGTdata)
@ 2009-04-03 17:13 ` Sufficool, Stanley
0 siblings, 0 replies; 34+ messages in thread
From: Sufficool, Stanley @ 2009-04-03 17:13 UTC (permalink / raw)
To: Bart Van Assche, Vladislav Bolkhovitin
Cc: scst-devel, linux-kernel, linux-scsi
> -----Original Message-----
> From: Bart Van Assche [mailto:bart.vanassche@gmail.com]
> Sent: Friday, April 03, 2009 10:09 AM
> To: Vladislav Bolkhovitin
> Cc: scst-devel; linux-kernel@vger.kernel.org;
> linux-scsi@vger.kernel.org
> Subject: Re: [Scst-devel] ISCSI-SCST performance (with also
> IET and STGTdata)
>
>
> On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin
> <vst@vlnb.net> wrote:
> > Bart Van Assche, on 04/02/2009 12:14 AM wrote:
> >> I have repeated some of these performance tests for iSCSI
> over IPoIB
> >> (two DDR PCIe 1.0 ConnectX HCA's connected back to back).
> The results
> >> for the buffered I/O test with a block size of 512K (initiator)
> >> against a file of 1GB residing on a tmpfs filesystem on the target
> >> are as follows:
> >>
> >> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
> >> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
> >>
> >> And for a block size of 4 KB:
> >>
> >> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
> >> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
> >
> > Do you have any thoughts why writes are so bad? It shouldn't be so..
>
> It's not impossible that with the 4 KB write test I hit the
> limits of the initiator system (Intel E6750 CPU, 2.66 GHz,
> two cores). Some statistics I gathered during the 4 KB write test:
> Target: CPU load 0.5, 16500 mlx4-comp-0 interrupts per
> second, same number of interrupts processed by each core (8250/s).
> Initiator: CPU load 1.0, 32850 mlx4-comp-0 interrupts per
> second, all interrupts occurred on the same core.
Are you using connected mode IPoIB and setting the MTU to 4KB? Would
fragmentation of IPoIB drive up the interrupt rates?
>
> Bart.
>
> --------------------------------------------------------------
> ----------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGTdata)
2009-04-03 17:13 ` Sufficool, Stanley
(?)
@ 2009-04-03 17:52 ` Bart Van Assche
-1 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-03 17:52 UTC (permalink / raw)
To: Sufficool, Stanley
Cc: Vladislav Bolkhovitin, scst-devel, linux-kernel, linux-scsi
On Fri, Apr 3, 2009 at 7:13 PM, Sufficool, Stanley
<ssufficool@rov.sbcounty.gov> wrote:
>> On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin
>> <vst@vlnb.net> wrote:
>> > Bart Van Assche, on 04/02/2009 12:14 AM wrote:
>> >> I have repeated some of these performance tests for iSCSI
>> >> over IPoIB
>> >> (two DDR PCIe 1.0 ConnectX HCA's connected back to back).
>> >> The results
>> >> for the buffered I/O test with a block size of 512K (initiator)
>> >> against a file of 1GB residing on a tmpfs filesystem on the target
>> >> are as follows:
>> >>
>> >> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
>> >> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>> >>
>> >> And for a block size of 4 KB:
>> >>
>> >> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
>> >> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
>> >
>> > Do you have any thoughts why writes are so bad? It shouldn't be so..
>>
>> It's not impossible that with the 4 KB write test I hit the
>> limits of the initiator system (Intel E6750 CPU, 2.66 GHz,
>> two cores). Some statistics I gathered during the 4 KB write test:
>> Target: CPU load 0.5, 16500 mlx4-comp-0 interrupts per
>> second, same number of interrupts processed by each core (8250/s).
>> Initiator: CPU load 1.0, 32850 mlx4-comp-0 interrupts per
>> second, all interrupts occurred on the same core.
>
> Are you using connected mode IPoIB and setting the MTU to 4KB? Would
> fragmentation of IPoIB drive up the interrupt rates?
All tests have been run with default IPoIB settings: an MTU of 2044
bytes and datagram mode. The following data has been obtained from the
target system after several 4 KB write tests:
$ cat /sys/class/net/ib0/mode
datagram
$ /sbin/ifconfig ib0
ib0 Link encap:UNSPEC HWaddr
80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::202:c903:2:d217/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:88482013 errors:0 dropped:0 overruns:0 frame:0
TX packets:38444824 errors:0 dropped:11 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:135770573672 (129480.9 Mb) TX bytes:5647702210 (5386.0 Mb)
Bart.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-02 17:16 ` [Scst-devel] " Vladislav Bolkhovitin
2009-04-03 17:08 ` Bart Van Assche
@ 2009-04-04 8:04 ` Bart Van Assche
2009-04-17 18:11 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2009-04-04 8:04 UTC (permalink / raw)
To: Vladislav Bolkhovitin; +Cc: scst-devel, linux-kernel, linux-scsi
On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Bart Van Assche, on 04/02/2009 12:14 AM wrote:
>> I have repeated some of these performance tests for iSCSI over IPoIB
>> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
>> for the buffered I/O test with a block size of 512K (initiator)
>> against a file of 1GB residing on a tmpfs filesystem on the target are
>> as follows:
>>
>> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
>> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>>
>> And for a block size of 4 KB:
>>
>> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
>> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
>
> Do you have any thoughts why writes are so bad? It shouldn't be so..
By this time I have run the following variation of the 4 KB write test:
* Target: iSCSI-SCST was exporting a 1 GB file residing on a tmpfs filesystem.
* Initiator: two processes were writing 4 KB blocks as follows:
dd if=/dev/zero of=/dev/sdb bs=4K seek=0 count=131072 oflag=sync &
dd if=/dev/zero of=/dev/sdb bs=4K seek=131072 count=131072 oflag=sync &
Results:
* Each dd process on the initiator was writing at a speed of 37.8
MB/s, or a combined writing speed of 75.6 MB/s.
* CPU load on the initiator system during the test: 2.0.
* According to /proc/interrupts, about 38000 mlx4-comp-0 interrupts
were triggered per second.
These results confirm that the initiator system was the bottleneck
during the 4 KB write test, not the target system.
Bart.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
2009-04-04 8:04 ` [Scst-devel] ISCSI-SCST performance (with also IET and STGT data) Bart Van Assche
@ 2009-04-17 18:11 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-17 18:11 UTC (permalink / raw)
To: Bart Van Assche; +Cc: scst-devel, linux-kernel, linux-scsi
Bart Van Assche, on 04/04/2009 12:04 PM wrote:
> On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
>> Bart Van Assche, on 04/02/2009 12:14 AM wrote:
>>> I have repeated some of these performance tests for iSCSI over IPoIB
>>> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
>>> for the buffered I/O test with a block size of 512K (initiator)
>>> against a file of 1GB residing on a tmpfs filesystem on the target are
>>> as follows:
>>>
>>> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
>>> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>>>
>>> And for a block size of 4 KB:
>>>
>>> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
>>> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
>> Do you have any thoughts why writes are so bad? It shouldn't be so..
>
> By this time I have run the following variation of the 4 KB write test:
> * Target: iSCSI-SCST was exporting a 1 GB file residing on a tmpfs filesystem.
> * Initiator: two processes were writing 4 KB blocks as follows:
> dd if=/dev/zero of=/dev/sdb bs=4K seek=0 count=131072 oflag=sync &
> dd if=/dev/zero of=/dev/sdb bs=4K seek=131072 count=131072 oflag=sync &
>
> Results:
> * Each dd process on the initiator was writing at a speed of 37.8
> MB/s, or a combined writing speed of 75.6 MB/s.
> * CPU load on the initiator system during the test: 2.0.
> * According to /proc/interrupts, about 38000 mlx4-comp-0 interrupts
> were triggered per second.
>
> These results confirm that the initiator system was the bottleneck
> during the 4 KB write test, not the target system.
If so with oflag=direct you should have a performance gain, because you
will eliminate a data copy.
> Bart.
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
>
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2009-04-17 18:11 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-30 17:33 ISCSI-SCST performance (with also IET and STGT data) Vladislav Bolkhovitin
[not found] ` <e2e108260903301106y2b750c23kfab978567f3de3a0@mail.gmail.com>
2009-03-30 18:33 ` [Scst-devel] " Vladislav Bolkhovitin
2009-03-30 18:53 ` Bart Van Assche
2009-03-30 18:53 ` Bart Van Assche
2009-03-31 17:37 ` Vladislav Bolkhovitin
2009-03-31 18:43 ` [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso " Ross S. W. Walker
2009-03-31 18:43 ` Ross S. W. Walker
2009-04-01 6:29 ` [Iscsitarget-devel] " Bart Van Assche
2009-04-01 6:29 ` Bart Van Assche
2009-04-01 12:20 ` Ross Walker
2009-04-01 12:20 ` Ross Walker
2009-04-01 20:23 ` James Bottomley
2009-04-01 20:23 ` James Bottomley
2009-04-02 7:38 ` [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also " Vladislav Bolkhovitin
2009-04-02 7:38 ` Vladislav Bolkhovitin
2009-04-02 9:02 ` Vladislav Bolkhovitin
2009-04-02 9:02 ` Vladislav Bolkhovitin
2009-04-02 14:06 ` Ross S. W. Walker
2009-04-02 14:06 ` Ross S. W. Walker
2009-04-02 14:14 ` Ross S. W. Walker
2009-04-02 14:14 ` Ross S. W. Walker
2009-04-02 15:36 ` Vladislav Bolkhovitin
2009-04-02 15:36 ` Vladislav Bolkhovitin
2009-04-02 17:19 ` Ross S. W. Walker
2009-04-02 17:19 ` Ross S. W. Walker
2009-04-01 20:14 ` Bart Van Assche
2009-04-01 20:14 ` Bart Van Assche
2009-04-02 17:16 ` [Scst-devel] " Vladislav Bolkhovitin
2009-04-03 17:08 ` Bart Van Assche
2009-04-03 17:13 ` [Scst-devel] ISCSI-SCST performance (with also IET and STGTdata) Sufficool, Stanley
2009-04-03 17:13 ` Sufficool, Stanley
2009-04-03 17:52 ` Bart Van Assche
2009-04-04 8:04 ` [Scst-devel] ISCSI-SCST performance (with also IET and STGT data) Bart Van Assche
2009-04-17 18:11 ` Vladislav Bolkhovitin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.