All of lore.kernel.org
 help / color / mirror / Atom feed
* ISCSI-SCST performance (with also IET and STGT data)
@ 2009-03-30 17:33 Vladislav Bolkhovitin
       [not found] ` <e2e108260903301106y2b750c23kfab978567f3de3a0@mail.gmail.com>
  2009-04-01 20:14   ` Bart Van Assche
  0 siblings, 2 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-03-30 17:33 UTC (permalink / raw)
  To: scst-devel; +Cc: linux-scsi, linux-kernel, iscsitarget-devel, stgt

Hi All,

As part of 1.0.1 release preparations I made some performance tests to 
make sure there are no performance regressions in SCST overall and 
iSCSI-SCST particularly. Results were quite interesting, so I decided to 
publish them together with the corresponding numbers for IET and STGT 
iSCSI targets. This isn't a real performance comparison, it includes 
only few chosen tests, because I don't have time for a complete 
comparison. But I hope somebody will take up what I did and make it 
complete.

Setup:

Target: HT 2.4GHz Xeon, x86_32, 2GB of memory limited to 256MB by kernel 
command line to have less test data footprint, 75GB 15K RPM SCSI disk as 
backstorage, dual port 1Gbps E1000 Intel network card, 2.6.29 kernel.

Initiator: 1.7GHz Xeon, x86_32, 1GB of memory limited to 256MB by kernel 
command line to have less test data footprint, dual port 1Gbps E1000 
Intel network card, 2.6.27 kernel, open-iscsi 2.0-870-rc3.

The target exported a 5GB file on XFS for FILEIO and 5GB partition for 
BLOCKIO.

All the tests were ran 3 times and average written. All the values are 
in MB/s. The tests were ran with CFQ and deadline IO schedulers on the 
target. All other parameters on both target and initiator were default.

==================================================================

I. SEQUENTIAL ACCESS OVER SINGLE LINE

1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000

			ISCSI-SCST	IET		STGT
NULLIO:			106		105		103
FILEIO/CFQ:		82		57		55
FILEIO/deadline		69		69		67
BLOCKIO/CFQ		81		28		-
BLOCKIO/deadline	80		66		-

------------------------------------------------------------------

2. # dd if=/dev/zero of=/dev/sdX bs=512K count=2000

I didn't do other write tests, because I have data on those devices.

			ISCSI-SCST	IET		STGT
NULLIO:			114		114		114

------------------------------------------------------------------

3. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then

# dd if=/mnt/q of=/dev/null bs=512K count=2000

were ran (/mnt/q was created before by the next test)

			ISCSI-SCST	IET		STGT
FILEIO/CFQ:		94		66		46
FILEIO/deadline		74		74		72
BLOCKIO/CFQ		95		35		-
BLOCKIO/deadline	94		95		-

------------------------------------------------------------------

4. /dev/sdX formatted in ext3 and mounted in /mnt on the initiator. Then

# dd if=/dev/zero of=/mnt/q bs=512K count=2000

were ran (/mnt/q was created by the next test before)

			ISCSI-SCST	IET		STGT
FILEIO/CFQ:		97		91		88
FILEIO/deadline		98		96		90
BLOCKIO/CFQ		112		110		-
BLOCKIO/deadline	112		110		-

------------------------------------------------------------------

Conclusions:

1. ISCSI-SCST FILEIO on buffered READs on 27% faster than IET (94 vs 
74). With CFQ the difference is 42% (94 vs 66).

2. ISCSI-SCST FILEIO on buffered READs on 30% faster than STGT (94 vs 
72). With CFQ the difference is 104% (94 vs 46).

3. ISCSI-SCST BLOCKIO on buffered READs has about the same performance 
as IET, but with CFQ it's on 170% faster (95 vs 35).

4. Buffered WRITEs are not so interesting, because they are async. with 
many outstanding commands at time, hence latency insensitive, but even 
here ISCSI-SCST always a bit faster than IET.

5. STGT always the worst, sometimes considerably.

6. BLOCKIO on buffered WRITEs is constantly faster, than FILEIO, so, 
definitely, there is a room for future improvement here.

7. For some reason assess on file system is considerably better, than 
the same device directly.

==================================================================

II. Mostly random "realistic" access.

For this test I used io_trash utility. For more details see
http://lkml.org/lkml/2008/11/17/444. To show value of target-side 
caching in this test target was ran with full 2GB of memory. I ran 
io_trash with the following parameters: "2 2 ./ 500000000 50000000 10 
4096 4096 300000 10 90 0 10". Total execution time was measured.

			ISCSI-SCST	IET		STGT
FILEIO/CFQ:		4m45s		5m		5m17s
FILEIO/deadline		5m20s		5m22s		5m35s
BLOCKIO/CFQ		23m3s		23m5s		-
BLOCKIO/deadline	23m15s		23m25s		-

Conclusions:

1. FILEIO on 500% (five times!) faster than BLOCKIO

2. STGT, as usually, always the worst

3. Deadline always a bit slower

==================================================================

III. SEQUENTIAL ACCESS OVER MPIO

Unfortunately, my dual port network card isn't capable of simultaneous 
data transfers, so I had to do some "modeling" and put my network 
devices in 100Mbps mode. To make this model more realistic I also used 
my old IDE 5200RPM hard drive capable to produce locally 35MB/s 
throughput. So I modeled the case of double 1Gbps links with 350MB/s 
backstorage, if all the following rules satisfied:

  - Both links a capable of simultaneous data transfers

  - There is sufficient amount of CPU power on both initiator and target 
to cover requirements for the data transfers.

All the tests were done with iSCSI-SCST only.

1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000

NULLIO:			23
FILEIO/CFQ:		20
FILEIO/deadline		20
BLOCKIO/CFQ		20
BLOCKIO/deadline	17

Single line NULLIO is 12.

So, there is a 67% improvement using 2 lines. With 1Gbps it should be 
equivalent of 200MB/s. Not too bad.

==================================================================

Connection to the target were made with the following iSCSI parameters:

# iscsi-scst-adm --op show --tid=1 --sid=0x10000013d0200
InitialR2T=No
ImmediateData=Yes
MaxConnections=1
MaxRecvDataSegmentLength=2097152
MaxXmitDataSegmentLength=131072
MaxBurstLength=2097152
FirstBurstLength=262144
DefaultTime2Wait=2
DefaultTime2Retain=0
MaxOutstandingR2T=1
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
HeaderDigest=None
DataDigest=None
OFMarker=No
IFMarker=No
OFMarkInt=Reject
IFMarkInt=Reject

# ietadm --op show --tid=1 --sid=0x10000013d0200
InitialR2T=No
ImmediateData=Yes
MaxConnections=1
MaxRecvDataSegmentLength=262144
MaxXmitDataSegmentLength=131072
MaxBurstLength=2097152
FirstBurstLength=262144
DefaultTime2Wait=2
DefaultTime2Retain=20
MaxOutstandingR2T=1
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
HeaderDigest=None
DataDigest=None
OFMarker=No
IFMarker=No
OFMarkInt=Reject
IFMarkInt=Reject

# tgtadm --op show --mode session --tid 1 --sid 1
MaxRecvDataSegmentLength=2097152
MaxXmitDataSegmentLength=131072
HeaderDigest=None
DataDigest=None
InitialR2T=No
MaxOutstandingR2T=1
ImmediateData=Yes
FirstBurstLength=262144
MaxBurstLength=2097152
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
IFMarker=No
OFMarker=No
DefaultTime2Wait=2
DefaultTime2Retain=0
OFMarkInt=Reject
IFMarkInt=Reject
MaxConnections=1
RDMAExtensions=No
TargetRecvDataSegmentLength=262144
InitiatorRecvDataSegmentLength=262144
MaxOutstandingUnexpectedPDUs=0

Vlad

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
       [not found] ` <e2e108260903301106y2b750c23kfab978567f3de3a0@mail.gmail.com>
@ 2009-03-30 18:33   ` Vladislav Bolkhovitin
  2009-03-30 18:53       ` Bart Van Assche
  0 siblings, 1 reply; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-03-30 18:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: scst-devel, iscsitarget-devel, linux-kernel, linux-scsi, stgt

Bart Van Assche, on 03/30/2009 10:06 PM wrote:
> On Mon, Mar 30, 2009 at 7:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
>> As part of 1.0.1 release preparations I made some performance tests to make
>> sure there are no performance regressions in SCST overall and iSCSI-SCST
>> particularly. Results were quite interesting, so I decided to publish them
>> together with the corresponding numbers for IET and STGT iSCSI targets. This
>> isn't a real performance comparison, it includes only few chosen tests,
>> because I don't have time for a complete comparison. But I hope somebody
>> will take up what I did and make it complete.
>>
>> Setup:
>>
>> Target: HT 2.4GHz Xeon, x86_32, 2GB of memory limited to 256MB by kernel
>> command line to have less test data footprint, 75GB 15K RPM SCSI disk as
>> backstorage, dual port 1Gbps E1000 Intel network card, 2.6.29 kernel.
>>
>> Initiator: 1.7GHz Xeon, x86_32, 1GB of memory limited to 256MB by kernel
>> command line to have less test data footprint, dual port 1Gbps E1000 Intel
>> network card, 2.6.27 kernel, open-iscsi 2.0-870-rc3.
>>
>> The target exported a 5GB file on XFS for FILEIO and 5GB partition for
>> BLOCKIO.
>>
>> All the tests were ran 3 times and average written. All the values are in
>> MB/s. The tests were ran with CFQ and deadline IO schedulers on the target.
>> All other parameters on both target and initiator were default.
> 
> These are indeed interesting results. There are some aspects of the
> test setup I do not understand however:
> * All tests have been run with buffered I/O instead of direct I/O
> (iflag=direct / oflag=direct). My experience is that the results of
> tests with direct I/O are easier to reproduce (less variation between
> runs). So I have been wondering why the tests have been run with
> buffered I/O instead ?

Real applications use buffered I/O, hence it should be used in tests. It 
  evaluates all the storage stack on both initiator and target as a 
whole. The results are very reproducible, variation is about 10%.

> * It is well known that having more memory in the target system
> improves performance because of read and write caching. What did you
> want to demonstrate by limiting the memory of the target system ?

If I had full 2GB on the target, I would have to spend on the 
measurements 10 times more time, since the data footprint should be at 
least 4x of the cache size. For sequential read/writes 256MB and 2GB of 
the cache are the same.

Where it did matter (io_trash) I increased memory size to full 2GB.

> * Which SCST options were enabled on the target ? Was e.g. the
> NV_CACHE option enabled ?

Defaults, i.e. yes, enabled. But it didn't matter, since all the 
filesystems where mounted on the initiator without data barriers enabled.

Thanks,
Vlad

P.S. Please don't drop CC.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-03-30 18:33   ` [Scst-devel] " Vladislav Bolkhovitin
@ 2009-03-30 18:53       ` Bart Van Assche
  0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-03-30 18:53 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: scst-devel, iscsitarget-devel, linux-kernel, linux-scsi, stgt

On Mon, Mar 30, 2009 at 8:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Bart Van Assche, on 03/30/2009 10:06 PM wrote:
>> These are indeed interesting results. There are some aspects of the
>> test setup I do not understand however:
>> * All tests have been run with buffered I/O instead of direct I/O
>> (iflag=direct / oflag=direct). My experience is that the results of
>> tests with direct I/O are easier to reproduce (less variation between
>> runs). So I have been wondering why the tests have been run with
>> buffered I/O instead ?
>
> Real applications use buffered I/O, hence it should be used in tests. It
>  evaluates all the storage stack on both initiator and target as a whole.
> The results are very reproducible, variation is about 10%.

Most applications do indeed use buffered I/O. Database software
however often uses direct I/O. It might be interesting to publish
performance results for both buffered I/O and direct I/O. A quote from
the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
(Linux Symposium, Ottawa, 2003):

Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
a device without copying the data through the kernel’s buffer cache.
This mechanism can boost performance if the data is unlikely to be
used again in the short term (during a disk backup, for example), or
for applications such as large database management systems that
perform their own caching.

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-03-30 18:53       ` Bart Van Assche
  0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-03-30 18:53 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: scst-devel, iscsitarget-devel, linux-kernel, linux-scsi, stgt

On Mon, Mar 30, 2009 at 8:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Bart Van Assche, on 03/30/2009 10:06 PM wrote:
>> These are indeed interesting results. There are some aspects of the
>> test setup I do not understand however:
>> * All tests have been run with buffered I/O instead of direct I/O
>> (iflag=direct / oflag=direct). My experience is that the results of
>> tests with direct I/O are easier to reproduce (less variation between
>> runs). So I have been wondering why the tests have been run with
>> buffered I/O instead ?
>
> Real applications use buffered I/O, hence it should be used in tests. It
>  evaluates all the storage stack on both initiator and target as a whole.
> The results are very reproducible, variation is about 10%.

Most applications do indeed use buffered I/O. Database software
however often uses direct I/O. It might be interesting to publish
performance results for both buffered I/O and direct I/O. A quote from
the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
(Linux Symposium, Ottawa, 2003):

Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
a device without copying the data through the kernel’s buffer cache.
This mechanism can boost performance if the data is unlikely to be
used again in the short term (during a disk backup, for example), or
for applications such as large database management systems that
perform their own caching.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-03-30 18:53       ` Bart Van Assche
  (?)
@ 2009-03-31 17:37       ` Vladislav Bolkhovitin
  2009-03-31 18:43           ` Ross S. W. Walker
  -1 siblings, 1 reply; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-03-31 17:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: iscsitarget-devel, scst-devel, linux-kernel, linux-scsi, stgt

Bart Van Assche, on 03/30/2009 10:53 PM wrote:
> On Mon, Mar 30, 2009 at 8:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
>> Bart Van Assche, on 03/30/2009 10:06 PM wrote:
>>> These are indeed interesting results. There are some aspects of the
>>> test setup I do not understand however:
>>> * All tests have been run with buffered I/O instead of direct I/O
>>> (iflag=direct / oflag=direct). My experience is that the results of
>>> tests with direct I/O are easier to reproduce (less variation between
>>> runs). So I have been wondering why the tests have been run with
>>> buffered I/O instead ?
>> Real applications use buffered I/O, hence it should be used in tests. It
>>  evaluates all the storage stack on both initiator and target as a whole.
>> The results are very reproducible, variation is about 10%.
> 
> Most applications do indeed use buffered I/O. Database software
> however often uses direct I/O. It might be interesting to publish
> performance results for both buffered I/O and direct I/O.

Yes, sure

> A quote from
> the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
> (Linux Symposium, Ottawa, 2003):
> 
> Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
> a device without copying the data through the kernel’s buffer cache.
> This mechanism can boost performance if the data is unlikely to be
> used again in the short term (during a disk backup, for example), or
> for applications such as large database management systems that
> perform their own caching.

Please don't misread phrase "unlikely to be used again in the short 
term". If you have read-ahead, all your cached data is *likely* to be 
used "again" in the near future after they were read from storage, 
although only once in the first read by application. The same is true 
for write-back caching, where data written to the cache once for each 
command. Both read-ahead and write back are very important for good 
performance and O_DIRECT throws them away. All the modern HDDs have a 
memory buffer (cache) at least 2MB big on the cheapest ones. This cache 
is essential for performance, although how can it make any difference if 
the host computer has, say, 1000 times more memory?

Thus, to work effectively with O_DIRECT an application has to be very 
smart to workaround the lack of read-ahead and write back.

I personally consider O_DIRECT (as well as BLOCKIO) as nothing more than 
a workaround for possible flaws in the storage subsystem. If O_DIRECT 
works better, then in 99+% cases there is something in the storage 
subsystem, which should be fixed to perform better.

To be complete, there is one case where O_DIRECT and BLOCKIO have an 
advantage: both of them transfer data zero-copy. So they are good if 
your memory is too slow comparing to storage (InfiniBand case, for 
instance) and additional data copy hurts performance noticeably.

> Bart.
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
  2009-03-31 17:37       ` Vladislav Bolkhovitin
@ 2009-03-31 18:43           ` Ross S. W. Walker
  0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-03-31 18:43 UTC (permalink / raw)
  To: Vladislav Bolkhovitin, Bart Van Assche
  Cc: iscsitarget-devel, scst-devel, linux-kernel, linux-scsi, stgt

Vladislav Bolkhovitin wrote:
> Bart Van Assche, on 03/30/2009 10:53 PM wrote:
> > 
> > Most applications do indeed use buffered I/O. Database software
> > however often uses direct I/O. It might be interesting to publish
> > performance results for both buffered I/O and direct I/O.
> 
> Yes, sure
> 
> > A quote from
> > the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
> > (Linux Symposium, Ottawa, 2003):
> > 
> > Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
> > a device without copying the data through the kernel’s buffer cache.
> > This mechanism can boost performance if the data is unlikely to be
> > used again in the short term (during a disk backup, for example), or
> > for applications such as large database management systems that
> > perform their own caching.
> 
> Please don't misread phrase "unlikely to be used again in the short 
> term". If you have read-ahead, all your cached data is *likely* to be 
> used "again" in the near future after they were read from storage, 
> although only once in the first read by application. The same is true 
> for write-back caching, where data written to the cache once for each 
> command. Both read-ahead and write back are very important for good 
> performance and O_DIRECT throws them away. All the modern HDDs have a 
> memory buffer (cache) at least 2MB big on the cheapest ones. 
> This cache is essential for performance, although how can it make any 
> difference if the host computer has, say, 1000 times more memory?
> 
> Thus, to work effectively with O_DIRECT an application has to be very 
> smart to workaround the lack of read-ahead and write back.

True, the application has to perform it's own read-ahead and write-back.

Kind of like how a database does it, or maybe the page cache on the
iSCSI initiator's system ;-)

> I personally consider O_DIRECT (as well as BLOCKIO) as nothing more than 
> a workaround for possible flaws in the storage subsystem. If O_DIRECT 
> works better, then in 99+% cases there is something in the storage 
> subsystem, which should be fixed to perform better.

That's not true, page cached I/O is broken into page sizes which limits
the I/O bandwidth of the storage hardware while imposing a higher CPU
overhead. Obviously page-cached I/O isn't ideal for all situations.

You could also have an amazing backend storage system with it's own
NVRAM cache. Why put the performance overhead onto the target system
when you can off-load it to the controller?

> To be complete, there is one case where O_DIRECT and BLOCKIO have an 
> advantage: both of them transfer data zero-copy. So they are good if 
> your memory is too slow comparing to storage (InfiniBand case, for 
> instance) and additional data copy hurts performance noticeably.

The bottom line which will always be true:

Know your workload, configure your storage to match.

The best storage solutions allow the implementor the most flexibility
in configuring the storage, which I think both IET and SCST do.

IET just needs to fix how it does it workload with CFQ which
somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
gain some extra speed.

Vlad, how about a comparison of SCST vs IET without those kernel hooks?

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
@ 2009-03-31 18:43           ` Ross S. W. Walker
  0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-03-31 18:43 UTC (permalink / raw)
  To: Vladislav Bolkhovitin, Bart Van Assche
  Cc: iscsitarget-devel, scst-devel, linux-kernel, linux-scsi, stgt

Vladislav Bolkhovitin wrote:
> Bart Van Assche, on 03/30/2009 10:53 PM wrote:
> > 
> > Most applications do indeed use buffered I/O. Database software
> > however often uses direct I/O. It might be interesting to publish
> > performance results for both buffered I/O and direct I/O.
> 
> Yes, sure
> 
> > A quote from
> > the paper "Asynchronous I/O Support in Linux 2.5" by Bhattacharya e.a.
> > (Linux Symposium, Ottawa, 2003):
> > 
> > Direct I/O (raw and O_DIRECT) transfers data between a user buffer and
> > a device without copying the data through the kernel’s buffer cache.
> > This mechanism can boost performance if the data is unlikely to be
> > used again in the short term (during a disk backup, for example), or
> > for applications such as large database management systems that
> > perform their own caching.
> 
> Please don't misread phrase "unlikely to be used again in the short 
> term". If you have read-ahead, all your cached data is *likely* to be 
> used "again" in the near future after they were read from storage, 
> although only once in the first read by application. The same is true 
> for write-back caching, where data written to the cache once for each 
> command. Both read-ahead and write back are very important for good 
> performance and O_DIRECT throws them away. All the modern HDDs have a 
> memory buffer (cache) at least 2MB big on the cheapest ones. 
> This cache is essential for performance, although how can it make any 
> difference if the host computer has, say, 1000 times more memory?
> 
> Thus, to work effectively with O_DIRECT an application has to be very 
> smart to workaround the lack of read-ahead and write back.

True, the application has to perform it's own read-ahead and write-back.

Kind of like how a database does it, or maybe the page cache on the
iSCSI initiator's system ;-)

> I personally consider O_DIRECT (as well as BLOCKIO) as nothing more than 
> a workaround for possible flaws in the storage subsystem. If O_DIRECT 
> works better, then in 99+% cases there is something in the storage 
> subsystem, which should be fixed to perform better.

That's not true, page cached I/O is broken into page sizes which limits
the I/O bandwidth of the storage hardware while imposing a higher CPU
overhead. Obviously page-cached I/O isn't ideal for all situations.

You could also have an amazing backend storage system with it's own
NVRAM cache. Why put the performance overhead onto the target system
when you can off-load it to the controller?

> To be complete, there is one case where O_DIRECT and BLOCKIO have an 
> advantage: both of them transfer data zero-copy. So they are good if 
> your memory is too slow comparing to storage (InfiniBand case, for 
> instance) and additional data copy hurts performance noticeably.

The bottom line which will always be true:

Know your workload, configure your storage to match.

The best storage solutions allow the implementor the most flexibility
in configuring the storage, which I think both IET and SCST do.

IET just needs to fix how it does it workload with CFQ which
somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
gain some extra speed.

Vlad, how about a comparison of SCST vs IET without those kernel hooks?

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso  IET and STGT data)
  2009-03-31 18:43           ` Ross S. W. Walker
@ 2009-04-01  6:29             ` Bart Van Assche
  -1 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-01  6:29 UTC (permalink / raw)
  To: Ross S. W. Walker
  Cc: Vladislav Bolkhovitin, iscsitarget-devel, scst-devel,
	linux-kernel, linux-scsi, stgt

On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
<RWalker@medallion.com> wrote:
> IET just needs to fix how it does it workload with CFQ which
> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
> gain some extra speed.

I'm not familiar with the implementation details of CFQ, but I know
that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
default number of kernel threads of the scst_vdisk kernel module has
been increased to 5. Could this explain the performance difference
between SCST and IET for FILEIO and BLOCKIO ?

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
@ 2009-04-01  6:29             ` Bart Van Assche
  0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-01  6:29 UTC (permalink / raw)
  To: Ross S. W. Walker
  Cc: Vladislav Bolkhovitin, iscsitarget-devel, scst-devel,
	linux-kernel, linux-scsi, stgt

On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
<RWalker@medallion.com> wrote:
> IET just needs to fix how it does it workload with CFQ which
> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
> gain some extra speed.

I'm not familiar with the implementation details of CFQ, but I know
that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
default number of kernel threads of the scst_vdisk kernel module has
been increased to 5. Could this explain the performance difference
between SCST and IET for FILEIO and BLOCKIO ?

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
  2009-04-01  6:29             ` Bart Van Assche
@ 2009-04-01 12:20               ` Ross Walker
  -1 siblings, 0 replies; 34+ messages in thread
From: Ross Walker @ 2009-04-01 12:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ross S. W. Walker, Vladislav Bolkhovitin, linux-scsi,
	iSCSI Enterprise Target Developer List, linux-kernel, stgt,
	scst-devel

On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>  
wrote:

> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
> <RWalker@medallion.com> wrote:
>> IET just needs to fix how it does it workload with CFQ which
>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>> gain some extra speed.
>
> I'm not familiar with the implementation details of CFQ, but I know
> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
> default number of kernel threads of the scst_vdisk kernel module has
> been increased to 5. Could this explain the performance difference
> between SCST and IET for FILEIO and BLOCKIO ?

Thank for the update. IET has used 8 threads per target for ages now,  
I don't think it is that.

It may be how the I/O threads are forked in SCST that causes them to  
be in the same I/O context with each other.

I'm pretty sure implementing a version of the patch that was used for  
the dump command (found on the LKML) will fix this.

But thanks goes to Vlad for pointing this dificiency out so we can fix  
it to help make IET even better.

-Ross


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
@ 2009-04-01 12:20               ` Ross Walker
  0 siblings, 0 replies; 34+ messages in thread
From: Ross Walker @ 2009-04-01 12:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ross S. W. Walker, Vladislav Bolkhovitin, linux-scsi,
	iSCSI Enterprise Target Developer List, linux-kernel, stgt,
	scst-devel

On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>  
wrote:

> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
> <RWalker@medallion.com> wrote:
>> IET just needs to fix how it does it workload with CFQ which
>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>> gain some extra speed.
>
> I'm not familiar with the implementation details of CFQ, but I know
> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
> default number of kernel threads of the scst_vdisk kernel module has
> been increased to 5. Could this explain the performance difference
> between SCST and IET for FILEIO and BLOCKIO ?

Thank for the update. IET has used 8 threads per target for ages now,  
I don't think it is that.

It may be how the I/O threads are forked in SCST that causes them to  
be in the same I/O context with each other.

I'm pretty sure implementing a version of the patch that was used for  
the dump command (found on the LKML) will fix this.

But thanks goes to Vlad for pointing this dificiency out so we can fix  
it to help make IET even better.

-Ross


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: ISCSI-SCST performance (with also IET and STGT data)
  2009-03-30 17:33 ISCSI-SCST performance (with also IET and STGT data) Vladislav Bolkhovitin
@ 2009-04-01 20:14   ` Bart Van Assche
  2009-04-01 20:14   ` Bart Van Assche
  1 sibling, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-01 20:14 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: scst-devel, linux-scsi, linux-kernel, iscsitarget-devel, stgt

On Mon, Mar 30, 2009 at 7:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
==================================================================
>
> I. SEQUENTIAL ACCESS OVER SINGLE LINE
>
> 1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
>
>                        ISCSI-SCST      IET             STGT
> NULLIO:                 106             105             103
> FILEIO/CFQ:             82              57              55
> FILEIO/deadline         69              69              67
> BLOCKIO/CFQ             81              28              -
> BLOCKIO/deadline        80              66              -

I have repeated some of these performance tests for iSCSI over IPoIB
(two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
for the buffered I/O test with a block size of 512K (initiator)
against a file of 1GB residing on a tmpfs filesystem on the target are
as follows:

write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.

And for a block size of 4 KB:

write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.

Or: depending on the test scenario, SCST transfers data between 2% and
30% faster via the iSCSI protocol over this network.

Something that is not relevant for this comparison, but interesting to
know: with the SRP implementation in SCST the maximal read throughput
is 1290 MB/s on the same setup.

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-01 20:14   ` Bart Van Assche
  0 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-01 20:14 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: scst-devel, linux-scsi, linux-kernel, iscsitarget-devel, stgt

On Mon, Mar 30, 2009 at 7:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
==================================================================
>
> I. SEQUENTIAL ACCESS OVER SINGLE LINE
>
> 1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
>
>                        ISCSI-SCST      IET             STGT
> NULLIO:                 106             105             103
> FILEIO/CFQ:             82              57              55
> FILEIO/deadline         69              69              67
> BLOCKIO/CFQ             81              28              -
> BLOCKIO/deadline        80              66              -

I have repeated some of these performance tests for iSCSI over IPoIB
(two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
for the buffered I/O test with a block size of 512K (initiator)
against a file of 1GB residing on a tmpfs filesystem on the target are
as follows:

write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.

And for a block size of 4 KB:

write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.

Or: depending on the test scenario, SCST transfers data between 2% and
30% faster via the iSCSI protocol over this network.

Something that is not relevant for this comparison, but interesting to
know: with the SRP implementation in SCST the maximal read throughput
is 1290 MB/s on the same setup.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
  2009-04-01 12:20               ` Ross Walker
@ 2009-04-01 20:23                 ` James Bottomley
  -1 siblings, 0 replies; 34+ messages in thread
From: James Bottomley @ 2009-04-01 20:23 UTC (permalink / raw)
  To: Ross Walker
  Cc: Bart Van Assche, Ross S. W. Walker, Vladislav Bolkhovitin,
	linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
	stgt, scst-devel

On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>  
> wrote:
> 
> > On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
> > <RWalker@medallion.com> wrote:
> >> IET just needs to fix how it does it workload with CFQ which
> >> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
> >> gain some extra speed.
> >
> > I'm not familiar with the implementation details of CFQ, but I know
> > that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
> > default number of kernel threads of the scst_vdisk kernel module has
> > been increased to 5. Could this explain the performance difference
> > between SCST and IET for FILEIO and BLOCKIO ?
> 
> Thank for the update. IET has used 8 threads per target for ages now,  
> I don't think it is that.
> 
> It may be how the I/O threads are forked in SCST that causes them to  
> be in the same I/O context with each other.
> 
> I'm pretty sure implementing a version of the patch that was used for  
> the dump command (found on the LKML) will fix this.
> 
> But thanks goes to Vlad for pointing this dificiency out so we can fix  
> it to help make IET even better.

SCST explicitly fiddles with the io context to get this to happen.  It
has a hack to block to export alloc_io_context:

http://marc.info/?t=122893564800003

James



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso IET and STGT data)
@ 2009-04-01 20:23                 ` James Bottomley
  0 siblings, 0 replies; 34+ messages in thread
From: James Bottomley @ 2009-04-01 20:23 UTC (permalink / raw)
  To: Ross Walker
  Cc: Bart Van Assche, Ross S. W. Walker, Vladislav Bolkhovitin,
	linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
	stgt, scst-devel

On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>  
> wrote:
> 
> > On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
> > <RWalker@medallion.com> wrote:
> >> IET just needs to fix how it does it workload with CFQ which
> >> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
> >> gain some extra speed.
> >
> > I'm not familiar with the implementation details of CFQ, but I know
> > that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
> > default number of kernel threads of the scst_vdisk kernel module has
> > been increased to 5. Could this explain the performance difference
> > between SCST and IET for FILEIO and BLOCKIO ?
> 
> Thank for the update. IET has used 8 threads per target for ages now,  
> I don't think it is that.
> 
> It may be how the I/O threads are forked in SCST that causes them to  
> be in the same I/O context with each other.
> 
> I'm pretty sure implementing a version of the patch that was used for  
> the dump command (found on the LKML) will fix this.
> 
> But thanks goes to Vlad for pointing this dificiency out so we can fix  
> it to help make IET even better.

SCST explicitly fiddles with the io context to get this to happen.  It
has a hack to block to export alloc_io_context:

http://marc.info/?t=122893564800003

James



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-01 20:23                 ` James Bottomley
@ 2009-04-02  7:38                   ` Vladislav Bolkhovitin
  -1 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02  7:38 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ross Walker, linux-scsi, iSCSI Enterprise Target Developer List,
	linux-kernel, Ross S. W. Walker, scst-devel, stgt,
	Bart Van Assche

James Bottomley, on 04/02/2009 12:23 AM wrote:
> On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
>> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>  
>> wrote:
>>
>>> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
>>> <RWalker@medallion.com> wrote:
>>>> IET just needs to fix how it does it workload with CFQ which
>>>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>>>> gain some extra speed.
>>> I'm not familiar with the implementation details of CFQ, but I know
>>> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
>>> default number of kernel threads of the scst_vdisk kernel module has
>>> been increased to 5. Could this explain the performance difference
>>> between SCST and IET for FILEIO and BLOCKIO ?
>> Thank for the update. IET has used 8 threads per target for ages now,  
>> I don't think it is that.
>>
>> It may be how the I/O threads are forked in SCST that causes them to  
>> be in the same I/O context with each other.
>>
>> I'm pretty sure implementing a version of the patch that was used for  
>> the dump command (found on the LKML) will fix this.
>>
>> But thanks goes to Vlad for pointing this dificiency out so we can fix  
>> it to help make IET even better.
> 
> SCST explicitly fiddles with the io context to get this to happen.  It
> has a hack to block to export alloc_io_context:
> 
> http://marc.info/?t=122893564800003

Correct, although I wouldn't call it "fiddle", rather "grouping" ;)

But that's not the only reason for good performance. Particularly, it 
can't explain Bart's tmpfs results from the previous message, where the 
majority of I/O done to/from RAM without any I/O scheduler involved. (Or 
does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I 
remember correctly, i.e. the test data set was 25% of RAM.

Thanks,
Vlad


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02  7:38                   ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02  7:38 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ross Walker, linux-scsi, iSCSI Enterprise Target Developer List,
	linux-kernel, Ross S. W. Walker, scst-devel, stgt,
	Bart Van Assche

James Bottomley, on 04/02/2009 12:23 AM wrote:
> On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
>> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>  
>> wrote:
>>
>>> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
>>> <RWalker@medallion.com> wrote:
>>>> IET just needs to fix how it does it workload with CFQ which
>>>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>>>> gain some extra speed.
>>> I'm not familiar with the implementation details of CFQ, but I know
>>> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
>>> default number of kernel threads of the scst_vdisk kernel module has
>>> been increased to 5. Could this explain the performance difference
>>> between SCST and IET for FILEIO and BLOCKIO ?
>> Thank for the update. IET has used 8 threads per target for ages now,  
>> I don't think it is that.
>>
>> It may be how the I/O threads are forked in SCST that causes them to  
>> be in the same I/O context with each other.
>>
>> I'm pretty sure implementing a version of the patch that was used for  
>> the dump command (found on the LKML) will fix this.
>>
>> But thanks goes to Vlad for pointing this dificiency out so we can fix  
>> it to help make IET even better.
> 
> SCST explicitly fiddles with the io context to get this to happen.  It
> has a hack to block to export alloc_io_context:
> 
> http://marc.info/?t=122893564800003

Correct, although I wouldn't call it "fiddle", rather "grouping" ;)

But that's not the only reason for good performance. Particularly, it 
can't explain Bart's tmpfs results from the previous message, where the 
majority of I/O done to/from RAM without any I/O scheduler involved. (Or 
does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I 
remember correctly, i.e. the test data set was 25% of RAM.

Thanks,
Vlad


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-02  7:38                   ` Vladislav Bolkhovitin
@ 2009-04-02  9:02                     ` Vladislav Bolkhovitin
  -1 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02  9:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, Ross S. W. Walker, scst-devel, stgt

Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> James Bottomley, on 04/02/2009 12:23 AM wrote:
>> On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
>>> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>  
>>> wrote:
>>>
>>>> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
>>>> <RWalker@medallion.com> wrote:
>>>>> IET just needs to fix how it does it workload with CFQ which
>>>>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>>>>> gain some extra speed.
>>>> I'm not familiar with the implementation details of CFQ, but I know
>>>> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
>>>> default number of kernel threads of the scst_vdisk kernel module has
>>>> been increased to 5. Could this explain the performance difference
>>>> between SCST and IET for FILEIO and BLOCKIO ?
>>> Thank for the update. IET has used 8 threads per target for ages now,  
>>> I don't think it is that.
>>>
>>> It may be how the I/O threads are forked in SCST that causes them to  
>>> be in the same I/O context with each other.
>>>
>>> I'm pretty sure implementing a version of the patch that was used for  
>>> the dump command (found on the LKML) will fix this.
>>>
>>> But thanks goes to Vlad for pointing this dificiency out so we can fix  
>>> it to help make IET even better.
>> SCST explicitly fiddles with the io context to get this to happen.  It
>> has a hack to block to export alloc_io_context:
>>
>> http://marc.info/?t=122893564800003
> 
> Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
> 
> But that's not the only reason for good performance. Particularly, it 
> can't explain Bart's tmpfs results from the previous message, where the 
> majority of I/O done to/from RAM without any I/O scheduler involved. (Or 
> does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I 
> remember correctly, i.e. the test data set was 25% of RAM.

To remove any suspicions that I'm playing dirty games here I should note 
that in many cases I can't say what exactly is responsible for good SCST 
performance. I can say only something like "good design and 
implementation", but, I guess, it wouldn't be counted too much. 
SCST/iSCSI-SCST from the very beginning were designed and made with the 
best performance in mind and that has brought the result. Sorry, but at 
the moment I can't afford doing any "why it's so good?" kinds of 
investigations, because I have a lot more important things to do, like 
SCST procfs -> sysfs interface conversion.

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02  9:02                     ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02  9:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, Ross S. W. Walker, scst-devel, stgt

Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> James Bottomley, on 04/02/2009 12:23 AM wrote:
>> On Wed, 2009-04-01 at 08:20 -0400, Ross Walker wrote:
>>> On Apr 1, 2009, at 2:29 AM, Bart Van Assche <bart.vanassche@gmail.com>  
>>> wrote:
>>>
>>>> On Tue, Mar 31, 2009 at 8:43 PM, Ross S. W. Walker
>>>> <RWalker@medallion.com> wrote:
>>>>> IET just needs to fix how it does it workload with CFQ which
>>>>> somehow SCST has overcome. Of course SCST tweaks the Linux kernel to
>>>>> gain some extra speed.
>>>> I'm not familiar with the implementation details of CFQ, but I know
>>>> that one of the changes between SCST 1.0.0 and SCST 1.0.1 is that the
>>>> default number of kernel threads of the scst_vdisk kernel module has
>>>> been increased to 5. Could this explain the performance difference
>>>> between SCST and IET for FILEIO and BLOCKIO ?
>>> Thank for the update. IET has used 8 threads per target for ages now,  
>>> I don't think it is that.
>>>
>>> It may be how the I/O threads are forked in SCST that causes them to  
>>> be in the same I/O context with each other.
>>>
>>> I'm pretty sure implementing a version of the patch that was used for  
>>> the dump command (found on the LKML) will fix this.
>>>
>>> But thanks goes to Vlad for pointing this dificiency out so we can fix  
>>> it to help make IET even better.
>> SCST explicitly fiddles with the io context to get this to happen.  It
>> has a hack to block to export alloc_io_context:
>>
>> http://marc.info/?t=122893564800003
> 
> Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
> 
> But that's not the only reason for good performance. Particularly, it 
> can't explain Bart's tmpfs results from the previous message, where the 
> majority of I/O done to/from RAM without any I/O scheduler involved. (Or 
> does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I 
> remember correctly, i.e. the test data set was 25% of RAM.

To remove any suspicions that I'm playing dirty games here I should note 
that in many cases I can't say what exactly is responsible for good SCST 
performance. I can say only something like "good design and 
implementation", but, I guess, it wouldn't be counted too much. 
SCST/iSCSI-SCST from the very beginning were designed and made with the 
best performance in mind and that has brought the result. Sorry, but at 
the moment I can't afford doing any "why it's so good?" kinds of 
investigations, because I have a lot more important things to do, like 
SCST procfs -> sysfs interface conversion.

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-02  9:02                     ` Vladislav Bolkhovitin
@ 2009-04-02 14:06                       ` Ross S. W. Walker
  -1 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 14:06 UTC (permalink / raw)
  To: Vladislav Bolkhovitin, James Bottomley
  Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, scst-devel, stgt

Vladislav Bolkhovitin wrote:
> Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> > James Bottomley, on 04/02/2009 12:23 AM wrote:
> >> 
> >> SCST explicitly fiddles with the io context to get this to happen.  It
> >> has a hack to block to export alloc_io_context:
> >>
> >> http://marc.info/?t=122893564800003
> > 
> > Correct, although I wouldn't call it "fiddle", rather "grouping" ;)

Call it what you like,

Vladislav Bolkhovitin wrote:
> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> 
> I would be interested in knowing how your code defeats CFQ's extremely
> high latency? Does your code reach into the io scheduler too? If not,
> some code hints would be great.

Hmm, CFQ doesn't have any extra processing latency, especially 
"extremely", hence there is nothing to defeat. If it had, how could it 
been chosen as the default?

----------
List:       linux-scsi
Subject:    [PATCH][RFC 13/23]: Export of alloc_io_context() function
From:       Vladislav Bolkhovitin <vst () vlnb ! net>
Date:       2008-12-10 18:49:19
Message-ID: 49400F2F.4050603 () vlnb ! net

This patch exports alloc_io_context() function. For performance reasons 
SCST queues commands using a pool of IO threads. It is considerably 
better for performance (>30% increase on sequential reads) if threads in 
  a pool have the same IO context. Since SCST can be built as a module, 
it needs alloc_io_context() function exported.

<snip>
----------

I call that lying.

> > But that's not the only reason for good performance. Particularly, it 
> > can't explain Bart's tmpfs results from the previous message, where the 
> > majority of I/O done to/from RAM without any I/O scheduler involved. (Or 
> > does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I 
> > remember correctly, i.e. the test data set was 25% of RAM.
> 
> To remove any suspicions that I'm playing dirty games here I should note 
<snip>

I don't know what games your playing at, but do me a favor, if your too
stupid enough to realize when your caught in a lie and to just shut up
then please do me the favor and leave me out of any further correspondence
from you.

Thank you,

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 14:06                       ` Ross S. W. Walker
  0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 14:06 UTC (permalink / raw)
  To: Vladislav Bolkhovitin, James Bottomley
  Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, scst-devel, stgt

Vladislav Bolkhovitin wrote:
> Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> > James Bottomley, on 04/02/2009 12:23 AM wrote:
> >> 
> >> SCST explicitly fiddles with the io context to get this to happen.  It
> >> has a hack to block to export alloc_io_context:
> >>
> >> http://marc.info/?t=122893564800003
> > 
> > Correct, although I wouldn't call it "fiddle", rather "grouping" ;)

Call it what you like,

Vladislav Bolkhovitin wrote:
> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> 
> I would be interested in knowing how your code defeats CFQ's extremely
> high latency? Does your code reach into the io scheduler too? If not,
> some code hints would be great.

Hmm, CFQ doesn't have any extra processing latency, especially 
"extremely", hence there is nothing to defeat. If it had, how could it 
been chosen as the default?

----------
List:       linux-scsi
Subject:    [PATCH][RFC 13/23]: Export of alloc_io_context() function
From:       Vladislav Bolkhovitin <vst () vlnb ! net>
Date:       2008-12-10 18:49:19
Message-ID: 49400F2F.4050603 () vlnb ! net

This patch exports alloc_io_context() function. For performance reasons 
SCST queues commands using a pool of IO threads. It is considerably 
better for performance (>30% increase on sequential reads) if threads in 
  a pool have the same IO context. Since SCST can be built as a module, 
it needs alloc_io_context() function exported.

<snip>
----------

I call that lying.

> > But that's not the only reason for good performance. Particularly, it 
> > can't explain Bart's tmpfs results from the previous message, where the 
> > majority of I/O done to/from RAM without any I/O scheduler involved. (Or 
> > does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I 
> > remember correctly, i.e. the test data set was 25% of RAM.
> 
> To remove any suspicions that I'm playing dirty games here I should note 
<snip>

I don't know what games your playing at, but do me a favor, if your too
stupid enough to realize when your caught in a lie and to just shut up
then please do me the favor and leave me out of any further correspondence
from you.

Thank you,

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-02 14:06                       ` Ross S. W. Walker
@ 2009-04-02 14:14                         ` Ross S. W. Walker
  -1 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 14:14 UTC (permalink / raw)
  To: Vladislav Bolkhovitin, James Bottomley
  Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, scst-devel, stgt

Ross S. W. Walker wrote:
> Vladislav Bolkhovitin wrote:
> > Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> > > James Bottomley, on 04/02/2009 12:23 AM wrote:
> > >> 
> > >> SCST explicitly fiddles with the io context to get this to happen.  It
> > >> has a hack to block to export alloc_io_context:
> > >>
> > >> http://marc.info/?t=122893564800003
> > > 
> > > Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
> 
> Call it what you like,
> 
> Vladislav Bolkhovitin wrote:
> > Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> > 
> > I would be interested in knowing how your code defeats CFQ's extremely
> > high latency? Does your code reach into the io scheduler too? If not,
> > some code hints would be great.
> 

The above quoting was wrong, for accuracy, it should have read:

Vladislav Bolkhovitin wrote:
> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> > 
> > I would be interested in knowing how your code defeats CFQ's extremely
> > high latency? Does your code reach into the io scheduler too? If not,
> > some code hints would be great.
> 
> Hmm, CFQ doesn't have any extra processing latency, especially 
> "extremely", hence there is nothing to defeat. If it had, how could it 
> been chosen as the default?

Just so there is no misunderstanding who said what here.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 14:14                         ` Ross S. W. Walker
  0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 14:14 UTC (permalink / raw)
  To: Vladislav Bolkhovitin, James Bottomley
  Cc: linux-scsi, iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, scst-devel, stgt

Ross S. W. Walker wrote:
> Vladislav Bolkhovitin wrote:
> > Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
> > > James Bottomley, on 04/02/2009 12:23 AM wrote:
> > >> 
> > >> SCST explicitly fiddles with the io context to get this to happen.  It
> > >> has a hack to block to export alloc_io_context:
> > >>
> > >> http://marc.info/?t=122893564800003
> > > 
> > > Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
> 
> Call it what you like,
> 
> Vladislav Bolkhovitin wrote:
> > Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> > 
> > I would be interested in knowing how your code defeats CFQ's extremely
> > high latency? Does your code reach into the io scheduler too? If not,
> > some code hints would be great.
> 

The above quoting was wrong, for accuracy, it should have read:

Vladislav Bolkhovitin wrote:
> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
> > 
> > I would be interested in knowing how your code defeats CFQ's extremely
> > high latency? Does your code reach into the io scheduler too? If not,
> > some code hints would be great.
> 
> Hmm, CFQ doesn't have any extra processing latency, especially 
> "extremely", hence there is nothing to defeat. If it had, how could it 
> been chosen as the default?

Just so there is no misunderstanding who said what here.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-02 14:06                       ` Ross S. W. Walker
@ 2009-04-02 15:36                         ` Vladislav Bolkhovitin
  -1 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 15:36 UTC (permalink / raw)
  To: Ross S. W. Walker
  Cc: James Bottomley, linux-scsi,
	iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, stgt, scst-devel

Ross S. W. Walker, on 04/02/2009 06:06 PM wrote:
> Vladislav Bolkhovitin wrote:
>> Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
>>> James Bottomley, on 04/02/2009 12:23 AM wrote:
>>>> SCST explicitly fiddles with the io context to get this to happen.  It
>>>> has a hack to block to export alloc_io_context:
>>>>
>>>> http://marc.info/?t=122893564800003
>>> Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
> 
> Call it what you like,
> 
> Vladislav Bolkhovitin wrote:
>> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
>>
>> I would be interested in knowing how your code defeats CFQ's extremely
>> high latency? Does your code reach into the io scheduler too? If not,
>> some code hints would be great.
> 
> Hmm, CFQ doesn't have any extra processing latency, especially 
> "extremely", hence there is nothing to defeat. If it had, how could it 
> been chosen as the default?
> 
> ----------
> List:       linux-scsi
> Subject:    [PATCH][RFC 13/23]: Export of alloc_io_context() function
> From:       Vladislav Bolkhovitin <vst () vlnb ! net>
> Date:       2008-12-10 18:49:19
> Message-ID: 49400F2F.4050603 () vlnb ! net
> 
> This patch exports alloc_io_context() function. For performance reasons 
> SCST queues commands using a pool of IO threads. It is considerably 
> better for performance (>30% increase on sequential reads) if threads in 
>   a pool have the same IO context. Since SCST can be built as a module, 
> it needs alloc_io_context() function exported.
> 
> <snip>
> ----------
> 
> I call that lying.
> 
>>> But that's not the only reason for good performance. Particularly, it 
>>> can't explain Bart's tmpfs results from the previous message, where the 
>>> majority of I/O done to/from RAM without any I/O scheduler involved. (Or 
>>> does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I 
>>> remember correctly, i.e. the test data set was 25% of RAM.
>> To remove any suspicions that I'm playing dirty games here I should note 
> <snip>
> 
> I don't know what games your playing at, but do me a favor, if your too
> stupid enough to realize when your caught in a lie and to just shut up
> then please do me the favor and leave me out of any further correspondence
> from you.

Think what you want and do what you want. You can even filter out all 
e-mails from me, that's your right. But:

1. As I wrote grouping threads into a single IO context doesn't explain 
all the performance difference and finding out reasons for other's 
performance problems isn't something I can afford at the moment.

2. CFQ doesn't have any processing latency and has never had. Learn to 
understand what are your writing about and how to correctly express 
yourself at first. You asked about that latency and I replied that there 
is nothing to defeat.

3. SCST doesn't have any hooks into CFQ and not going to have in the 
considerable future.

> Thank you,
> 
> -Ross


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 15:36                         ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 15:36 UTC (permalink / raw)
  To: Ross S. W. Walker
  Cc: James Bottomley, linux-scsi,
	iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, stgt, scst-devel

Ross S. W. Walker, on 04/02/2009 06:06 PM wrote:
> Vladislav Bolkhovitin wrote:
>> Vladislav Bolkhovitin, on 04/02/2009 11:38 AM wrote:
>>> James Bottomley, on 04/02/2009 12:23 AM wrote:
>>>> SCST explicitly fiddles with the io context to get this to happen.  It
>>>> has a hack to block to export alloc_io_context:
>>>>
>>>> http://marc.info/?t=122893564800003
>>> Correct, although I wouldn't call it "fiddle", rather "grouping" ;)
> 
> Call it what you like,
> 
> Vladislav Bolkhovitin wrote:
>> Ross S. W. Walker, on 03/30/2009 10:33 PM wrote:
>>
>> I would be interested in knowing how your code defeats CFQ's extremely
>> high latency? Does your code reach into the io scheduler too? If not,
>> some code hints would be great.
> 
> Hmm, CFQ doesn't have any extra processing latency, especially 
> "extremely", hence there is nothing to defeat. If it had, how could it 
> been chosen as the default?
> 
> ----------
> List:       linux-scsi
> Subject:    [PATCH][RFC 13/23]: Export of alloc_io_context() function
> From:       Vladislav Bolkhovitin <vst () vlnb ! net>
> Date:       2008-12-10 18:49:19
> Message-ID: 49400F2F.4050603 () vlnb ! net
> 
> This patch exports alloc_io_context() function. For performance reasons 
> SCST queues commands using a pool of IO threads. It is considerably 
> better for performance (>30% increase on sequential reads) if threads in 
>   a pool have the same IO context. Since SCST can be built as a module, 
> it needs alloc_io_context() function exported.
> 
> <snip>
> ----------
> 
> I call that lying.
> 
>>> But that's not the only reason for good performance. Particularly, it 
>>> can't explain Bart's tmpfs results from the previous message, where the 
>>> majority of I/O done to/from RAM without any I/O scheduler involved. (Or 
>>> does I/O scheduler also involved with tmpfs?) Bart has 4GB RAM, if I 
>>> remember correctly, i.e. the test data set was 25% of RAM.
>> To remove any suspicions that I'm playing dirty games here I should note 
> <snip>
> 
> I don't know what games your playing at, but do me a favor, if your too
> stupid enough to realize when your caught in a lie and to just shut up
> then please do me the favor and leave me out of any further correspondence
> from you.

Think what you want and do what you want. You can even filter out all 
e-mails from me, that's your right. But:

1. As I wrote grouping threads into a single IO context doesn't explain 
all the performance difference and finding out reasons for other's 
performance problems isn't something I can afford at the moment.

2. CFQ doesn't have any processing latency and has never had. Learn to 
understand what are your writing about and how to correctly express 
yourself at first. You asked about that latency and I replied that there 
is nothing to defeat.

3. SCST doesn't have any hooks into CFQ and not going to have in the 
considerable future.

> Thank you,
> 
> -Ross


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-01 20:14   ` Bart Van Assche
  (?)
@ 2009-04-02 17:16   ` Vladislav Bolkhovitin
  2009-04-03 17:08     ` Bart Van Assche
  2009-04-04  8:04     ` [Scst-devel] ISCSI-SCST performance (with also IET and STGT data) Bart Van Assche
  -1 siblings, 2 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-02 17:16 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: scst-devel, linux-kernel, linux-scsi

Bart Van Assche, on 04/02/2009 12:14 AM wrote:
> On Mon, Mar 30, 2009 at 7:33 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> ==================================================================
>> I. SEQUENTIAL ACCESS OVER SINGLE LINE
>>
>> 1. # dd if=/dev/sdX of=/dev/null bs=512K count=2000
>>
>>                        ISCSI-SCST      IET             STGT
>> NULLIO:                 106             105             103
>> FILEIO/CFQ:             82              57              55
>> FILEIO/deadline         69              69              67
>> BLOCKIO/CFQ             81              28              -
>> BLOCKIO/deadline        80              66              -
> 
> I have repeated some of these performance tests for iSCSI over IPoIB
> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
> for the buffered I/O test with a block size of 512K (initiator)
> against a file of 1GB residing on a tmpfs filesystem on the target are
> as follows:
> 
> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
> 
> And for a block size of 4 KB:
> 
> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.

Do you have any thoughts why writes are so bad? It shouldn't be so..

> Or: depending on the test scenario, SCST transfers data between 2% and
> 30% faster via the iSCSI protocol over this network.
> 
> Something that is not relevant for this comparison, but interesting to
> know: with the SRP implementation in SCST the maximal read throughput
> is 1290 MB/s on the same setup.

This can be well explained. The limiting factor for iSCSI is that 
iSCSI/TCP processing overloads a single CPU core. You can prove that on 
vmstat output during the test. Sum of user and sys time should be about 
100/(number of CPUs) or higher. SRP has a lot more CPU effective, hence 
better has throughput.

If you try to test with 2 or more parallel IO streams, you should have 
the correspondingly increased aggregate throughput up to the moment you 
hit your memory copy bandwidth.

Thanks,
Vlad


> Bart.
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-02 15:36                         ` Vladislav Bolkhovitin
@ 2009-04-02 17:19                           ` Ross S. W. Walker
  -1 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 17:19 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: James Bottomley, linux-scsi,
	iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, stgt, scst-devel

Vladislav Bolkhovitin wrote:
> 
> Think what you want and do what you want. You can even filter out all 
> e-mails from me, that's your right. But:
> 
> 1. As I wrote grouping threads into a single IO context doesn't explain 
> all the performance difference and finding out reasons for other's 
> performance problems isn't something I can afford at the moment.

No, not all the performance, but a substantial part of it, enough
so to say IET has a real performance issue when using CFQ scheduler.

> 2. CFQ doesn't have any processing latency and has never had. Learn to 
> understand what are your writing about and how to correctly express 
> yourself at first. You asked about that latency and I replied that there 
> is nothing to defeat.

CFQ pauses briefly before switching I/O contexts in order to make sure
it is giving as much bandwidth to a context before moving on. This is
documented. With a single I/O stream, or random I/O it won't be
noticeable, but for interleaved sequential I/O across multiple threads
with different I/O contexts it can be significant.

Not that Wikipedia is authorative: http://en.wikipedia.org/wiki/CFQ

It's right in the first paragraph:

"... While CFQ does not do explicit anticipatory IO scheduling, it
achieves the same effect of having good aggregate throughput for the
system as a whole, by allowing a process queue to idle at the end of
synchronous IO thereby "anticipating" further close IO from that
process. ..."

You can also check out the LXR:

This one in 2.6.18 kernels (RHEL) show a pause of HZ/10

http://lxr.linux.no/linux+v2.6.18/block/cfq-iosched.c#L30

So given a 10ms time slice, that would equate to ~1ms, in later
kernels it's defined as HZ/5 which can equate to ~2ms. These ms
delays can be an eternity for sequential I/O patterns.

> 3. SCST doesn't have any hooks into CFQ and not going to have in the 
> considerable future.

True, SCST doesn't have any hooks into CFQ, but your code modifies
block/blk-ioc.c to export the alloc_io_context(), which by default
is a private function, to allow your kernel based threads to set
their I/O contexts to the same group, therefore avoiding the delay
CFQ imposes on the switching of the I/O contexts between these
threads.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also IET and STGT data)
@ 2009-04-02 17:19                           ` Ross S. W. Walker
  0 siblings, 0 replies; 34+ messages in thread
From: Ross S. W. Walker @ 2009-04-02 17:19 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: James Bottomley, linux-scsi,
	iSCSI Enterprise Target Developer List, linux-kernel,
	Ross Walker, stgt, scst-devel

Vladislav Bolkhovitin wrote:
> 
> Think what you want and do what you want. You can even filter out all 
> e-mails from me, that's your right. But:
> 
> 1. As I wrote grouping threads into a single IO context doesn't explain 
> all the performance difference and finding out reasons for other's 
> performance problems isn't something I can afford at the moment.

No, not all the performance, but a substantial part of it, enough
so to say IET has a real performance issue when using CFQ scheduler.

> 2. CFQ doesn't have any processing latency and has never had. Learn to 
> understand what are your writing about and how to correctly express 
> yourself at first. You asked about that latency and I replied that there 
> is nothing to defeat.

CFQ pauses briefly before switching I/O contexts in order to make sure
it is giving as much bandwidth to a context before moving on. This is
documented. With a single I/O stream, or random I/O it won't be
noticeable, but for interleaved sequential I/O across multiple threads
with different I/O contexts it can be significant.

Not that Wikipedia is authorative: http://en.wikipedia.org/wiki/CFQ

It's right in the first paragraph:

"... While CFQ does not do explicit anticipatory IO scheduling, it
achieves the same effect of having good aggregate throughput for the
system as a whole, by allowing a process queue to idle at the end of
synchronous IO thereby "anticipating" further close IO from that
process. ..."

You can also check out the LXR:

This one in 2.6.18 kernels (RHEL) show a pause of HZ/10

http://lxr.linux.no/linux+v2.6.18/block/cfq-iosched.c#L30

So given a 10ms time slice, that would equate to ~1ms, in later
kernels it's defined as HZ/5 which can equate to ~2ms. These ms
delays can be an eternity for sequential I/O patterns.

> 3. SCST doesn't have any hooks into CFQ and not going to have in the 
> considerable future.

True, SCST doesn't have any hooks into CFQ, but your code modifies
block/blk-ioc.c to export the alloc_io_context(), which by default
is a private function, to allow your kernel based threads to set
their I/O contexts to the same group, therefore avoiding the delay
CFQ imposes on the switching of the I/O contexts between these
threads.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-02 17:16   ` [Scst-devel] " Vladislav Bolkhovitin
@ 2009-04-03 17:08     ` Bart Van Assche
  2009-04-03 17:13         ` Sufficool, Stanley
  2009-04-04  8:04     ` [Scst-devel] ISCSI-SCST performance (with also IET and STGT data) Bart Van Assche
  1 sibling, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2009-04-03 17:08 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: scst-devel, linux-kernel, linux-scsi

On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Bart Van Assche, on 04/02/2009 12:14 AM wrote:
>> I have repeated some of these performance tests for iSCSI over IPoIB
>> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
>> for the buffered I/O test with a block size of 512K (initiator)
>> against a file of 1GB residing on a tmpfs filesystem on the target are
>> as follows:
>>
>> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
>> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>>
>> And for a block size of 4 KB:
>>
>> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
>> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
>
> Do you have any thoughts why writes are so bad? It shouldn't be so..

It's not impossible that with the 4 KB write test I hit the limits of
the initiator system (Intel E6750 CPU, 2.66 GHz, two cores). Some
statistics I gathered during the 4 KB write test:
Target: CPU load 0.5, 16500 mlx4-comp-0 interrupts per second, same
number of interrupts processed by each core (8250/s).
Initiator: CPU load 1.0, 32850 mlx4-comp-0 interrupts per second, all
interrupts occurred on the same core.

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Scst-devel] ISCSI-SCST performance (with also IET and STGTdata)
  2009-04-03 17:08     ` Bart Van Assche
@ 2009-04-03 17:13         ` Sufficool, Stanley
  0 siblings, 0 replies; 34+ messages in thread
From: Sufficool, Stanley @ 2009-04-03 17:13 UTC (permalink / raw)
  To: Bart Van Assche, Vladislav Bolkhovitin
  Cc: scst-devel, linux-kernel, linux-scsi



> -----Original Message-----
> From: Bart Van Assche [mailto:bart.vanassche@gmail.com] 
> Sent: Friday, April 03, 2009 10:09 AM
> To: Vladislav Bolkhovitin
> Cc: scst-devel; linux-kernel@vger.kernel.org; 
> linux-scsi@vger.kernel.org
> Subject: Re: [Scst-devel] ISCSI-SCST performance (with also 
> IET and STGTdata)
> 
> 
> On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin 
> <vst@vlnb.net> wrote:
> > Bart Van Assche, on 04/02/2009 12:14 AM wrote:
> >> I have repeated some of these performance tests for iSCSI 
> over IPoIB 
> >> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). 
> The results 
> >> for the buffered I/O test with a block size of 512K (initiator) 
> >> against a file of 1GB residing on a tmpfs filesystem on the target 
> >> are as follows:
> >>
> >> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
> >> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
> >>
> >> And for a block size of 4 KB:
> >>
> >> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
> >> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
> >
> > Do you have any thoughts why writes are so bad? It shouldn't be so..
> 
> It's not impossible that with the 4 KB write test I hit the 
> limits of the initiator system (Intel E6750 CPU, 2.66 GHz, 
> two cores). Some statistics I gathered during the 4 KB write test:
> Target: CPU load 0.5, 16500 mlx4-comp-0 interrupts per 
> second, same number of interrupts processed by each core (8250/s).
> Initiator: CPU load 1.0, 32850 mlx4-comp-0 interrupts per 
> second, all interrupts occurred on the same core.

Are you using connected mode IPoIB and setting the MTU to 4KB? Would
fragmentation of IPoIB drive up the interrupt rates?

> 
> Bart.
> 
> --------------------------------------------------------------
> ----------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Scst-devel] ISCSI-SCST performance (with also IET and STGTdata)
@ 2009-04-03 17:13         ` Sufficool, Stanley
  0 siblings, 0 replies; 34+ messages in thread
From: Sufficool, Stanley @ 2009-04-03 17:13 UTC (permalink / raw)
  To: Bart Van Assche, Vladislav Bolkhovitin
  Cc: scst-devel, linux-kernel, linux-scsi



> -----Original Message-----
> From: Bart Van Assche [mailto:bart.vanassche@gmail.com] 
> Sent: Friday, April 03, 2009 10:09 AM
> To: Vladislav Bolkhovitin
> Cc: scst-devel; linux-kernel@vger.kernel.org; 
> linux-scsi@vger.kernel.org
> Subject: Re: [Scst-devel] ISCSI-SCST performance (with also 
> IET and STGTdata)
> 
> 
> On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin 
> <vst@vlnb.net> wrote:
> > Bart Van Assche, on 04/02/2009 12:14 AM wrote:
> >> I have repeated some of these performance tests for iSCSI 
> over IPoIB 
> >> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). 
> The results 
> >> for the buffered I/O test with a block size of 512K (initiator) 
> >> against a file of 1GB residing on a tmpfs filesystem on the target 
> >> are as follows:
> >>
> >> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
> >> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
> >>
> >> And for a block size of 4 KB:
> >>
> >> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
> >> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
> >
> > Do you have any thoughts why writes are so bad? It shouldn't be so..
> 
> It's not impossible that with the 4 KB write test I hit the 
> limits of the initiator system (Intel E6750 CPU, 2.66 GHz, 
> two cores). Some statistics I gathered during the 4 KB write test:
> Target: CPU load 0.5, 16500 mlx4-comp-0 interrupts per 
> second, same number of interrupts processed by each core (8250/s).
> Initiator: CPU load 1.0, 32850 mlx4-comp-0 interrupts per 
> second, all interrupts occurred on the same core.

Are you using connected mode IPoIB and setting the MTU to 4KB? Would
fragmentation of IPoIB drive up the interrupt rates?

> 
> Bart.
> 
> --------------------------------------------------------------
> ----------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGTdata)
  2009-04-03 17:13         ` Sufficool, Stanley
  (?)
@ 2009-04-03 17:52         ` Bart Van Assche
  -1 siblings, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2009-04-03 17:52 UTC (permalink / raw)
  To: Sufficool, Stanley
  Cc: Vladislav Bolkhovitin, scst-devel, linux-kernel, linux-scsi

On Fri, Apr 3, 2009 at 7:13 PM, Sufficool, Stanley
<ssufficool@rov.sbcounty.gov> wrote:
>> On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin
>> <vst@vlnb.net> wrote:
>> > Bart Van Assche, on 04/02/2009 12:14 AM wrote:
>> >> I have repeated some of these performance tests for iSCSI
>> >> over IPoIB
>> >> (two DDR PCIe 1.0 ConnectX HCA's connected back to back).
>> >> The results
>> >> for the buffered I/O test with a block size of 512K (initiator)
>> >> against a file of 1GB residing on a tmpfs filesystem on the target
>> >> are as follows:
>> >>
>> >> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
>> >> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>> >>
>> >> And for a block size of 4 KB:
>> >>
>> >> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
>> >> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
>> >
>> > Do you have any thoughts why writes are so bad? It shouldn't be so..
>>
>> It's not impossible that with the 4 KB write test I hit the
>> limits of the initiator system (Intel E6750 CPU, 2.66 GHz,
>> two cores). Some statistics I gathered during the 4 KB write test:
>> Target: CPU load 0.5, 16500 mlx4-comp-0 interrupts per
>> second, same number of interrupts processed by each core (8250/s).
>> Initiator: CPU load 1.0, 32850 mlx4-comp-0 interrupts per
>> second, all interrupts occurred on the same core.
>
> Are you using connected mode IPoIB and setting the MTU to 4KB? Would
> fragmentation of IPoIB drive up the interrupt rates?

All tests have been run with default IPoIB settings: an MTU of 2044
bytes and datagram mode. The following data has been obtained from the
target system after several 4 KB write tests:

$ cat /sys/class/net/ib0/mode
datagram
$ /sbin/ifconfig ib0
ib0       Link encap:UNSPEC  HWaddr
80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:192.168.2.1  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:2:d217/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:88482013 errors:0 dropped:0 overruns:0 frame:0
          TX packets:38444824 errors:0 dropped:11 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:135770573672 (129480.9 Mb)  TX bytes:5647702210 (5386.0 Mb)

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-02 17:16   ` [Scst-devel] " Vladislav Bolkhovitin
  2009-04-03 17:08     ` Bart Van Assche
@ 2009-04-04  8:04     ` Bart Van Assche
  2009-04-17 18:11       ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2009-04-04  8:04 UTC (permalink / raw)
  To: Vladislav Bolkhovitin; +Cc: scst-devel, linux-kernel, linux-scsi

On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
> Bart Van Assche, on 04/02/2009 12:14 AM wrote:
>> I have repeated some of these performance tests for iSCSI over IPoIB
>> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
>> for the buffered I/O test with a block size of 512K (initiator)
>> against a file of 1GB residing on a tmpfs filesystem on the target are
>> as follows:
>>
>> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
>> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>>
>> And for a block size of 4 KB:
>>
>> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
>> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
>
> Do you have any thoughts why writes are so bad? It shouldn't be so..

By this time I have run the following variation of the 4 KB write test:
* Target: iSCSI-SCST was exporting a 1 GB file residing on a tmpfs filesystem.
* Initiator: two processes were writing 4 KB blocks as follows:
dd if=/dev/zero of=/dev/sdb bs=4K seek=0 count=131072 oflag=sync &
dd if=/dev/zero of=/dev/sdb bs=4K seek=131072 count=131072 oflag=sync &

Results:
* Each dd process on the initiator was writing at a speed of 37.8
MB/s, or a combined writing speed of 75.6 MB/s.
* CPU load on the initiator system during the test: 2.0.
* According to /proc/interrupts, about 38000 mlx4-comp-0 interrupts
were triggered per second.

These results confirm that the initiator system was the bottleneck
during the 4 KB write test, not the target system.

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Scst-devel] ISCSI-SCST performance (with also IET and STGT data)
  2009-04-04  8:04     ` [Scst-devel] ISCSI-SCST performance (with also IET and STGT data) Bart Van Assche
@ 2009-04-17 18:11       ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 34+ messages in thread
From: Vladislav Bolkhovitin @ 2009-04-17 18:11 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: scst-devel, linux-kernel, linux-scsi

Bart Van Assche, on 04/04/2009 12:04 PM wrote:
> On Thu, Apr 2, 2009 at 7:16 PM, Vladislav Bolkhovitin <vst@vlnb.net> wrote:
>> Bart Van Assche, on 04/02/2009 12:14 AM wrote:
>>> I have repeated some of these performance tests for iSCSI over IPoIB
>>> (two DDR PCIe 1.0 ConnectX HCA's connected back to back). The results
>>> for the buffered I/O test with a block size of 512K (initiator)
>>> against a file of 1GB residing on a tmpfs filesystem on the target are
>>> as follows:
>>>
>>> write-test: iSCSI-SCST 243 MB/s; IET 192 MB/s.
>>> read-test: iSCSI-SCST 291 MB/s; IET 223 MB/s.
>>>
>>> And for a block size of 4 KB:
>>>
>>> write-test: iSCSI-SCST 43 MB/s; IET 42 MB/s.
>>> read-test: iSCSI-SCST 288 MB/s; IET 221 MB/s.
>> Do you have any thoughts why writes are so bad? It shouldn't be so..
> 
> By this time I have run the following variation of the 4 KB write test:
> * Target: iSCSI-SCST was exporting a 1 GB file residing on a tmpfs filesystem.
> * Initiator: two processes were writing 4 KB blocks as follows:
> dd if=/dev/zero of=/dev/sdb bs=4K seek=0 count=131072 oflag=sync &
> dd if=/dev/zero of=/dev/sdb bs=4K seek=131072 count=131072 oflag=sync &
> 
> Results:
> * Each dd process on the initiator was writing at a speed of 37.8
> MB/s, or a combined writing speed of 75.6 MB/s.
> * CPU load on the initiator system during the test: 2.0.
> * According to /proc/interrupts, about 38000 mlx4-comp-0 interrupts
> were triggered per second.
> 
> These results confirm that the initiator system was the bottleneck
> during the 4 KB write test, not the target system.

If so with oflag=direct you should have a performance gain, because you 
will eliminate a data copy.

> Bart.
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2009-04-17 18:11 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-30 17:33 ISCSI-SCST performance (with also IET and STGT data) Vladislav Bolkhovitin
     [not found] ` <e2e108260903301106y2b750c23kfab978567f3de3a0@mail.gmail.com>
2009-03-30 18:33   ` [Scst-devel] " Vladislav Bolkhovitin
2009-03-30 18:53     ` Bart Van Assche
2009-03-30 18:53       ` Bart Van Assche
2009-03-31 17:37       ` Vladislav Bolkhovitin
2009-03-31 18:43         ` [Iscsitarget-devel] [Scst-devel] ISCSI-SCST performance (withalso " Ross S. W. Walker
2009-03-31 18:43           ` Ross S. W. Walker
2009-04-01  6:29           ` [Iscsitarget-devel] " Bart Van Assche
2009-04-01  6:29             ` Bart Van Assche
2009-04-01 12:20             ` Ross Walker
2009-04-01 12:20               ` Ross Walker
2009-04-01 20:23               ` James Bottomley
2009-04-01 20:23                 ` James Bottomley
2009-04-02  7:38                 ` [Scst-devel] [Iscsitarget-devel] ISCSI-SCST performance (with also " Vladislav Bolkhovitin
2009-04-02  7:38                   ` Vladislav Bolkhovitin
2009-04-02  9:02                   ` Vladislav Bolkhovitin
2009-04-02  9:02                     ` Vladislav Bolkhovitin
2009-04-02 14:06                     ` Ross S. W. Walker
2009-04-02 14:06                       ` Ross S. W. Walker
2009-04-02 14:14                       ` Ross S. W. Walker
2009-04-02 14:14                         ` Ross S. W. Walker
2009-04-02 15:36                       ` Vladislav Bolkhovitin
2009-04-02 15:36                         ` Vladislav Bolkhovitin
2009-04-02 17:19                         ` Ross S. W. Walker
2009-04-02 17:19                           ` Ross S. W. Walker
2009-04-01 20:14 ` Bart Van Assche
2009-04-01 20:14   ` Bart Van Assche
2009-04-02 17:16   ` [Scst-devel] " Vladislav Bolkhovitin
2009-04-03 17:08     ` Bart Van Assche
2009-04-03 17:13       ` [Scst-devel] ISCSI-SCST performance (with also IET and STGTdata) Sufficool, Stanley
2009-04-03 17:13         ` Sufficool, Stanley
2009-04-03 17:52         ` Bart Van Assche
2009-04-04  8:04     ` [Scst-devel] ISCSI-SCST performance (with also IET and STGT data) Bart Van Assche
2009-04-17 18:11       ` Vladislav Bolkhovitin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.