linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is user-space AIO dead?
@ 2006-01-11 18:12 Kenny Simpson
  2006-01-11 18:20 ` Marcin Dalecki
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Kenny Simpson @ 2006-01-11 18:12 UTC (permalink / raw)
  To: linux kernel

Hi,
  Having read the excellent paper by IBM presented at the 2003 OLS about Asynchronous I/O Support
in Linux 2.5, I found the conclusion rather disappointing:
"In conclusion, there appears to be no conditions for raw or O_DIRECT access under which AIO can
show a noticable benefit." - p385.
http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Pulavarty-OLS2003.pdf

Is this still the case?

If I want a transactional engine (like a database) that needs to persist to stable storage, is it
still best to use a helper thread to do write/fsync or O_SYNC|O_DIRECT?

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 18:12 Is user-space AIO dead? Kenny Simpson
@ 2006-01-11 18:20 ` Marcin Dalecki
  2006-01-11 18:23 ` David Lloyd
  2006-01-11 18:41 ` Benjamin LaHaise
  2 siblings, 0 replies; 12+ messages in thread
From: Marcin Dalecki @ 2006-01-11 18:20 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: linux kernel


On 2006-01-11, at 19:12, Kenny Simpson wrote:
> If I want a transactional engine (like a database) that needs to  
> persist to stable storage, is it
> still best to use a helper thread to do write/fsync or O_SYNC| 
> O_DIRECT?

Yes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 18:12 Is user-space AIO dead? Kenny Simpson
  2006-01-11 18:20 ` Marcin Dalecki
@ 2006-01-11 18:23 ` David Lloyd
  2006-01-11 18:45   ` Kenny Simpson
  2006-01-11 18:41 ` Benjamin LaHaise
  2 siblings, 1 reply; 12+ messages in thread
From: David Lloyd @ 2006-01-11 18:23 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: linux kernel

On Wed, 11 Jan 2006, Kenny Simpson wrote:

> Hi,
>  Having read the excellent paper by IBM presented at the 2003 OLS about Asynchronous I/O Support
> in Linux 2.5, I found the conclusion rather disappointing:
> "In conclusion, there appears to be no conditions for raw or O_DIRECT access under which AIO can
> show a noticable benefit." - p385.
> http://archive.linuxsymposium.org/ols2003/Proceedings/All-Reprints/Reprint-Pulavarty-OLS2003.pdf
>
> Is this still the case?
>
> If I want a transactional engine (like a database) that needs to persist to stable storage, is it
> still best to use a helper thread to do write/fsync or O_SYNC|O_DIRECT?

Wouldn't nonblocking I/O on regular files be nice?

- D

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 18:12 Is user-space AIO dead? Kenny Simpson
  2006-01-11 18:20 ` Marcin Dalecki
  2006-01-11 18:23 ` David Lloyd
@ 2006-01-11 18:41 ` Benjamin LaHaise
  2006-01-11 18:54   ` Kenny Simpson
  2 siblings, 1 reply; 12+ messages in thread
From: Benjamin LaHaise @ 2006-01-11 18:41 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: linux kernel

On Wed, Jan 11, 2006 at 10:12:52AM -0800, Kenny Simpson wrote:
> If I want a transactional engine (like a database) that needs to persist to stable storage, is it
> still best to use a helper thread to do write/fsync or O_SYNC|O_DIRECT?

It all depends on which database engine you're using.  Getting Oracle 
tuned to the Linux AIO implementation took a few revisions, but what's 
out in the fields these days makes good use of aio to gain 10-15% on 
the usual large industry standard database benchmark.

		-ben
-- 
"You know, I've seen some crystals do some pretty trippy shit, man."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 18:23 ` David Lloyd
@ 2006-01-11 18:45   ` Kenny Simpson
  2006-01-11 19:10     ` David Lloyd
  0 siblings, 1 reply; 12+ messages in thread
From: Kenny Simpson @ 2006-01-11 18:45 UTC (permalink / raw)
  To: David Lloyd; +Cc: linux kernel

--- David Lloyd <dmlloyd@tds.net> wrote:
> Wouldn't nonblocking I/O on regular files be nice?

Yes it could be.  As I understand it, regular file writes (not O_DIRECT) are only to the page
cache and only block when there is memory pressure (so it is more of a throttle).

Reads, on the other hand, could be quite handy.  What might be very cool is if there were a way to
mmap and start faulting in the pages in the background, and get notified as they complete - or
when all the faulting is done.

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 18:41 ` Benjamin LaHaise
@ 2006-01-11 18:54   ` Kenny Simpson
  0 siblings, 0 replies; 12+ messages in thread
From: Kenny Simpson @ 2006-01-11 18:54 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux kernel

--- Benjamin LaHaise <bcrl@kvack.org> wrote:
> It all depends on which database engine you're using.
Not interrested in using, more interrested in building.

> Getting Oracle 
> tuned to the Linux AIO implementation took a few revisions, but what's 
> out in the fields these days makes good use of aio to gain 10-15% on 
> the usual large industry standard database benchmark.

I was about to start out testing libaio for a simple transaction engine and read this paper, so I
thought it prudent to ask around before investing too much effort.

Are there any more up-to-date references?

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 18:45   ` Kenny Simpson
@ 2006-01-11 19:10     ` David Lloyd
  2006-01-11 19:20       ` Kenny Simpson
  0 siblings, 1 reply; 12+ messages in thread
From: David Lloyd @ 2006-01-11 19:10 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: linux kernel

On Wed, 11 Jan 2006, Kenny Simpson wrote:

> --- David Lloyd <dmlloyd@tds.net> wrote:
>> Wouldn't nonblocking I/O on regular files be nice?
>
> Yes it could be.  As I understand it, regular file writes (not O_DIRECT) 
> are only to the page cache and only block when there is memory pressure 
> (so it is more of a throttle).

If you were however using O_DIRECT or O_SYNC, you would then have a 
mechanism to know when your writes have made it to disk, which might be 
useful for transactional systems.

- D

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 19:10     ` David Lloyd
@ 2006-01-11 19:20       ` Kenny Simpson
  2006-01-11 20:31         ` Phillip Susi
  0 siblings, 1 reply; 12+ messages in thread
From: Kenny Simpson @ 2006-01-11 19:20 UTC (permalink / raw)
  To: David Lloyd; +Cc: linux kernel

--- David Lloyd <dmlloyd@tds.net> wrote:

> On Wed, 11 Jan 2006, Kenny Simpson wrote:
> 
> > --- David Lloyd <dmlloyd@tds.net> wrote:
> >> Wouldn't nonblocking I/O on regular files be nice?
> >
> > Yes it could be.  As I understand it, regular file writes (not O_DIRECT) 
> > are only to the page cache and only block when there is memory pressure 
> > (so it is more of a throttle).
> 
> If you were however using O_DIRECT or O_SYNC, you would then have a 
> mechanism to know when your writes have made it to disk, which might be 
> useful for transactional systems.

Right, but I'm not sure O_DIRECT implies stable storage, only data sent out to the device, not
held up in the page cache (I could be wrong).

AIO is implemented for O_DIRECT according to the paper, but they observed it not having benefit.

AIO being implemented to O_SYNC would be nice for my use, as it would also eliminate the extra
alignment restrictions brought on by O_DIRECT.

-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 19:20       ` Kenny Simpson
@ 2006-01-11 20:31         ` Phillip Susi
  2006-01-11 22:02           ` Kenny Simpson
  0 siblings, 1 reply; 12+ messages in thread
From: Phillip Susi @ 2006-01-11 20:31 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: David Lloyd, linux kernel

It is not possible to use non blocking IO with O_DIRECT, because the 
kernel does not buffer the data, and once the write() call returns, the 
kernel can not touch the caller's buffer any more.  The idea of O_DIRECT 
is that the hardware can directly DMA from the caller's buffer, so if 
you want to keep the hardware busy, you need to use async IO so the 
hardware always has some work to do. 

I actually hacked up dd to use async IO ( via io_submit ) in conjunction 
with O_DIRECT and it did noticeably improve ( ~10% ish ) both throughput 
and cpu utilization.  I have an OO.o spreadsheet showing the results of 
some simple benchmarking with various parameters I did at home, which I 
will post later this evening. 

Of course, dd is a simplistic case of sequential IO.  If you have 
something like a big database that needs to concurrently handle dozens 
or hundreds of random IO requests at once, O_DIRECT async IO is 
definitely going to be a clear winner. 

Kenny Simpson wrote:
> Right, but I'm not sure O_DIRECT implies stable storage, only data sent out to the device, not
> held up in the page cache (I could be wrong).
>
> AIO is implemented for O_DIRECT according to the paper, but they observed it not having benefit.
>
> AIO being implemented to O_SYNC would be nice for my use, as it would also eliminate the extra
> alignment restrictions brought on by O_DIRECT.
>
> -Kenny
>
>   


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 20:31         ` Phillip Susi
@ 2006-01-11 22:02           ` Kenny Simpson
  2006-01-12  3:50             ` Phillip Susi
  0 siblings, 1 reply; 12+ messages in thread
From: Kenny Simpson @ 2006-01-11 22:02 UTC (permalink / raw)
  To: Phillip Susi; +Cc: David Lloyd, linux kernel

--- Phillip Susi <psusi@cfl.rr.com> wrote:
> I actually hacked up dd to use async IO ( via io_submit ) in conjunction 
> with O_DIRECT and it did noticeably improve ( ~10% ish ) both throughput 
> and cpu utilization.  I have an OO.o spreadsheet showing the results of 
> some simple benchmarking with various parameters I did at home, which I 
> will post later this evening. 
> 
> Of course, dd is a simplistic case of sequential IO.  If you have 
> something like a big database that needs to concurrently handle dozens 
> or hundreds of random IO requests at once, O_DIRECT async IO is 
> definitely going to be a clear winner. 

The part I am writing looks like a transaction log writer:
  Lots of sequential small-ish writes (call each quanta a transaction)
  Must be written to stable storage
  Must know when the writes are completed
  The data is only read back for recovery processing

  In the past, the way I found to have worked best is to have a dedicated thread pulling
transactions off a queue and doing the blocking syncronous writes either by write(v)/fsync or
write(v) on a file opened with O_SYNC | O_DIRECT.  Once the fsync returned, the thread would
signal completion and grab the next batch to start writing.
  This works very well and can easily max out any real device's bandwidth, but incurs more latency
than should be absolutely needed due to the extra context switching from the completion
signalling.

  I am hoping AIO can be used to reduce the latency, but was a bit discouraged after reading the
IBM paper.

  I am looking forward to your post reguarding dd.

thanks,
-Kenny


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-11 22:02           ` Kenny Simpson
@ 2006-01-12  3:50             ` Phillip Susi
  2006-01-12  4:14               ` Phillip Susi
  0 siblings, 1 reply; 12+ messages in thread
From: Phillip Susi @ 2006-01-12  3:50 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: David Lloyd, linux kernel

Attached are the results of some simple testing I did in ods format. 
These tests involved having dd read the first GB of data from my two 
drive sata (fake)raid0 array with varying numbers of concurrent aio 
operations ( except for the original, non aio dd of course ).

I performed these tests with cpufreq disabled and filesystems mounted 
with noatime to insure no disturbances.  I also set the IO scheduler to 
noop, otherwise the default scheduler reordered the IO requests which 
was not good for sequential throughput.  I used commands like this:

sync
dd bs=512 count=1 iflag=direct if=/dev/sda of=/dev/null
dd bs=512 count=1 iflag=direct if=/dev/sdb of=/dev/null
time dd bs=128KiB count=32768 iflag=direct if=/dev/mapper/via_hfciifae 
of=/dev/null

The first two commands were to make sure the drive head was on track 
zero, otherwise the TCQ on the drives kicked in and reordered some of 
the earlier reads as the head seeked to track zero.

The results show a rather large increase in throughput for block sizes 
under 128 KB, with a smaller improvement on larger block size. 
Likewise, the cpu time used was significantly lower, especially with 
block sizes less than 128 KB.  In most cases, the original dd uses 2-3 
times more cpu time than the aio dd.

The original dd reached near peak throughput ( 93.4 MB/s ) at a block 
size of 128 KB.  I believe this is due in part to that being the stripe 
width of the array, so smaller block sizes did not keep both drives 
operating full time.  In contrast all of the aio trials reached peak 
throughput of 97.x MB/s with a block size of only 32 KB, and at the 
smallest block size of 16 KB, the aio(16) trial managed more than 20% 
higher throughput than the non aio dd ( 72.1 vs  59.7 MB/s ), and did so 
using 1/7th the cpu time.

To show the difference O_DIRECT makes, at 128 KB block size the original 
dd with O_DIRECT managed 93.4 MB/s using 0.906 seconds of CPU time. 
Without O_DIRECT, the original dd only sustains 82.6 MB/s and uses a 
whopping 2.912 seconds of cpu time, or more than triple the time without 
O_DIRECT, and 13x more cpu time than the aio(4) test at that block size!

Kenny Simpson wrote:
> --- Phillip Susi <psusi@cfl.rr.com> wrote:
> 
>>I actually hacked up dd to use async IO ( via io_submit ) in conjunction 
>>with O_DIRECT and it did noticeably improve ( ~10% ish ) both throughput 
>>and cpu utilization.  I have an OO.o spreadsheet showing the results of 
>>some simple benchmarking with various parameters I did at home, which I 
>>will post later this evening. 
>>
>>Of course, dd is a simplistic case of sequential IO.  If you have 
>>something like a big database that needs to concurrently handle dozens 
>>or hundreds of random IO requests at once, O_DIRECT async IO is 
>>definitely going to be a clear winner. 
> 
> 
> The part I am writing looks like a transaction log writer:
>   Lots of sequential small-ish writes (call each quanta a transaction)
>   Must be written to stable storage
>   Must know when the writes are completed
>   The data is only read back for recovery processing
> 
>   In the past, the way I found to have worked best is to have a dedicated thread pulling
> transactions off a queue and doing the blocking syncronous writes either by write(v)/fsync or
> write(v) on a file opened with O_SYNC | O_DIRECT.  Once the fsync returned, the thread would
> signal completion and grab the next batch to start writing.
>   This works very well and can easily max out any real device's bandwidth, but incurs more latency
> than should be absolutely needed due to the extra context switching from the completion
> signalling.
> 
>   I am hoping AIO can be used to reduce the latency, but was a bit discouraged after reading the
> IBM paper.
> 
>   I am looking forward to your post reguarding dd.
> 
> thanks,
> -Kenny
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Is user-space AIO dead?
  2006-01-12  3:50             ` Phillip Susi
@ 2006-01-12  4:14               ` Phillip Susi
  0 siblings, 0 replies; 12+ messages in thread
From: Phillip Susi @ 2006-01-12  4:14 UTC (permalink / raw)
  To: linux kernel; +Cc: Kenny Simpson, David Lloyd

[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]

Heh, would help if I actually attached the file ;)


Phillip Susi wrote:
> Attached are the results of some simple testing I did in ods format. 
> These tests involved having dd read the first GB of data from my two 
> drive sata (fake)raid0 array with varying numbers of concurrent aio 
> operations ( except for the original, non aio dd of course ).
> 
> I performed these tests with cpufreq disabled and filesystems mounted 
> with noatime to insure no disturbances.  I also set the IO scheduler to 
> noop, otherwise the default scheduler reordered the IO requests which 
> was not good for sequential throughput.  I used commands like this:
> 
> sync
> dd bs=512 count=1 iflag=direct if=/dev/sda of=/dev/null
> dd bs=512 count=1 iflag=direct if=/dev/sdb of=/dev/null
> time dd bs=128KiB count=32768 iflag=direct if=/dev/mapper/via_hfciifae 
> of=/dev/null
> 
> The first two commands were to make sure the drive head was on track 
> zero, otherwise the TCQ on the drives kicked in and reordered some of 
> the earlier reads as the head seeked to track zero.
> 
> The results show a rather large increase in throughput for block sizes 
> under 128 KB, with a smaller improvement on larger block size. Likewise, 
> the cpu time used was significantly lower, especially with block sizes 
> less than 128 KB.  In most cases, the original dd uses 2-3 times more 
> cpu time than the aio dd.
> 
> The original dd reached near peak throughput ( 93.4 MB/s ) at a block 
> size of 128 KB.  I believe this is due in part to that being the stripe 
> width of the array, so smaller block sizes did not keep both drives 
> operating full time.  In contrast all of the aio trials reached peak 
> throughput of 97.x MB/s with a block size of only 32 KB, and at the 
> smallest block size of 16 KB, the aio(16) trial managed more than 20% 
> higher throughput than the non aio dd ( 72.1 vs  59.7 MB/s ), and did so 
> using 1/7th the cpu time.
> 
> To show the difference O_DIRECT makes, at 128 KB block size the original 
> dd with O_DIRECT managed 93.4 MB/s using 0.906 seconds of CPU time. 
> Without O_DIRECT, the original dd only sustains 82.6 MB/s and uses a 
> whopping 2.912 seconds of cpu time, or more than triple the time without 
> O_DIRECT, and 13x more cpu time than the aio(4) test at that block size!
> 


[-- Attachment #2: dd aio results.ods --]
[-- Type: application/vnd.oasis.opendocument.spreadsheet, Size: 21302 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-01-12  4:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-11 18:12 Is user-space AIO dead? Kenny Simpson
2006-01-11 18:20 ` Marcin Dalecki
2006-01-11 18:23 ` David Lloyd
2006-01-11 18:45   ` Kenny Simpson
2006-01-11 19:10     ` David Lloyd
2006-01-11 19:20       ` Kenny Simpson
2006-01-11 20:31         ` Phillip Susi
2006-01-11 22:02           ` Kenny Simpson
2006-01-12  3:50             ` Phillip Susi
2006-01-12  4:14               ` Phillip Susi
2006-01-11 18:41 ` Benjamin LaHaise
2006-01-11 18:54   ` Kenny Simpson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).