All of lore.kernel.org
 help / color / mirror / Atom feed
* On I/O engines
@ 2011-08-03 20:13 Martin Steigerwald
  2011-08-04  6:45 ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Steigerwald @ 2011-08-03 20:13 UTC (permalink / raw)
  To: fio

Hi!

In order to understand I/O engines better, I like to summarize what I 
think to know at the moment. Maybe this can be a starting point for some 
additional documentation:

=== sync, psync, vsync ===

- all these are using synchronous Linux (POSIX) system calls
- is used by regular applications
- synchronous just refers to the system call interface: i.e. the when the 
system call returns to the application
- as far as I understand it returns when the I/O request is told to be 
completed
- it does not imply synchronous I/O aka O_SYNC which is way slower and 
enabled by sync=1
- thus it does not guarantee that the I/O has been physically written to 
the underlying device (see open(2))
- thus is only guarantees that the I/O request has been dealt with? what 
does this exactly mean?
- does it mean that this is I/O in the context of the process?
- it can be used with direct=1 to circumvent the pagecache


difference is the kind of system call used:
- sync uses read/write which read/write count bytes into from/to a buffer. 
Uses current file offset, changeable via fseek (or lseek, I did not find a 
manpage for fseek)
- psync uses pread/pwrite which read/write count bytes from given offset
- vsync uses readv/writev which read/writes count, i.e. mutiple buffers of 
given length in one call (struct iovec)

I am not sure on what performance difference to expect. I bet that 
sync/psync should perform roughly the same.


=== libaio ===

- this uses Linux asynchronous I/O calls[1]
- it uses libaio for that
- who else uses libaio? It systems application that are near to the 
system:

martin@merkaba:~> apt-cache rdepends libaio1
libaio1
Reverse Depends:
  fio
  qemu-kvm
  libdbd-oracle-perl
  zfs-fuse
  stressapptest
  qemu-kvm
  qemu-utils
  qemu-system
  multipath-tools
  ltp-kernel-test
  libaio1-dbg
  libaio-dev
  fio
  drizzle
  blktrace

- these calls allow applications to offload I/O calls to the background
- according to [1] this is only supported for direct I/O
- using anything else let it fall back to synchronous call behavior
- thus one sees this in combination with direct=1 in fio jobs
- does this mean that this is I/O outside the context of the process?


Question:
- what difference is between the following two other than the second one 
seems to be more popular in example job files?
1) ioengine=sync + direct=1
2) ioengine=libaio + direct=1

Current answer: It is that fio can issue further I/Os while the Linux 
kernels handles the I/O.



=== other I/O engines relevant to Linux ===
There seem to be some other I/O engines relevant to Linux and mass storage 
I/O:

== mmap ==
- maps the memory into files and uses memcpy
- used by quite some applications
- what else to note?

== syslet-rw ==
- make regular read/write asynchronous
- where is this used?
- what else to note?

Any others?


Is what I wrote correct so far?


I think I�d like to write something up about the different I/O concepts in 
Linux, if such a document doesn�t exist yet.


[1] http://lse.sourceforge.net/io/aio.html

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: On I/O engines
  2011-08-03 20:13 On I/O engines Martin Steigerwald
@ 2011-08-04  6:45 ` Jens Axboe
  2011-08-04  6:58   ` DongJin Lee
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2011-08-04  6:45 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: fio

On 2011-08-03 22:13, Martin Steigerwald wrote:
> Hi!
> 
> In order to understand I/O engines better, I like to summarize what I 
> think to know at the moment. Maybe this can be a starting point for some 
> additional documentation:
> 
> === sync, psync, vsync ===
> 
> - all these are using synchronous Linux (POSIX) system calls
> - is used by regular applications
> - synchronous just refers to the system call interface: i.e. the when the 
> system call returns to the application
> - as far as I understand it returns when the I/O request is told to be 
> completed
> - it does not imply synchronous I/O aka O_SYNC which is way slower and 
> enabled by sync=1
> - thus it does not guarantee that the I/O has been physically written to 
> the underlying device (see open(2))

All of above are correct.

> - thus is only guarantees that the I/O request has been dealt with? what 
> does this exactly mean?

For reads, the IO has been done by the device. For writes, it could just
be sitting in the page cache for later writeback.

> - does it mean that this is I/O in the context of the process?

Not sure what you mean here. For reads, the IO always happens in the
context of the process. For buffered writes, it usually does not. The
process merely dirties the page, kernel threads will most often do the
actual writeback of the data.

> - it can be used with direct=1 to circumvent the pagecache

Right, and additionally direct=1 will make the writes sync as well. So
instead of just returning when it's in page cache, when a sync write
with direct=1 returns, the data has been received and acknowledged by
the backing device. That does not mean it's stable, it could just be
sitting in the drive write back cache.

> difference is the kind of system call used:
> - sync uses read/write which read/write count bytes into from/to a buffer. 
> Uses current file offset, changeable via fseek (or lseek, I did not find a 
> manpage for fseek)

Fio uses file descriptors, not handles. So lseek() will be used to
position the file before each IO, unless the offset of the new IO is
identical to the current offset.

> - psync uses pread/pwrite which read/write count bytes from given offset
> - vsync uses readv/writev which read/writes count, i.e. mutiple buffers of 
> given length in one call (struct iovec)
> 
> I am not sure on what performance difference to expect. I bet that 
> sync/psync should perform roughly the same.

For random IO, you save a lseek() syscall for each IO. Depending on your
IO rates, this may or may not be significant. It usually isn't. But if
you are doing hundreds of thousand IOPS, then it could make a
difference.

> === libaio ===
> 
> - this uses Linux asynchronous I/O calls[1]
> - it uses libaio for that
> - who else uses libaio? It systems application that are near to the 
> system:
> 
> martin@merkaba:~> apt-cache rdepends libaio1
> libaio1
> Reverse Depends:
>   fio
>   qemu-kvm
>   libdbd-oracle-perl
>   zfs-fuse
>   stressapptest
>   qemu-kvm
>   qemu-utils
>   qemu-system
>   multipath-tools
>   ltp-kernel-test
>   libaio1-dbg
>   libaio-dev
>   fio
>   drizzle
>   blktrace
> 
> - these calls allow applications to offload I/O calls to the background
> - according to [1] this is only supported for direct I/O
> - using anything else let it fall back to synchronous call behavior
> - thus one sees this in combination with direct=1 in fio jobs
> - does this mean that this is I/O outside the context of the process?

aio assumes the identity of the process. aio is usually mostly used by
databases.

> Question:
> - what difference is between the following two other than the second one 
> seems to be more popular in example job files?
> 1) ioengine=sync + direct=1
> 2) ioengine=libaio + direct=1
> 
> Current answer: It is that fio can issue further I/Os while the Linux 
> kernels handles the I/O.

Yes

> === other I/O engines relevant to Linux ===
> There seem to be some other I/O engines relevant to Linux and mass storage 
> I/O:
> 
> == mmap ==
> - maps the memory into files and uses memcpy
> - used by quite some applications
> - what else to note?

mmap'ed IO is quite widely used.

> == syslet-rw ==
> - make regular read/write asynchronous
> - where is this used?
> - what else to note?

syslet-rw is an engine that was written to benchmark/test the syslet
async system call interface. It was never merged, so it has mostly
historic relevance now.

> Any others?

You should mention posixaio and net as well, might be interesting. And
splice is unique to Linux, would be good to cover.

> Is what I wrote correct so far?

Yep, good so far!

> I think I�d like to write something up about the different I/O concepts in 
> Linux, if such a document doesn�t exist yet.

Might not be a bad idea :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: On I/O engines
  2011-08-04  6:45 ` Jens Axboe
@ 2011-08-04  6:58   ` DongJin Lee
  2011-08-04  7:03     ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: DongJin Lee @ 2011-08-04  6:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin Steigerwald, fio

just adding 2c question:

> On Thu, Aug 4, 2011 at 18:45, Jens Axboe <jaxboe@fusionio.com> wrote:
> That does not mean it's stable, it could just be
> sitting in the drive write back cache.

right, example with a simple hdd 2tb with some 64mb cache, so indeed,
there's no real way to confirm that the data has been physically
written to the mechanical platter;
but as I understand when shutting down the system, all are physically
written to the platter; so I wonder what command the os issues to the
disk then? maybe just unmount do so?

Regards

On Thu, Aug 4, 2011 at 18:45, Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2011-08-03 22:13, Martin Steigerwald wrote:
>> Hi!
>>
>> In order to understand I/O engines better, I like to summarize what I
>> think to know at the moment. Maybe this can be a starting point for some
>> additional documentation:
>>
>> === sync, psync, vsync ===
>>
>> - all these are using synchronous Linux (POSIX) system calls
>> - is used by regular applications
>> - synchronous just refers to the system call interface: i.e. the when the
>> system call returns to the application
>> - as far as I understand it returns when the I/O request is told to be
>> completed
>> - it does not imply synchronous I/O aka O_SYNC which is way slower and
>> enabled by sync=1
>> - thus it does not guarantee that the I/O has been physically written to
>> the underlying device (see open(2))
>
> All of above are correct.
>
>> - thus is only guarantees that the I/O request has been dealt with? what
>> does this exactly mean?
>
> For reads, the IO has been done by the device. For writes, it could just
> be sitting in the page cache for later writeback.
>
>> - does it mean that this is I/O in the context of the process?
>
> Not sure what you mean here. For reads, the IO always happens in the
> context of the process. For buffered writes, it usually does not. The
> process merely dirties the page, kernel threads will most often do the
> actual writeback of the data.
>
>> - it can be used with direct=1 to circumvent the pagecache
>
> Right, and additionally direct=1 will make the writes sync as well. So
> instead of just returning when it's in page cache, when a sync write
> with direct=1 returns, the data has been received and acknowledged by
> the backing device. That does not mean it's stable, it could just be
> sitting in the drive write back cache.
>
>> difference is the kind of system call used:
>> - sync uses read/write which read/write count bytes into from/to a buffer.
>> Uses current file offset, changeable via fseek (or lseek, I did not find a
>> manpage for fseek)
>
> Fio uses file descriptors, not handles. So lseek() will be used to
> position the file before each IO, unless the offset of the new IO is
> identical to the current offset.
>
>> - psync uses pread/pwrite which read/write count bytes from given offset
>> - vsync uses readv/writev which read/writes count, i.e. mutiple buffers of
>> given length in one call (struct iovec)
>>
>> I am not sure on what performance difference to expect. I bet that
>> sync/psync should perform roughly the same.
>
> For random IO, you save a lseek() syscall for each IO. Depending on your
> IO rates, this may or may not be significant. It usually isn't. But if
> you are doing hundreds of thousand IOPS, then it could make a
> difference.
>
>> === libaio ===
>>
>> - this uses Linux asynchronous I/O calls[1]
>> - it uses libaio for that
>> - who else uses libaio? It systems application that are near to the
>> system:
>>
>> martin@merkaba:~> apt-cache rdepends libaio1
>> libaio1
>> Reverse Depends:
>>   fio
>>   qemu-kvm
>>   libdbd-oracle-perl
>>   zfs-fuse
>>   stressapptest
>>   qemu-kvm
>>   qemu-utils
>>   qemu-system
>>   multipath-tools
>>   ltp-kernel-test
>>   libaio1-dbg
>>   libaio-dev
>>   fio
>>   drizzle
>>   blktrace
>>
>> - these calls allow applications to offload I/O calls to the background
>> - according to [1] this is only supported for direct I/O
>> - using anything else let it fall back to synchronous call behavior
>> - thus one sees this in combination with direct=1 in fio jobs
>> - does this mean that this is I/O outside the context of the process?
>
> aio assumes the identity of the process. aio is usually mostly used by
> databases.
>
>> Question:
>> - what difference is between the following two other than the second one
>> seems to be more popular in example job files?
>> 1) ioengine=sync + direct=1
>> 2) ioengine=libaio + direct=1
>>
>> Current answer: It is that fio can issue further I/Os while the Linux
>> kernels handles the I/O.
>
> Yes
>
>> === other I/O engines relevant to Linux ===
>> There seem to be some other I/O engines relevant to Linux and mass storage
>> I/O:
>>
>> == mmap ==
>> - maps the memory into files and uses memcpy
>> - used by quite some applications
>> - what else to note?
>
> mmap'ed IO is quite widely used.
>
>> == syslet-rw ==
>> - make regular read/write asynchronous
>> - where is this used?
>> - what else to note?
>
> syslet-rw is an engine that was written to benchmark/test the syslet
> async system call interface. It was never merged, so it has mostly
> historic relevance now.
>
>> Any others?
>
> You should mention posixaio and net as well, might be interesting. And
> splice is unique to Linux, would be good to cover.
>
>> Is what I wrote correct so far?
>
> Yep, good so far!
>
>> I think I´d like to write something up about the different I/O concepts in
>> Linux, if such a document doesn´t exist yet.
>
> Might not be a bad idea :-)
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: On I/O engines
  2011-08-04  6:58   ` DongJin Lee
@ 2011-08-04  7:03     ` Jens Axboe
  2011-08-04  7:12       ` DongJin Lee
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2011-08-04  7:03 UTC (permalink / raw)
  To: DongJin Lee; +Cc: Martin Steigerwald, fio

On 2011-08-04 08:58, DongJin Lee wrote:
> just adding 2c question:
> 
>> On Thu, Aug 4, 2011 at 18:45, Jens Axboe <jaxboe@fusionio.com> wrote:
>> That does not mean it's stable, it could just be
>> sitting in the drive write back cache.
> 
> right, example with a simple hdd 2tb with some 64mb cache, so indeed,
> there's no real way to confirm that the data has been physically
> written to the mechanical platter;

Usually any type of device that has a write back caching scheme also
supports a command to sync/flush that cache. This is what the kernel
does when you use one of the fsync() variants.

> but as I understand when shutting down the system, all are physically
> written to the platter; so I wonder what command the os issues to the
> disk then? maybe just unmount do so?

The devices are stopped at some point, that should ensure that the
caches are stable.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: On I/O engines
  2011-08-04  7:03     ` Jens Axboe
@ 2011-08-04  7:12       ` DongJin Lee
  0 siblings, 0 replies; 5+ messages in thread
From: DongJin Lee @ 2011-08-04  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin Steigerwald, fio

> The devices are stopped at some point, that should ensure that the
> caches are stable.

so assuming that I have no data caches (to be written to the disk)
hanging around the system; I should be able to just unplug the
sata/power cable? :-} (i.e., equivalent to nice shut-down?)
I always thought that there was some special os command to shuts it
nicely, i.e., all confirmed within disk internal that its 32/64mb is
empty before goes off...
I know for raid-wb with bbu, no problems whatsoever..

Regards

On Thu, Aug 4, 2011 at 19:03, Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2011-08-04 08:58, DongJin Lee wrote:
>> just adding 2c question:
>>
>>> On Thu, Aug 4, 2011 at 18:45, Jens Axboe <jaxboe@fusionio.com> wrote:
>>> That does not mean it's stable, it could just be
>>> sitting in the drive write back cache.
>>
>> right, example with a simple hdd 2tb with some 64mb cache, so indeed,
>> there's no real way to confirm that the data has been physically
>> written to the mechanical platter;
>
> Usually any type of device that has a write back caching scheme also
> supports a command to sync/flush that cache. This is what the kernel
> does when you use one of the fsync() variants.
>
>> but as I understand when shutting down the system, all are physically
>> written to the platter; so I wonder what command the os issues to the
>> disk then? maybe just unmount do so?
>
> The devices are stopped at some point, that should ensure that the
> caches are stable.
>
>
> --
> Jens Axboe
>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-08-04  7:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-03 20:13 On I/O engines Martin Steigerwald
2011-08-04  6:45 ` Jens Axboe
2011-08-04  6:58   ` DongJin Lee
2011-08-04  7:03     ` Jens Axboe
2011-08-04  7:12       ` DongJin Lee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.