All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux I/O stack design question
@ 2011-09-01 12:33 Werner Fischer
  2011-09-01 14:19 ` Jeff Moyer
  0 siblings, 1 reply; 11+ messages in thread
From: Werner Fischer @ 2011-09-01 12:33 UTC (permalink / raw)
  To: fio

Dear fio users and developers,

I have a question regarding the Linux I/O stack (not directly regarding
fio, but I think fio user's and developers have a lot of knowledge
here):

I'm trying to better understand the different layers of the Linux I/O
stack and how they play together. So I have started a diagram showing
the different layers of the I/O stack:
http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png

Is this diagram correct or are there any errors in there?

Any feedback is welcome.

Thanks in advance,
Werner

-- 
: Werner Fischer
: Technology Specialist
: Thomas-Krenn.AG | The server-experts
: http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-01 12:33 Linux I/O stack design question Werner Fischer
@ 2011-09-01 14:19 ` Jeff Moyer
  2011-09-01 20:14   ` Werner Fischer
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2011-09-01 14:19 UTC (permalink / raw)
  To: Werner Fischer; +Cc: fio

Werner Fischer <devlists@wefi.net> writes:

> Dear fio users and developers,
>
> I have a question regarding the Linux I/O stack (not directly regarding
> fio, but I think fio user's and developers have a lot of knowledge
> here):
>
> I'm trying to better understand the different layers of the Linux I/O
> stack and how they play together. So I have started a diagram showing
> the different layers of the I/O stack:
> http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png
>
> Is this diagram correct or are there any errors in there?

That's a nice diagram.  :)  A few of things of note:
1) O_DIRECT I/O can bypass the page cache
2) request-based dm targets sit below the I/O scheduler (currently, that
   just means dm-multipath)
3) the fusion IO device driver can hook itself in where you put it, or
   also up above the I/O scheduler (based on a module load option).
4) not sure if you want to cover I/O directly to the device (no file
   system involved)

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-01 14:19 ` Jeff Moyer
@ 2011-09-01 20:14   ` Werner Fischer
  2011-09-01 20:17     ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Werner Fischer @ 2011-09-01 20:14 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: fio

Hi Jeff,

On Don, 2011-09-01 at 10:19 -0400, Jeff Moyer wrote:
> Werner Fischer <devlists@wefi.net> writes:
> > http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png
> > Is this diagram correct or are there any errors in there?
> 
> That's a nice diagram.  :)
Thanks! ;-)

> A few of things of note:
> 1) O_DIRECT I/O can bypass the page cache
you are right, I'll add it.

> 2) request-based dm targets sit below the I/O scheduler (currently, that
>    just means dm-multipath)
Thanks, I'll correct this.

> 3) the fusion IO device driver can hook itself in where you put it, or
>    also up above the I/O scheduler (based on a module load option).
Oh wow, that sounds interesting. Does this mean that in this case (IO
device driver above the I/O scheduler) simply no I/O scheduler is used?

> 4) not sure if you want to cover I/O directly to the device (no file
>    system involved)
Yea, that sound reasonable. I'll add it.

So thanks a lot for your valuable feedback.
I will update the diagram according to your hints and let you know then.

Once the diagram is in a state that seems reasonable, I'll add it to
Wikipedia commons under some open license and include it in articles
where it makes sense (e.g.
http://en.wikipedia.org/wiki/Deadline_scheduler and so on)

Regards,
Werner



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-01 20:14   ` Werner Fischer
@ 2011-09-01 20:17     ` Jens Axboe
  2011-09-01 20:27       ` Werner Fischer
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2011-09-01 20:17 UTC (permalink / raw)
  To: Werner Fischer; +Cc: Jeff Moyer, fio

On 2011-09-01 14:14, Werner Fischer wrote:
>> 3) the fusion IO device driver can hook itself in where you put it, or
>>    also up above the I/O scheduler (based on a module load option).
> Oh wow, that sounds interesting. Does this mean that in this case (IO
> device driver above the I/O scheduler) simply no I/O scheduler is used?

It'll hook in similarly to where stacked devices like md/dm do. So yes,
it's bypassing the IO scheduler. One note on that - this mode is going
away in the future. You end up losing out on request merging, so write
performance is hampered, for one.

The Micron pci-e mtip32xx driver does similarly, as does the
nvmhci-express driver from Intel. IMHO it's largely due to
inefficiencies in the IO stack, once we get those fixed, we should be
getting back to the one true single IO mode for a driver. I consider the
bypass setup a bit of a hack and work-around.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-01 20:17     ` Jens Axboe
@ 2011-09-01 20:27       ` Werner Fischer
  2011-09-01 21:13         ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Werner Fischer @ 2011-09-01 20:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jeff Moyer, fio

On Don, 2011-09-01 at 14:17 -0600, Jens Axboe wrote:
> On 2011-09-01 14:14, Werner Fischer wrote:
> >> 3) the fusion IO device driver can hook itself in where you put it, or
> >>    also up above the I/O scheduler (based on a module load option).
> > Oh wow, that sounds interesting. Does this mean that in this case (IO
> > device driver above the I/O scheduler) simply no I/O scheduler is used?
> 
> It'll hook in similarly to where stacked devices like md/dm do. So yes,
> it's bypassing the IO scheduler.
ok, i see.

> One note on that - this mode is going
> away in the future. 
Do you mean that this mode is going away in the future for the Fusion-io
driver or that generally no driver will be able to hook in there (also
the mtip32xx and nvmhci-express drivers you mention below)?

> You end up losing out on request merging, so write
> performance is hampered, for one.
> 
> The Micron pci-e mtip32xx driver does similarly, as does the
> nvmhci-express driver from Intel. IMHO it's largely due to
> inefficiencies in the IO stack, once we get those fixed, we should be
> getting back to the one true single IO mode for a driver.
With "one true single IO mode for a driver" you mean that every device
driver should sit and stay below the I/O scheduler?
Or do you mean that there should only be one single I/O scheduler? (in
the patch removing the anticipatory scheduler you suggested something
like this:
http://git.kernel.org/linus/492af6350a5ccf087e4964104a276ed358811458 )

> I consider the
> bypass setup a bit of a hack and work-around.

Regards,
Werner



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-01 20:27       ` Werner Fischer
@ 2011-09-01 21:13         ` Jens Axboe
  2011-09-05 14:01           ` Werner Fischer
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2011-09-01 21:13 UTC (permalink / raw)
  To: Werner Fischer; +Cc: Jeff Moyer, fio

On 2011-09-01 14:27, Werner Fischer wrote:
>> One note on that - this mode is going
>> away in the future. 
> Do you mean that this mode is going away in the future for the Fusion-io
> driver or that generally no driver will be able to hook in there (also
> the mtip32xx and nvmhci-express drivers you mention below)?

For this particular case, I meant the fusion-io driver. Even for the
current 2.x series of the driver, bypassing the IO scheduler is not the
default. You have to manually specify that with a module option.

>> You end up losing out on request merging, so write
>> performance is hampered, for one.
>>
>> The Micron pci-e mtip32xx driver does similarly, as does the
>> nvmhci-express driver from Intel. IMHO it's largely due to
>> inefficiencies in the IO stack, once we get those fixed, we should be
>> getting back to the one true single IO mode for a driver.
> With "one true single IO mode for a driver" you mean that every device
> driver should sit and stay below the I/O scheduler?
> Or do you mean that there should only be one single I/O scheduler? (in
> the patch removing the anticipatory scheduler you suggested something
> like this:
> http://git.kernel.org/linus/492af6350a5ccf087e4964104a276ed358811458 )

I mean that every device should plug in at the same place. There are
definite up and down sides to plugging in at the stacking level and
bypassing the IO scheduler. So you have to weight the pros and cons
before doing that. We need to fix this. Drivers doing that lose out on
other features in the name of a bit more performance, that's just not
acceptable.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-01 21:13         ` Jens Axboe
@ 2011-09-05 14:01           ` Werner Fischer
  2011-09-08 12:39             ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Werner Fischer @ 2011-09-05 14:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jeff Moyer, fio

On Don, 2011-09-01 at 15:13 -0600, Jens Axboe wrote:
> [...]
> I mean that every device should plug in at the same place. There are
> definite up and down sides to plugging in at the stacking level and
> bypassing the IO scheduler. So you have to weight the pros and cons
> before doing that. We need to fix this. Drivers doing that lose out on
> other features in the name of a bit more performance, that's just not
> acceptable.

I have updated the diagram according to all of your hints:
http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png

I had also some off-list discussion with Florian Haas, who convinced me
that the file systems are below of the page cache. I hope this is now
correct.

I took some information regarding the VFS layer from
www.mimuw.edu.pl/~vincent/lecture12/12-fs.pdf 

I'm looking forward to your feedback and corrections.

Best regards,
Werner

-- 
: Werner Fischer
: Technology Specialist
: Thomas-Krenn.AG | The server-experts
: http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-05 14:01           ` Werner Fischer
@ 2011-09-08 12:39             ` Jens Axboe
  2011-09-12  6:36               ` Werner Fischer
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2011-09-08 12:39 UTC (permalink / raw)
  To: Werner Fischer; +Cc: Jeff Moyer, fio

On 2011-09-05 16:01, Werner Fischer wrote:
> On Don, 2011-09-01 at 15:13 -0600, Jens Axboe wrote:
>> [...]
>> I mean that every device should plug in at the same place. There are
>> definite up and down sides to plugging in at the stacking level and
>> bypassing the IO scheduler. So you have to weight the pros and cons
>> before doing that. We need to fix this. Drivers doing that lose out on
>> other features in the name of a bit more performance, that's just not
>> acceptable.
> 
> I have updated the diagram according to all of your hints:
> http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png
> 
> I had also some off-list discussion with Florian Haas, who convinced me
> that the file systems are below of the page cache. I hope this is now
> correct.

Not sure I'd agree with that, I'd place the page cache between the fs
and the storage layer.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-08 12:39             ` Jens Axboe
@ 2011-09-12  6:36               ` Werner Fischer
  2011-09-27 14:06                 ` Martin Steigerwald
  0 siblings, 1 reply; 11+ messages in thread
From: Werner Fischer @ 2011-09-12  6:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jeff Moyer, fio, Christoph Hellwig, Florian Haas

On Don, 2011-09-08 at 14:39 +0200, Jens Axboe wrote:
> On 2011-09-05 16:01, Werner Fischer wrote:
> > On Don, 2011-09-01 at 15:13 -0600, Jens Axboe wrote:
> >> [...]
> >> I mean that every device should plug in at the same place. There are
> >> definite up and down sides to plugging in at the stacking level and
> >> bypassing the IO scheduler. So you have to weight the pros and cons
> >> before doing that. We need to fix this. Drivers doing that lose out on
> >> other features in the name of a bit more performance, that's just not
> >> acceptable.
> > 
> > I have updated the diagram according to all of your hints:
> > http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png
> > 
> > I had also some off-list discussion with Florian Haas, who convinced me
> > that the file systems are below of the page cache. I hope this is now
> > correct.
> 
> Not sure I'd agree with that, I'd place the page cache between the fs
> and the storage layer.
After some further off-list feedback from Christoph Hellwig I did some
updates on the block diagram, including moving the page cache from above
the fs layer to next to the fs layer (Christoph told me that the page
cache is a helper function for the file systems).
I also added SCSI mid layer, SCSI low layer, libata and so on:
http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png

I'm looking forward to further feedback.

Thanks,
Werner




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Linux I/O stack design question
  2011-09-12  6:36               ` Werner Fischer
@ 2011-09-27 14:06                 ` Martin Steigerwald
  2012-03-06 10:46                   ` Announce: Linux I/O stack diagram (was: Re: Linux I/O stack design question) Werner Fischer
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Steigerwald @ 2011-09-27 14:06 UTC (permalink / raw)
  To: Werner Fischer
  Cc: Jens Axboe, Jeff Moyer, fio, Christoph Hellwig, Florian Haas

Hi Werner!

Am Montag, 12. September 2011 schrieb Werner Fischer:
> On Don, 2011-09-08 at 14:39 +0200, Jens Axboe wrote:
> > On 2011-09-05 16:01, Werner Fischer wrote:
> > > On Don, 2011-09-01 at 15:13 -0600, Jens Axboe wrote:
> > >> [...]
> > >> I mean that every device should plug in at the same place. There are
> > >> definite up and down sides to plugging in at the stacking level and
> > >> bypassing the IO scheduler. So you have to weight the pros and cons
> > >> before doing that. We need to fix this. Drivers doing that lose out on
> > >> other features in the name of a bit more performance, that's just not
> > >> acceptable.
> > > 
> > > I have updated the diagram according to all of your hints:
> > > http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png
> > > 
> > > I had also some off-list discussion with Florian Haas, who convinced me
> > > that the file systems are below of the page cache. I hope this is now
> > > correct.
> > 
> > Not sure I'd agree with that, I'd place the page cache between the fs
> > and the storage layer.
> 
> After some further off-list feedback from Christoph Hellwig I did some
> updates on the block diagram, including moving the page cache from above
> the fs layer to next to the fs layer (Christoph told me that the page
> cache is a helper function for the file systems).
> I also added SCSI mid layer, SCSI low layer, libata and so on:
> http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png
> 
> I'm looking forward to further feedback.

Not that I would like to use it, but where would dmraid sit? I think it would 
be a bit nearer to the SCSI low layer...

I am not quite used to the placement of the page cache, but if its merely a 
helper function for filesystems... Regarding performance measurements it makes 
a lot of difference. I think the current diagram hides the impact of using 
pagecache or not using it a bit. So maybe at least coloring it differently 
would give it a bit more visual weight.

Aside from that I find this very detailed and I learned quite a bit from just 
looking at it.

Can you make that image available as SVG too?

Thanks,
-- 
Martin Steigerwald - teamix GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Announce: Linux I/O stack diagram (was: Re: Linux I/O stack design question)
  2011-09-27 14:06                 ` Martin Steigerwald
@ 2012-03-06 10:46                   ` Werner Fischer
  0 siblings, 0 replies; 11+ messages in thread
From: Werner Fischer @ 2012-03-06 10:46 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Jens Axboe, Jeff Moyer, fio, Christoph Hellwig, Florian Haas

Hi all,

some months ago I asked several questions regarding the design of the
Linux I/O stack. Thank you all for your feedback you gave.
* We have now drawn the Linux I/O Stack diagram with Inkscape, to have 
  it in svg format and nicer pdf/png's.
* We have licensed it under CC-BY-SA 3.0, so everybody can use and
  alter it when necessary
* I have also included some feedback I got from James Bottomley and 
  Hannes Reinecke at LinuxCon Europe in Prague
* Here is the link to the first published version (version 0.1):
  http://www.thomas-krenn.com/en/oss/linux-io-stack-diagram.html

Can you take a look at it and send us feedback in case there are some
errors left in the diagram?

Best regards,
Werner

On Die, 2011-09-27 at 16:06 +0200, Martin Steigerwald wrote:
> Hi Werner!
> 
> Am Montag, 12. September 2011 schrieb Werner Fischer:
> > On Don, 2011-09-08 at 14:39 +0200, Jens Axboe wrote:
> > > On 2011-09-05 16:01, Werner Fischer wrote:
> > > > On Don, 2011-09-01 at 15:13 -0600, Jens Axboe wrote:
> > > >> [...]
> > > >> I mean that every device should plug in at the same place. There are
> > > >> definite up and down sides to plugging in at the stacking level and
> > > >> bypassing the IO scheduler. So you have to weight the pros and cons
> > > >> before doing that. We need to fix this. Drivers doing that lose out on
> > > >> other features in the name of a bit more performance, that's just not
> > > >> acceptable.
> > > > 
> > > > I have updated the diagram according to all of your hints:
> > > > http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png
> > > > 
> > > > I had also some off-list discussion with Florian Haas, who convinced me
> > > > that the file systems are below of the page cache. I hope this is now
> > > > correct.
> > > 
> > > Not sure I'd agree with that, I'd place the page cache between the fs
> > > and the storage layer.
> > 
> > After some further off-list feedback from Christoph Hellwig I did some
> > updates on the block diagram, including moving the page cache from above
> > the fs layer to next to the fs layer (Christoph told me that the page
> > cache is a helper function for the file systems).
> > I also added SCSI mid layer, SCSI low layer, libata and so on:
> > http://www.thomas-krenn.com/de/wikiDE/images/0/07/Linux-IO-Stack.png
> > 
> > I'm looking forward to further feedback.
> 
> Not that I would like to use it, but where would dmraid sit? I think it would 
> be a bit nearer to the SCSI low layer...
> 
> I am not quite used to the placement of the page cache, but if its merely a 
> helper function for filesystems... Regarding performance measurements it makes 
> a lot of difference. I think the current diagram hides the impact of using 
> pagecache or not using it a bit. So maybe at least coloring it differently 
> would give it a bit more visual weight.
> 
> Aside from that I find this very detailed and I learned quite a bit from just 
> looking at it.
> 
> Can you make that image available as SVG too?
> 
> Thanks,

-- 
: Werner Fischer
: Technology Specialist
: Thomas-Krenn.AG | The server-experts
: http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-03-06 10:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01 12:33 Linux I/O stack design question Werner Fischer
2011-09-01 14:19 ` Jeff Moyer
2011-09-01 20:14   ` Werner Fischer
2011-09-01 20:17     ` Jens Axboe
2011-09-01 20:27       ` Werner Fischer
2011-09-01 21:13         ` Jens Axboe
2011-09-05 14:01           ` Werner Fischer
2011-09-08 12:39             ` Jens Axboe
2011-09-12  6:36               ` Werner Fischer
2011-09-27 14:06                 ` Martin Steigerwald
2012-03-06 10:46                   ` Announce: Linux I/O stack diagram (was: Re: Linux I/O stack design question) Werner Fischer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.