All of lore.kernel.org
 help / color / mirror / Atom feed
* buffer page concepts in the page cache
@ 2011-01-25 12:56 Miguel Telleria de Esteban
  2011-01-25 18:19 ` Mulyadi Santosa
  0 siblings, 1 reply; 4+ messages in thread
From: Miguel Telleria de Esteban @ 2011-01-25 12:56 UTC (permalink / raw)
  To: kernelnewbies


Dear all,

CONTEXT

I am spending the last weeks learning how the kernel executes disk I/O
writes and reads from the userland read() to the hard disk drive.  I
want to see (and understand) the WHOLE PICTURE regarding the VFS,
block I/O layer and page cache.

To do this, I am following as a base guide Bovet and Cesati UTLK 3rd
edition [1] (chapters 12-16 so far) and the new edition of Robert
Love's Linux Kernel Development[2] (chapters 13-16).  A lot of reading
so far that I still need to slowly digest.

For the moment, I have not dived yet into the details of "page frame
reclaiming", "swap memory" and "filesystem implementations" areas.  My
knowledge about memory allocation (slab allocator) is also limited.

[1]  Understanding The Linux Kernel 3rd Edition, O'Reilly
[2]  Linux Kernel Development 3rd Edition Addison Wesley


MY QUESTIONS

1.  What do we understand by "buffer pages"?

2.  Is the whole page cache content (i.e. the radix tree in the
    address_space of the different inodes) organized as buffer pages?

3.  What is the functional difference between "block device buffer
    pages" (stored in the address_space of the master bdev inode) and
    the "file buffer pages" stored in the address_space of a file
    inode? [ UTLK, page 614 ]


Maybe I am missing an important point of course...


MY INTERPRETATION (please correct me if I am wrong)

Q1  What is a "buffer page"?

A "buffer page" is a "struct page" data describing a page allocated to
hold one or more i/o blocks from disk.

As such, the "private" field points to a single circular  list of
"buffer_heads" each describing the mapping between the i/o blocks in
memory (b_data field) and the i/o blocks on disk (b_size,
b_blocknr...).

The buffer_head structures themselves are stored outside of the page
as shown in UTLK Fig 15.2.

---

Q2  Is the whole page cache content organized as buffer pages?

YES, there is no other way to link memory-mapped disk i/o data to the
struct page pointed by address_space radix-tree entries.

---

Q3  block device buffer_pages vs file buffer_pages

This I really don't understand.  From what UTLK page 614 says:

*  File buffer_pages ONLY refer to non-contiguous (on disk layout) file
   contents.

*  blockdev buffer_pages refer to single-block or continuous (on disk
   layout) portions of block.

My question is:  what happens with non-fragmented medium size files
that do not contain "disk holes" or non-adjancent block submissions?



Thanks in advance for your attention,

     Miguel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* buffer page concepts in the page cache
  2011-01-25 12:56 buffer page concepts in the page cache Miguel Telleria de Esteban
@ 2011-01-25 18:19 ` Mulyadi Santosa
  2011-01-25 19:10   ` Miguel Telleria de Esteban
  0 siblings, 1 reply; 4+ messages in thread
From: Mulyadi Santosa @ 2011-01-25 18:19 UTC (permalink / raw)
  To: kernelnewbies

Hi Miguel...

Tough questions, let's see if I can made it :D

On Tue, Jan 25, 2011 at 19:56, Miguel Telleria de Esteban
<miguel@mtelleria.com> wrote:
> MY INTERPRETATION (please correct me if I am wrong)
>
> Q1 ?What is a "buffer page"?
>
> A "buffer page" is a "struct page" data describing a page allocated to
> hold one or more i/o blocks from disk.

I agree...in other word, they are pages that hold data when the I/O
are still in flight. But since it's part of page cache, they aren't
thrown away after the I/O is done...for few moment they are held in
RAM, in case they're subsequently read...thus, I/O frequency toward
physical discs are reduced

I think, we know call it page cache....

> Q2 ?Is the whole page cache content organized as buffer pages?
>
> YES, there is no other way to link memory-mapped disk i/o data to the
> struct page pointed by address_space radix-tree entries.

Not so sure, but it's something like that IMHO.

> ---
>
> Q3 ?block device buffer_pages vs file buffer_pages
>
> This I really don't understand. ?From what UTLK page 614 says:
>
> * ?File buffer_pages ONLY refer to non-contiguous (on disk layout) file
> ? contents.
>
> * ?blockdev buffer_pages refer to single-block or continuous (on disk
> ? layout) portions of block.
>
> My question is: ?what happens with non-fragmented medium size files
> that do not contain "disk holes" or non-adjancent block submissions?

Here's my understanding:
1. when you're dealing with file in raw, e.g using "dd" on /dev/sda1
or "dd" with direct I/O command, you use block buffer cache
2. when you deal with files using read()/write facility of filesystem
(thus via VFS), you use file page cache...

to experiment with it, simply start "top" and examine which field
increases when you do "dd", cat, etc....

I hope I help you instead confusing you :D

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* buffer page concepts in the page cache
  2011-01-25 18:19 ` Mulyadi Santosa
@ 2011-01-25 19:10   ` Miguel Telleria de Esteban
  2011-01-26  4:19     ` Mulyadi Santosa
  0 siblings, 1 reply; 4+ messages in thread
From: Miguel Telleria de Esteban @ 2011-01-25 19:10 UTC (permalink / raw)
  To: kernelnewbies

Thanks Mulyadi,

On Wed, 26 Jan 2011 01:19:52 +0700 Mulyadi Santosa wrote:

> Hi Miguel...
> 
> Tough questions, let's see if I can made it :D
> 
> On Tue, Jan 25, 2011 at 19:56, Miguel Telleria de Esteban
> <miguel@mtelleria.com> wrote:
> > MY INTERPRETATION (please correct me if I am wrong)
> >
> > Q1 ?What is a "buffer page"?
> >
> > A "buffer page" is a "struct page" data describing a page allocated
> > to hold one or more i/o blocks from disk.
> 
> I agree...in other word, they are pages that hold data when the I/O
> are still in flight. But since it's part of page cache, they aren't
> thrown away after the I/O is done...for few moment they are held in
> RAM, in case they're subsequently read...thus, I/O frequency toward
> physical discs are reduced
> 
> I think, we know call it page cache....
> 
> > Q2 ?Is the whole page cache content organized as buffer pages?
> >
> > YES, there is no other way to link memory-mapped disk i/o data to
> > the struct page pointed by address_space radix-tree entries.
> 
> Not so sure, but it's something like that IMHO.
> 
> > ---
> >
> > Q3 ?block device buffer_pages vs file buffer_pages
> >
> > This I really don't understand. ?From what UTLK page 614 says:
> >
> > * ?File buffer_pages ONLY refer to non-contiguous (on disk layout)
> > file contents.
> >
> > * ?blockdev buffer_pages refer to single-block or continuous (on
> > disk layout) portions of block.
> >
> > My question is: ?what happens with non-fragmented medium size files
> > that do not contain "disk holes" or non-adjancent block submissions?
> 
> Here's my understanding:
> 1. when you're dealing with file in raw, e.g using "dd" on /dev/sda1
> or "dd" with direct I/O command, you use block buffer cache

> 2. when you deal with files using read()/write facility of filesystem
> (thus via VFS), you use file page cache...

This makes sense.  Looking through LXR at the do_generic_file_read()
function (actually do_generic_mapping_read() ), the address_space used
is the one of the file, not the dev.

Maybe dd goes also through this same path since you directly specify
the devfile to read from.

The other read path (bread() function) seems to be used when looking
for metadata (inode, superblocks) which are not requested by the
user-space read() call.


> 
> to experiment with it, simply start "top" and examine which field
> increases when you do "dd", cat, etc....

Uhhmm I don't have this clear.  I would like to check on which
adress_space object I am using (the block device or the file) so I
guess I need more deep tools (maybe ftrace??) to see it.

> 
> I hope I help you instead confusing you :D
> 

Thanks, you have helped.  On my side I continue (re)reading :).



-- 

      (O-O)
---oOO-(_)-OOo-----------------------------------------------------
 Miguel TELLERIA DE ESTEBAN               http://www.mtelleria.com
 Email: miguel at mtelleria.com           Tel GSM:  +34 650 801098
                                          Tel Fix:  +34 942 280174

 Miembro de http://www.linuca.org    Membre du http://www.bxlug.be
 ?Usuario captivo o libre?    http://www.obtengalinux.org/windows/
 Free or  captive user?        http://www.getgnulinux.org/windows/
-------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20110125/98f9daa3/attachment.bin 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* buffer page concepts in the page cache
  2011-01-25 19:10   ` Miguel Telleria de Esteban
@ 2011-01-26  4:19     ` Mulyadi Santosa
  0 siblings, 0 replies; 4+ messages in thread
From: Mulyadi Santosa @ 2011-01-26  4:19 UTC (permalink / raw)
  To: kernelnewbies

Hi Miguel :)

On Wed, Jan 26, 2011 at 02:10, Miguel Telleria de Esteban
<miguel@mtelleria.com> wrote:
> Uhhmm I don't have this clear. ?I would like to check on which
> adress_space object I am using (the block device or the file) so I
> guess I need more deep tools (maybe ftrace??) to see it.

Oh you mean function tracing? Alright then, maybe
ftrace...specifically the function tracer could help you....

OK, happy hacking :)

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-01-26  4:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-25 12:56 buffer page concepts in the page cache Miguel Telleria de Esteban
2011-01-25 18:19 ` Mulyadi Santosa
2011-01-25 19:10   ` Miguel Telleria de Esteban
2011-01-26  4:19     ` Mulyadi Santosa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.