linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Classification of reads within a filesystem
@ 2021-07-21 17:08 Shyam Prasad N
  2021-07-21 17:52 ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: Shyam Prasad N @ 2021-07-21 17:08 UTC (permalink / raw)
  To: Matthew Wilcox, David Howells, Steve French, CIFS

Hi Matthew,

I had a query about a filesystem's ability to differentiate readaheads
from actual reads.
David Howells suggested that you may be able to answer this one. If
not, please feel free to add the right people to this email.

In a scenario where a user/application issues a readahead/fadvise for
large data ranges in advance (informing the kernel that they intend to
read these data ranges soon). Depending on how much data ranges these
calls cover, it could keep the network quite busy for a network
filesystem (or the disk for a block filesystem).

I see some value if filesystems have the ability to differentiate the
reads from regular buffered reads by users. In such cases, the
filesystem can choose to throttle the readahead reads, so that there's
a specified bandwidth that's still available for regular reads.

I wanted to get your opinions about this. And whether this can be done
already in VFS ->readahead and ->readpage calls in the filesystems?

-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Classification of reads within a filesystem
  2021-07-21 17:08 Classification of reads within a filesystem Shyam Prasad N
@ 2021-07-21 17:52 ` Matthew Wilcox
  2021-07-22  4:14   ` Christoph Hellwig
  2021-07-23 10:37   ` Shyam Prasad N
  0 siblings, 2 replies; 4+ messages in thread
From: Matthew Wilcox @ 2021-07-21 17:52 UTC (permalink / raw)
  To: Shyam Prasad N; +Cc: David Howells, Steve French, CIFS

On Wed, Jul 21, 2021 at 10:38:59PM +0530, Shyam Prasad N wrote:
> In a scenario where a user/application issues a readahead/fadvise for
> large data ranges in advance (informing the kernel that they intend to
> read these data ranges soon). Depending on how much data ranges these
> calls cover, it could keep the network quite busy for a network
> filesystem (or the disk for a block filesystem).
> 
> I see some value if filesystems have the ability to differentiate the
> reads from regular buffered reads by users. In such cases, the
> filesystem can choose to throttle the readahead reads, so that there's
> a specified bandwidth that's still available for regular reads.
> 
> I wanted to get your opinions about this. And whether this can be done
> already in VFS ->readahead and ->readpage calls in the filesystems?

This is something I have an interest in, but haven't had time to pursue.
The readahead code gets this information because the page cache
calls page_cache_sync_ra() if it needs this page right now, and calls
page_cache_async_ra() if it thinks it will need the page in the future.

ondemand_readahead() currently gets a true/false parameter
(hit_readahead_marker), although my folio patches change it to pass in
a folio or NULL.  That is then *not* passed to the filesystem, but it
could be information passed in the ractl.

There's also some tidying-up to be done around faulting.  Currently
fault-around doesn't have a way to express "read me all the pages around
page N".  Instead it just assumes that pages N-R/2 to N+R/2 are the
right ones to fetch when it should be left up to the filesystem or the
readahead code to determine what window of pages to fetch.

Another thing I have an interest in doing but not had opportunity to
pursue is making ->readpage synchronous.  The current MM code always
calls ->readahead first and only calls ->readpage if ->readahead fails.
That means that all the async ->readpage work is actually wrong; we
want to return the best error possible from ->readpage, even if that
means sleeping.

Oh ... except for swap.  For NFS only, it calls ->readpage, so it really
wants ->readpage to be async so it can kick off multiple pages and
then wait for the one it actually needs.  That gets into a conversation
about how much we really care about swap-over-NFS, whether swap should
be using ->readpage or ->direct_IO, and whether swap should use the
file readahead code or its own virtual address based readahead code.
Most of those discussions are outside my area of expertise.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Classification of reads within a filesystem
  2021-07-21 17:52 ` Matthew Wilcox
@ 2021-07-22  4:14   ` Christoph Hellwig
  2021-07-23 10:37   ` Shyam Prasad N
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2021-07-22  4:14 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Shyam Prasad N, David Howells, Steve French, CIFS

On Wed, Jul 21, 2021 at 06:52:05PM +0100, Matthew Wilcox wrote:
> Oh ... except for swap.  For NFS only, it calls ->readpage, so it really
> wants ->readpage to be async so it can kick off multiple pages and
> then wait for the one it actually needs.  That gets into a conversation
> about how much we really care about swap-over-NFS, whether swap should
> be using ->readpage or ->direct_IO, and whether swap should use the
> file readahead code or its own virtual address based readahead code.
> Most of those discussions are outside my area of expertise.

It really should be using direct I/O.  I think one issue back in the
day was the odd locking requirements for swap, but that's something we
could overcome.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Classification of reads within a filesystem
  2021-07-21 17:52 ` Matthew Wilcox
  2021-07-22  4:14   ` Christoph Hellwig
@ 2021-07-23 10:37   ` Shyam Prasad N
  1 sibling, 0 replies; 4+ messages in thread
From: Shyam Prasad N @ 2021-07-23 10:37 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: David Howells, Steve French, CIFS

On Wed, Jul 21, 2021 at 11:22 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jul 21, 2021 at 10:38:59PM +0530, Shyam Prasad N wrote:
> > In a scenario where a user/application issues a readahead/fadvise for
> > large data ranges in advance (informing the kernel that they intend to
> > read these data ranges soon). Depending on how much data ranges these
> > calls cover, it could keep the network quite busy for a network
> > filesystem (or the disk for a block filesystem).
> >
> > I see some value if filesystems have the ability to differentiate the
> > reads from regular buffered reads by users. In such cases, the
> > filesystem can choose to throttle the readahead reads, so that there's
> > a specified bandwidth that's still available for regular reads.
> >
> > I wanted to get your opinions about this. And whether this can be done
> > already in VFS ->readahead and ->readpage calls in the filesystems?
>
> This is something I have an interest in, but haven't had time to pursue.
> The readahead code gets this information because the page cache
> calls page_cache_sync_ra() if it needs this page right now, and calls
> page_cache_async_ra() if it thinks it will need the page in the future.
>
> ondemand_readahead() currently gets a true/false parameter
> (hit_readahead_marker), although my folio patches change it to pass in
> a folio or NULL.  That is then *not* passed to the filesystem, but it
> could be information passed in the ractl.
>


Hi Matthew,

I don't yet know if this can be useful in other scenarios.
But for the above scenario (of eagerly calling readahead), I thought
that this info can be used by a filesystem for throttling, which it
doesn't get today.
I was also thinking that there could potentially be other
classifications, apart from sync vs async, for example the process IO
priority.
Today, I don't see the process IO priority used by block layer, and
not in vfs or the individual filesystems.
Do you think this is also another info that could/should trickle down
to individual filesystems?

CCing fsdevel also to get more inputs on this.

> There's also some tidying-up to be done around faulting.  Currently
> fault-around doesn't have a way to express "read me all the pages around
> page N".  Instead it just assumes that pages N-R/2 to N+R/2 are the
> right ones to fetch when it should be left up to the filesystem or the
> readahead code to determine what window of pages to fetch.
>
> Another thing I have an interest in doing but not had opportunity to
> pursue is making ->readpage synchronous.  The current MM code always
> calls ->readahead first and only calls ->readpage if ->readahead fails.
> That means that all the async ->readpage work is actually wrong; we
> want to return the best error possible from ->readpage, even if that
> means sleeping.
>
> Oh ... except for swap.  For NFS only, it calls ->readpage, so it really
> wants ->readpage to be async so it can kick off multiple pages and
> then wait for the one it actually needs.  That gets into a conversation
> about how much we really care about swap-over-NFS, whether swap should
> be using ->readpage or ->direct_IO, and whether swap should use the
> file readahead code or its own virtual address based readahead code.
> Most of those discussions are outside my area of expertise.



-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-23 10:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21 17:08 Classification of reads within a filesystem Shyam Prasad N
2021-07-21 17:52 ` Matthew Wilcox
2021-07-22  4:14   ` Christoph Hellwig
2021-07-23 10:37   ` Shyam Prasad N

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox