All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shyam Prasad N <nspmangalore@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: David Howells <dhowells@redhat.com>,
	Steve French <smfrench@gmail.com>,
	CIFS <linux-cifs@vger.kernel.org>
Subject: Re: Classification of reads within a filesystem
Date: Fri, 23 Jul 2021 16:07:17 +0530	[thread overview]
Message-ID: <CANT5p=rCCoP3ScU80giZmGvM225e4u_W4hqB892vpNhj2J=auw@mail.gmail.com> (raw)
In-Reply-To: <YPhexTyuuE0/Wxf5@casper.infradead.org>

On Wed, Jul 21, 2021 at 11:22 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jul 21, 2021 at 10:38:59PM +0530, Shyam Prasad N wrote:
> > In a scenario where a user/application issues a readahead/fadvise for
> > large data ranges in advance (informing the kernel that they intend to
> > read these data ranges soon). Depending on how much data ranges these
> > calls cover, it could keep the network quite busy for a network
> > filesystem (or the disk for a block filesystem).
> >
> > I see some value if filesystems have the ability to differentiate the
> > reads from regular buffered reads by users. In such cases, the
> > filesystem can choose to throttle the readahead reads, so that there's
> > a specified bandwidth that's still available for regular reads.
> >
> > I wanted to get your opinions about this. And whether this can be done
> > already in VFS ->readahead and ->readpage calls in the filesystems?
>
> This is something I have an interest in, but haven't had time to pursue.
> The readahead code gets this information because the page cache
> calls page_cache_sync_ra() if it needs this page right now, and calls
> page_cache_async_ra() if it thinks it will need the page in the future.
>
> ondemand_readahead() currently gets a true/false parameter
> (hit_readahead_marker), although my folio patches change it to pass in
> a folio or NULL.  That is then *not* passed to the filesystem, but it
> could be information passed in the ractl.
>


Hi Matthew,

I don't yet know if this can be useful in other scenarios.
But for the above scenario (of eagerly calling readahead), I thought
that this info can be used by a filesystem for throttling, which it
doesn't get today.
I was also thinking that there could potentially be other
classifications, apart from sync vs async, for example the process IO
priority.
Today, I don't see the process IO priority used by block layer, and
not in vfs or the individual filesystems.
Do you think this is also another info that could/should trickle down
to individual filesystems?

CCing fsdevel also to get more inputs on this.

> There's also some tidying-up to be done around faulting.  Currently
> fault-around doesn't have a way to express "read me all the pages around
> page N".  Instead it just assumes that pages N-R/2 to N+R/2 are the
> right ones to fetch when it should be left up to the filesystem or the
> readahead code to determine what window of pages to fetch.
>
> Another thing I have an interest in doing but not had opportunity to
> pursue is making ->readpage synchronous.  The current MM code always
> calls ->readahead first and only calls ->readpage if ->readahead fails.
> That means that all the async ->readpage work is actually wrong; we
> want to return the best error possible from ->readpage, even if that
> means sleeping.
>
> Oh ... except for swap.  For NFS only, it calls ->readpage, so it really
> wants ->readpage to be async so it can kick off multiple pages and
> then wait for the one it actually needs.  That gets into a conversation
> about how much we really care about swap-over-NFS, whether swap should
> be using ->readpage or ->direct_IO, and whether swap should use the
> file readahead code or its own virtual address based readahead code.
> Most of those discussions are outside my area of expertise.



-- 
Regards,
Shyam

      parent reply	other threads:[~2021-07-23 10:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21 17:08 Classification of reads within a filesystem Shyam Prasad N
2021-07-21 17:52 ` Matthew Wilcox
2021-07-22  4:14   ` Christoph Hellwig
2021-07-23 10:37   ` Shyam Prasad N [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANT5p=rCCoP3ScU80giZmGvM225e4u_W4hqB892vpNhj2J=auw@mail.gmail.com' \
    --to=nspmangalore@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=smfrench@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.