All of
 help / color / mirror / Atom feed
From: Brendan Gregg <>
To: riteshh <>
Cc: "Matthew Wilcox (Oracle)" <>,
	David Howells <>,
Subject: Re: [PATCH 0/3] readahead improvements
Date: Thu, 8 Apr 2021 23:15:51 +1000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <20210408105705.exod2cvtvnr4467o@riteshh-domain>

On Thu, Apr 8, 2021 at 8:57 PM riteshh <> wrote:
> On 21/04/07 09:18PM, Matthew Wilcox (Oracle) wrote:
> > As requested, fix up readahead_expand() so as to not confuse the ondemand
> > algorithm.  Also make the documentation slightly better.  Dave, could you
> > put in some debug and check this actually works?  I don't generally test
> > with any filesystems that use readahead_expand(), but printing (index,
> > nr_to_read, lookahead_size) in page_cache_ra_unbounded() would let a human
> > (such as your good self) determine whether it's working approximately
> > as designed.
> Hello,
> Sorry about the silly question here, since I don't have much details of how
> readahead algorithm code path.
> 1. Do we know of a way to measure efficiency of readahead in Linux?
> 2. And if there is any way to validate readahead is working correctly and as
>    intended in Linux?

I created a bpftrace tool for measuring readahead efficiency for my
LSFMM 2019 keynote, where it showed the age of readahead pages when
they were finally used:

If they were mostly of a young age, one might conclude that readahead
is not only working, but could be tuned higher. Mostly of an old age,
and one might conclude readahead was tuned too high, and was reading
too many pages (that were later used unrelated to the original

I think my tool is just the start. What else should we measure for
understanding readahead efficiency? Code it today as a bpftrace tool
(and share it)!

> Like is there anything designed already to measure above two things?
> If not, are there any stats which can be collected and later should be parsed to
> say how efficient readahead is working in different use cases and also can
> verify if it's working correctly?
> I guess, we can already do point 1 from below. What about point 2 & 3?
> 1. Turn on/off the readahead and measure file reads timings for different
>    patterns. - I guess this is already doable.
> 2. Collecting runtime histogram showing how readahead window is
>    increasing/decreasing based on changing read patterns. And collecting how
>    much IOs it takes to increase/decrease the readahead size.
>    Are there any tracepoints needed to be enabled for this?
> 3. I guess it won't be possible w/o a way to also measure page cache
>    efficiency. Like in case of a memory pressure, if the page which was read
>    using readahead is thrown out only to re-read it again.
>    So a way to measure page cache efficiency also will be required.
> Any idea from others on this?
> I do see below page[1] by Brendan showing some ways to measure page cache
> efficiency using cachestat. But there are also some problems mentioned in the
> conclusion section, which I am not sure of what is the latest state of that.
> Also it doesn't discusses much on the readahead efficiency measurement.
> [1]:

Coincidentally, during the same LSFMMBPF keynote I showed cachestat
and described it as a "sandcastle," as kernel changes easily wash it
away. The MM folk discussed the various issues in measuring this
accurately: while cachestat worked for my workloads, I think there's a
lot more work to do to make it a robust tool for all workloads. I
still think it should be /proc metrics instead, as I commonly want a
page cache hit ratio metric (whereas many of my other tracing tools
are more niche, and can stay as tracing tools).

I don't think there's a video of the talk, but there was a writeup:

People keep porting my cachestat tool and building other things upon
it, but aren't updating the code, which is getting a bit annoying.
You're all assuming I solved it. But in my original Ftrace cachestat
code I thought I made it clear that it was a proof of concept for

# cachestat - show Linux page cache hit/miss statistics.
#             Uses Linux ftrace.
# This is a proof of concept using Linux ftrace capabilities on older kernels,
# and works by using function profiling for in-kernel counters. Specifically,
# four kernel functions are traced:
# mark_page_accessed() for measuring cache accesses
# mark_buffer_dirty() for measuring cache writes
# add_to_page_cache_lru() for measuring page additions
# account_page_dirtied() for measuring page dirties
# It is possible that these functions have been renamed (or are different
# logically) for your kernel version, and this script will not work as-is.
# This script was written on Linux 3.13. This script is a sandcastle: the
# kernel may wash some away, and you'll need to rebuild.


Brendan Gregg, Senior Performance Architect, Netflix

  reply	other threads:[~2021-04-08 13:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-07 20:18 Matthew Wilcox (Oracle)
2021-04-07 20:18 ` [PATCH 1/3] mm/filemap: Pass the file_ra_state in the ractl Matthew Wilcox (Oracle)
2021-04-07 20:18 ` [PATCH 2/3] fs: Document file_ra_state Matthew Wilcox (Oracle)
2021-04-07 20:18 ` [PATCH 3/3] mm/readahead: Adjust file_ra in readahead_expand Matthew Wilcox (Oracle)
2021-04-07 22:43 ` [PATCH 0/3] readahead improvements David Howells
2021-04-08 10:57 ` riteshh
2021-04-08 13:15   ` Brendan Gregg [this message]
2021-04-08 12:51 ` [PATCH] mm, netfs: Fix readahead bits David Howells
2021-04-08 13:09   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \
    --subject='Re: [PATCH 0/3] readahead improvements' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.