linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-btrfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RESEND] block: annotate refault stalls from IO submission
Date: Tue, 13 Aug 2019 13:46:25 -0400	[thread overview]
Message-ID: <20190813174625.GA21982@cmpxchg.org> (raw)
In-Reply-To: <20190809221248.GK7689@dread.disaster.area>

On Sat, Aug 10, 2019 at 08:12:48AM +1000, Dave Chinner wrote:
> On Thu, Aug 08, 2019 at 03:03:00PM -0400, Johannes Weiner wrote:
> > psi tracks the time tasks wait for refaulting pages to become
> > uptodate, but it does not track the time spent submitting the IO. The
> > submission part can be significant if backing storage is contended or
> > when cgroup throttling (io.latency) is in effect - a lot of time is
> 
> Or the wbt is throttling.
> 
> > spent in submit_bio(). In that case, we underreport memory pressure.
> > 
> > Annotate submit_bio() to account submission time as memory stall when
> > the bio is reading userspace workingset pages.
> 
> PAtch looks fine to me, but it raises another question w.r.t. IO
> stalls and reclaim pressure feedback to the vm: how do we make use
> of the pressure stall infrastructure to track inode cache pressure
> and stalls?
> 
> With the congestion_wait() and wait_iff_congested() being entire
> non-functional for block devices since 5.0, there is no IO load
> based feedback going into memory reclaim from shrinkers that might
> require IO to free objects before they can be reclaimed. This is
> directly analogous to page reclaim writing back dirty pages from
> the LRU, and as I understand it one of things the PSI is supposed
> to be tracking.
>
> Lots of workloads create inode cache pressure and often it can
> dominate the time spent in memory reclaim, so it would seem to me
> that having PSI only track/calculate pressure and stalls from LRU
> pages misses a fair chunk of the memory pressure and reclaim stalls
> that can be occurring.

psi already tracks the entire reclaim operation. So if reclaim calls
into the shrinker and the shrinker scans inodes, initiates IO, or even
waits on IO, that time is accounted for as memory pressure stalling.

If you can think of asynchronous events that are initiated from
reclaim but cause indirect stalls in other contexts, contexts which
can clearly link the stall back to reclaim activity, we can annotate
them using psi_memstall_enter() / psi_memstall_leave().

In that vein, what would be great to have is be a distinction between
read stalls on dentries/inodes that have never been touched before
versus those that have been recently reclaimed - analogous to cold
page faults vs refaults.

It would help psi, sure, but more importantly it would help us better
balance pressure between filesystem metadata and the data pages. We
would be able to tell the difference between a `find /' and actual
thrashing, where hot inodes are getting kicked out and reloaded
repeatedly - and we could backfeed that pressure to the LRU pages to
allow the metadata caches to grow as needed.

For example, it could make sense to swap out a couple of completely
unused anonymous pages if it means we could hold the metadata
workingset fully in memory. But right now we cannot do that, because
we cannot risk swapping just because somebody runs find /.

I have semi-seriously talked to Josef about this before, but it wasn't
quite obvious where we could track non-residency or eviction
information for inodes, dentries etc. Maybe you have an idea?


  reply	other threads:[~2019-08-13 17:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-08 19:03 [PATCH RESEND] block: annotate refault stalls from IO submission Johannes Weiner
2019-08-09 22:12 ` Dave Chinner
2019-08-13 17:46   ` Johannes Weiner [this message]
2019-08-14  2:51     ` Dave Chinner
2019-08-14 13:53       ` Johannes Weiner
2019-08-09 23:03 ` Suren Baghdasaryan
2019-08-14 14:50 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190813174625.GA21982@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).