linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-btrfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] psi: annotate refault stalls from IO submission
Date: Mon, 22 Jul 2019 19:35:27 -0400	[thread overview]
Message-ID: <20190722233527.GA21594@cmpxchg.org> (raw)
In-Reply-To: <20190722152607.dd175a9d517a5f6af06a8bdc@linux-foundation.org>

On Mon, Jul 22, 2019 at 03:26:07PM -0700, Andrew Morton wrote:
> On Mon, 22 Jul 2019 16:13:37 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > psi tracks the time tasks wait for refaulting pages to become
> > uptodate, but it does not track the time spent submitting the IO. The
> > submission part can be significant if backing storage is contended or
> > when cgroup throttling (io.latency) is in effect - a lot of time is
> > spent in submit_bio(). In that case, we underreport memory pressure.
> 
> It's a somewhat broad patch.  How significant is this problem in the
> real world?  Can we be confident that the end-user benefit is worth the
> code changes?

The error scales with how aggressively IO is throttled compared to the
device's capability.

For example, we have system maintenance software throttled down pretty
hard on IO compared to the workload. When memory is contended, the
system software starts thrashing cache, but since the backing device
is actually pretty fast, the majority of "io time" is from injected
throttling delays during submit_bio().

As a result we barely see memory pressure, when the reality is that
there is almost no progress due to the thrashing and we should be
killing misbehaving stuff.

> > Annotate the submit_bio() paths (or the indirection through readpage)
> > for refaults and swapin to get proper psi coverage of delays there.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  fs/btrfs/extent_io.c | 14 ++++++++++++--
> >  fs/ext4/readpage.c   |  9 +++++++++
> >  fs/f2fs/data.c       |  8 ++++++++
> >  fs/mpage.c           |  9 +++++++++
> >  mm/filemap.c         | 20 ++++++++++++++++++++
> >  mm/page_io.c         | 11 ++++++++---
> >  mm/readahead.c       | 24 +++++++++++++++++++++++-
> 
> We touch three filesystems.  Why these three?  Are all other
> filesystems OK or will they need work as well?

These are the ones that I found open-coding add_to_page_cache_lru()
followed by submit_bio() instead of going through generic code like
mpage, use read_cache_pages(), implement ->readpage only.

> > @@ -2753,11 +2763,14 @@ static struct page *do_read_cache_page(struct address_space *mapping,
> >  				void *data,
> >  				gfp_t gfp)
> >  {
> > +	bool refault = false;
> >  	struct page *page;
> >  	int err;
> >  repeat:
> >  	page = find_get_page(mapping, index);
> >  	if (!page) {
> > +		unsigned long pflags;
> > +
> 
> That was a bit odd.  This?
> 
> --- a/mm/filemap.c~psi-annotate-refault-stalls-from-io-submission-fix
> +++ a/mm/filemap.c
> @@ -2815,12 +2815,12 @@ static struct page *do_read_cache_page(s
>  				void *data,
>  				gfp_t gfp)
>  {
> -	bool refault = false;
>  	struct page *page;
>  	int err;
>  repeat:
>  	page = find_get_page(mapping, index);
>  	if (!page) {
> +		bool refault = false;
>  		unsigned long pflags;
>  
>  		page = __page_cache_alloc(gfp);
> _
> 

It's so that when we jump to 'filler:' from outside the branch, the
'refault' variable is initialized from the first time through:

	bool refault = false;
	struct page *page;

	page = find_get_page(mapping, index);
	if (!page) {
	   	__page_cache_alloc()
		add_to_page_cache_lru()
		refault = PageWorkingset(page);
filler:
		if (refault)
			psi_memstall_enter(&pflags);

		readpage()

		if (refault)
			psi_memstall_leave(&pflags);
	}
	lock_page()
	if (PageUptodate())
		goto out;
	goto filler;

  reply	other threads:[~2019-07-22 23:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-22 20:13 [PATCH] psi: annotate refault stalls from IO submission Johannes Weiner
2019-07-22 22:26 ` Andrew Morton
2019-07-22 23:35   ` Johannes Weiner [this message]
2019-07-23  0:02 ` Dave Chinner
2019-07-23 19:04   ` Johannes Weiner
2019-07-23 19:34     ` Jens Axboe
2019-07-23 20:42       ` Johannes Weiner
2019-07-23 22:10       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190722233527.GA21594@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).