linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: "Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)" 
	<rruslich@cisco.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Taras Kondratiuk <takondra@cisco.com>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, xe-linux-external@cisco.com,
	linux-kernel@vger.kernel.org
Subject: Re: Detecting page cache trashing state
Date: Mon, 27 Nov 2017 11:18:35 +0900	[thread overview]
Message-ID: <20171127021835.GA27255@bbox> (raw)
In-Reply-To: <bfbfaaa1-2b12-f26f-218a-ff6804f47eae@cisco.com>

Hello,

On Mon, Nov 20, 2017 at 09:40:56PM +0200, Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco) wrote:
> Hi Johannes,
> 
> I tested with your patches but situation is still mostly the same.
> 
> Spend some time for debugging and found that the problem is squashfs
> specific (probably some others fs's too).
> The point is that iowait for squashfs reads will be awaited inside squashfs
> readpage() callback.
> Here is some backtrace for page fault handling to illustrate this:
> 
>  1)               |  handle_mm_fault() {
>  1)               |    filemap_fault() {
>  1)               |      __do_page_cache_readahead()
>  1)               |        add_to_page_cache_lru()
>  1)               |        squashfs_readpage() {
>  1)               |          squashfs_readpage_block() {
>  1)               |            squashfs_get_datablock() {
>  1)               |              squashfs_cache_get() {
>  1)               |                squashfs_read_data() {
>  1)               |                  ll_rw_block() {
>  1)               |                    submit_bh_wbc.isra.42()
>  1)               |                  __wait_on_buffer() {
>  1)               |                    io_schedule() {
>  ------------------------------------------
>  0)   kworker-79   =>    <idle>-0
>  ------------------------------------------
>  0)   0.382 us    |  blk_complete_request();
>  0)               |  blk_done_softirq() {
>  0)               |    blk_update_request() {
>  0)               |      end_buffer_read_sync()
>  0) + 38.559 us   |    }
>  0) + 48.367 us   |  }
>  ------------------------------------------
>  0)   kworker-79   =>  memhog-781
>  ------------------------------------------
>  0) ! 278.848 us  |                    }
>  0) ! 279.612 us  |                  }
>  0)               |                  squashfs_decompress() {
>  0) # 4919.082 us |                    squashfs_xz_uncompress();
>  0) # 4919.864 us |                  }
>  0) # 5479.212 us |                } /* squashfs_read_data */
>  0) # 5479.749 us |              } /* squashfs_cache_get */
>  0) # 5480.177 us |            } /* squashfs_get_datablock */
>  0)               |            squashfs_copy_cache() {
>  0)   0.057 us    |              unlock_page();
>  0) ! 142.773 us  |            }
>  0) # 5624.113 us |          } /* squashfs_readpage_block */
>  0) # 5628.814 us |        } /* squashfs_readpage */
>  0) # 5665.097 us |      } /* __do_page_cache_readahead */
>  0) # 5667.437 us |    } /* filemap_fault */
>  0) # 5672.880 us |  } /* handle_mm_fault */
> 
> As you can see squashfs_read_data() schedules IO by ll_rw_block() and then
> it waits for IO to finish inside wait_on_buffer().
> After that read buffer is decompressed and page is unlocked inside
> squashfs_readpage() handler.
> 
> Thus by the the time when filemap_fault() calls lock_page_or_retry() page
> will be uptodate and unlocked,
> wait_on_page_bit() is not called at all, and time spent for read/decompress
> is not accounted.

A weakness in current approach is that it relies on page lock.
It means it cannot work with sychronous devices like DAX, zram and
so on, I think.

Johannes, Can we add memdelay_enter to every fault handler's prologue?
and we can check it in epilogue whether the faulted page is workingset.
If is was, we can accumuate the spent time.
It would work with synchronous devices, esp, zram without hacking
some FSes like squashfs.

I think page fault handler/kswapd/direct reclaim would cover most of
cases of *real* memory pressure but un[lock]page freinds would cover
superfluously, for example, FSes can call it easily without memory
pressure.

> 
> Tried to apply quick workaround for test:
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index c4ca702..5e2be2b 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -126,9 +126,21 @@ static int read_pages(struct address_space *mapping,
> struct file *filp,
> 
>      for (page_idx = 0; page_idx < nr_pages; page_idx++) {
>          struct page *page = lru_to_page(pages);
> +        bool refault = false;
> +        unsigned long mdflags;
> +
>          list_del(&page->lru);
> -        if (!add_to_page_cache_lru(page, mapping, page->index, gfp))
> +        if (!add_to_page_cache_lru(page, mapping, page->index, gfp)) {
> +            if (!PageUptodate(page) && PageWorkingset(page)) {
> +                memdelay_enter(&mdflags);
> +                refault = true;
> +            }
> +
>              mapping->a_ops->readpage(filp, page);
> +
> +            if (refault)
> +                memdelay_leave(&mdflags);
> +        }
>          put_page(page);

  reply	other threads:[~2017-11-27  2:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15  0:16 Detecting page cache trashing state Taras Kondratiuk
2017-09-15 11:55 ` Zdenek Kabelac
2017-09-15 14:22 ` Daniel Walker
2017-09-15 16:38   ` Taras Kondratiuk
2017-09-15 17:31     ` Daniel Walker
2017-09-15 14:36 ` Michal Hocko
2017-09-15 17:28   ` Taras Kondratiuk
2017-09-18 16:34     ` Johannes Weiner
2017-09-19 10:55       ` [PATCH 1/3] sched/loadavg: consolidate LOAD_INT, LOAD_FRAC macros kbuild test robot
2017-09-19 11:02       ` kbuild test robot
2017-09-28 15:49       ` Detecting page cache trashing state Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)
2017-10-25 16:53         ` Daniel Walker
2017-10-25 17:54         ` Johannes Weiner
2017-10-27 20:19           ` Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)
2017-11-20 19:40             ` Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)
2017-11-27  2:18               ` Minchan Kim [this message]
2017-10-26  3:53         ` vinayak menon
2017-10-27 20:29           ` Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)
2017-09-15 21:20   ` vcaputo
2017-09-15 23:40     ` Taras Kondratiuk
2017-09-18  5:55     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171127021835.GA27255@bbox \
    --to=minchan@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rruslich@cisco.com \
    --cc=takondra@cisco.com \
    --cc=xe-linux-external@cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).