All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-kernel@vger.kernel.org,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>,
	Brendan Gregg <bgregg@netflix.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christian Hansen <chansen3@cisco.com>,
	dancol@google.com, fmayer@google.com,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Kees Cook <keescook@chromium.org>,
	kernel-team@android.com, linux-api@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Mike Rapoport <rppt@linux.ibm.com>,
	minchan@kernel.org, namhyung@google.com, paulmck@linux.ibm.com,
	Robin Murphy <robin.murphy@arm.com>, Roman Gushchin <guro@fb.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	surenb@google.com, Thomas Gleixner <tglx@linutronix.de>,
	tkjos@google.com, Vladimir Davydov <vdavydov.dev@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>
Subject: Re: [PATCH v4 4/5] page_idle: Drain all LRU pagevec before idle tracking
Date: Tue, 6 Aug 2019 13:44:02 +0200	[thread overview]
Message-ID: <20190806114402.GX11812@dhcp22.suse.cz> (raw)
In-Reply-To: <20190806111921.GB117316@google.com>

On Tue 06-08-19 07:19:21, Joel Fernandes wrote:
> On Tue, Aug 06, 2019 at 12:51:49PM +0200, Michal Hocko wrote:
> > On Tue 06-08-19 06:45:54, Joel Fernandes wrote:
> > > On Tue, Aug 06, 2019 at 10:43:57AM +0200, Michal Hocko wrote:
> > > > On Mon 05-08-19 13:04:50, Joel Fernandes (Google) wrote:
> > > > > During idle tracking, we see that sometimes faulted anon pages are in
> > > > > pagevec but are not drained to LRU. Idle tracking considers pages only
> > > > > on LRU. Drain all CPU's LRU before starting idle tracking.
> > > > 
> > > > Please expand on why does this matter enough to introduce a potentially
> > > > expensinve draining which has to schedule a work on each CPU and wait
> > > > for them to finish.
> > > 
> > > Sure, I can expand. I am able to find multiple issues involving this. One
> > > issue looks like idle tracking is completely broken. It shows up in my
> > > testing as if a page that is marked as idle is always "accessed" -- because
> > > it was never marked as idle (due to not draining of pagevec).
> > > 
> > > The other issue shows up as a failure in my "swap test", with the following
> > > sequence:
> > > 1. Allocate some pages
> > > 2. Write to them
> > > 3. Mark them as idle                                    <--- fails
> > > 4. Introduce some memory pressure to induce swapping.
> > > 5. Check the swap bit I introduced in this series.      <--- fails to set idle
> > >                                                              bit in swap PTE.
> > > 
> > > Draining the pagevec in advance fixes both of these issues.
> > 
> > This belongs to the changelog.
> 
> Sure, will add.
> 
> 
> > > This operation even if expensive is only done once during the access of the
> > > page_idle file. Did you have a better fix in mind?
> > 
> > Can we set the idle bit also for non-lru pages as long as they are
> > reachable via pte?
> 
> Not at the moment with the current page idle tracking code. PageLRU(page)
> flag is checked in page_idle_get_page().

yes, I am aware of the current code. I strongly suspect that the PageLRU
check was there to not mark arbitrary page looked up by pfn with the
idle bit because that would be unexpected. But I might be easily wrong
here.

> Even if we could set it for non-LRU, the idle bit (page flag) would not be
> cleared if page is not on LRU because page-reclaim code (page_referenced() I
> believe) would not clear it.

Yes, it is either reclaim when checking references as you say but also
mark_page_accessed. I believe the later might still have the page on the
pcp LRU add cache. Maybe I am missing something something but it seems
that there is nothing fundamentally requiring the user mapped page to be
on the LRU list when seting the idle bit.

That being said, your big hammer approach will work more reliable but if
you do not feel like changing the underlying PageLRU assumption then
document that draining should be removed longterm.
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-kernel@vger.kernel.org,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>,
	Brendan Gregg <bgregg@netflix.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christian Hansen <chansen3@cisco.com>,
	dancol@google.com, fmayer@google.com,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Kees Cook <keescook@chromium.org>,
	kernel-team@android.com, linux-api@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Mike Rapoport <rppt@linux.ibm.com>,
	minchan@kernel.org, namhyung@google.com, paulmck@linux.ibm.com,
	Robin Murphy <robin.murphy@arm.com>, Roman Gushchin <guro@fb.com>,
	Stephen Rothwell <sfr@canb.a>
Subject: Re: [PATCH v4 4/5] page_idle: Drain all LRU pagevec before idle tracking
Date: Tue, 6 Aug 2019 13:44:02 +0200	[thread overview]
Message-ID: <20190806114402.GX11812@dhcp22.suse.cz> (raw)
In-Reply-To: <20190806111921.GB117316@google.com>

On Tue 06-08-19 07:19:21, Joel Fernandes wrote:
> On Tue, Aug 06, 2019 at 12:51:49PM +0200, Michal Hocko wrote:
> > On Tue 06-08-19 06:45:54, Joel Fernandes wrote:
> > > On Tue, Aug 06, 2019 at 10:43:57AM +0200, Michal Hocko wrote:
> > > > On Mon 05-08-19 13:04:50, Joel Fernandes (Google) wrote:
> > > > > During idle tracking, we see that sometimes faulted anon pages are in
> > > > > pagevec but are not drained to LRU. Idle tracking considers pages only
> > > > > on LRU. Drain all CPU's LRU before starting idle tracking.
> > > > 
> > > > Please expand on why does this matter enough to introduce a potentially
> > > > expensinve draining which has to schedule a work on each CPU and wait
> > > > for them to finish.
> > > 
> > > Sure, I can expand. I am able to find multiple issues involving this. One
> > > issue looks like idle tracking is completely broken. It shows up in my
> > > testing as if a page that is marked as idle is always "accessed" -- because
> > > it was never marked as idle (due to not draining of pagevec).
> > > 
> > > The other issue shows up as a failure in my "swap test", with the following
> > > sequence:
> > > 1. Allocate some pages
> > > 2. Write to them
> > > 3. Mark them as idle                                    <--- fails
> > > 4. Introduce some memory pressure to induce swapping.
> > > 5. Check the swap bit I introduced in this series.      <--- fails to set idle
> > >                                                              bit in swap PTE.
> > > 
> > > Draining the pagevec in advance fixes both of these issues.
> > 
> > This belongs to the changelog.
> 
> Sure, will add.
> 
> 
> > > This operation even if expensive is only done once during the access of the
> > > page_idle file. Did you have a better fix in mind?
> > 
> > Can we set the idle bit also for non-lru pages as long as they are
> > reachable via pte?
> 
> Not at the moment with the current page idle tracking code. PageLRU(page)
> flag is checked in page_idle_get_page().

yes, I am aware of the current code. I strongly suspect that the PageLRU
check was there to not mark arbitrary page looked up by pfn with the
idle bit because that would be unexpected. But I might be easily wrong
here.

> Even if we could set it for non-LRU, the idle bit (page flag) would not be
> cleared if page is not on LRU because page-reclaim code (page_referenced() I
> believe) would not clear it.

Yes, it is either reclaim when checking references as you say but also
mark_page_accessed. I believe the later might still have the page on the
pcp LRU add cache. Maybe I am missing something something but it seems
that there is nothing fundamentally requiring the user mapped page to be
on the LRU list when seting the idle bit.

That being said, your big hammer approach will work more reliable but if
you do not feel like changing the underlying PageLRU assumption then
document that draining should be removed longterm.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-08-06 11:44 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05 17:04 [PATCH v4 1/5] mm/page_idle: Add per-pid idle page tracking using virtual indexing Joel Fernandes (Google)
2019-08-05 17:04 ` Joel Fernandes (Google)
2019-08-05 17:04 ` [PATCH v4 2/5] [RFC] x86: Add support for idle bit in swap PTE Joel Fernandes (Google)
2019-08-05 17:04   ` Joel Fernandes (Google)
2019-08-05 17:04 ` [PATCH v4 3/5] [RFC] arm64: " Joel Fernandes (Google)
2019-08-05 17:04   ` Joel Fernandes (Google)
2019-08-06  8:42   ` Michal Hocko
2019-08-06  8:42     ` Michal Hocko
2019-08-06 10:36     ` Joel Fernandes
2019-08-06 10:36       ` Joel Fernandes
2019-08-06 10:47       ` Michal Hocko
2019-08-06 10:47         ` Michal Hocko
2019-08-06 11:07         ` Minchan Kim
2019-08-06 11:07           ` Minchan Kim
2019-08-06 11:14           ` Michal Hocko
2019-08-06 11:14             ` Michal Hocko
2019-08-06 11:26             ` Joel Fernandes
2019-08-06 11:26               ` Joel Fernandes
2019-08-06 11:14         ` Joel Fernandes
2019-08-06 11:14           ` Joel Fernandes
2019-08-06 11:57           ` Michal Hocko
2019-08-06 11:57             ` Michal Hocko
2019-08-06 13:43             ` Joel Fernandes
2019-08-06 13:43               ` Joel Fernandes
2019-08-06 14:09               ` Michal Hocko
2019-08-06 14:09                 ` Michal Hocko
2019-08-06 14:47             ` Minchan Kim
2019-08-06 14:47               ` Minchan Kim
2019-08-06 15:20               ` Joel Fernandes
2019-08-06 15:20                 ` Joel Fernandes
2019-08-05 17:04 ` [PATCH v4 4/5] page_idle: Drain all LRU pagevec before idle tracking Joel Fernandes (Google)
2019-08-05 17:04   ` Joel Fernandes (Google)
2019-08-06  8:43   ` Michal Hocko
2019-08-06  8:43     ` Michal Hocko
2019-08-06 10:45     ` Joel Fernandes
2019-08-06 10:45       ` Joel Fernandes
2019-08-06 10:51       ` Michal Hocko
2019-08-06 10:51         ` Michal Hocko
2019-08-06 11:19         ` Joel Fernandes
2019-08-06 11:19           ` Joel Fernandes
2019-08-06 11:44           ` Michal Hocko [this message]
2019-08-06 11:44             ` Michal Hocko
2019-08-06 13:48             ` Joel Fernandes
2019-08-06 13:48               ` Joel Fernandes
2019-08-05 17:04 ` [PATCH v4 5/5] doc: Update documentation for page_idle virtual address indexing Joel Fernandes (Google)
2019-08-05 17:04   ` Joel Fernandes (Google)
2019-08-06  8:56 ` [PATCH v4 1/5] mm/page_idle: Add per-pid idle page tracking using virtual indexing Michal Hocko
2019-08-06  8:56   ` Michal Hocko
2019-08-06 10:47   ` Joel Fernandes
2019-08-06 10:47     ` Joel Fernandes
2019-08-06 22:19 ` Andrew Morton
2019-08-06 22:19   ` Andrew Morton
2019-08-07 10:00   ` Joel Fernandes
2019-08-07 10:00     ` Joel Fernandes
2019-08-07 20:01     ` Andrew Morton
2019-08-07 20:01       ` Andrew Morton
2019-08-07 20:44       ` Joel Fernandes
2019-08-07 20:44         ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190806114402.GX11812@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bgregg@netflix.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=chansen3@cisco.com \
    --cc=corbet@lwn.net \
    --cc=dancol@google.com \
    --cc=fmayer@google.com \
    --cc=guro@fb.com \
    --cc=hpa@zytor.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=kernel-team@android.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@google.com \
    --cc=paulmck@linux.ibm.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@linux.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tkjos@google.com \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.