linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>, Shaohua Li <shli@fb.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Kernel-team@fb.com, hughd@google.com, riel@redhat.com,
	mgorman@techsingularity.net, akpm@linux-foundation.org
Subject: Re: [PATCH V5 6/6] proc: show MADV_FREE pages info in smaps
Date: Thu, 2 Mar 2017 16:39:47 +0900	[thread overview]
Message-ID: <20170302073947.GA32690@bbox> (raw)
In-Reply-To: <20170301185735.GA24905@dhcp22.suse.cz>

On Wed, Mar 01, 2017 at 07:57:35PM +0100, Michal Hocko wrote:
> On Wed 01-03-17 13:31:49, Johannes Weiner wrote:
> > On Wed, Mar 01, 2017 at 02:36:24PM +0100, Michal Hocko wrote:
> > > On Fri 24-02-17 13:31:49, Shaohua Li wrote:
> > > > show MADV_FREE pages info of each vma in smaps. The interface is for
> > > > diganose or monitoring purpose, userspace could use it to understand
> > > > what happens in the application. Since userspace could dirty MADV_FREE
> > > > pages without notice from kernel, this interface is the only place we
> > > > can get accurate accounting info about MADV_FREE pages.
> > > 
> > > I have just got to test this patchset and noticed something that was a
> > > bit surprising
> > > 
> > > madvise(mmap(len), len, MADV_FREE)
> > > Size:             102400 kB
> > > Rss:              102400 kB
> > > Pss:              102400 kB
> > > Shared_Clean:          0 kB
> > > Shared_Dirty:          0 kB
> > > Private_Clean:    102400 kB
> > > Private_Dirty:         0 kB
> > > Referenced:            0 kB
> > > Anonymous:        102400 kB
> > > LazyFree:         102368 kB
> > > 
> > > It took me a some time to realize that LazyFree is not accurate because
> > > there are still pages on the per-cpu lru_lazyfree_pvecs. I believe this
> > > is an implementation detail which shouldn't be visible to the userspace.
> > > Should we simply drain the pagevec? A crude way would be to simply
> > > lru_add_drain_all after we are done with the given range. We can also
> > > make this lru_lazyfree_pvecs specific but I am not sure this is worth
> > > the additional code.
> > > ---
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index dc5927c812d3..d2c318db16c9 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -474,7 +474,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
> > >  	madvise_free_page_range(&tlb, vma, start, end);
> > >  	mmu_notifier_invalidate_range_end(mm, start, end);
> > >  	tlb_finish_mmu(&tlb, start, end);
> > > -
> > > +	lru_add_drain_all();
> > 
> > A full drain on all CPUs is very expensive and IMO not justified for
> > some per-cpu fuzz factor in the stats. I'd take hampering the stats
> > over hampering the syscall any day; only a subset of MADV_FREE users
> > will look at the stats.
> > 
> > And while the aggregate error can be large on machines with many CPUs
> > (notably the machines on which you absolutely don't want to send IPIs
> > to all cores each time a thread madvises some pages!),
> 
> I am not sure I understand. Where would we trigger IPIs?
> lru_add_drain_all relies on workqueus.
> 
> > the pages of a
> > single process are not likely to be spread out across more than a few
> > CPUs.
> 
> Then we can simply only flushe lru_lazyfree_pvecs which should reduce
> the unrelated noise from other pagevecs.
> 
> > The error when reading a specific smaps should be completely ok.
> > 
> > In numbers: even if your process is madvising from 16 different CPUs,
> > the error in its smaps file will peak at 896K in the worst case. That
> > level of concurrency tends to come with much bigger memory quantities
> > for that amount of error to matter.
> 
> It is still an unexpected behavior IMHO and an implementation detail
> which leaks to the userspace.
>  
> > IMO this is a non-issue.
> 
> I will not insist if there is a general consensus on this and it is a
> documented behavior, though. 

We cannot gurantee with that even draining because madvise_free can
miss some of pages easily with several conditions.
First of all, userspace can never know how many of pages are mapped
in there at the moment. As well, one of page in the range can be
swapped out or is going migrating, fail to try_lockpage and so on.

  reply	other threads:[~2017-03-02  7:55 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-24 21:31 [PATCH V5 0/6] mm: fix some MADV_FREE issues Shaohua Li
2017-02-24 21:31 ` [PATCH V5 1/6] mm: delete unnecessary TTU_* flags Shaohua Li
2017-02-27 13:48   ` Michal Hocko
2017-02-24 21:31 ` [PATCH V5 2/6] mm: don't assume anonymous pages have SwapBacked flag Shaohua Li
2017-02-27  6:48   ` Hillf Danton
2017-02-27 14:35   ` Michal Hocko
2017-02-27 16:10     ` Shaohua Li
2017-02-27 16:28       ` Michal Hocko
2017-02-24 21:31 ` [PATCH V5 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list Shaohua Li
2017-02-27  6:28   ` Minchan Kim
2017-02-27 16:13     ` Shaohua Li
2017-02-27 16:30       ` Michal Hocko
2017-02-28  2:53       ` Minchan Kim
2017-02-27 14:53   ` Michal Hocko
2017-02-27 17:15   ` Johannes Weiner
2017-02-28  3:19   ` Hillf Danton
2017-02-24 21:31 ` [PATCH V5 4/6] mm: reclaim MADV_FREE pages Shaohua Li
2017-02-27  6:33   ` Minchan Kim
2017-02-27 16:19     ` Shaohua Li
2017-02-27 16:32       ` Michal Hocko
2017-02-28  5:02       ` Minchan Kim
2017-02-27 15:05   ` Michal Hocko
2017-02-27 17:21   ` Johannes Weiner
2017-02-28  3:21   ` Hillf Danton
2017-02-24 21:31 ` [PATCH V5 5/6] mm: enable MADV_FREE for swapless system Shaohua Li
2017-02-27 15:06   ` Michal Hocko
2017-02-28  3:22   ` Hillf Danton
2017-02-28  5:02   ` Minchan Kim
2017-02-24 21:31 ` [PATCH V5 6/6] proc: show MADV_FREE pages info in smaps Shaohua Li
2017-02-27 15:06   ` Michal Hocko
2017-02-28  3:23   ` Hillf Danton
2017-03-01 13:36   ` Michal Hocko
2017-03-01 17:37     ` Shaohua Li
2017-03-01 17:49       ` Michal Hocko
2017-03-01 18:18         ` Shaohua Li
2017-03-01 18:31     ` Johannes Weiner
2017-03-01 18:57       ` Michal Hocko
2017-03-02  7:39         ` Minchan Kim [this message]
2017-03-02 14:01         ` Johannes Weiner
2017-03-02 16:30           ` Michal Hocko
2017-03-04  0:10             ` Andrew Morton
2017-03-07 10:05               ` Michal Hocko
2017-03-07 22:43                 ` Andrew Morton
2017-03-08  5:36                   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170302073947.GA32690@bbox \
    --to=minchan@kernel.org \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=riel@redhat.com \
    --cc=shli@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).