From: Johannes Weiner <hannes@cmpxchg.org> To: Michal Hocko <mhocko@kernel.org> Cc: Shaohua Li <shli@fb.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kernel-team@fb.com, minchan@kernel.org, hughd@google.com, riel@redhat.com, mgorman@techsingularity.net, akpm@linux-foundation.org Subject: Re: [PATCH V5 6/6] proc: show MADV_FREE pages info in smaps Date: Thu, 2 Mar 2017 09:01:01 -0500 [thread overview] Message-ID: <20170302140101.GA16021@cmpxchg.org> (raw) In-Reply-To: <20170301185735.GA24905@dhcp22.suse.cz> On Wed, Mar 01, 2017 at 07:57:35PM +0100, Michal Hocko wrote: > On Wed 01-03-17 13:31:49, Johannes Weiner wrote: > > On Wed, Mar 01, 2017 at 02:36:24PM +0100, Michal Hocko wrote: > > > @@ -474,7 +474,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, > > > madvise_free_page_range(&tlb, vma, start, end); > > > mmu_notifier_invalidate_range_end(mm, start, end); > > > tlb_finish_mmu(&tlb, start, end); > > > - > > > + lru_add_drain_all(); > > > > A full drain on all CPUs is very expensive and IMO not justified for > > some per-cpu fuzz factor in the stats. I'd take hampering the stats > > over hampering the syscall any day; only a subset of MADV_FREE users > > will look at the stats. > > > > And while the aggregate error can be large on machines with many CPUs > > (notably the machines on which you absolutely don't want to send IPIs > > to all cores each time a thread madvises some pages!), > > I am not sure I understand. Where would we trigger IPIs? > lru_add_drain_all relies on workqueus. Brainfart on my end, s,IPIs,sync work items,. That doesn't change my point, though. These things are expensive, and we had scalability issues with them in the past. See for example 4dd72b4a47a5 ("mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED"). > > the pages of a > > single process are not likely to be spread out across more than a few > > CPUs. > > Then we can simply only flushe lru_lazyfree_pvecs which should reduce > the unrelated noise from other pagevecs. The problem isn't flushing other pagevecs once we're already scheduled on a CPU, the problem is scheduling work on all cpus and then waiting for completion. > > The error when reading a specific smaps should be completely ok. > > > > In numbers: even if your process is madvising from 16 different CPUs, > > the error in its smaps file will peak at 896K in the worst case. That > > level of concurrency tends to come with much bigger memory quantities > > for that amount of error to matter. > > It is still an unexpected behavior IMHO and an implementation detail > which leaks to the userspace. We have per-cpu fuzz in every single vmstat counter. Look at calculate_normal_threshold() in vmstat.c and the sample thresholds for when per-cpu deltas are flushed. In the vast majority of machines, the per-cpu error in these counters is much higher than what we get with pagevecs holding back a few pages. It's not that I think you're wrong: it *is* an implementation detail. But we take a bit of incoherency from batching all over the place, so it's a little odd to take a stand over this particular instance of it - whether demanding that it'd be fixed, or be documented, which would only suggest to users that this is special when it really isn't etc.
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org> To: Michal Hocko <mhocko@kernel.org> Cc: Shaohua Li <shli@fb.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kernel-team@fb.com, minchan@kernel.org, hughd@google.com, riel@redhat.com, mgorman@techsingularity.net, akpm@linux-foundation.org Subject: Re: [PATCH V5 6/6] proc: show MADV_FREE pages info in smaps Date: Thu, 2 Mar 2017 09:01:01 -0500 [thread overview] Message-ID: <20170302140101.GA16021@cmpxchg.org> (raw) In-Reply-To: <20170301185735.GA24905@dhcp22.suse.cz> On Wed, Mar 01, 2017 at 07:57:35PM +0100, Michal Hocko wrote: > On Wed 01-03-17 13:31:49, Johannes Weiner wrote: > > On Wed, Mar 01, 2017 at 02:36:24PM +0100, Michal Hocko wrote: > > > @@ -474,7 +474,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, > > > madvise_free_page_range(&tlb, vma, start, end); > > > mmu_notifier_invalidate_range_end(mm, start, end); > > > tlb_finish_mmu(&tlb, start, end); > > > - > > > + lru_add_drain_all(); > > > > A full drain on all CPUs is very expensive and IMO not justified for > > some per-cpu fuzz factor in the stats. I'd take hampering the stats > > over hampering the syscall any day; only a subset of MADV_FREE users > > will look at the stats. > > > > And while the aggregate error can be large on machines with many CPUs > > (notably the machines on which you absolutely don't want to send IPIs > > to all cores each time a thread madvises some pages!), > > I am not sure I understand. Where would we trigger IPIs? > lru_add_drain_all relies on workqueus. Brainfart on my end, s,IPIs,sync work items,. That doesn't change my point, though. These things are expensive, and we had scalability issues with them in the past. See for example 4dd72b4a47a5 ("mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED"). > > the pages of a > > single process are not likely to be spread out across more than a few > > CPUs. > > Then we can simply only flushe lru_lazyfree_pvecs which should reduce > the unrelated noise from other pagevecs. The problem isn't flushing other pagevecs once we're already scheduled on a CPU, the problem is scheduling work on all cpus and then waiting for completion. > > The error when reading a specific smaps should be completely ok. > > > > In numbers: even if your process is madvising from 16 different CPUs, > > the error in its smaps file will peak at 896K in the worst case. That > > level of concurrency tends to come with much bigger memory quantities > > for that amount of error to matter. > > It is still an unexpected behavior IMHO and an implementation detail > which leaks to the userspace. We have per-cpu fuzz in every single vmstat counter. Look at calculate_normal_threshold() in vmstat.c and the sample thresholds for when per-cpu deltas are flushed. In the vast majority of machines, the per-cpu error in these counters is much higher than what we get with pagevecs holding back a few pages. It's not that I think you're wrong: it *is* an implementation detail. But we take a bit of incoherency from batching all over the place, so it's a little odd to take a stand over this particular instance of it - whether demanding that it'd be fixed, or be documented, which would only suggest to users that this is special when it really isn't etc. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-03-02 14:25 UTC|newest] Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-02-24 21:31 [PATCH V5 0/6] mm: fix some MADV_FREE issues Shaohua Li 2017-02-24 21:31 ` Shaohua Li 2017-02-24 21:31 ` [PATCH V5 1/6] mm: delete unnecessary TTU_* flags Shaohua Li 2017-02-24 21:31 ` Shaohua Li 2017-02-27 13:48 ` Michal Hocko 2017-02-27 13:48 ` Michal Hocko 2017-02-24 21:31 ` [PATCH V5 2/6] mm: don't assume anonymous pages have SwapBacked flag Shaohua Li 2017-02-24 21:31 ` Shaohua Li 2017-02-27 6:48 ` Hillf Danton 2017-02-27 6:48 ` Hillf Danton 2017-02-27 14:35 ` Michal Hocko 2017-02-27 14:35 ` Michal Hocko 2017-02-27 16:10 ` Shaohua Li 2017-02-27 16:10 ` Shaohua Li 2017-02-27 16:28 ` Michal Hocko 2017-02-27 16:28 ` Michal Hocko 2017-02-24 21:31 ` [PATCH V5 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list Shaohua Li 2017-02-24 21:31 ` Shaohua Li 2017-02-27 6:28 ` Minchan Kim 2017-02-27 6:28 ` Minchan Kim 2017-02-27 16:13 ` Shaohua Li 2017-02-27 16:13 ` Shaohua Li 2017-02-27 16:30 ` Michal Hocko 2017-02-27 16:30 ` Michal Hocko 2017-02-28 2:53 ` Minchan Kim 2017-02-28 2:53 ` Minchan Kim 2017-02-27 14:53 ` Michal Hocko 2017-02-27 14:53 ` Michal Hocko 2017-02-27 17:15 ` Johannes Weiner 2017-02-27 17:15 ` Johannes Weiner 2017-02-28 3:19 ` Hillf Danton 2017-02-28 3:19 ` Hillf Danton 2017-02-24 21:31 ` [PATCH V5 4/6] mm: reclaim MADV_FREE pages Shaohua Li 2017-02-24 21:31 ` Shaohua Li 2017-02-27 6:33 ` Minchan Kim 2017-02-27 6:33 ` Minchan Kim 2017-02-27 16:19 ` Shaohua Li 2017-02-27 16:19 ` Shaohua Li 2017-02-27 16:32 ` Michal Hocko 2017-02-27 16:32 ` Michal Hocko 2017-02-28 5:02 ` Minchan Kim 2017-02-28 5:02 ` Minchan Kim 2017-02-27 15:05 ` Michal Hocko 2017-02-27 15:05 ` Michal Hocko 2017-02-27 17:21 ` Johannes Weiner 2017-02-27 17:21 ` Johannes Weiner 2017-02-28 3:21 ` Hillf Danton 2017-02-28 3:21 ` Hillf Danton 2017-02-24 21:31 ` [PATCH V5 5/6] mm: enable MADV_FREE for swapless system Shaohua Li 2017-02-24 21:31 ` Shaohua Li 2017-02-27 15:06 ` Michal Hocko 2017-02-27 15:06 ` Michal Hocko 2017-02-28 3:22 ` Hillf Danton 2017-02-28 3:22 ` Hillf Danton 2017-02-28 5:02 ` Minchan Kim 2017-02-28 5:02 ` Minchan Kim 2017-02-24 21:31 ` [PATCH V5 6/6] proc: show MADV_FREE pages info in smaps Shaohua Li 2017-02-24 21:31 ` Shaohua Li 2017-02-27 15:06 ` Michal Hocko 2017-02-27 15:06 ` Michal Hocko 2017-02-28 3:23 ` Hillf Danton 2017-02-28 3:23 ` Hillf Danton 2017-03-01 13:36 ` Michal Hocko 2017-03-01 13:36 ` Michal Hocko 2017-03-01 17:37 ` Shaohua Li 2017-03-01 17:37 ` Shaohua Li 2017-03-01 17:49 ` Michal Hocko 2017-03-01 17:49 ` Michal Hocko 2017-03-01 18:18 ` Shaohua Li 2017-03-01 18:18 ` Shaohua Li 2017-03-01 18:31 ` Johannes Weiner 2017-03-01 18:31 ` Johannes Weiner 2017-03-01 18:57 ` Michal Hocko 2017-03-01 18:57 ` Michal Hocko 2017-03-02 7:39 ` Minchan Kim 2017-03-02 7:39 ` Minchan Kim 2017-03-02 14:01 ` Johannes Weiner [this message] 2017-03-02 14:01 ` Johannes Weiner 2017-03-02 16:30 ` Michal Hocko 2017-03-02 16:30 ` Michal Hocko 2017-03-04 0:10 ` Andrew Morton 2017-03-04 0:10 ` Andrew Morton 2017-03-07 10:05 ` Michal Hocko 2017-03-07 10:05 ` Michal Hocko 2017-03-07 22:43 ` Andrew Morton 2017-03-07 22:43 ` Andrew Morton 2017-03-08 5:36 ` Minchan Kim 2017-03-08 5:36 ` Minchan Kim
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170302140101.GA16021@cmpxchg.org \ --to=hannes@cmpxchg.org \ --cc=Kernel-team@fb.com \ --cc=akpm@linux-foundation.org \ --cc=hughd@google.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@techsingularity.net \ --cc=mhocko@kernel.org \ --cc=minchan@kernel.org \ --cc=riel@redhat.com \ --cc=shli@fb.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.