linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS
       [not found] <20201104231928.1494083-1-shakeelb@google.com>
@ 2020-11-05 16:01 ` Johannes Weiner
  2020-11-06  3:00 ` Hugh Dickins
  1 sibling, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2020-11-05 16:01 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Hugh Dickins, Jerome Glisse, Andrew Morton, Vlastimil Babka,
	linux-mm, linux-kernel

On Wed, Nov 04, 2020 at 03:19:28PM -0800, Shakeel Butt wrote:
> Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier
> semantic v2"), the code to check the secondary MMU's page table access
> bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from
> the secondary MMU's page table before the check. More specifically for
> those secondary MMUs which unmap the memory in
> mmu_notifier_invalidate_range_start() like kvm.
> 
> However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
> absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
> access check before trying to unmap the page. So, at worst the reclaim
> will miss accesses in a very short window if we remove page table access
> check in unmapping code.

We also miss accesses that happen right after the unmap :-) Seems
completely fine to make page_referenced() the time of last call.

> There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
> reclaim. From memcg reclaim the page_referenced() only account the
> accesses from the processes which are in the same memcg of the target
> page but the unmapping code is considering accesses from all the
> processes, so, decreasing the effectiveness of memcg reclaim.
> 
> The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping
> code.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS
       [not found] <20201104231928.1494083-1-shakeelb@google.com>
  2020-11-05 16:01 ` [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS Johannes Weiner
@ 2020-11-06  3:00 ` Hugh Dickins
  2020-11-06 15:09   ` Shakeel Butt
  1 sibling, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2020-11-06  3:00 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Hugh Dickins, Jerome Glisse, Johannes Weiner, Andrew Morton,
	Vlastimil Babka, Michal Hocko, linux-mm, linux-kernel

I don't know why this was addressed to me in particular (easy to imagine
I've made a mod at some time that bears on this, but I haven't found it);
but have spent longer considering the patch than I should have done -
apologies to everyone else I should be replying to.

On Wed, 4 Nov 2020, Shakeel Butt wrote:

> Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier
> semantic v2"), the code to check the secondary MMU's page table access
> bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from
> the secondary MMU's page table before the check. More specifically for
> those secondary MMUs which unmap the memory in
> mmu_notifier_invalidate_range_start() like kvm.

Well, "broken" seems a bit unfair to 369ea8242c0f. It put a warning
mmu_notifier_invalidate_range_start() at the beginning, and matching
mmu_notifier_invalidate_range_end() at the end of try_to_unmap_one();
with its mmu_notifier_invalidate_range() exactly where the
mmu_notifier_invalidate_page() was before (I think the story gets
more complicated later).  Yes, if notifiee takes invalidate_range_start()
as signal to invalidate all their own range, then that will sometimes
cause them unnecessary invalidations.

Not just for !TTU_IGNORE_ACCESS: there's also the !TTU_IGNORE_MLOCK
case meeting a VM_LOCKED vma and setting PageMlocked where that had
been missed earlier (and page_check_references() has intentionally but
confusingly marked this case as PAGEREF_RECLAIM, not to reclaim the page,
but to reach the try_to_unmap_one() which will recognize and fix it up -
historically easier to do there than in page_referenced_one()).

But I think mmu_notifier is a diversion from what needs thinking about.

> 
> However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
> absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
> access check before trying to unmap the page. So, at worst the reclaim
> will miss accesses in a very short window if we remove page table access
> check in unmapping code.

I agree with you and Johannes that the short race window when the page
might be re-referenced is no issue at all: the functional issue is the
one in your next paragraph.  If that's agreed by memcg guys, great,
then this patch is a nice observation and a welcome cleanup.

> 
> There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
> reclaim. From memcg reclaim the page_referenced() only account the
> accesses from the processes which are in the same memcg of the target
> page but the unmapping code is considering accesses from all the
> processes, so, decreasing the effectiveness of memcg reclaim.

Are you sure it was unintended?

Since the dawn of memcg reclaim, it has been the case that a recent
reference in a "foreign" vma has rescued that page from being reclaimed:
now you propose to change that.  I expect some workflows will benefit
and others be disadvantaged.  I have no objection myself to the change,
but I do think it needs to be better highlighted here, and explicitly
agreed by those more familiar with memcg reclaim.

Hugh

> 
> The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping
> code.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  include/linux/rmap.h |  1 -
>  mm/huge_memory.c     |  2 +-
>  mm/memory-failure.c  |  2 +-
>  mm/memory_hotplug.c  |  2 +-
>  mm/migrate.c         |  8 +++-----
>  mm/rmap.c            |  9 ---------
>  mm/vmscan.c          | 14 +++++---------
>  7 files changed, 11 insertions(+), 27 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 3a6adfa70fb0..70085ca1a3fc 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -91,7 +91,6 @@ enum ttu_flags {
>  
>  	TTU_SPLIT_HUGE_PMD	= 0x4,	/* split huge PMD if any */
>  	TTU_IGNORE_MLOCK	= 0x8,	/* ignore mlock */
> -	TTU_IGNORE_ACCESS	= 0x10,	/* don't age */
>  	TTU_IGNORE_HWPOISON	= 0x20,	/* corrupted page is recoverable */
>  	TTU_BATCH_FLUSH		= 0x40,	/* Batch TLB flushes where possible
>  					 * and caller guarantees they will
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 08a183f6c3ab..8b235b4abf73 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2324,7 +2324,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
>  
>  static void unmap_page(struct page *page)
>  {
> -	enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS |
> +	enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
>  		TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
>  	bool unmap_success;
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index c0bb186bba62..b6d6d5cdb435 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -989,7 +989,7 @@ static int get_hwpoison_page(struct page *page)
>  static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  				  int flags, struct page **hpagep)
>  {
> -	enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
> +	enum ttu_flags ttu = TTU_IGNORE_MLOCK;
>  	struct address_space *mapping;
>  	LIST_HEAD(tokill);
>  	bool unmap_success = true;
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 8c1b7182bb08..968e6916d297 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1303,7 +1303,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			if (WARN_ON(PageLRU(page)))
>  				isolate_lru_page(page);
>  			if (page_mapped(page))
> -				try_to_unmap(page, TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS);
> +				try_to_unmap(page, TTU_IGNORE_MLOCK);
>  			continue;
>  		}
>  
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c1585ec29827..e434d49fd428 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1122,8 +1122,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  		/* Establish migration ptes */
>  		VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
>  				page);
> -		try_to_unmap(page,
> -			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> +		try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK);
>  		page_was_mapped = 1;
>  	}
>  
> @@ -1339,8 +1338,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  			goto unlock_put_anon;
>  
>  		try_to_unmap(hpage,
> -			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
> -			TTU_RMAP_LOCKED);
> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_RMAP_LOCKED);
>  		page_was_mapped = 1;
>  		/*
>  		 * Leave mapping locked until after subsequent call to
> @@ -2684,7 +2682,7 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
>   */
>  static void migrate_vma_unmap(struct migrate_vma *migrate)
>  {
> -	int flags = TTU_MIGRATION | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
> +	int flags = TTU_MIGRATION | TTU_IGNORE_MLOCK;
>  	const unsigned long npages = migrate->npages;
>  	const unsigned long start = migrate->start;
>  	unsigned long addr, i, restore = 0;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 1b84945d655c..6cd9d4512117 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1536,15 +1536,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  			goto discard;
>  		}
>  
> -		if (!(flags & TTU_IGNORE_ACCESS)) {
> -			if (ptep_clear_flush_young_notify(vma, address,
> -						pvmw.pte)) {
> -				ret = false;
> -				page_vma_mapped_walk_done(&pvmw);
> -				break;
> -			}
> -		}
> -
>  		/* Nuke the page table entry. */
>  		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>  		if (should_defer_flush(mm, flags)) {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a8611dce7a95..e71b563cda7b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1072,7 +1072,6 @@ static void page_check_dirty_writeback(struct page *page,
>  static unsigned int shrink_page_list(struct list_head *page_list,
>  				     struct pglist_data *pgdat,
>  				     struct scan_control *sc,
> -				     enum ttu_flags ttu_flags,
>  				     struct reclaim_stat *stat,
>  				     bool ignore_references)
>  {
> @@ -1297,7 +1296,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>  		 * processes. Try to unmap it here.
>  		 */
>  		if (page_mapped(page)) {
> -			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
> +			enum ttu_flags flags = TTU_BATCH_FLUSH;
>  			bool was_swapbacked = PageSwapBacked(page);
>  
>  			if (unlikely(PageTransHuge(page)))
> @@ -1514,7 +1513,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  	}
>  
>  	nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> -			TTU_IGNORE_ACCESS, &stat, true);
> +					&stat, true);
>  	list_splice(&clean_pages, page_list);
>  	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
>  			    -(long)nr_reclaimed);
> @@ -1958,8 +1957,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  	if (nr_taken == 0)
>  		return 0;
>  
> -	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0,
> -				&stat, false);
> +	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, &stat, false);
>  
>  	spin_lock_irq(&pgdat->lru_lock);
>  
> @@ -2131,8 +2129,7 @@ unsigned long reclaim_pages(struct list_head *page_list)
>  
>  		nr_reclaimed += shrink_page_list(&node_page_list,
>  						NODE_DATA(nid),
> -						&sc, 0,
> -						&dummy_stat, false);
> +						&sc, &dummy_stat, false);
>  		while (!list_empty(&node_page_list)) {
>  			page = lru_to_page(&node_page_list);
>  			list_del(&page->lru);
> @@ -2145,8 +2142,7 @@ unsigned long reclaim_pages(struct list_head *page_list)
>  	if (!list_empty(&node_page_list)) {
>  		nr_reclaimed += shrink_page_list(&node_page_list,
>  						NODE_DATA(nid),
> -						&sc, 0,
> -						&dummy_stat, false);
> +						&sc, &dummy_stat, false);
>  		while (!list_empty(&node_page_list)) {
>  			page = lru_to_page(&node_page_list);
>  			list_del(&page->lru);
> -- 
> 2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS
  2020-11-06  3:00 ` Hugh Dickins
@ 2020-11-06 15:09   ` Shakeel Butt
  2020-11-11  7:50     ` Hugh Dickins
  0 siblings, 1 reply; 4+ messages in thread
From: Shakeel Butt @ 2020-11-06 15:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jerome Glisse, Johannes Weiner, Andrew Morton, Vlastimil Babka,
	Michal Hocko, Linux MM, LKML, Balbir Singh

On Thu, Nov 5, 2020 at 7:00 PM Hugh Dickins <hughd@google.com> wrote:
>
> I don't know why this was addressed to me in particular (easy to imagine
> I've made a mod at some time that bears on this, but I haven't found it);
> but have spent longer considering the patch than I should have done -
> apologies to everyone else I should be replying to.
>

I really appreciate your insights and historical anecdotes. I always
learn something new.

> On Wed, 4 Nov 2020, Shakeel Butt wrote:
>
> > Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier
> > semantic v2"), the code to check the secondary MMU's page table access
> > bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from
> > the secondary MMU's page table before the check. More specifically for
> > those secondary MMUs which unmap the memory in
> > mmu_notifier_invalidate_range_start() like kvm.
>
> Well, "broken" seems a bit unfair to 369ea8242c0f. It put a warning
> mmu_notifier_invalidate_range_start() at the beginning, and matching
> mmu_notifier_invalidate_range_end() at the end of try_to_unmap_one();
> with its mmu_notifier_invalidate_range() exactly where the
> mmu_notifier_invalidate_page() was before (I think the story gets
> more complicated later).  Yes, if notifiee takes invalidate_range_start()
> as signal to invalidate all their own range, then that will sometimes
> cause them unnecessary invalidations.
>
> Not just for !TTU_IGNORE_ACCESS: there's also the !TTU_IGNORE_MLOCK
> case meeting a VM_LOCKED vma and setting PageMlocked where that had
> been missed earlier (and page_check_references() has intentionally but
> confusingly marked this case as PAGEREF_RECLAIM, not to reclaim the page,
> but to reach the try_to_unmap_one() which will recognize and fix it up -
> historically easier to do there than in page_referenced_one()).
>
> But I think mmu_notifier is a diversion from what needs thinking about.
>
> >
> > However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
> > absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
> > access check before trying to unmap the page. So, at worst the reclaim
> > will miss accesses in a very short window if we remove page table access
> > check in unmapping code.
>
> I agree with you and Johannes that the short race window when the page
> might be re-referenced is no issue at all: the functional issue is the
> one in your next paragraph.  If that's agreed by memcg guys, great,
> then this patch is a nice observation and a welcome cleanup.
>
> >
> > There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
> > reclaim. From memcg reclaim the page_referenced() only account the
> > accesses from the processes which are in the same memcg of the target
> > page but the unmapping code is considering accesses from all the
> > processes, so, decreasing the effectiveness of memcg reclaim.
>
> Are you sure it was unintended?
>
> Since the dawn of memcg reclaim, it has been the case that a recent
> reference in a "foreign" vma has rescued that page from being reclaimed:
> now you propose to change that.  I expect some workflows will benefit
> and others be disadvantaged.  I have no objection myself to the change,
> but I do think it needs to be better highlighted here, and explicitly
> agreed by those more familiar with memcg reclaim.

The reason I said unintended was due to bed7161a519a2 ("Memory
controller: make page_referenced() cgroup aware"). From the commit
message it seems like the intention was to not be influenced by
foreign accesses during memcg reclaim but it missed to make
try_to_unmap_one() memcg aware.

I agree with you that this is a behavior change and we have explicitly
agree to not let memcg reclaim be influenced by foreign accesses.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS
  2020-11-06 15:09   ` Shakeel Butt
@ 2020-11-11  7:50     ` Hugh Dickins
  0 siblings, 0 replies; 4+ messages in thread
From: Hugh Dickins @ 2020-11-11  7:50 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Hugh Dickins, Jerome Glisse, Johannes Weiner, Andrew Morton,
	Vlastimil Babka, Michal Hocko, Linux MM, LKML, Balbir Singh

On Fri, 6 Nov 2020, Shakeel Butt wrote:
> On Thu, Nov 5, 2020 at 7:00 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > I don't know why this was addressed to me in particular (easy to imagine
> > I've made a mod at some time that bears on this, but I haven't found it);
> > but have spent longer considering the patch than I should have done -
> > apologies to everyone else I should be replying to.
> >
> 
> I really appreciate your insights and historical anecdotes. I always
> learn something new.

:)

> 
> > On Wed, 4 Nov 2020, Shakeel Butt wrote:
> >
> > > Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier
> > > semantic v2"), the code to check the secondary MMU's page table access
> > > bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from
> > > the secondary MMU's page table before the check. More specifically for
> > > those secondary MMUs which unmap the memory in
> > > mmu_notifier_invalidate_range_start() like kvm.
> >
> > Well, "broken" seems a bit unfair to 369ea8242c0f. It put a warning
> > mmu_notifier_invalidate_range_start() at the beginning, and matching
> > mmu_notifier_invalidate_range_end() at the end of try_to_unmap_one();
> > with its mmu_notifier_invalidate_range() exactly where the
> > mmu_notifier_invalidate_page() was before (I think the story gets
> > more complicated later).  Yes, if notifiee takes invalidate_range_start()
> > as signal to invalidate all their own range, then that will sometimes
> > cause them unnecessary invalidations.
> >
> > Not just for !TTU_IGNORE_ACCESS: there's also the !TTU_IGNORE_MLOCK
> > case meeting a VM_LOCKED vma and setting PageMlocked where that had
> > been missed earlier (and page_check_references() has intentionally but
> > confusingly marked this case as PAGEREF_RECLAIM, not to reclaim the page,
> > but to reach the try_to_unmap_one() which will recognize and fix it up -
> > historically easier to do there than in page_referenced_one()).
> >
> > But I think mmu_notifier is a diversion from what needs thinking about.
> >
> > >
> > > However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
> > > absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
> > > access check before trying to unmap the page. So, at worst the reclaim
> > > will miss accesses in a very short window if we remove page table access
> > > check in unmapping code.
> >
> > I agree with you and Johannes that the short race window when the page
> > might be re-referenced is no issue at all: the functional issue is the
> > one in your next paragraph.  If that's agreed by memcg guys, great,
> > then this patch is a nice observation and a welcome cleanup.
> >
> > >
> > > There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
> > > reclaim. From memcg reclaim the page_referenced() only account the
> > > accesses from the processes which are in the same memcg of the target
> > > page but the unmapping code is considering accesses from all the
> > > processes, so, decreasing the effectiveness of memcg reclaim.
> >
> > Are you sure it was unintended?
> >
> > Since the dawn of memcg reclaim, it has been the case that a recent
> > reference in a "foreign" vma has rescued that page from being reclaimed:
> > now you propose to change that.  I expect some workflows will benefit
> > and others be disadvantaged.  I have no objection myself to the change,
> > but I do think it needs to be better highlighted here, and explicitly
> > agreed by those more familiar with memcg reclaim.
> 
> The reason I said unintended was due to bed7161a519a2 ("Memory
> controller: make page_referenced() cgroup aware"). From the commit
> message it seems like the intention was to not be influenced by
> foreign accesses during memcg reclaim but it missed to make
> try_to_unmap_one() memcg aware.

Oooh, that's a good reference (much better than the mmu_notifier one
you cited in the patch).  Yes, I agree Balbir was explicit about the
intention then, and you're simply fixing it up.

> 
> I agree with you that this is a behavior change and we have explicitly
> agree to not let memcg reclaim be influenced by foreign accesses.

I've not seen anyone else protesting, and Johannes and Andrew happy
with this: so no more protest from me, let's proceed with the nice
cleanup, and hope no regression surfaces.

Hugh


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-11  7:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20201104231928.1494083-1-shakeelb@google.com>
2020-11-05 16:01 ` [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS Johannes Weiner
2020-11-06  3:00 ` Hugh Dickins
2020-11-06 15:09   ` Shakeel Butt
2020-11-11  7:50     ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).