* Re: [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS [not found] <20201104231928.1494083-1-shakeelb@google.com> @ 2020-11-05 16:01 ` Johannes Weiner 2020-11-06 3:00 ` Hugh Dickins 1 sibling, 0 replies; 4+ messages in thread From: Johannes Weiner @ 2020-11-05 16:01 UTC (permalink / raw) To: Shakeel Butt Cc: Hugh Dickins, Jerome Glisse, Andrew Morton, Vlastimil Babka, linux-mm, linux-kernel On Wed, Nov 04, 2020 at 03:19:28PM -0800, Shakeel Butt wrote: > Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier > semantic v2"), the code to check the secondary MMU's page table access > bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from > the secondary MMU's page table before the check. More specifically for > those secondary MMUs which unmap the memory in > mmu_notifier_invalidate_range_start() like kvm. > > However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the > absence of TTU_IGNORE_ACCESS and it explicitly performs the page table > access check before trying to unmap the page. So, at worst the reclaim > will miss accesses in a very short window if we remove page table access > check in unmapping code. We also miss accesses that happen right after the unmap :-) Seems completely fine to make page_referenced() the time of last call. > There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg > reclaim. From memcg reclaim the page_referenced() only account the > accesses from the processes which are in the same memcg of the target > page but the unmapping code is considering accesses from all the > processes, so, decreasing the effectiveness of memcg reclaim. > > The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping > code. > > Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS [not found] <20201104231928.1494083-1-shakeelb@google.com> 2020-11-05 16:01 ` [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS Johannes Weiner @ 2020-11-06 3:00 ` Hugh Dickins 2020-11-06 15:09 ` Shakeel Butt 1 sibling, 1 reply; 4+ messages in thread From: Hugh Dickins @ 2020-11-06 3:00 UTC (permalink / raw) To: Shakeel Butt Cc: Hugh Dickins, Jerome Glisse, Johannes Weiner, Andrew Morton, Vlastimil Babka, Michal Hocko, linux-mm, linux-kernel I don't know why this was addressed to me in particular (easy to imagine I've made a mod at some time that bears on this, but I haven't found it); but have spent longer considering the patch than I should have done - apologies to everyone else I should be replying to. On Wed, 4 Nov 2020, Shakeel Butt wrote: > Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier > semantic v2"), the code to check the secondary MMU's page table access > bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from > the secondary MMU's page table before the check. More specifically for > those secondary MMUs which unmap the memory in > mmu_notifier_invalidate_range_start() like kvm. Well, "broken" seems a bit unfair to 369ea8242c0f. It put a warning mmu_notifier_invalidate_range_start() at the beginning, and matching mmu_notifier_invalidate_range_end() at the end of try_to_unmap_one(); with its mmu_notifier_invalidate_range() exactly where the mmu_notifier_invalidate_page() was before (I think the story gets more complicated later). Yes, if notifiee takes invalidate_range_start() as signal to invalidate all their own range, then that will sometimes cause them unnecessary invalidations. Not just for !TTU_IGNORE_ACCESS: there's also the !TTU_IGNORE_MLOCK case meeting a VM_LOCKED vma and setting PageMlocked where that had been missed earlier (and page_check_references() has intentionally but confusingly marked this case as PAGEREF_RECLAIM, not to reclaim the page, but to reach the try_to_unmap_one() which will recognize and fix it up - historically easier to do there than in page_referenced_one()). But I think mmu_notifier is a diversion from what needs thinking about. > > However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the > absence of TTU_IGNORE_ACCESS and it explicitly performs the page table > access check before trying to unmap the page. So, at worst the reclaim > will miss accesses in a very short window if we remove page table access > check in unmapping code. I agree with you and Johannes that the short race window when the page might be re-referenced is no issue at all: the functional issue is the one in your next paragraph. If that's agreed by memcg guys, great, then this patch is a nice observation and a welcome cleanup. > > There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg > reclaim. From memcg reclaim the page_referenced() only account the > accesses from the processes which are in the same memcg of the target > page but the unmapping code is considering accesses from all the > processes, so, decreasing the effectiveness of memcg reclaim. Are you sure it was unintended? Since the dawn of memcg reclaim, it has been the case that a recent reference in a "foreign" vma has rescued that page from being reclaimed: now you propose to change that. I expect some workflows will benefit and others be disadvantaged. I have no objection myself to the change, but I do think it needs to be better highlighted here, and explicitly agreed by those more familiar with memcg reclaim. Hugh > > The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping > code. > > Signed-off-by: Shakeel Butt <shakeelb@google.com> > --- > include/linux/rmap.h | 1 - > mm/huge_memory.c | 2 +- > mm/memory-failure.c | 2 +- > mm/memory_hotplug.c | 2 +- > mm/migrate.c | 8 +++----- > mm/rmap.c | 9 --------- > mm/vmscan.c | 14 +++++--------- > 7 files changed, 11 insertions(+), 27 deletions(-) > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > index 3a6adfa70fb0..70085ca1a3fc 100644 > --- a/include/linux/rmap.h > +++ b/include/linux/rmap.h > @@ -91,7 +91,6 @@ enum ttu_flags { > > TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ > TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ > - TTU_IGNORE_ACCESS = 0x10, /* don't age */ > TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */ > TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible > * and caller guarantees they will > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 08a183f6c3ab..8b235b4abf73 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2324,7 +2324,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, > > static void unmap_page(struct page *page) > { > - enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | > + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | > TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; > bool unmap_success; > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index c0bb186bba62..b6d6d5cdb435 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -989,7 +989,7 @@ static int get_hwpoison_page(struct page *page) > static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, > int flags, struct page **hpagep) > { > - enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS; > + enum ttu_flags ttu = TTU_IGNORE_MLOCK; > struct address_space *mapping; > LIST_HEAD(tokill); > bool unmap_success = true; > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 8c1b7182bb08..968e6916d297 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1303,7 +1303,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > if (WARN_ON(PageLRU(page))) > isolate_lru_page(page); > if (page_mapped(page)) > - try_to_unmap(page, TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS); > + try_to_unmap(page, TTU_IGNORE_MLOCK); > continue; > } > > diff --git a/mm/migrate.c b/mm/migrate.c > index c1585ec29827..e434d49fd428 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1122,8 +1122,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, > /* Establish migration ptes */ > VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma, > page); > - try_to_unmap(page, > - TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); > + try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK); > page_was_mapped = 1; > } > > @@ -1339,8 +1338,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, > goto unlock_put_anon; > > try_to_unmap(hpage, > - TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS| > - TTU_RMAP_LOCKED); > + TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_RMAP_LOCKED); > page_was_mapped = 1; > /* > * Leave mapping locked until after subsequent call to > @@ -2684,7 +2682,7 @@ static void migrate_vma_prepare(struct migrate_vma *migrate) > */ > static void migrate_vma_unmap(struct migrate_vma *migrate) > { > - int flags = TTU_MIGRATION | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS; > + int flags = TTU_MIGRATION | TTU_IGNORE_MLOCK; > const unsigned long npages = migrate->npages; > const unsigned long start = migrate->start; > unsigned long addr, i, restore = 0; > diff --git a/mm/rmap.c b/mm/rmap.c > index 1b84945d655c..6cd9d4512117 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1536,15 +1536,6 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > goto discard; > } > > - if (!(flags & TTU_IGNORE_ACCESS)) { > - if (ptep_clear_flush_young_notify(vma, address, > - pvmw.pte)) { > - ret = false; > - page_vma_mapped_walk_done(&pvmw); > - break; > - } > - } > - > /* Nuke the page table entry. */ > flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); > if (should_defer_flush(mm, flags)) { > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a8611dce7a95..e71b563cda7b 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1072,7 +1072,6 @@ static void page_check_dirty_writeback(struct page *page, > static unsigned int shrink_page_list(struct list_head *page_list, > struct pglist_data *pgdat, > struct scan_control *sc, > - enum ttu_flags ttu_flags, > struct reclaim_stat *stat, > bool ignore_references) > { > @@ -1297,7 +1296,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, > * processes. Try to unmap it here. > */ > if (page_mapped(page)) { > - enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; > + enum ttu_flags flags = TTU_BATCH_FLUSH; > bool was_swapbacked = PageSwapBacked(page); > > if (unlikely(PageTransHuge(page))) > @@ -1514,7 +1513,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > } > > nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, > - TTU_IGNORE_ACCESS, &stat, true); > + &stat, true); > list_splice(&clean_pages, page_list); > mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, > -(long)nr_reclaimed); > @@ -1958,8 +1957,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, > if (nr_taken == 0) > return 0; > > - nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0, > - &stat, false); > + nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, &stat, false); > > spin_lock_irq(&pgdat->lru_lock); > > @@ -2131,8 +2129,7 @@ unsigned long reclaim_pages(struct list_head *page_list) > > nr_reclaimed += shrink_page_list(&node_page_list, > NODE_DATA(nid), > - &sc, 0, > - &dummy_stat, false); > + &sc, &dummy_stat, false); > while (!list_empty(&node_page_list)) { > page = lru_to_page(&node_page_list); > list_del(&page->lru); > @@ -2145,8 +2142,7 @@ unsigned long reclaim_pages(struct list_head *page_list) > if (!list_empty(&node_page_list)) { > nr_reclaimed += shrink_page_list(&node_page_list, > NODE_DATA(nid), > - &sc, 0, > - &dummy_stat, false); > + &sc, &dummy_stat, false); > while (!list_empty(&node_page_list)) { > page = lru_to_page(&node_page_list); > list_del(&page->lru); > -- > 2.29.1.341.ge80a0c044ae-goog ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS 2020-11-06 3:00 ` Hugh Dickins @ 2020-11-06 15:09 ` Shakeel Butt 2020-11-11 7:50 ` Hugh Dickins 0 siblings, 1 reply; 4+ messages in thread From: Shakeel Butt @ 2020-11-06 15:09 UTC (permalink / raw) To: Hugh Dickins Cc: Jerome Glisse, Johannes Weiner, Andrew Morton, Vlastimil Babka, Michal Hocko, Linux MM, LKML, Balbir Singh On Thu, Nov 5, 2020 at 7:00 PM Hugh Dickins <hughd@google.com> wrote: > > I don't know why this was addressed to me in particular (easy to imagine > I've made a mod at some time that bears on this, but I haven't found it); > but have spent longer considering the patch than I should have done - > apologies to everyone else I should be replying to. > I really appreciate your insights and historical anecdotes. I always learn something new. > On Wed, 4 Nov 2020, Shakeel Butt wrote: > > > Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier > > semantic v2"), the code to check the secondary MMU's page table access > > bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from > > the secondary MMU's page table before the check. More specifically for > > those secondary MMUs which unmap the memory in > > mmu_notifier_invalidate_range_start() like kvm. > > Well, "broken" seems a bit unfair to 369ea8242c0f. It put a warning > mmu_notifier_invalidate_range_start() at the beginning, and matching > mmu_notifier_invalidate_range_end() at the end of try_to_unmap_one(); > with its mmu_notifier_invalidate_range() exactly where the > mmu_notifier_invalidate_page() was before (I think the story gets > more complicated later). Yes, if notifiee takes invalidate_range_start() > as signal to invalidate all their own range, then that will sometimes > cause them unnecessary invalidations. > > Not just for !TTU_IGNORE_ACCESS: there's also the !TTU_IGNORE_MLOCK > case meeting a VM_LOCKED vma and setting PageMlocked where that had > been missed earlier (and page_check_references() has intentionally but > confusingly marked this case as PAGEREF_RECLAIM, not to reclaim the page, > but to reach the try_to_unmap_one() which will recognize and fix it up - > historically easier to do there than in page_referenced_one()). > > But I think mmu_notifier is a diversion from what needs thinking about. > > > > > However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the > > absence of TTU_IGNORE_ACCESS and it explicitly performs the page table > > access check before trying to unmap the page. So, at worst the reclaim > > will miss accesses in a very short window if we remove page table access > > check in unmapping code. > > I agree with you and Johannes that the short race window when the page > might be re-referenced is no issue at all: the functional issue is the > one in your next paragraph. If that's agreed by memcg guys, great, > then this patch is a nice observation and a welcome cleanup. > > > > > There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg > > reclaim. From memcg reclaim the page_referenced() only account the > > accesses from the processes which are in the same memcg of the target > > page but the unmapping code is considering accesses from all the > > processes, so, decreasing the effectiveness of memcg reclaim. > > Are you sure it was unintended? > > Since the dawn of memcg reclaim, it has been the case that a recent > reference in a "foreign" vma has rescued that page from being reclaimed: > now you propose to change that. I expect some workflows will benefit > and others be disadvantaged. I have no objection myself to the change, > but I do think it needs to be better highlighted here, and explicitly > agreed by those more familiar with memcg reclaim. The reason I said unintended was due to bed7161a519a2 ("Memory controller: make page_referenced() cgroup aware"). From the commit message it seems like the intention was to not be influenced by foreign accesses during memcg reclaim but it missed to make try_to_unmap_one() memcg aware. I agree with you that this is a behavior change and we have explicitly agree to not let memcg reclaim be influenced by foreign accesses. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS 2020-11-06 15:09 ` Shakeel Butt @ 2020-11-11 7:50 ` Hugh Dickins 0 siblings, 0 replies; 4+ messages in thread From: Hugh Dickins @ 2020-11-11 7:50 UTC (permalink / raw) To: Shakeel Butt Cc: Hugh Dickins, Jerome Glisse, Johannes Weiner, Andrew Morton, Vlastimil Babka, Michal Hocko, Linux MM, LKML, Balbir Singh On Fri, 6 Nov 2020, Shakeel Butt wrote: > On Thu, Nov 5, 2020 at 7:00 PM Hugh Dickins <hughd@google.com> wrote: > > > > I don't know why this was addressed to me in particular (easy to imagine > > I've made a mod at some time that bears on this, but I haven't found it); > > but have spent longer considering the patch than I should have done - > > apologies to everyone else I should be replying to. > > > > I really appreciate your insights and historical anecdotes. I always > learn something new. :) > > > On Wed, 4 Nov 2020, Shakeel Butt wrote: > > > > > Since the commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier > > > semantic v2"), the code to check the secondary MMU's page table access > > > bit is broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from > > > the secondary MMU's page table before the check. More specifically for > > > those secondary MMUs which unmap the memory in > > > mmu_notifier_invalidate_range_start() like kvm. > > > > Well, "broken" seems a bit unfair to 369ea8242c0f. It put a warning > > mmu_notifier_invalidate_range_start() at the beginning, and matching > > mmu_notifier_invalidate_range_end() at the end of try_to_unmap_one(); > > with its mmu_notifier_invalidate_range() exactly where the > > mmu_notifier_invalidate_page() was before (I think the story gets > > more complicated later). Yes, if notifiee takes invalidate_range_start() > > as signal to invalidate all their own range, then that will sometimes > > cause them unnecessary invalidations. > > > > Not just for !TTU_IGNORE_ACCESS: there's also the !TTU_IGNORE_MLOCK > > case meeting a VM_LOCKED vma and setting PageMlocked where that had > > been missed earlier (and page_check_references() has intentionally but > > confusingly marked this case as PAGEREF_RECLAIM, not to reclaim the page, > > but to reach the try_to_unmap_one() which will recognize and fix it up - > > historically easier to do there than in page_referenced_one()). > > > > But I think mmu_notifier is a diversion from what needs thinking about. > > > > > > > > However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the > > > absence of TTU_IGNORE_ACCESS and it explicitly performs the page table > > > access check before trying to unmap the page. So, at worst the reclaim > > > will miss accesses in a very short window if we remove page table access > > > check in unmapping code. > > > > I agree with you and Johannes that the short race window when the page > > might be re-referenced is no issue at all: the functional issue is the > > one in your next paragraph. If that's agreed by memcg guys, great, > > then this patch is a nice observation and a welcome cleanup. > > > > > > > > There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg > > > reclaim. From memcg reclaim the page_referenced() only account the > > > accesses from the processes which are in the same memcg of the target > > > page but the unmapping code is considering accesses from all the > > > processes, so, decreasing the effectiveness of memcg reclaim. > > > > Are you sure it was unintended? > > > > Since the dawn of memcg reclaim, it has been the case that a recent > > reference in a "foreign" vma has rescued that page from being reclaimed: > > now you propose to change that. I expect some workflows will benefit > > and others be disadvantaged. I have no objection myself to the change, > > but I do think it needs to be better highlighted here, and explicitly > > agreed by those more familiar with memcg reclaim. > > The reason I said unintended was due to bed7161a519a2 ("Memory > controller: make page_referenced() cgroup aware"). From the commit > message it seems like the intention was to not be influenced by > foreign accesses during memcg reclaim but it missed to make > try_to_unmap_one() memcg aware. Oooh, that's a good reference (much better than the mmu_notifier one you cited in the patch). Yes, I agree Balbir was explicit about the intention then, and you're simply fixing it up. > > I agree with you that this is a behavior change and we have explicitly > agree to not let memcg reclaim be influenced by foreign accesses. I've not seen anyone else protesting, and Johannes and Andrew happy with this: so no more protest from me, let's proceed with the nice cleanup, and hope no regression surfaces. Hugh ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-11-11 7:50 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20201104231928.1494083-1-shakeelb@google.com> 2020-11-05 16:01 ` [PATCH] mm/rmap: always do TTU_IGNORE_ACCESS Johannes Weiner 2020-11-06 3:00 ` Hugh Dickins 2020-11-06 15:09 ` Shakeel Butt 2020-11-11 7:50 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).