All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Peter Xu <peterx@redhat.com>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Yang Shi <shy828301@gmail.com>, Vlastimil Babka <vbabka@suse.cz>,
	Hugh Dickins <hughd@google.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>
Subject: Re: [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details specified
Date: Tue, 11 Jan 2022 18:40:43 +1100	[thread overview]
Message-ID: <4711362.BPgp0156Pq@nvdebian> (raw)
In-Reply-To: <849f1e44-d35e-b8c6-c7c3-a73941028ba7@redhat.com>

On Monday, 10 January 2022 7:37:15 PM AEDT David Hildenbrand wrote:
> On 15.11.21 14:49, Peter Xu wrote:
> > This check existed since the 1st git commit of Linux repository, but at that
> > time there's no page migration yet so I think it's okay.
> > 
> > With page migration enabled, it should logically be possible that we zap some
> > shmem pages during migration.  When that happens, IIUC the old code could have
> > the RSS counter accounted wrong on MM_SHMEMPAGES because we will zap the ptes
> > without decreasing the counters for the migrating entries.  I have no unit test
> > to prove it as I don't know an easy way to trigger this condition, though.
> > 
> > Besides, the optimization itself is already confusing IMHO to me in a few points:
> > 
> >   - The wording "skip swap entries" is confusing, because we're not skipping all
> >     swap entries - we handle device private/exclusive pages before that.
> 
> I think one part of the confusion is "swap vs non-swap" entries.
> For !pte_none() && !pte_present() we can have
> 
> * swap entry
> * non-swap entry
> ** device exclusive entry
> ** device private entry
> ** HWpoison entry
> ** migration entry
> 
> So the comment claims to skip "swap entries" but also skips HWpoison and
> migration entries, and I think that's the confusing part.
> Both only apply to PageAnon().

I must be missing something but why do these only apply to PageAnon()?

> IIUC, the only way we could get details != NULL is via unmap_mapping_page()+unmap_mapping_pages().
> 
> I do wonder if any of the callers really cares about PageAnon() pages where this would be relevant.
> 
> Am I wrong or is unmap_mapping_pages() never called with "even_cows == true" and we can remove
> that paremeter:

Except that unmap_mapping_range() takes `even_cows` as a parameter and passes
that to unmap_mapping_pages(), and from what I can tell there are callers of
unmap_mapping_range() that set `even_cows = true`.

> git grep -C2 unmap_mapping_pages
> fs/afs/callback.c-      struct afs_vnode *vnode = container_of(work, struct afs_vnode, cb_work);
> fs/afs/callback.c-
> fs/afs/callback.c:      unmap_mapping_pages(vnode->vfs_inode.i_mapping, 0, 0, false);
> fs/afs/callback.c-}
> fs/afs/callback.c-
> --
> fs/dax.c-               if (dax_is_zero_entry(entry)) {
> fs/dax.c-                       xas_unlock_irq(xas);
> fs/dax.c:                       unmap_mapping_pages(mapping,
> fs/dax.c-                                       xas->xa_index & ~PG_PMD_COLOUR,
> fs/dax.c-                                       PG_PMD_NR, false);
> --
> fs/dax.c-        * get_user_pages() slow path.  The slow path is protected by
> fs/dax.c-        * pte_lock() and pmd_lock(). New references are not taken without
> fs/dax.c:        * holding those locks, and unmap_mapping_pages() will not zero the
> fs/dax.c-        * pte or pmd without holding the respective lock, so we are
> fs/dax.c-        * guaranteed to either see new references or prevent new
> fs/dax.c-        * references from being established.
> fs/dax.c-        */
> fs/dax.c:       unmap_mapping_pages(mapping, start_idx, end_idx - start_idx + 1, 0);
> fs/dax.c-
> fs/dax.c-       xas_lock_irq(&xas);
> --
> fs/dax.c-               /* we are replacing a zero page with block mapping */
> fs/dax.c-               if (dax_is_pmd_entry(entry))
> fs/dax.c:                       unmap_mapping_pages(mapping, index & ~PG_PMD_COLOUR,
> fs/dax.c-                                       PG_PMD_NR, false);
> fs/dax.c-               else /* pte entry */
> fs/dax.c:                       unmap_mapping_pages(mapping, index, 1, false);
> fs/dax.c-       }
> fs/dax.c-
> --
> include/linux/mm.h-                         bool *unlocked);
> include/linux/mm.h-void unmap_mapping_page(struct page *page);
> include/linux/mm.h:void unmap_mapping_pages(struct address_space *mapping,
> include/linux/mm.h-             pgoff_t start, pgoff_t nr, bool even_cows);
> include/linux/mm.h-void unmap_mapping_range(struct address_space *mapping,
> --
> include/linux/mm.h-}
> include/linux/mm.h-static inline void unmap_mapping_page(struct page *page) { }
> include/linux/mm.h:static inline void unmap_mapping_pages(struct address_space *mapping,
> include/linux/mm.h-             pgoff_t start, pgoff_t nr, bool even_cows) { }
> include/linux/mm.h-static inline void unmap_mapping_range(struct address_space *mapping,
> --
> mm/khugepaged.c-
> mm/khugepaged.c-                if (page_mapped(page))
> mm/khugepaged.c:                        unmap_mapping_pages(mapping, index, 1, false);
> mm/khugepaged.c-
> mm/khugepaged.c-                xas_lock_irq(&xas);
> --
> mm/memory.c- * Unmap this page from any userspace process which still has it mmaped.
> mm/memory.c- * Typically, for efficiency, the range of nearby pages has already been
> mm/memory.c: * unmapped by unmap_mapping_pages() or unmap_mapping_range().  But once
> mm/memory.c- * truncation or invalidation holds the lock on a page, it may find that
> mm/memory.c- * the page has been remapped again: and then uses unmap_mapping_page()
> --
> mm/memory.c-
> mm/memory.c-/**
> mm/memory.c: * unmap_mapping_pages() - Unmap pages from processes.
> mm/memory.c- * @mapping: The address space containing pages to be unmapped.
> mm/memory.c- * @start: Index of first page to be unmapped.
> --
> mm/memory.c- * cache.
> mm/memory.c- */
> mm/memory.c:void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
> mm/memory.c-            pgoff_t nr, bool even_cows)
> mm/memory.c-{
> --
> mm/memory.c-    i_mmap_unlock_write(mapping);
> mm/memory.c-}
> mm/memory.c:EXPORT_SYMBOL_GPL(unmap_mapping_pages);
> mm/memory.c-
> mm/memory.c-/**
> --
> mm/memory.c-    }
> mm/memory.c-
> mm/memory.c:    unmap_mapping_pages(mapping, hba, hlen, even_cows);
> mm/memory.c-}
> mm/memory.c-EXPORT_SYMBOL(unmap_mapping_range);
> --
> mm/truncate.c-                           * zap the rest of the file in one hit.
> mm/truncate.c-                           */
> mm/truncate.c:                          unmap_mapping_pages(mapping, index,
> mm/truncate.c-                                          (1 + end - index), false);
> mm/truncate.c-                          did_range_unmap = 1;
> --
> mm/truncate.c-   */
> mm/truncate.c-  if (dax_mapping(mapping)) {
> mm/truncate.c:          unmap_mapping_pages(mapping, start, end - start + 1, false);
> mm/truncate.c-  }
> mm/truncate.c-out:
> 
> 
> 





  reply	other threads:[~2022-01-11  7:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15 13:49 [PATCH RFC v2 0/2] mm: Rework zap ptes on swap entries Peter Xu
2021-11-15 13:49 ` [PATCH RFC v2 1/2] mm: Don't skip swap entry even if zap_details specified Peter Xu
2021-12-02 11:06   ` Alistair Popple
2021-12-03  3:21     ` Peter Xu
2021-12-03  5:33       ` Alistair Popple
2021-12-03  6:59         ` Peter Xu
2022-01-09  1:19   ` Hugh Dickins
2022-01-12 13:18     ` Peter Xu
2022-01-12 13:26       ` Peter Xu
2022-01-13  3:47         ` Hugh Dickins
2022-01-20 10:32           ` Peter Xu
2022-01-21  3:11             ` Peter Xu
2022-01-21  5:11               ` Peter Xu
2022-01-24  6:51                 ` Hugh Dickins
2022-01-24  9:13                   ` Peter Xu
2022-01-24  6:29             ` Hugh Dickins
2022-01-24  8:54               ` Peter Xu
2022-01-24 11:01                 ` Peter Xu
2022-01-10  8:37   ` David Hildenbrand
2022-01-11  7:40     ` Alistair Popple [this message]
2022-01-11  9:00       ` David Hildenbrand
2021-11-15 13:49 ` [PATCH RFC v2 2/2] mm: Rework swap handling of zap_pte_range Peter Xu
2021-11-15 13:57   ` Matthew Wilcox
2021-11-16  5:06     ` Peter Xu
2021-11-16  8:51     ` John Hubbard
2021-11-16 13:11       ` Matthew Wilcox
2021-11-16 19:06         ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4711362.BPgp0156Pq@nvdebian \
    --to=apopple@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.