All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Yang Shi <shy828301@gmail.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
	Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
	Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
	Jan Kara <jack@suse.cz>, Liang Zhang <zhangliang5@huawei.com>,
	Pedro Gomes <pedrodemargomes@gmail.com>,
	Oded Gabbay <oded.gabbay@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v1 10/15] mm/page-flags: reuse PG_slab as PG_anon_exclusive for PageAnon() pages
Date: Sat, 12 Mar 2022 09:26:45 +0100	[thread overview]
Message-ID: <2bbe2f37-37af-664d-181f-034917b6bb93@redhat.com> (raw)
In-Reply-To: <Yiu400H9JNtC03Co@casper.infradead.org>

On 11.03.22 22:02, Matthew Wilcox wrote:
> On Fri, Mar 11, 2022 at 07:46:39PM +0100, David Hildenbrand wrote:
>> I'm currently testing with the following. My tests so far with a swapfile on
>> all different kinds of weird filesystems (excluding networking fs, though)
>> revealed no surprises so far:
> 
> I like this a lot better than reusing PG_swap.  Thanks!
> 
> I'm somewhat reluctant to introduce a new flag that can be set on tail
> pages.  Do we lose much if it's always set only on the head page?
After spending one month on getting THP to work without PF_ANY, I can
say with confidence that the whole thing won't fly with THP when not
tracking it on the minimum-mapping granularity. For a PTE-mapped THP,
that's on the subpage level.

The next patch in the series documents some details. Intuitively, if we
could replace the pageflag by a PTE/PMD bit, we'd get roughly the same
result.

> 
>> +++ b/include/linux/page-flags.h
>> @@ -142,6 +142,60 @@ enum pageflags {
>>  
>>  	PG_readahead = PG_reclaim,
>>  
>> +	/*
>> +	 * Depending on the way an anonymous folio can be mapped into a page
>> +	 * table (e.g., single PMD/PUD/CONT of the head page vs. PTE-mapped
>> +	 * THP), PG_anon_exclusive may be set only for the head page or for
>> +	 * subpages of an anonymous folio.
>> +	 *
>> +	 * PG_anon_exclusive is *usually* only expressive in combination with a
>> +	 * page table entry. Depending on the page table entry type it might
>> +	 * store the following information:
>> +	 *
>> +	 *	Is what's mapped via this page table entry exclusive to the
>> +	 *	single process and can be mapped writable without further
>> +	 *	checks? If not, it might be shared and we might have to COW.
>> +	 *
>> +	 * For now, we only expect PTE-mapped THPs to make use of
>> +	 * PG_anon_exclusive in subpages. For other anonymous compound
>> +	 * folios (i.e., hugetlb), only the head page is logically mapped and
>> +	 * holds this information.
>> +	 *
>> +	 * For example, an exclusive, PMD-mapped THP only has PG_anon_exclusive
>> +	 * set on the head page. When replacing the PMD by a page table full
>> +	 * of PTEs, PG_anon_exclusive, if set on the head page, will be set on
>> +	 * all tail pages accordingly. Note that converting from a PTE-mapping
>> +	 * to a PMD mapping using the same compound page is currently not
>> +	 * possible and consequently doesn't require care.
>> +	 *
>> +	 * If GUP wants to take a reliable pin (FOLL_PIN) on an anonymous page,
>> +	 * it should only pin if the relevant PG_anon_bit is set. In that case,
>> +	 * the pin will be fully reliable and stay consistent with the pages
>> +	 * mapped into the page table, as the bit cannot get cleared (e.g., by
>> +	 * fork(), KSM) while the page is pinned. For anonymous pages that
>> +	 * are mapped R/W, PG_anon_exclusive can be assumed to always be set
>> +	 * because such pages cannot possibly be shared.
>> +	 *
>> +	 * The page table lock protecting the page table entry is the primary
>> +	 * synchronization mechanism for PG_anon_exclusive; GUP-fast that does
>> +	 * not take the PT lock needs special care when trying to clear the
>> +	 * flag.
>> +	 *
>> +	 * Page table entry types and PG_anon_exclusive:
>> +	 * * Present: PG_anon_exclusive applies.
>> +	 * * Swap: the information is lost. PG_anon_exclusive was cleared.
>> +	 * * Migration: the entry holds this information instead.
>> +	 *		PG_anon_exclusive was cleared.
>> +	 * * Device private: PG_anon_exclusive applies.
>> +	 * * Device exclusive: PG_anon_exclusive applies.
>> +	 * * HW Poison: PG_anon_exclusive is stale and not changed.
>> +	 *
>> +	 * If the page may be pinned (FOLL_PIN), clearing PG_anon_exclusive is
>> +	 * not allowed and the flag will stick around until the page is freed
>> +	 * and folio->mapping is cleared.
>> +	 */
> 
> ... I also don't think this is the right place for this comment.  Not
> sure where it should go.

I went for "rather have some documentation at a sub-optimal place then
no documentation at all". I'm thinking about writing a proper
documentation once everything is in place, and moving some details from
there into that document then.

> 
>> +static __always_inline void SetPageAnonExclusive(struct page *page)
>> +{
>> +	VM_BUG_ON_PGFLAGS(!PageAnon(page) || PageKsm(page), page);
> 
> hm.  seems to me like we should have a PageAnonNotKsm which just
> does
> 	return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
> 			PAGE_MAPPING_ANON;
> because that's "a bit" inefficient.  OK, that's just a VM_BUG_ON,
> but we have other users in real code:
> 
> mm/migrate.c:   if (PageAnon(page) && !PageKsm(page))
> mm/page_idle.c: need_lock = !PageAnon(page) || PageKsm(page);
> mm/rmap.c:      if (!is_locked && (!PageAnon(page) || PageKsm(page))) {
> 

I'm wondering if the compiler won't be able to optimize that. Having
that said, I can look into adding that outside of this series.

Thanks!

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2022-03-12  8:27 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-08 14:14 [PATCH v1 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 01/15] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 02/15] mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 03/15] mm/memory: slightly simplify copy_present_pte() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 04/15] mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 05/15] mm/rmap: convert RMAP flags to a proper distinct rmap_t type David Hildenbrand
2022-03-08 17:15   ` Nadav Amit
2022-03-08 17:30     ` David Hildenbrand
2022-03-08 18:09     ` Linus Torvalds
2022-03-08 18:24       ` Nadav Amit
2022-03-08 18:42         ` Linus Torvalds
2022-03-08 14:14 ` [PATCH v1 06/15] mm/rmap: remove do_page_add_anon_rmap() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 07/15] mm/rmap: pass rmap flags to hugepage_add_anon_rmap() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 08/15] mm/rmap: drop "compound" parameter from page_add_new_anon_rmap() David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 09/15] mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 10/15] mm/page-flags: reuse PG_slab as PG_anon_exclusive for PageAnon() pages David Hildenbrand
2022-03-09 15:47   ` Matthew Wilcox
2022-03-09 16:57     ` David Hildenbrand
2022-03-09 17:40       ` Matthew Wilcox
2022-03-09 18:00         ` David Hildenbrand
2022-03-11 18:46   ` David Hildenbrand
2022-03-11 19:22     ` Linus Torvalds
2022-03-11 19:36       ` David Hildenbrand
2022-03-11 19:49         ` Linus Torvalds
2022-03-11 21:11         ` Matthew Wilcox
2022-03-12  8:11           ` David Hildenbrand
2022-03-11 21:02     ` Matthew Wilcox
2022-03-12  8:26       ` David Hildenbrand [this message]
2022-03-08 14:14 ` [PATCH v1 11/15] mm: remember exclusively mapped anonymous pages with PG_anon_exclusive David Hildenbrand
2022-03-11 18:52   ` David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 12/15] mm/gup: disallow follow_page(FOLL_PIN) David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 13/15] mm: support GUP-triggered unsharing of anonymous pages David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 14/15] mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page David Hildenbrand
2022-03-08 14:14 ` [PATCH v1 15/15] mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are exclusive when (un)pinning David Hildenbrand
2022-03-08 14:19 ` [PATCH v1 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages David Hildenbrand
2022-03-08 21:22 ` Linus Torvalds
2022-03-09  8:00   ` David Hildenbrand
2022-03-10 11:13     ` Oded Gabbay
2022-03-10 11:57       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2bbe2f37-37af-664d-181f-034917b6bb93@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ddutile@redhat.com \
    --cc=guro@fb.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=namit@vmware.com \
    --cc=oded.gabbay@gmail.com \
    --cc=oleg@redhat.com \
    --cc=pedrodemargomes@gmail.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zhangliang5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.