linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Khalid Aziz <khalid.aziz@oracle.com>, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Yang Shi <shy828301@gmail.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
	Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
	Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
	Jan Kara <jack@suse.cz>, Liang Zhang <zhangliang5@huawei.com>,
	Pedro Gomes <pedrodemargomes@gmail.com>,
	Oded Gabbay <oded.gabbay@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v2 01/15] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed
Date: Thu, 31 Mar 2022 15:46:19 +0200	[thread overview]
Message-ID: <2eaa2667-219c-1bac-8f50-0020fa42e54e@redhat.com> (raw)
In-Reply-To: <689c199f-9954-524b-7a2d-40f186e87b34@oracle.com>

> Hi David,

Hi,

> 
> arch_remap_one() sounds reasonable. Would you like to add that in your patch series, or would you like me to create a 
> separate patch for adding this on top of your patch series?

Let's handle it separately, because I think there is still a lot to be
clarified. I also feel like the terminology ("arch_unmap_one()") is a
little too generic, if we really only care about anonymous pages (do we?).

> 
>>
>>>
>>>>
>>>> arch_unmap_one() calls adi_save_tags(), which allocates memory.
>>>> adi_restore_tags()->del_tag_store() reverts that operation and ends up
>>>> freeing memory conditionally; However, it's only
>>>> called from arch_do_swap_page().
>>>>
>>>>
>>>> Here is why I have to scratch my head:
>>>>
>>>> a) arch_do_swap_page() is only called from do_swap_page(). We don't do anything similar
>>>> for mm/swapfile.c:unuse_pte(), aren't we missing something?
>>>
>>> My understanding of this code path maybe flawed, so do correct me if this does not sound right. unused_pte() is called
>>> upon user turning off swap on a device. unused_pte() is called by unused_pte_range() which swaps the page back in from
>>> swap device before calling unuse_pte(). Once the page is read back in from swap, ultimately access to the va for the
>>> page will result in call to __handle_mm_fault() which in turn will call handle_pte_fault() to insert a new pte for this
>>> mapping and handle_pte_fault() will call arch_do_swap_page() which will restore the tags.
>>
>> unuse_pte() will replace a swap pte directly by a proper, present pte,
>> just like do_swap_page() would. You won't end up in do_swap_page()
>> anymore and arch_do_swap_page() won't be called, because there is no
>> swap PTE anymore.
>>
>>>
>>>>
>>>> b) try_to_migrate_one() does the arch_unmap_one(), but who will do the
>>>> restore+free after migration succeeded or failed, aren't we missing something?
>>>
>>> try_to_migrate_one() replaces the current pte with a migration pte after calling arch_unmap_one(). This causes
>>> __handle_mm_fault() to be called when a reference to the va covered by migration pte is made. This will in turn finally
>>> result in a call to arch_do_swap_page() which restores the tags.
>>
>> Migration PTEs are restore via mm/migrate.c:remove_migration_ptes().
>> arch_do_swap_page() won't be called.
>>
>> What you mention is if someone accesses the migration PTE while
>> migration is active and the migration PTEs have not been removed yet.
>> While we'll end up in do_swap_page(), we'll do a migration_entry_wait(),
>> followed by an effective immediate "return 0;". arch_do_swap_page()
>> won't get called.
>>
>>
>> So in essence, I think this doesn't work as expected yet. In the best
>> case we don't immediately free memory. In the worst case we lose the tags.
>>
> 
> I see what you mean. I can work on fixing these issues up.

Ideally, I think we'd handle migration differently: migrate the tags
directly instead of temporarily storing them

I also wonder, how to handle migration of THP (which uses migration
PMDs) and THP splitting (which also uses migration PTEs); maybe they
don't apply on sparc?

Further, I wonder if you have to handle zapping of swap/migration PTEs,
and if you'd also have to cleanup+freeup any allcoated tag memory?
E.g., zap_pte_range() can simply zap swap and migration entries, wouldn't

Last but not least, I do wonder if mremap() -- e.g., move_pte() -- and
fork() -- e.g., copy_pte_range() -- are handled properly, where we can
end up moving/copying swap+migration PTEs around.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-03-31 13:46 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-15 10:47 [PATCH v2 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 01/15] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed David Hildenbrand
2022-03-29 13:59   ` David Hildenbrand
2022-03-29 20:42     ` Khalid Aziz
2022-03-29 20:55       ` David Hildenbrand
2022-03-30 17:04         ` Khalid Aziz
2022-03-31 13:46           ` David Hildenbrand [this message]
2022-03-15 10:47 ` [PATCH v2 02/15] mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 03/15] mm/memory: slightly simplify copy_present_pte() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 04/15] mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap() David Hildenbrand
2022-03-16 20:02   ` Yang Shi
2022-03-17  9:00     ` David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 05/15] mm/rmap: convert RMAP flags to a proper distinct rmap_t type David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 06/15] mm/rmap: remove do_page_add_anon_rmap() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 07/15] mm/rmap: pass rmap flags to hugepage_add_anon_rmap() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 08/15] mm/rmap: drop "compound" parameter from page_add_new_anon_rmap() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 09/15] mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 10/15] mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for PageAnon() pages David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 11/15] mm: remember exclusively mapped anonymous pages with PG_anon_exclusive David Hildenbrand
2022-03-16 21:23   ` Yang Shi
2022-03-17  9:06     ` David Hildenbrand
2022-03-18 20:29       ` Yang Shi
2022-03-19 10:21         ` David Hildenbrand
2022-03-19 10:50           ` David Hildenbrand
2022-03-21 20:56             ` Yang Shi
2022-03-22  9:41               ` David Hildenbrand
2022-03-21 20:51           ` Yang Shi
2022-03-15 10:47 ` [PATCH v2 12/15] mm/gup: disallow follow_page(FOLL_PIN) David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 13/15] mm: support GUP-triggered unsharing of anonymous pages David Hildenbrand
2022-03-18 23:30   ` Jason Gunthorpe
2022-03-21 16:15     ` David Hildenbrand
2022-03-21 16:18       ` Jason Gunthorpe
2022-03-21 16:24         ` David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 14/15] mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 15/15] mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are exclusive when (un)pinning David Hildenbrand
2022-03-18 23:35   ` Jason Gunthorpe
2022-03-19 10:22     ` David Hildenbrand
2022-03-18 23:36 ` [PATCH v2 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2eaa2667-219c-1bac-8f50-0020fa42e54e@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ddutile@redhat.com \
    --cc=guro@fb.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=khalid.aziz@oracle.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=namit@vmware.com \
    --cc=oded.gabbay@gmail.com \
    --cc=oleg@redhat.com \
    --cc=pedrodemargomes@gmail.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zhangliang5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).