All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Khalid Aziz <khalid.aziz@oracle.com>, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Yang Shi <shy828301@gmail.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
	Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
	Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
	Jan Kara <jack@suse.cz>, Liang Zhang <zhangliang5@huawei.com>,
	Pedro Gomes <pedrodemargomes@gmail.com>,
	Oded Gabbay <oded.gabbay@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v2 01/15] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed
Date: Thu, 31 Mar 2022 15:46:19 +0200	[thread overview]
Message-ID: <2eaa2667-219c-1bac-8f50-0020fa42e54e@redhat.com> (raw)
In-Reply-To: <689c199f-9954-524b-7a2d-40f186e87b34@oracle.com>

> Hi David,

Hi,

> 
> arch_remap_one() sounds reasonable. Would you like to add that in your patch series, or would you like me to create a 
> separate patch for adding this on top of your patch series?

Let's handle it separately, because I think there is still a lot to be
clarified. I also feel like the terminology ("arch_unmap_one()") is a
little too generic, if we really only care about anonymous pages (do we?).

> 
>>
>>>
>>>>
>>>> arch_unmap_one() calls adi_save_tags(), which allocates memory.
>>>> adi_restore_tags()->del_tag_store() reverts that operation and ends up
>>>> freeing memory conditionally; However, it's only
>>>> called from arch_do_swap_page().
>>>>
>>>>
>>>> Here is why I have to scratch my head:
>>>>
>>>> a) arch_do_swap_page() is only called from do_swap_page(). We don't do anything similar
>>>> for mm/swapfile.c:unuse_pte(), aren't we missing something?
>>>
>>> My understanding of this code path maybe flawed, so do correct me if this does not sound right. unused_pte() is called
>>> upon user turning off swap on a device. unused_pte() is called by unused_pte_range() which swaps the page back in from
>>> swap device before calling unuse_pte(). Once the page is read back in from swap, ultimately access to the va for the
>>> page will result in call to __handle_mm_fault() which in turn will call handle_pte_fault() to insert a new pte for this
>>> mapping and handle_pte_fault() will call arch_do_swap_page() which will restore the tags.
>>
>> unuse_pte() will replace a swap pte directly by a proper, present pte,
>> just like do_swap_page() would. You won't end up in do_swap_page()
>> anymore and arch_do_swap_page() won't be called, because there is no
>> swap PTE anymore.
>>
>>>
>>>>
>>>> b) try_to_migrate_one() does the arch_unmap_one(), but who will do the
>>>> restore+free after migration succeeded or failed, aren't we missing something?
>>>
>>> try_to_migrate_one() replaces the current pte with a migration pte after calling arch_unmap_one(). This causes
>>> __handle_mm_fault() to be called when a reference to the va covered by migration pte is made. This will in turn finally
>>> result in a call to arch_do_swap_page() which restores the tags.
>>
>> Migration PTEs are restore via mm/migrate.c:remove_migration_ptes().
>> arch_do_swap_page() won't be called.
>>
>> What you mention is if someone accesses the migration PTE while
>> migration is active and the migration PTEs have not been removed yet.
>> While we'll end up in do_swap_page(), we'll do a migration_entry_wait(),
>> followed by an effective immediate "return 0;". arch_do_swap_page()
>> won't get called.
>>
>>
>> So in essence, I think this doesn't work as expected yet. In the best
>> case we don't immediately free memory. In the worst case we lose the tags.
>>
> 
> I see what you mean. I can work on fixing these issues up.

Ideally, I think we'd handle migration differently: migrate the tags
directly instead of temporarily storing them

I also wonder, how to handle migration of THP (which uses migration
PMDs) and THP splitting (which also uses migration PTEs); maybe they
don't apply on sparc?

Further, I wonder if you have to handle zapping of swap/migration PTEs,
and if you'd also have to cleanup+freeup any allcoated tag memory?
E.g., zap_pte_range() can simply zap swap and migration entries, wouldn't

Last but not least, I do wonder if mremap() -- e.g., move_pte() -- and
fork() -- e.g., copy_pte_range() -- are handled properly, where we can
end up moving/copying swap+migration PTEs around.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2022-03-31 13:46 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-15 10:47 [PATCH v2 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 01/15] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed David Hildenbrand
2022-03-29 13:59   ` David Hildenbrand
2022-03-29 20:42     ` Khalid Aziz
2022-03-29 20:55       ` David Hildenbrand
2022-03-30 17:04         ` Khalid Aziz
2022-03-31 13:46           ` David Hildenbrand [this message]
2022-03-15 10:47 ` [PATCH v2 02/15] mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 03/15] mm/memory: slightly simplify copy_present_pte() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 04/15] mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap() David Hildenbrand
2022-03-16 20:02   ` Yang Shi
2022-03-17  9:00     ` David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 05/15] mm/rmap: convert RMAP flags to a proper distinct rmap_t type David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 06/15] mm/rmap: remove do_page_add_anon_rmap() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 07/15] mm/rmap: pass rmap flags to hugepage_add_anon_rmap() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 08/15] mm/rmap: drop "compound" parameter from page_add_new_anon_rmap() David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 09/15] mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 10/15] mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for PageAnon() pages David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 11/15] mm: remember exclusively mapped anonymous pages with PG_anon_exclusive David Hildenbrand
2022-03-16 21:23   ` Yang Shi
2022-03-17  9:06     ` David Hildenbrand
2022-03-18 20:29       ` Yang Shi
2022-03-19 10:21         ` David Hildenbrand
2022-03-19 10:50           ` David Hildenbrand
2022-03-21 20:56             ` Yang Shi
2022-03-22  9:41               ` David Hildenbrand
2022-03-21 20:51           ` Yang Shi
2022-03-15 10:47 ` [PATCH v2 12/15] mm/gup: disallow follow_page(FOLL_PIN) David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 13/15] mm: support GUP-triggered unsharing of anonymous pages David Hildenbrand
2022-03-18 23:30   ` Jason Gunthorpe
2022-03-21 16:15     ` David Hildenbrand
2022-03-21 16:18       ` Jason Gunthorpe
2022-03-21 16:24         ` David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 14/15] mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page David Hildenbrand
2022-03-15 10:47 ` [PATCH v2 15/15] mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are exclusive when (un)pinning David Hildenbrand
2022-03-18 23:35   ` Jason Gunthorpe
2022-03-19 10:22     ` David Hildenbrand
2022-03-18 23:36 ` [PATCH v2 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2eaa2667-219c-1bac-8f50-0020fa42e54e@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ddutile@redhat.com \
    --cc=guro@fb.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=khalid.aziz@oracle.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=namit@vmware.com \
    --cc=oded.gabbay@gmail.com \
    --cc=oleg@redhat.com \
    --cc=pedrodemargomes@gmail.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zhangliang5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.