All of lore.kernel.org
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	Steven Price <steven.price@arm.com>,
	akpm@linux-foundation.org, catalin.marinas@arm.com,
	will@kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, mhocko@suse.com, shy828301@gmail.com,
	v-songbaohua@oppo.com, wangkefeng.wang@huawei.com,
	willy@infradead.org, xiang@kernel.org, ying.huang@intel.com,
	yuzhao@google.com
Subject: Re: [RFC V3 PATCH] arm64: mm: swap: save and restore mte tags for large folios
Date: Fri, 8 Dec 2023 13:00:15 +1300	[thread overview]
Message-ID: <CAGsJ_4zLhmOPwjwuC4Sk=ZkWdxvpDsU5gE6PfWoxXH3WBwB_hQ@mail.gmail.com> (raw)
In-Reply-To: <1dcd6985-aa29-4df7-a7cb-ef57ae658861@redhat.com>

On Thu, Dec 7, 2023 at 11:04 PM David Hildenbrand <david@redhat.com> wrote:
>
> >>
> >>> not per-folio? I'm also not sure what it buys us - instead of reading a per-page
> >>> flag we now have to read 128 bytes of tag for each page and check its zero.
> >>
> >> My point is, if that is the corner case, we might not care about that.
> >
> > Hi David,
>
> Hi!
>
> > my understanding is that this is NOT a corner. Alternatively, it is
> > really a common case.
>
> If it happens with < 1% of all large folios on swapout/swapin, it's not
> the common case. Even if some scenarios you point out below can and will
> happen.
>

Fair enough. If we define "corner case" based on the percentage of those folios
which can get partial MTE tags set or get partial MTE tags invalidated, I agree
this is a corner case. I thought  that a corner case was a case which
could rarely
happen.

> >
> > 1. a large folio can be partially unmapped when it is in swapche and
> > after it is swapped out
> > in all cases, its tags can be partially invalidated. I don't think
> > this is a corner case, as long
> > as userspaces are still working at the granularity of basepages, this
> > is always going to
> > happen. For example, userspace libc such as jemalloc can identify
> > PAGESIZE, and use
> > madvise(DONTNEED) to return memory to the kernel. Heap management is
> > still working
> > at the granularity of the basepage.
> >
> > 2. mprotect on a part of a large folio as Steven pointed out.
> >
> > 3.long term, we are working to swap-in large folios as a whole[1] just
> > like swapping out large
> > folios as a whole. for those ptes which are still contiguous swap
> > entries, i mean, which
> > are not unmapped by userspace after the large folios are swapped out
> > to swap devices,
> > we have a chance to swap in a whole large folio, we do have a chance
> > to restore tags
> > for the large folio without early-exit. but we still have a good
> > chance to fall back to base
> > page if we fail to allocate large folio, in this case, do_swap_page()
> > still works at the
> > granularity of basepage. and do_swap_page() will call swap_free(entry),  tags of
> >
> > this particular page can be invalidated as a result.
>
> I don't immediately see how that relates. You get a fresh small folio
> and simply load that tag from the internal datastructure. No messing
> with large folios required, because you don't have a large folio. So no
> considerations about large folio batch MTE tag restore apply.

right. I was thinking the original large folio was partially
swapped-in and forgot
the new allocated page was actually one folio with only one page :-)

Indeed, in that case, it is still restoring the MTE tag for the whole
folio with one
page.

>
> >
> > 4. too many early-exit might be negative to performance.
> >
> >
> > So I am thinking that in the future, we need two helpers,
> > 1. void __arch_swap_restore(swp_entry_t entry, struct page *page);
> > this is always needed to support page-level tag restore.
> >
> > 2.  void arch_swap_restore(swp_entry_t entry, struct folio *folio);
> > this can be a helper when we are able to swap in a whole folio. two
> > conditions must be met
> > (a). PTEs entries are still contiguous swap entries just as when large
> > folios were swapped
> > out.
> > (b). we succeed in the allocation of a large folio in do_swap_page.
> >
> > For this moment, we only need 1; we will add 2 in swap-in large folio series.
> >
> > What do you think?
>
> I agree that it's better to keep it simple for now.
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry

  reply	other threads:[~2023-12-08  0:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-14  1:43 [RFC V3 PATCH] arm64: mm: swap: save and restore mte tags for large folios Barry Song
2023-11-15 14:40 ` Steven Price
2023-11-15 15:16 ` David Hildenbrand
2023-11-15 20:49   ` Barry Song
2023-11-15 22:45     ` Yang Shi
2023-11-16  9:36     ` David Hildenbrand
2023-11-16 23:47       ` Barry Song
2023-11-17  0:15         ` Barry Song
2023-11-17 11:28           ` David Hildenbrand
2023-11-17 18:41             ` Barry Song
2023-11-20  9:11               ` David Hildenbrand
2023-11-20 10:57                 ` Ryan Roberts
2023-11-20 14:50                   ` Steven Price
2023-11-24  1:35                   ` Barry Song
2023-11-24  8:55                     ` David Hildenbrand
2023-11-24  9:01                       ` Ryan Roberts
2023-11-24  9:55                         ` Steven Price
2023-11-24 18:14                           ` Barry Song
2023-11-27 11:56                             ` Ryan Roberts
2023-11-27 12:01                               ` David Hildenbrand
2023-11-27 12:14                                 ` Ryan Roberts
2023-11-27 14:16                                   ` David Hildenbrand
2023-11-27 14:52                                     ` Ryan Roberts
2023-11-27 15:42                                       ` David Hildenbrand
2023-12-07  3:22                                     ` Barry Song
2023-12-07 10:03                                       ` David Hildenbrand
2023-12-08  0:00                                         ` Barry Song [this message]
2023-11-17 11:25         ` David Hildenbrand
2023-11-17 19:36         ` Matthew Wilcox
2023-11-17 20:36           ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4zLhmOPwjwuC4Sk=ZkWdxvpDsU5gE6PfWoxXH3WBwB_hQ@mail.gmail.com' \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=steven.price@arm.com \
    --cc=v-songbaohua@oppo.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.