linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads
Date: Fri, 4 Dec 2020 11:50:37 -0500	[thread overview]
Message-ID: <20201204165037.GN108496@xz-x1> (raw)
In-Reply-To: <alpine.LSU.2.11.2012032053440.15141@eggly.anvils>

On Thu, Dec 03, 2020 at 09:59:50PM -0800, Hugh Dickins wrote:
> On Thu, 3 Dec 2020, Andrea Arcangeli wrote:
> > On Thu, Dec 03, 2020 at 09:30:51PM -0500, Peter Xu wrote:
> > > I'm just afraid there's no space left for a migration entry, because migration
> > > entries fills in the pfn information into swp offset field rather than a real
> > > offset (please refer to make_migration_entry())?  I assume PFN can use any bit.
> > > Or did I miss anything?
> > > 
> > > I went back to see the original proposal from Hugh:
> > > 
> > >   IIUC you only need a single value, no need to carve out another whole
> > >   swp_type: could probably be swp_offset 0 of any swp_type other than 0.
> > > 
> > > Hugh/Andrea, sorry if this is a stupid swap question: could you help explain
> > > why swp_offset=0 won't be used by any swap device?  I believe it's correct,
> > > it's just that I failed to figure out the reason myself. :(
> > > 
> 
> It's because swp_offset 0 is the offset of the swap header, and if we
> ever used that when allocating swap, then the swap header would get
> overwritten, and that swap area become unrecognizable next time.
> 
> But I said it would be usable for UFFD with any swp_type other than 0,
> because a swap entry of type 0, offset 0 is simply 0, which looks just
> like no swap entry at all, and there are (or were: I might not be
> up-to-date) benign races where a swap entry might get passed down but
> then found to be 0, and that was understandable and permitted (yes,
> I still see the "if (!entry.val) goto out;" in __swap_info_get()).
> 
> And that might be related to pte_none() being 0 on most architectures
> (not s390 IIRC): we need to distinguish none from swap.  Though that
> all gets complicated by the way the swp_entry is munged before being
> put into a pte, and the x86 swap munging got more complicated when
> L1TF was revealed (and accompanied by prot none munging too) -
> search git log of v4.19 for x86/speculation/l1tf if you need to.

My thanks to both of you for explaining the details.

> 
> > 
> > Hugh may want to review if I got it wrong, but there's basically three
> > ways.
> > 
> > swp_type would mean adding one more reserved value in addition of
> > SWP_MIGRATION_READ and SWP_MIGRATION_WRITE (kind of increasing
> > SWP_MIGRATION_NUM to 3).
> 
> I'm not very keen on actually using any of the SWP_MIGRATION defines,
> partly because in principle UFFD should not depend on CONFIG_MIGRATION,
> partly because the uffd_wp entry would not behave anything like a
> migration entry (whose pfn should always indicate a locked page).
> 
> swp_offset 0 of swp_type 1 perhaps?
> 
> > 
> > swp_offset = 0 works in combination of SWP_MIGRATION_WRITE and
> > SWP_MIGRATION_READ if we enforce pfn 0 is never used by the kernel
> > (I'd feel safer with pfn value -1UL truncated to the bits of the swp
> > offset, since the swp_entry format is common code).
> > 
> > The bit I was suggesting is just one more bit like _PAGE_SWP_UFFD_WP
> > from the pte, one that cannot ever be set in any swp entry today. I
> > assume it can't be _PAGE_SWP_UFFD_WP since that already can be set but
> > you may want to verify it...
> 
> I don't see why you would need another bit for this.
> 
> The code that checks non-present non-none entries in page table,
> for whether they are actually swap or migration entries or whatever,
> would now also check for swp_offset 0 of swp_type 1 and go off to
> the UFFD WP processing if so.
> 
> I didn't pay much attention to below, it seemed over-complicated.
> And I don't think Peter's PROT_NONE alternative was unworkable,
> but would have to be more careful about pfn and L1TF than shown.
> And I am more comfortable to focus on the swap-like direction,
> than think in two directions at once - never my strength!

Yes, I think both of them may work, but I'll follow your advise on using swap
entries, assuming easier and cleaner than _PAGE_PROTNONE.  For example, current
pte_present() does make more sense to return false for such an uffd-wp reserved
pte.  Then I won't make _PAGE_PROTNONE even more complicated too.

So I guess I'll start with type==1 && offset==0.

(PS: I still think "swp_entry(0, _UFFD_SWP_UFFD_WP) && !vma_is_anonymous(vma)"
 could also be a good candidate comparing to "swp_entry(1, 0)" considering
 type==1 here is kind of randomly chosen from all the other numbers except 0;
 but maybe that's not extremely important - the major logic should be the same)

Thanks!

-- 
Peter Xu


  reply	other threads:[~2020-12-04 16:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-30 23:06 [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads Peter Xu
2020-12-01  9:30 ` David Hildenbrand
2020-12-01 12:59 ` Matthew Wilcox
2020-12-01 22:30   ` Peter Xu
2020-12-02  0:02     ` Andrea Arcangeli
2020-12-02 22:37       ` Hugh Dickins
2020-12-02 23:41         ` Peter Xu
2020-12-03  5:36           ` Hugh Dickins
2020-12-03 18:02             ` Peter Xu
2020-12-03 19:44               ` Andrea Arcangeli
2020-12-04  2:30                 ` Peter Xu
2020-12-04  4:10                   ` Andrea Arcangeli
2020-12-04  5:59                     ` Hugh Dickins
2020-12-04 16:50                       ` Peter Xu [this message]
2020-12-04 18:12                     ` Andrea Arcangeli
2020-12-04 19:23                       ` Peter Xu
2020-12-04 19:37                         ` Andrea Arcangeli
2020-12-04 20:21                           ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201204165037.GN108496@xz-x1 \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).