All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads
Date: Fri, 4 Dec 2020 11:50:37 -0500	[thread overview]
Message-ID: <20201204165037.GN108496@xz-x1> (raw)
In-Reply-To: <alpine.LSU.2.11.2012032053440.15141@eggly.anvils>

On Thu, Dec 03, 2020 at 09:59:50PM -0800, Hugh Dickins wrote:
> On Thu, 3 Dec 2020, Andrea Arcangeli wrote:
> > On Thu, Dec 03, 2020 at 09:30:51PM -0500, Peter Xu wrote:
> > > I'm just afraid there's no space left for a migration entry, because migration
> > > entries fills in the pfn information into swp offset field rather than a real
> > > offset (please refer to make_migration_entry())?  I assume PFN can use any bit.
> > > Or did I miss anything?
> > > 
> > > I went back to see the original proposal from Hugh:
> > > 
> > >   IIUC you only need a single value, no need to carve out another whole
> > >   swp_type: could probably be swp_offset 0 of any swp_type other than 0.
> > > 
> > > Hugh/Andrea, sorry if this is a stupid swap question: could you help explain
> > > why swp_offset=0 won't be used by any swap device?  I believe it's correct,
> > > it's just that I failed to figure out the reason myself. :(
> > > 
> 
> It's because swp_offset 0 is the offset of the swap header, and if we
> ever used that when allocating swap, then the swap header would get
> overwritten, and that swap area become unrecognizable next time.
> 
> But I said it would be usable for UFFD with any swp_type other than 0,
> because a swap entry of type 0, offset 0 is simply 0, which looks just
> like no swap entry at all, and there are (or were: I might not be
> up-to-date) benign races where a swap entry might get passed down but
> then found to be 0, and that was understandable and permitted (yes,
> I still see the "if (!entry.val) goto out;" in __swap_info_get()).
> 
> And that might be related to pte_none() being 0 on most architectures
> (not s390 IIRC): we need to distinguish none from swap.  Though that
> all gets complicated by the way the swp_entry is munged before being
> put into a pte, and the x86 swap munging got more complicated when
> L1TF was revealed (and accompanied by prot none munging too) -
> search git log of v4.19 for x86/speculation/l1tf if you need to.

My thanks to both of you for explaining the details.

> 
> > 
> > Hugh may want to review if I got it wrong, but there's basically three
> > ways.
> > 
> > swp_type would mean adding one more reserved value in addition of
> > SWP_MIGRATION_READ and SWP_MIGRATION_WRITE (kind of increasing
> > SWP_MIGRATION_NUM to 3).
> 
> I'm not very keen on actually using any of the SWP_MIGRATION defines,
> partly because in principle UFFD should not depend on CONFIG_MIGRATION,
> partly because the uffd_wp entry would not behave anything like a
> migration entry (whose pfn should always indicate a locked page).
> 
> swp_offset 0 of swp_type 1 perhaps?
> 
> > 
> > swp_offset = 0 works in combination of SWP_MIGRATION_WRITE and
> > SWP_MIGRATION_READ if we enforce pfn 0 is never used by the kernel
> > (I'd feel safer with pfn value -1UL truncated to the bits of the swp
> > offset, since the swp_entry format is common code).
> > 
> > The bit I was suggesting is just one more bit like _PAGE_SWP_UFFD_WP
> > from the pte, one that cannot ever be set in any swp entry today. I
> > assume it can't be _PAGE_SWP_UFFD_WP since that already can be set but
> > you may want to verify it...
> 
> I don't see why you would need another bit for this.
> 
> The code that checks non-present non-none entries in page table,
> for whether they are actually swap or migration entries or whatever,
> would now also check for swp_offset 0 of swp_type 1 and go off to
> the UFFD WP processing if so.
> 
> I didn't pay much attention to below, it seemed over-complicated.
> And I don't think Peter's PROT_NONE alternative was unworkable,
> but would have to be more careful about pfn and L1TF than shown.
> And I am more comfortable to focus on the swap-like direction,
> than think in two directions at once - never my strength!

Yes, I think both of them may work, but I'll follow your advise on using swap
entries, assuming easier and cleaner than _PAGE_PROTNONE.  For example, current
pte_present() does make more sense to return false for such an uffd-wp reserved
pte.  Then I won't make _PAGE_PROTNONE even more complicated too.

So I guess I'll start with type==1 && offset==0.

(PS: I still think "swp_entry(0, _UFFD_SWP_UFFD_WP) && !vma_is_anonymous(vma)"
 could also be a good candidate comparing to "swp_entry(1, 0)" considering
 type==1 here is kind of randomly chosen from all the other numbers except 0;
 but maybe that's not extremely important - the major logic should be the same)

Thanks!

-- 
Peter Xu


  reply	other threads:[~2020-12-04 16:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-30 23:06 [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads Peter Xu
2020-12-01  9:30 ` David Hildenbrand
2020-12-01 12:59 ` Matthew Wilcox
2020-12-01 22:30   ` Peter Xu
2020-12-02  0:02     ` Andrea Arcangeli
2020-12-02 22:37       ` Hugh Dickins
2020-12-02 22:37         ` Hugh Dickins
2020-12-02 23:41         ` Peter Xu
2020-12-03  5:36           ` Hugh Dickins
2020-12-03  5:36             ` Hugh Dickins
2020-12-03 18:02             ` Peter Xu
2020-12-03 19:44               ` Andrea Arcangeli
2020-12-04  2:30                 ` Peter Xu
2020-12-04  4:10                   ` Andrea Arcangeli
2020-12-04  5:59                     ` Hugh Dickins
2020-12-04  5:59                       ` Hugh Dickins
2020-12-04 16:50                       ` Peter Xu [this message]
2020-12-04 18:12                     ` Andrea Arcangeli
2020-12-04 19:23                       ` Peter Xu
2020-12-04 19:37                         ` Andrea Arcangeli
2020-12-04 20:21                           ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201204165037.GN108496@xz-x1 \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.