All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mina Almasry <almasrymina@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Paul E . McKenney" <paulmckrcu@fb.com>,
	Yu Zhao <yuzhao@google.com>, Jonathan Corbet <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Florian Schmidt <florian.schmidt@nutanix.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v4] mm: Add PM_HUGE_THP_MAPPING to /proc/pid/pagemap
Date: Wed, 10 Nov 2021 16:57:27 +0800	[thread overview]
Message-ID: <YYuJd9ZBQiY50dVs@xz-m1.local> (raw)
In-Reply-To: <793685d2-be3f-9a74-c9a3-65c486e0ef1f@redhat.com>

On Wed, Nov 10, 2021 at 09:30:50AM +0100, David Hildenbrand wrote:
> On 10.11.21 09:27, Peter Xu wrote:
> > On Wed, Nov 10, 2021 at 09:14:42AM +0100, David Hildenbrand wrote:
> >> On 10.11.21 08:03, Peter Xu wrote:
> >>> Hi, Mina,
> >>>
> >>> Sorry to comment late.
> >>>
> >>> On Sun, Nov 07, 2021 at 03:57:54PM -0800, Mina Almasry wrote:
> >>>> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> >>>> index fdc19fbc10839..8a0f0064ff336 100644
> >>>> --- a/Documentation/admin-guide/mm/pagemap.rst
> >>>> +++ b/Documentation/admin-guide/mm/pagemap.rst
> >>>> @@ -23,7 +23,8 @@ There are four components to pagemap:
> >>>>      * Bit  56    page exclusively mapped (since 4.2)
> >>>>      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
> >>>>        :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
> >>>> -    * Bits 57-60 zero
> >>>> +    * Bit  58    page is a huge (PMD size) THP mapping
> >>>> +    * Bits 59-60 zero
> >>>>      * Bit  61    page is file-page or shared-anon (since 3.5)
> >>>>      * Bit  62    page swapped
> >>>>      * Bit  63    page present
> >>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> >>>> index ad667dbc96f5c..6f1403f83b310 100644
> >>>> --- a/fs/proc/task_mmu.c
> >>>> +++ b/fs/proc/task_mmu.c
> >>>> @@ -1302,6 +1302,7 @@ struct pagemapread {
> >>>>  #define PM_SOFT_DIRTY		BIT_ULL(55)
> >>>>  #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
> >>>>  #define PM_UFFD_WP		BIT_ULL(57)
> >>>> +#define PM_HUGE_THP_MAPPING	BIT_ULL(58)
> >>>
> >>> The ending "_MAPPING" seems redundant to me, how about just call it "PM_THP" or
> >>> "PM_HUGE" (as THP also means HUGE already)?
> >>>
> >>> IMHO the core problem is about permission controls, and it seems to me we're
> >>> actually trying to workaround it by duplicating some information we have.. so
> >>> it's kind of a pity.  Totally not against this patch, but imho it'll be nicer
> >>> if it's the permission part that to be enhanced, rather than a new but slightly
> >>> duplicated interface.
> >>
> >> It's not a permission problem AFAIKS: even with permissions "changed",
> >> any attempt to use /proc/kpageflags is just racy. Let's not go down that
> >> path, it's really the wrong mechanism to export to random userspace.
> > 
> > I agree it's racy, but IMHO that's fine.  These are hints for userspace to make
> > decisions, they cannot be always right.  Even if we fetch atomically and seeing
> > that this pte is swapped out, it can be quickly accessed at the same time and
> > it'll be in-memory again.  Only if we can freeze the whole pgtable but we
> > can't, so they can only be used as hints.
> 
> Sorry, I don't think /proc/kpageflags (or exporting the PFNs to random
> users via /proc/self/pagemap) is the way to go.
> 
> "Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get
> PFNs. In 4.0 and 4.1 opens by unprivileged fail with -EPERM.  Starting
> from 4.2 the PFN field is zeroed if the user does not have
> CAP_SYS_ADMIN. Reason: information about PFNs helps in exploiting
> Rowhammer vulnerability."

IMHO these are two problems that you mentioned.  That's also what I was
wondering about: could the app be granted with CAP_SYS_ADMIN then?

I am not sure whether that'll work well with /proc/kpage* though, as it's by
default 0400.  So perhaps we need to manual adjust the file permission too to
make sure the app can both access PFNs (with SYS_ADMIN) and the flags.  Totally
no expert on the permissions..

> 
> > 
> >>
> >> We do have an interface to access this information from userspace
> >> already: /proc/self/smaps IIRC. Mina commented that they are seeing
> >> performance issues with that approach.
> >>
> >> It would be valuable to add these details to the patch description,
> >> including a performance difference when using both interfaces we have
> >> available. As the patch description stands, there is no explanation
> >> "why" we want this change.
> > 
> > I didn't notice Mina mention about performance issues with kpageflags, if so
> > then I agree this solution helps. 
> The performance issue seems to be with /proc/self/smaps.

This also reminded me that we've got issue with smaps being too slow, and in
many cases we're only interested in a small portion of the whole memory.  This
made me wonder how about a new smaps interface taking memory range as input.

Thanks,

-- 
Peter Xu


  reply	other threads:[~2021-11-10  8:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-07 23:57 [PATCH v4] mm: Add PM_HUGE_THP_MAPPING to /proc/pid/pagemap Mina Almasry
2021-11-07 23:57 ` Mina Almasry
2021-11-10  7:03 ` Peter Xu
2021-11-10  8:14   ` David Hildenbrand
2021-11-10  8:27     ` Peter Xu
2021-11-10  8:30       ` David Hildenbrand
2021-11-10  8:57         ` Peter Xu [this message]
2021-11-10 10:24           ` David Hildenbrand
2021-11-10 17:42             ` Mina Almasry
2021-11-12  7:41               ` Peter Xu
2021-11-10 17:50   ` Mina Almasry
2021-11-12  7:43     ` Peter Xu
2021-11-15 22:50       ` Mina Almasry
2021-11-16  1:59         ` Peter Xu
2021-11-17 19:50           ` Mina Almasry
2021-11-18  0:35             ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYuJd9ZBQiY50dVs@xz-m1.local \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=florian.schmidt@nutanix.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmckrcu@fb.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.