linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>, Michal Hocko <mhocko@suse.com>,
	Oscar Salvador <osalvador@suse.de>,
	Matthew Wilcox <willy@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Minchan Kim <minchan@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
	Dave Hansen <dave.hansen@intel.com>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@surriel.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Richard Henderson <rth@twiddle.net>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Matt Turner <mattst88@gmail.com>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Helge Deller <deller@gmx.de>, Chris Zankel <chris@zankel.net>,
	Max Filippov <jcmvbkbc@gmail.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Peter Xu <peterx@redhat.com>,
	Rolf Eike Beer <eike-kernel@sf-tec.de>,
	linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-parisc@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	linux-arch <linux-arch@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH v1 2/5] mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault/prealloc memory
Date: Tue, 30 Mar 2021 18:21:07 +0200	[thread overview]
Message-ID: <CAG48ez20rLRNPZj6hLHQ_PLT8H60kTac-uXRiLByD70Q7+qsdQ@mail.gmail.com> (raw)
In-Reply-To: <2bab28c7-08c0-7ff0-c70e-9bf94da05ce1@redhat.com>

On Tue, Mar 30, 2021 at 5:01 PM David Hildenbrand <david@redhat.com> wrote:
> >> +long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
> >> +                           unsigned long end, bool write, int *locked)
> >> +{
> >> +       struct mm_struct *mm = vma->vm_mm;
> >> +       unsigned long nr_pages = (end - start) / PAGE_SIZE;
> >> +       int gup_flags;
> >> +
> >> +       VM_BUG_ON(!PAGE_ALIGNED(start));
> >> +       VM_BUG_ON(!PAGE_ALIGNED(end));
> >> +       VM_BUG_ON_VMA(start < vma->vm_start, vma);
> >> +       VM_BUG_ON_VMA(end > vma->vm_end, vma);
> >> +       mmap_assert_locked(mm);
> >> +
> >> +       /*
> >> +        * FOLL_HWPOISON: Return -EHWPOISON instead of -EFAULT when we hit
> >> +        *                a poisoned page.
> >> +        * FOLL_POPULATE: Always populate memory with VM_LOCKONFAULT.
> >> +        * !FOLL_FORCE: Require proper access permissions.
> >> +        */
> >> +       gup_flags = FOLL_TOUCH | FOLL_POPULATE | FOLL_MLOCK | FOLL_HWPOISON;
> >> +       if (write)
> >> +               gup_flags |= FOLL_WRITE;
> >> +
> >> +       /*
> >> +        * See check_vma_flags(): Will return -EFAULT on incompatible mappings
> >> +        * or with insufficient permissions.
> >> +        */
> >> +       return __get_user_pages(mm, start, nr_pages, gup_flags,
> >> +                               NULL, NULL, locked);
> >
> > You mentioned in the commit message that you don't want to actually
> > dirty all the file pages and force writeback; but doesn't
> > POPULATE_WRITE still do exactly that? In follow_page_pte(), if
> > FOLL_TOUCH and FOLL_WRITE are set, we mark the page as dirty:
>
> Well, I mention that POPULATE_READ explicitly doesn't do that. I
> primarily set it because populate_vma_page_range() also sets it.
>
> Is it safe to *not* set it? IOW, fault something writable into a page
> table (where the CPU could dirty it without additional page faults)
> without marking it accessed? For me, this made logically sense. Thus I
> also understood why populate_vma_page_range() set it.

FOLL_TOUCH doesn't have anything to do with installing the PTE - it
essentially means "the caller of get_user_pages wants to read/write
the contents of the returned page, so please do the same things you
would do if userspace was accessing the page". So in particular, if
you look up a page via get_user_pages() with FOLL_WRITE|FOLL_TOUCH,
that tells the MM subsystem "I will be writing into this page directly
from the kernel, bypassing the userspace page tables, so please mark
it as dirty now so that it will be properly written back later". Part
of that is that it marks the page as recently used, which has an
effect on LRU pageout behavior, I think - as far as I understand, that
is why populate_vma_page_range() uses FOLL_TOUCH.

If you look at __get_user_pages(), you can see that it is split up
into two major parts: faultin_page() for creating PTEs, and
follow_page_mask() for grabbing pages from PTEs. faultin_page()
ignores FOLL_TOUCH completely; only follow_page_mask() uses it.

In a way I guess maybe you do want the "mark as recently accessed"
part that FOLL_TOUCH would give you without FOLL_WRITE? But I think
you very much don't want the dirtying that FOLL_TOUCH|FOLL_WRITE leads
to. Maybe the ideal approach would be to add a new FOLL flag to say "I
only want to mark as recently used, I don't want to dirty". Or maybe
it's enough to just leave out the FOLL_TOUCH entirely, I don't know.

  reply	other threads:[~2021-03-30 16:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 11:06 [PATCH v1 0/5] mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault/prealloc memory David Hildenbrand
2021-03-17 11:06 ` [PATCH v1 1/5] mm: make variable names for populate_vma_page_range() consistent David Hildenbrand
2021-03-17 11:06 ` [PATCH v1 2/5] mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault/prealloc memory David Hildenbrand
2021-03-30 13:37   ` Jann Horn
2021-03-30 15:01     ` David Hildenbrand
2021-03-30 16:21       ` Jann Horn [this message]
2021-03-30 16:30         ` David Hildenbrand
2021-03-30 16:31           ` David Hildenbrand
2021-04-07 10:31             ` David Hildenbrand
2021-04-15 10:26               ` David Hildenbrand
2021-03-17 11:06 ` [PATCH v1 3/5] MAINTAINERS: add tools/testing/selftests/vm/ to MEMORY MANAGEMENT David Hildenbrand
2021-03-17 11:06 ` [PATCH v1 4/5] selftests/vm: add protection_keys_32 / protection_keys_64 to gitignore David Hildenbrand
2021-03-17 11:06 ` [PATCH v1 5/5] selftests/vm: add test for MADV_POPULATE_(READ|WRITE) David Hildenbrand
2021-03-30  8:58 ` [PATCH v1 0/5] mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault/prealloc memory David Hildenbrand
2021-03-31  4:58   ` Andrew Morton
2021-03-31  6:46     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez20rLRNPZj6hLHQ_PLT8H60kTac-uXRiLByD70Q7+qsdQ@mail.gmail.com \
    --to=jannh@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=chris@zankel.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=eike-kernel@sf-tec.de \
    --cc=hughd@google.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jcmvbkbc@gmail.com \
    --cc=jgg@ziepe.ca \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-xtensa@linux-xtensa.org \
    --cc=mattst88@gmail.com \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=mst@redhat.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rth@twiddle.net \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).