All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: 'Mike Kravetz' <mike.kravetz@oracle.com>,
	'Andrew Morton' <akpm@linux-foundation.org>,
	linux-mm@kvack.org,
	"'Dr. David Alan Gilbert'" <dgilbert@redhat.com>,
	'Shaohua Li' <shli@fb.com>,
	'Pavel Emelyanov' <xemul@parallels.com>,
	'Mike Rapoport' <rppt@linux.vnet.ibm.com>
Subject: Re: [PATCH 15/33] userfaultfd: hugetlbfs: add __mcopy_atomic_hugetlb for huge page UFFDIO_COPY
Date: Thu, 15 Dec 2016 20:02:42 +0100	[thread overview]
Message-ID: <20161215190242.GC4909@redhat.com> (raw)
In-Reply-To: <019d01d24554$38e7f220$aab7d660$@alibaba-inc.com>

On Wed, Nov 23, 2016 at 02:38:37PM +0800, Hillf Danton wrote:
> On Tuesday, November 22, 2016 9:17 AM Mike Kravetz wrote:
> > I am not sure if you are convinced ClearPagePrivate is an acceptable
> > solution to this issue.  If you do, here is the simple patch to add
> > it and an appropriate comment.
> > 
> Hi Mike and Andrea
> 
> Sorry for my jumping in.
> 
> In commit 07443a85ad
> ("mm, hugetlb: return a reserved page to a reserved pool if failed")
> newly allocated huge page gets cleared for a successful COW.
> 
> I'm wondering if we can handle our error path along that way?
> 
> Obvious I could miss the points you are concerning.

The hugepage allocation toggles the region covering the page in the
vma reservations, so when the vma is virtually unmapped, those regions
that got toggled, are considered not reserved and the global
reservation is not decreased.

Because the global reservation is decreased by the same page
allocation that sets the page private flag after toggling the virtual
regions, the page private flag shall be cleared when the page is
finally mapped in userland, as it's not reserved anymore. This way
when the page is freed, the global reservation will not be increased
(and when the vma is unmapped the reservation will not be decreased
either, because of the region toggling above).

hugetlb_mcopy_atomic_pte is already correctly doing:

	ClearPagePrivate(page);
	hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);

while mapping the hugepage in userland.

The issue is that if we can't reach hugetlb_mcopy_atomic_pte because
userland screws with the vmas while the UFFDIO_COPY releases the
mmap_sem, the point where we error out, has the vma out of sync
because we had to drop the mmap_sem in the first place. So we can't
toggle the vma virtual region covering the page back to its original
state (i.e. reserved). That's what restore_reserve_on_error would try
to achieve, but we can't run it as the vma we got in the error path is
stale.

All we know is that one more page will be considered not reserved when
the vma is unmapped, so the global reservation will be decreased of
one less page when the vma is unmapped. In turn when freeing such
hugepage in the error path, we've to prevent the global reserve to be
increased once again and to do so we've to clear the page private flag
before freeing the hugepage.

I already applied Mark's patch that clears the page private flag in
the error path. If anything is incorrect in the explanation above let
me know.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-12-15 19:02 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-02 19:33 [PATCH 00/33] userfaultfd tmpfs/hugetlbfs/non-cooperative Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 01/33] userfaultfd: document _IOR/_IOW Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 02/33] userfaultfd: correct comment about UFFD_FEATURE_PAGEFAULT_FLAG_WP Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 03/33] userfaultfd: convert BUG() to WARN_ON_ONCE() Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 04/33] userfaultfd: use vma_is_anonymous Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 05/33] userfaultfd: non-cooperative: Split the find_userfault() routine Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 06/33] userfaultfd: non-cooperative: Add ability to report non-PF events from uffd descriptor Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 07/33] userfaultfd: non-cooperative: report all available features to userland Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 08/33] userfaultfd: non-cooperative: Add fork() event Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 09/33] userfaultfd: non-cooperative: Add fork() event, build warning fix Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 10/33] userfaultfd: non-cooperative: dup_userfaultfd: use mm_count instead of mm_users Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 11/33] userfaultfd: non-cooperative: Add mremap() event Andrea Arcangeli
2016-11-03  7:41   ` Hillf Danton
2016-11-03 17:52     ` Mike Rapoport
2016-11-04 15:40     ` Mike Rapoport
2016-11-02 19:33 ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED request Andrea Arcangeli
2016-11-03  8:01   ` Hillf Danton
2016-11-03 17:24     ` Mike Rapoport
2016-11-04 16:40       ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED requestg Andrea Arcangeli
2016-11-04 15:42     ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED request Mike Rapoport
2016-11-02 19:33 ` [PATCH 13/33] userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 14/33] userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for " Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 15/33] userfaultfd: hugetlbfs: add __mcopy_atomic_hugetlb for huge page UFFDIO_COPY Andrea Arcangeli
2016-11-03 10:15   ` Hillf Danton
2016-11-03 17:33     ` Mike Kravetz
2016-11-03 19:14       ` Mike Kravetz
2016-11-04  6:43         ` Hillf Danton
2016-11-04 19:36         ` Andrea Arcangeli
2016-11-04 20:34           ` Mike Kravetz
2016-11-08 21:06           ` Mike Kravetz
2016-11-16 18:28             ` Andrea Arcangeli
2016-11-16 18:53               ` Mike Kravetz
2016-11-17 15:40                 ` Andrea Arcangeli
2016-11-17 19:26                   ` Mike Kravetz
2016-11-18  0:05                     ` Andrea Arcangeli
2016-11-18  5:52                       ` Mike Kravetz
2016-11-22  1:16                         ` Mike Kravetz
2016-11-23  6:38                           ` Hillf Danton
2016-12-15 19:02                             ` Andrea Arcangeli [this message]
2016-12-16  3:54                               ` Hillf Danton
2016-11-17 19:41               ` Mike Kravetz
2016-11-04 16:35       ` Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 16/33] userfaultfd: hugetlbfs: add userfaultfd hugetlb hook Andrea Arcangeli
2016-11-04  7:02   ` Hillf Danton
2016-11-02 19:33 ` [PATCH 17/33] userfaultfd: hugetlbfs: allow registration of ranges containing huge pages Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 18/33] userfaultfd: hugetlbfs: add userfaultfd_hugetlb test Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 19/33] userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 20/33] userfaultfd: introduce vma_can_userfault Andrea Arcangeli
2016-11-04  7:39   ` Hillf Danton
2016-11-02 19:33 ` [PATCH 21/33] userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 22/33] userfaultfd: shmem: introduce vma_is_shmem Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 23/33] userfaultfd: shmem: add tlbflush.h header for microblaze Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 24/33] userfaultfd: shmem: use shmem_mcopy_atomic_pte for shared memory Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 25/33] userfaultfd: shmem: add userfaultfd hook for shared memory faults Andrea Arcangeli
2016-11-04  8:59   ` Hillf Danton
2016-11-04 14:53     ` Mike Rapoport
2016-11-04 15:44     ` Mike Rapoport
2016-11-04 16:56       ` Andrea Arcangeli
2016-11-18  0:37       ` Andrea Arcangeli
2016-11-20 12:10         ` Mike Rapoport
2016-11-02 19:33 ` [PATCH 26/33] userfaultfd: shmem: allow registration of shared memory ranges Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 27/33] userfaultfd: shmem: add userfaultfd_shmem test Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 28/33] userfaultfd: shmem: lock the page before adding it to pagecache Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 29/33] userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 30/33] userfaultfd: non-cooperative: selftest: introduce userfaultfd_open Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 31/33] userfaultfd: non-cooperative: selftest: add ufd parameter to copy_page Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 32/33] userfaultfd: non-cooperative: selftest: add test for FORK, MADVDONTNEED and REMAP events Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 33/33] mm: mprotect: use pmd_trans_unstable instead of taking the pmd_lock Andrea Arcangeli
2016-11-02 20:07 ` [PATCH 00/33] userfaultfd tmpfs/hugetlbfs/non-cooperative Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161215190242.GC4909@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dgilbert@redhat.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=shli@fb.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.