All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>, Qian Cai <cai@lca.pw>,
	js1304@gmail.com, Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@lge.com, Vlastimil Babka <vbabka@suse.cz>,
	Christoph Hellwig <hch@infradead.org>,
	Roman Gushchin <guro@fb.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Michal Hocko <mhocko@suse.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback
Date: Thu, 8 Oct 2020 22:50:16 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2010082216250.10228@eggly.anvils> (raw)
In-Reply-To: <cf715524-f30d-634f-2a05-e02c4e52e675@oracle.com>

On Thu, 8 Oct 2020, Mike Kravetz wrote:
> On 10/7/20 8:21 PM, Hugh Dickins wrote:
> > 
> > Mike, j'accuse... your 5.7 commit c0d0381ade79 ("hugetlbfs:
> > use i_mmap_rwsem for more pmd sharing synchronization"), in which
> > unmap_and_move_huge_page() now passes the TTU_RMAP_LOCKED flag to
> > try_to_unmap(), because it's already holding mapping->i_mmap_rwsem:
> > but that is not the right lock to secure an anon_vma lookup.
> 
> Thanks Hugh!  Your analysis is correct and the code in that commit is
> not correct.  I was so focused on the file mapping case, I overlooked
> (actually introduced) this issue for anon mappings.
> 
> Let me verify that this indeed is the root cause.  However, since
> move_pages12 migrated anon hugetlb pages it certainly does look to be
> the case.
> 
> > I intended to send a patch, passing TTU_RMAP_LOCKED only in the
> > !PageAnon case (and, see vma_adjust(), anon_vma lock conveniently
> > nests inside i_mmap_rwsem); but then wondered if i_mmap_rwsem was
> > needed in that case or not, so looked deeper into c0d0381ade79.
> > 
> > Hmm, not even you liked it!  But the worst of it looks simply
> > unnecessary to me, and I hope can be deleted - but better by you
> > than by me (in particular, you were trying to kill 1) and 2) birds
> > with one stone, and I've always given up on understanding hugetlb's
> > reservations: I suspect that side of it is irrelevant here,
> > but I wouldn't pretend to be sure).
> > 
> > How could you ever find a PageAnon page in a vma_shareable() area?
> > 
> > It is all rather confusing (vma_shareable() depending on VM_MAYSHARE,
> > whereas is_cow_mapping() on VM_SHARED and VM_MAYWRITE: they have to
> > be studied together with do_mmap()'s 
> > 			vm_flags |= VM_SHARED | VM_MAYSHARE;
> > 			if (!(file->f_mode & FMODE_WRITE))
> > 				vm_flags &= ~(VM_MAYWRITE | VM_SHARED);
> > 
> > (And let me add to the confusion by admitting that, prior to 3.15's
> > cda540ace6a1 "mm: get_user_pages(write,force) refuse to COW in
> > shared areas", maybe it was possible to find a PageAnon there.)
> > 
> > But my belief (best confirmed by you running your tests with a
> > suitably placed BUG_ON or WARN_ON) is that you'll never find a
> > PageAnon in a vma_shareable() area, so will never need try_to_unmap()
> > to unshare a pagetable in the PageAnon case, so won't need i_mmap_rwsem
> > for PageAnon there, and _get_hugetlb_page_mapping() (your function that
> > deduces an address_space from an anon_vma) can just be deleted.
> 
> Yes, it is confusing.  Let me look into this.  I would be really happy
> to delete that ugly function.
> 
> > (And in passing, may I ask what hugetlb_page_mapping_lock_write()'s
> > hpage->_mapcount inc and dec are for?  You comment it as a hack,
> > but don't explain what needs that hack, and I don't see it.)
> 
> We are trying to lock the mapping (mapping->i_mmap_rwsem).  We know
> mapping is valid, because we obtained it from page_mapping() and it
> will remain valid because we have the page locked.  Page needs to be
> unlocked to unmap.  However, we have to drop page lock in order to
> acquire i_mmap_rwsem.  Once we drop page lock, mapping could become
> invalid.  So, the code code artifically incs mapcount so that mapping
> will remain valid when upmapping page.

No, unless you can point me to some other hugetlbfs-does-it-differently
(and I didn't see it there in that commit), raising _mapcount does not
provide any such protection; but does add the possiblility of a
"BUG: Bad page cache" and leak from unaccount_page_cache_page().

Earlier in the day I was trying to work out what to recommend instead,
but had to turn aside to something else: I'll try again tomorrow.

It's a problem I've faced before in tmpfs, keeping a hold on the
mapping while page lock is dropped.  Quite awkward: igrab() looks as
if it's the right thing to use, but turns out to give no protection
against umount.  Last time around, I ended up with a stop_eviction
count in the shmem inode, which shmem_evict_inode() waits on if
necessary.  Something like that could be done for hugetlbfs too,
but I'd prefer to do it without adding extra, if there is a way.

> 
> As mentioned above, I hope all this can be removed.

If you continue to nest page lock inside i_mmap_rwsem for hugetlbfs,
then I think that part of hugetlb_page_mapping_lock_write() has to
remain.  I'd much prefer that hugetlbfs did not reverse the usual
nesting, but accept that you had reasons for doing it that way.

Hugh

  reply	other threads:[~2020-10-09  5:50 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23  6:13 [PATCH v3 0/8] clean-up the migration target allocation functions js1304
2020-06-23  6:13 ` [PATCH v3 1/8] mm/page_isolation: prefer the node of the source page js1304
2020-06-23  6:13 ` [PATCH v3 2/8] mm/migrate: move migration helper from .h to .c js1304
2020-06-23  6:13 ` [PATCH v3 3/8] mm/hugetlb: unify migration callbacks js1304
2020-06-24 21:18   ` Mike Kravetz
2020-06-25 11:26   ` Michal Hocko
2020-06-26  4:02     ` Joonsoo Kim
2020-06-26  4:02       ` Joonsoo Kim
2020-07-02 16:13       ` Vlastimil Babka
2020-07-03  0:55         ` Joonsoo Kim
2020-07-03  0:55           ` Joonsoo Kim
2020-06-23  6:13 ` [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware js1304
2020-06-25 11:54   ` Michal Hocko
2020-06-26  4:49     ` Joonsoo Kim
2020-06-26  4:49       ` Joonsoo Kim
2020-06-26  7:23       ` Michal Hocko
2020-06-29  6:27         ` Joonsoo Kim
2020-06-29  6:27           ` Joonsoo Kim
2020-06-29  7:55           ` Michal Hocko
2020-06-30  6:30             ` Joonsoo Kim
2020-06-30  6:30               ` Joonsoo Kim
2020-06-30  6:42               ` Michal Hocko
2020-06-30  7:22                 ` Joonsoo Kim
2020-06-30  7:22                   ` Joonsoo Kim
2020-06-30 16:37                   ` Mike Kravetz
2020-06-23  6:13 ` [PATCH v3 5/8] mm/migrate: make a standard migration target allocation function js1304
2020-06-25 12:05   ` Michal Hocko
2020-06-26  5:02     ` Joonsoo Kim
2020-06-26  5:02       ` Joonsoo Kim
2020-06-26  7:33       ` Michal Hocko
2020-06-29  6:41         ` Joonsoo Kim
2020-06-29  6:41           ` Joonsoo Kim
2020-06-29  8:03           ` Michal Hocko
2020-06-30  7:19             ` Joonsoo Kim
2020-06-30  7:19               ` Joonsoo Kim
2020-07-03 15:25   ` Vlastimil Babka
2020-06-23  6:13 ` [PATCH v3 6/8] mm/gup: use a standard migration target allocation callback js1304
2020-06-25 12:08   ` Michal Hocko
2020-06-26  5:03     ` Joonsoo Kim
2020-06-26  5:03       ` Joonsoo Kim
2020-07-03 15:56   ` Vlastimil Babka
2020-07-06  8:34     ` Joonsoo Kim
2020-07-06  8:34       ` Joonsoo Kim
2020-06-23  6:13 ` [PATCH v3 7/8] mm/mempolicy: " js1304
2020-06-25 12:09   ` Michal Hocko
2020-07-03 15:59   ` Vlastimil Babka
2020-07-08  1:20   ` Qian Cai
2020-07-08  6:45     ` Michal Hocko
2020-10-08  3:21     ` Hugh Dickins
2020-10-08  3:21       ` Hugh Dickins
2020-10-08 17:29       ` Mike Kravetz
2020-10-09  5:50         ` Hugh Dickins [this message]
2020-10-09  5:50           ` Hugh Dickins
2020-10-09 17:42           ` Mike Kravetz
2020-10-09 22:23             ` Hugh Dickins
2020-10-09 22:23               ` Hugh Dickins
2020-10-10  0:25               ` Mike Kravetz
2020-06-23  6:13 ` [PATCH v3 8/8] mm/page_alloc: remove a wrapper for alloc_migration_target() js1304
2020-06-25 12:10   ` Michal Hocko
2020-07-03 16:18   ` Vlastimil Babka
2020-07-06  8:44     ` Joonsoo Kim
2020-07-06  8:44       ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2010082216250.10228@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=guro@fb.com \
    --cc=hch@infradead.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.