linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Song Liu <songliubraving@fb.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] khugepaged: retract_page_tables() remember to test exit
Date: Mon, 3 Aug 2020 11:59:35 +0300	[thread overview]
Message-ID: <20200803085935.aama54aie77pq47b@box> (raw)
In-Reply-To: <alpine.LSU.2.11.2008021705240.29126@eggly.anvils>

On Sun, Aug 02, 2020 at 05:35:23PM -0700, Hugh Dickins wrote:
> On Mon, 3 Aug 2020, Kirill A. Shutemov wrote:
> > On Sun, Aug 02, 2020 at 12:16:53PM -0700, Hugh Dickins wrote:
> > > Only once have I seen this scenario (and forgot even to notice what
> > > forced the eventual crash): a sequence of "BUG: Bad page map" alerts
> > > from vm_normal_page(), from zap_pte_range() servicing exit_mmap();
> > > pmd:00000000, pte values corresponding to data in physical page 0.
> > > 
> > > The pte mappings being zapped in this case were supposed to be from a
> > > huge page of ext4 text (but could as well have been shmem): my belief
> > > is that it was racing with collapse_file()'s retract_page_tables(),
> > > found *pmd pointing to a page table, locked it, but *pmd had become
> > > 0 by the time start_pte was decided.
> > > 
> > > In most cases, that possibility is excluded by holding mmap lock;
> > > but exit_mmap() proceeds without mmap lock.  Most of what's run by
> > > khugepaged checks khugepaged_test_exit() after acquiring mmap lock:
> > > khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate()
> > > do so, for example.  But retract_page_tables() did not: fix that
> > > (using an mm variable instead of vma->vm_mm repeatedly).
> > 
> > Hm. I'm not sure I follow. vma->vm_mm has to be valid as long as we hold
> > i_mmap lock, no? Unlinking a VMA requires it.
> 
> Ah, my wording is misleading, yes.  That comment
> "(using an mm variable instead of vma->vm_mm repeatedly)"
> was nothing more than a note, that the patch is bigger than it could be,
> because I decided to use an mm variable, instead of vma->vm_mm repeatedly.
> But it looks as if I'm saying there used to be a need for READ_ONCE() or
> something, and by using the mm variable I was fixing the problem.
> 
> No, sorry: delete that line now the point is made: the mm variable is
> just a patch detail, it's not important.
> 
> The fix (as the subject suggested) is for retract_page_tables() to check
> khugepaged_test_exit(), after acquiring mmap lock, before doing anything
> to the page table.  Getting the mmap lock serializes with __mmput(),
> which briefly takes and drops it in __khugepaged_exit(); then the
> khugepaged_test_exit() check on mm_users makes sure we don't touch the
> page table once exit_mmap() might reach it, since exit_mmap() will be
> proceeding without mmap lock, not expecting anyone to be racing with it.

Okay, makes sense.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

  reply	other threads:[~2020-08-03  8:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-02 19:12 [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Hugh Dickins
2020-08-02 19:15 ` [PATCH] khugepaged: collapse_pte_mapped_thp() protect the pmd lock Hugh Dickins
2020-08-02 21:23   ` Kirill A. Shutemov
2020-08-02 19:16 ` [PATCH] khugepaged: retract_page_tables() remember to test exit Hugh Dickins
2020-08-02 21:44   ` Kirill A. Shutemov
2020-08-03  0:35     ` Hugh Dickins
2020-08-03  8:59       ` Kirill A. Shutemov [this message]
2020-08-02 19:18 ` [PATCH] khugepaged: khugepaged_test_exit() check mmget_still_valid() Hugh Dickins
2020-08-14 22:13   ` [PATCH] khugepaged: adjust VM_BUG_ON_MM() in __khugepaged_enter() Hugh Dickins
2020-08-17 20:50     ` Yang Shi
2020-08-02 21:07 ` [PATCH] khugepaged: collapse_pte_mapped_thp() flush the right range Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200803085935.aama54aie77pq47b@box \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).