linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Rafael Aquini <aquini@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, mhocko@suse.com, vbabka@suse.cz,
	kirill.shutemov@linux.intel.com
Subject: Re: [PATCH] mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
Date: Sun, 16 Feb 2020 23:32:33 +0000	[thread overview]
Message-ID: <20200216233232.GZ3420@suse.de> (raw)
In-Reply-To: <20200216191800.22423-1-aquini@redhat.com>

On Sun, Feb 16, 2020 at 02:18:00PM -0500, Rafael Aquini wrote:
> From: Mel Gorman <mgorman@techsingularity.net>
>   A user reported a bug against a distribution kernel while running
>   a proprietary workload described as "memory intensive that is not
>   swapping" that is expected to apply to mainline kernels. The workload
>   is read/write/modifying ranges of memory and checking the contents. They
>   reported that within a few hours that a bad PMD would be reported followed
>   by a memory corruption where expected data was all zeros.  A partial report
>   of the bad PMD looked like
> 
>   [ 5195.338482] ../mm/pgtable-generic.c:33: bad pmd ffff8888157ba008(000002e0396009e2)
>   [ 5195.341184] ------------[ cut here ]------------
>   [ 5195.356880] kernel BUG at ../mm/pgtable-generic.c:35!
>   ....
>   [ 5195.410033] Call Trace:
>   [ 5195.410471]  [<ffffffff811bc75d>] change_protection_range+0x7dd/0x930
>   [ 5195.410716]  [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
>   [ 5195.410918]  [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
>   [ 5195.411200]  [<ffffffff81098322>] task_work_run+0x72/0x90
>   [ 5195.411246]  [<ffffffff81077139>] exit_to_usermode_loop+0x91/0xc2
>   [ 5195.411494]  [<ffffffff81003a51>] prepare_exit_to_usermode+0x31/0x40
>   [ 5195.411739]  [<ffffffff815e56af>] retint_user+0x8/0x10
> 
>   Decoding revealed that the PMD was a valid prot_numa PMD and the bad PMD
>   was a false detection. The bug does not trigger if automatic NUMA balancing
>   or transparent huge pages is disabled.
> 
>   The bug is due a race in change_pmd_range between a pmd_trans_huge and
>   pmd_nond_or_clear_bad check without any locks held. During the pmd_trans_huge
>   check, a parallel protection update under lock can have cleared the PMD
>   and filled it with a prot_numa entry between the transhuge check and the
>   pmd_none_or_clear_bad check.
> 
>   While this could be fixed with heavy locking, it's only necessary to
>   make a copy of the PMD on the stack during change_pmd_range and avoid
>   races. A new helper is created for this as the check if quite subtle and the
>   existing similar helpful is not suitable. This passed 154 hours of testing
>   (usually triggers between 20 minutes and 24 hours) without detecting bad
>   PMDs or corruption. A basic test of an autonuma-intensive workload showed
>   no significant change in behaviour.
> 
> Although Mel withdrew the patch on the face of LKML comment https://lkml.org/lkml/2017/4/10/922
> the race window aforementioned is still open, and we have reports of Linpack test reporting bad
> residuals after the bad PMD warning is observed. In addition to that, bad rss-counter and
> non-zero pgtables assertions are triggered on mm teardown for the task hitting the bad PMD.
> 
>  host kernel: mm/pgtable-generic.c:40: bad pmd 00000000b3152f68(8000000d2d2008e7)
>  ....
>  host kernel: BUG: Bad rss-counter state mm:00000000b583043d idx:1 val:512
>  host kernel: BUG: non-zero pgtables_bytes on freeing mm: 4096
> 
> The issue is observed on a v4.18-based distribution kernel, but the race window is
> expected to be applicable to mainline kernels, as well.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Cc: stable@vger.kernel.org
> Signed-off-by: Rafael Aquini <aquini@redhat.com>

It's curious that it took so long for this to be caught again.
Unfortunately I cannot find exactly what it's racing against but maybe
it's not worth chasing down and the patch is simply the safer option :(

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2020-02-16 23:32 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-16 19:18 [PATCH] mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa Rafael Aquini
2020-02-16 23:32 ` Mel Gorman [this message]
2020-03-07  2:40 ` Qian Cai
2020-03-07  3:05   ` Rafael Aquini
2020-03-08  3:20     ` Qian Cai
2020-03-08 23:14       ` Rafael Aquini
2020-03-09  3:27         ` Qian Cai
2020-03-09 15:05           ` Rafael Aquini
2020-03-11  0:04             ` Qian Cai
  -- strict thread matches above, loose matches on Subject: below --
2017-04-10  9:48 [PATCH] mm, numa: Fix " Mel Gorman
2017-04-10 10:03 ` Vlastimil Babka
2017-04-10 12:19   ` Mel Gorman
2017-04-10 12:38 ` Rik van Riel
2017-04-10 13:53 ` Michal Hocko
2017-04-10 17:38   ` Mel Gorman
2017-04-10 16:45 ` Zi Yan
2017-04-10 17:20   ` Mel Gorman
2017-04-10 17:49     ` Zi Yan
2017-04-10 18:07       ` Mel Gorman
2017-04-10 22:09         ` Andrew Morton
2017-04-10 22:28           ` Zi Yan
2017-04-11  6:35             ` Vlastimil Babka
2017-04-11 21:44               ` Andrew Morton
2017-04-11  8:29           ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200216233232.GZ3420@suse.de \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).