All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: akpm@linux-foundation.org, andrea@kernel.org,
	kirill@shutemov.name, oleg@redhat.com,
	wenwei.tww@alibaba-inc.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: Re: Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer
Date: Tue, 15 Aug 2017 14:26:21 +0200	[thread overview]
Message-ID: <20170815122621.GE29067@dhcp22.suse.cz> (raw)
In-Reply-To: <201708151006.v7FA6SxD079619@www262.sakura.ne.jp>

On Tue 15-08-17 19:06:28, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 15-08-17 07:51:02, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > [...]
> > > > Were you able to reproduce with other filesystems?
> > > 
> > > Yes, I can reproduce this problem using both xfs and ext4 on 4.11.11-200.fc25.x86_64
> > > on Oracle VM VirtualBox on Windows.
> > 
> > Just a quick question.
> > http://lkml.kernel.org/r/201708112053.FIG52141.tHJSOQFLOFMFOV@I-love.SAKURA.ne.jp
> > mentioned next-20170811 kernel and this one 4.11. Your original report
> > as a reply to this thread
> > http://lkml.kernel.org/r/201708072228.FAJ09347.tOOVOFFQJSHMFL@I-love.SAKURA.ne.jp
> > mentioned next-20170728. None of them seem to have this fix
> > http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org so let
> > me ask again. Have you seen an unexpected content written with that
> > patch applied?
> 
> No. All non-zero non-0xFF values are without that patch applied.
> I want to confirm that that patch actually fixes non-zero non-0xFF values
> (so that we can have better patch description for that patch).

OK, so I have clearly misunderstood you. I thought that you can still
see corrupted content with the patch _applied_. Now I see why I couldn't
reproduce this...

Now I also understand what you meant when asking for an explanation. I
can only speculate how we could end up with the non-zero page previously
but the closest match would be that the page got unmapped and reused by
a different path and a stalled tlb entry would leak the content. Such a
thing would happen if we freed the page _before_ we flushed the tlb
during unmap.

Considering that oom_reaper is relying on unmap_page_range which seems
to be doing the right thing wrt. flushing vs. freeing ordering (enforced
by the tlb_gather) I am wondering what else could go wrong but I vaguely
remember there were some races between THP and MADV_DONTNEED in the
past. Maybe we have hit an incarnation of something like that. Anyway
the oom_reaper doesn't try to be clever and it only calls to
unmap_page_range which should be safe from that context.

The primary bug here was that we allowed to refault an unmmaped memory
and that should be fixed by the patch AFAICS. If there are more issues
we should definitely track those down but those should be oom_reaper
independent because we really do not do anything special here.

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: akpm@linux-foundation.org, andrea@kernel.org,
	kirill@shutemov.name, oleg@redhat.com,
	wenwei.tww@alibaba-inc.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: Re: Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer
Date: Tue, 15 Aug 2017 14:26:21 +0200	[thread overview]
Message-ID: <20170815122621.GE29067@dhcp22.suse.cz> (raw)
In-Reply-To: <201708151006.v7FA6SxD079619@www262.sakura.ne.jp>

On Tue 15-08-17 19:06:28, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 15-08-17 07:51:02, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > [...]
> > > > Were you able to reproduce with other filesystems?
> > > 
> > > Yes, I can reproduce this problem using both xfs and ext4 on 4.11.11-200.fc25.x86_64
> > > on Oracle VM VirtualBox on Windows.
> > 
> > Just a quick question.
> > http://lkml.kernel.org/r/201708112053.FIG52141.tHJSOQFLOFMFOV@I-love.SAKURA.ne.jp
> > mentioned next-20170811 kernel and this one 4.11. Your original report
> > as a reply to this thread
> > http://lkml.kernel.org/r/201708072228.FAJ09347.tOOVOFFQJSHMFL@I-love.SAKURA.ne.jp
> > mentioned next-20170728. None of them seem to have this fix
> > http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org so let
> > me ask again. Have you seen an unexpected content written with that
> > patch applied?
> 
> No. All non-zero non-0xFF values are without that patch applied.
> I want to confirm that that patch actually fixes non-zero non-0xFF values
> (so that we can have better patch description for that patch).

OK, so I have clearly misunderstood you. I thought that you can still
see corrupted content with the patch _applied_. Now I see why I couldn't
reproduce this...

Now I also understand what you meant when asking for an explanation. I
can only speculate how we could end up with the non-zero page previously
but the closest match would be that the page got unmapped and reused by
a different path and a stalled tlb entry would leak the content. Such a
thing would happen if we freed the page _before_ we flushed the tlb
during unmap.

Considering that oom_reaper is relying on unmap_page_range which seems
to be doing the right thing wrt. flushing vs. freeing ordering (enforced
by the tlb_gather) I am wondering what else could go wrong but I vaguely
remember there were some races between THP and MADV_DONTNEED in the
past. Maybe we have hit an incarnation of something like that. Anyway
the oom_reaper doesn't try to be clever and it only calls to
unmap_page_range which should be safe from that context.

The primary bug here was that we allowed to refault an unmmaped memory
and that should be fixed by the patch AFAICS. If there are more issues
we should definitely track those down but those should be oom_reaper
independent because we really do not do anything special here.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-08-15 12:26 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-07 11:38 [PATCH 0/2] mm, oom: fix oom_reaper fallouts Michal Hocko
2017-08-07 11:38 ` Michal Hocko
2017-08-07 11:38 ` [PATCH 1/2] mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS Michal Hocko
2017-08-07 11:38   ` Michal Hocko
2017-08-15  0:49   ` David Rientjes
2017-08-15  0:49     ` David Rientjes
2017-08-07 11:38 ` [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer Michal Hocko
2017-08-07 11:38   ` Michal Hocko
2017-08-08 17:48   ` Andrea Arcangeli
2017-08-08 17:48     ` Andrea Arcangeli
2017-08-08 23:35     ` Tetsuo Handa
2017-08-08 23:35       ` Tetsuo Handa
2017-08-09 18:36       ` Andrea Arcangeli
2017-08-09 18:36         ` Andrea Arcangeli
2017-08-10  8:21     ` Michal Hocko
2017-08-10  8:21       ` Michal Hocko
2017-08-10 13:33       ` Michal Hocko
2017-08-10 13:33         ` Michal Hocko
2017-08-11  2:28   ` Tetsuo Handa
2017-08-11  2:28     ` Tetsuo Handa
2017-08-11  7:09     ` Michal Hocko
2017-08-11  7:09       ` Michal Hocko
2017-08-11  7:54       ` Tetsuo Handa
2017-08-11  7:54         ` Tetsuo Handa
2017-08-11 10:22         ` Andrea Arcangeli
2017-08-11 10:22           ` Andrea Arcangeli
2017-08-11 10:42           ` Andrea Arcangeli
2017-08-11 10:42             ` Andrea Arcangeli
2017-08-11 11:53             ` Tetsuo Handa
2017-08-11 11:53               ` Tetsuo Handa
2017-08-11 12:08         ` Michal Hocko
2017-08-11 12:08           ` Michal Hocko
2017-08-11 15:46           ` Tetsuo Handa
2017-08-11 15:46             ` Tetsuo Handa
2017-08-14 13:59             ` Michal Hocko
2017-08-14 13:59               ` Michal Hocko
2017-08-14 22:51               ` Tetsuo Handa
2017-08-15  6:55                 ` Michal Hocko
2017-08-15  6:55                   ` Michal Hocko
2017-08-15  8:41                 ` Michal Hocko
2017-08-15  8:41                   ` Michal Hocko
2017-08-15 10:06                   ` Tetsuo Handa
2017-08-15 12:26                     ` Michal Hocko [this message]
2017-08-15 12:26                       ` Michal Hocko
2017-08-15 12:58                       ` Tetsuo Handa
2017-08-17 13:58                         ` Michal Hocko
2017-08-17 13:58                           ` Michal Hocko
2017-08-15  5:30               ` Tetsuo Handa
2017-08-07 13:28 ` [PATCH 0/2] mm, oom: fix oom_reaper fallouts Tetsuo Handa
2017-08-07 13:28   ` Tetsuo Handa
2017-08-07 14:04   ` Michal Hocko
2017-08-07 14:04     ` Michal Hocko
2017-08-07 15:23     ` Tetsuo Handa
2017-08-07 15:23       ` Tetsuo Handa
2017-08-15 12:29 ` Michal Hocko
2017-08-15 12:29   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170815122621.GE29067@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@kernel.org \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=wenwei.tww@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.