linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: syzbot <syzbot+b591856e0f0139f83023@syzkaller.appspotmail.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] [mm?] kernel BUG in vma_replace_policy
Date: Thu, 14 Sep 2023 22:21:07 +0000	[thread overview]
Message-ID: <CAJuCfpGRSJhBBZop_L-UubuveUWBca4YtyPBzM2KZGEx7iwhXg@mail.gmail.com> (raw)
In-Reply-To: <ZQN58hFWfgn+OfvG@casper.infradead.org>

On Thu, Sep 14, 2023 at 9:24 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Sep 14, 2023 at 08:53:59PM +0000, Suren Baghdasaryan wrote:
> > On Thu, Sep 14, 2023 at 8:00 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Thu, Sep 14, 2023 at 7:09 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Thu, Sep 14, 2023 at 06:20:56PM +0000, Suren Baghdasaryan wrote:
> > > > > I think I found the problem and the explanation is much simpler. While
> > > > > walking the page range, queue_folios_pte_range() encounters an
> > > > > unmovable page and queue_folios_pte_range() returns 1. That causes a
> > > > > break from the loop inside walk_page_range() and no more VMAs get
> > > > > locked. After that the loop calling mbind_range() walks over all VMAs,
> > > > > even the ones which were skipped by queue_folios_pte_range() and that
> > > > > causes this BUG assertion.
> > > > >
> > > > > Thinking what's the right way to handle this situation (what's the
> > > > > expected behavior here)...
> > > > > I think the safest way would be to modify walk_page_range() and make
> > > > > it continue calling process_vma_walk_lock() for all VMAs in the range
> > > > > even when __walk_page_range() returns a positive err. Any objection or
> > > > > alternative suggestions?
> > > >
> > > > So we only return 1 here if MPOL_MF_MOVE* & MPOL_MF_STRICT were
> > > > specified.  That means we're going to return an error, no matter what,
> > > > and there's no point in calling mbind_range().  Right?
> > > >
> > > > +++ b/mm/mempolicy.c
> > > > @@ -1334,6 +1334,8 @@ static long do_mbind(unsigned long start, unsigned long len,
> > > >         ret = queue_pages_range(mm, start, end, nmask,
> > > >                           flags | MPOL_MF_INVERT, &pagelist, true);
> > > >
> > > > +       if (ret == 1)
> > > > +               ret = -EIO;
> > > >         if (ret < 0) {
> > > >                 err = ret;
> > > >                 goto up_out;
> > > >
> > > > (I don't really understand this code, so it can't be this simple, can
> > > > it?  Why don't we just return -EIO from queue_folios_pte_range() if
> > > > this is the right answer?)
> > >
> > > Yeah, I'm trying to understand the expected behavior of this function
> > > to make sure we are not missing anything. I tried a simple fix that I
> > > suggested in my previous email and it works but I want to understand a
> > > bit more about this function's logic before posting the fix.
> >
> > So, current functionality is that after queue_pages_range() encounters
> > an unmovable page, terminates the loop and returns 1, mbind_range()
> > will still be called for the whole range
> > (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1345),
> > all pages in the pagelist will be migrated
> > (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1355)
> > and only after that the -EIO code will be returned
> > (https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1362).
> > So, if we follow Matthew's suggestion we will be altering the current
> > behavior which I assume is not what we want to do.
>
> Right, I'm intentionally changing the behaviour.  My thinking is
> that mbind(MPOL_MF_MOVE | MPOL_MF_STRICT) is going to fail.  Should
> such a failure actually move the movable pages before reporting that
> it failed?  I don't know.
>
> > The simple fix I was thinking about that would not alter this behavior
> > is smth like this:
>
> I don't like it, but can we run it past syzbot to be sure it solves the
> issue and we're not chasing a ghost here?

Yes, I just finished running the reproducer on both upstream and
linux-next builds listed in
https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023 and the
problem does not happen anymore.
I'm fine with your suggestion too, just wanted to point out it would
introduce change in the behavior. Let me know how you want to proceed.

  reply	other threads:[~2023-09-14 22:21 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06  1:03 [syzbot] [mm?] kernel BUG in vma_replace_policy syzbot
     [not found] ` <20230906061902.591996-1-eadavis@sina.com>
2023-09-06 12:06   ` [PATCH] mm: as the same logic with queue_pages_range Matthew Wilcox
2023-09-12  5:20   ` kernel test robot
2023-09-13  9:10     ` [LTP] " Cyril Hrubis
2023-09-08 18:04 ` [syzbot] [mm?] kernel BUG in vma_replace_policy syzbot
2023-09-12  5:30 ` Matthew Wilcox
2023-09-12  6:09   ` syzbot
2023-09-12 14:55   ` Matthew Wilcox
2023-09-12 15:03     ` Suren Baghdasaryan
2023-09-12 16:00       ` Suren Baghdasaryan
2023-09-13 16:05         ` Suren Baghdasaryan
2023-09-13 16:46           ` Suren Baghdasaryan
2023-09-14 18:20             ` Suren Baghdasaryan
2023-09-14 19:09               ` Matthew Wilcox
2023-09-14 20:00                 ` Suren Baghdasaryan
2023-09-14 20:53                   ` Suren Baghdasaryan
2023-09-14 21:24                     ` Matthew Wilcox
2023-09-14 22:21                       ` Suren Baghdasaryan [this message]
2023-09-15  4:26                         ` Hugh Dickins
2023-09-15 16:09                           ` Suren Baghdasaryan
2023-09-15 18:05                             ` Suren Baghdasaryan
2023-09-16  2:43                               ` Hugh Dickins
2023-09-18 21:20                                 ` Suren Baghdasaryan
2023-09-15 18:26                           ` Matthew Wilcox
2023-09-16  2:54                             ` Hugh Dickins
2023-09-16  1:35                           ` Yang Shi
2023-09-16  3:57                             ` Hugh Dickins
2023-09-18 22:34                               ` Yang Shi
2023-09-19  0:34                                 ` Hugh Dickins
     [not found] <20230909034207.5816-1-hdanton@sina.com>
2023-09-09  4:43 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpGRSJhBBZop_L-UubuveUWBca4YtyPBzM2KZGEx7iwhXg@mail.gmail.com \
    --to=surenb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=syzbot+b591856e0f0139f83023@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).