From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F204C433DF for ; Sat, 11 Jul 2020 18:13:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2016920748 for ; Sat, 11 Jul 2020 18:13:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="VBBPUfpi" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2016920748 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8506D6B0002; Sat, 11 Jul 2020 14:13:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8000D6B0005; Sat, 11 Jul 2020 14:13:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C7986B0006; Sat, 11 Jul 2020 14:13:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id 54EF76B0002 for ; Sat, 11 Jul 2020 14:13:27 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E9BD82C6D for ; Sat, 11 Jul 2020 18:13:26 +0000 (UTC) X-FDA: 77026592412.27.camp70_5701b8e26ed9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 706823D67A for ; Sat, 11 Jul 2020 18:13:19 +0000 (UTC) X-HE-Tag: camp70_5701b8e26ed9 X-Filterd-Recvd-Size: 8750 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Sat, 11 Jul 2020 18:13:18 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id q4so10152821lji.2 for ; Sat, 11 Jul 2020 11:13:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=nOYGlWMMEU7tyKGZLlNjUtNxoe6S2h/oz0+oUhK+rOk=; b=VBBPUfpiOu0muHbqeP4r3/fUpKQexAHw4DwIqUVNkb43xD+nP+XZ31NsLPUx/vRkRa zzaUKWdXrUMfP1FaX3/iNfjqwE1q1Ok1Kix+hzrcv/inVzkMy6FKF046RABJFAJTQy8g /oNUGPpsZ3sEzfL5OAQbTJjbjvnkfKAkigYrM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nOYGlWMMEU7tyKGZLlNjUtNxoe6S2h/oz0+oUhK+rOk=; b=FYAen2jc+YCRQnXWnVfcA94Wn9gCyTGzaDxPl8E4f2+b7NnTwtGW1f4Pyc09/Umz/M 2DTqMk0kmFHBGAfHIr1mAyJW9JOt0PLxr7qcXSHNysvJk3DwNJ9uF67bV2/UsA+cNgtf xdoLcbgvpjeXGXRr5mu3sbnAiEwR7Sz2YKaYZRXWEzgOAKxPRLZp2OPVyDSzT+Q3eWZA DD5sRQldzdu3kng5GL/uF1zvO/+dq+dDn7lhkEZnz5YIG3Kid6K97ARAmCccl5IToFWI Vkdkko8Zs5GHKIV2sLOFGvJG1VvgkU9KkaxruYKbtqKi+AKVVEfd2xSXlhqP7poQfhUx 2gtg== X-Gm-Message-State: AOAM533G9kktCuAMRwCZr/zqRHUiLh7wQ3dHfsDT1TnjzbKteEoEpLEq 9KvXGvDWHRowNIYL1y54+V9oonvpMlA= X-Google-Smtp-Source: ABdhPJxdSsXuaNs2QsFr+HGc7kFJBkAki1vdY3Pt8rmkhEwi+BMoOUmXjFV994XsJ00+7CCURQv94A== X-Received: by 2002:a2e:91ca:: with SMTP id u10mr39633634ljg.101.1594491197009; Sat, 11 Jul 2020 11:13:17 -0700 (PDT) Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com. [209.85.208.179]) by smtp.gmail.com with ESMTPSA id x30sm3291791lfn.3.2020.07.11.11.13.15 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 11 Jul 2020 11:13:15 -0700 (PDT) Received: by mail-lj1-f179.google.com with SMTP id q4so10152757lji.2 for ; Sat, 11 Jul 2020 11:13:15 -0700 (PDT) X-Received: by 2002:a2e:9b42:: with SMTP id o2mr42771907ljj.102.1594491195002; Sat, 11 Jul 2020 11:13:15 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Sat, 11 Jul 2020 11:12:58 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: WARNING: at mm/mremap.c:211 move_page_tables in i386 To: Naresh Kamboju Cc: Joel Fernandes , "Kirill A. Shutemov" , William Kucharski , linux- stable , open list , linux-mm , Arnd Bergmann , Andrew Morton , Roman Gushchin , Michal Hocko , lkft-triage@lists.linaro.org, Chris Down , Michel Lespinasse , Fan Yang , Brian Geffon , Anshuman Khandual , Will Deacon , Catalin Marinas , pugaowei@gmail.com, Jerome Glisse , Greg Kroah-Hartman , Mel Gorman , Hugh Dickins , Al Viro , Tejun Heo , Sasha Levin , Oleg Nesterov Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 706823D67A X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jul 11, 2020 at 10:27 AM Naresh Kamboju wrote: > > I have started bisecting this problem and found the first bad commit Thanks for the effort. Bisection is often a great tool to figure out what's wrong. Sadly, in this case: > commit 9f132f7e145506efc0744426cb338b18a54afc3b > Author: Joel Fernandes (Google) > Date: Thu Jan 3 15:28:41 2019 -0800 > > mm: select HAVE_MOVE_PMD on x86 for faster mremap Yeah, that's just the commit that enables the code, not the commit that introduces the fundamental problem. That said, this is a prime example of why I absolutely detest patch series that do this kind of thing, and are several patches that create new functionality, followed by one patch to enable it. If you can't get things working incrementally, maybe you shouldn't do them at all. Doing a big series of "hidden work" and then enabling it later is wrong. In this case, though, the real patch that did the code isn't that kind of "big series of hidden work" patch series, it's just the (single) previous commit 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions"). So your bisection is useful, it's just that it really points to that previous commit, and it's where this code was introduced. It's also worth noting that that commit doesn't really *break* anything, since it just falls back to the old behavior when it warns. So to "fix" your test-case, we could just remove the WARN_ON. But the WARN_ON() is still worrisome, because the thing it tests for really _should_ be true. Now, we actually have a known bug in this area that is talked about elsewhere: the way unmap does the pgtable_free() is /* Detach vmas from rbtree */ detach_vmas_to_be_unmapped(mm, vma, prev, end); if (downgrade) mmap_write_downgrade(mm); unmap_region(mm, vma, prev, start, end); (and unmap_region() is what does the pgtable_free() that should have cleared the PMD). And the problem with the "downgrade" is that another thread can change the beginning of the next vma when it's a grow-down region (or the end of the prev one if it's a grow-up). See commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") for the source of that But that requires an _actual_ "unmap()" system call (the others set "downgrade" to false - I'm not entirely sure why), and it requires another thread to be attached to that VM in order to actually do that racy concurrent stack size change. And neither of those two cases will be true for the execve() path. It's a new VM, with just one thread attached, so no threaded munmap() going on there. The fact that it seems to happen with https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/thp/thp01.c makes me think it's somehow related to THP mappings, but I don't see why those would matter. All the same pmd freeing should still have happened, afaik. And the printout I asked for a few days back for when it triggered clearly showed a normal non-huge pmd ("val: 7d530067" is just "accessed, dirty, rw, user and present", which is a perfectly normal page directory entry for 4kB pages, and we could move the whole thing and move 2MB (or 4MB) of aligned virtual memory in one go). Some race with THP splitting and pgtable_free()? I can't see how anything would race in execve(), or how anything would have touched that address below the stack in the first place.. Kirill, Oleg, and reaction from this? Neither of you were on the original email, I think, it's this one: https://lore.kernel.org/lkml/CA+G9fYt+6OeibZMD0fP=O3nqFbcN3O4xcLkjq0mpQbZJ2uxB9w@mail.gmail.com/ and I really think it is harmless in that when the warning triggers, we just go back to the page-by-page code, but while I think the WARN_ON() should probably be downgraded to a WARN_ON_ONCE(), I do think it's showing _something_. I just can't see how this would trigger for execve(). That's just about the _simplest_ case for us: single-threaded, mappings set up purely by load_elf_binary() etc. I'm clearly missing something. Linus