From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39EB7C433E1 for ; Sun, 12 Jul 2020 23:04:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1043C20674 for ; Sun, 12 Jul 2020 23:04:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594595090; bh=zItzV7tGTJkWNA/oxxQuc1H487Z0ZQDfb+cXJt6kZ4U=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=fJpaXU8V8Y4HqVNG5z372YMccnO/idBH4bww3Zz+udLu6mR5mL9WbWycYRdaNceNB SlX+sG6KZPTBkjCdjCF7Zt0KBtuqMhqRB+QPI0f0SH0BQWhzPl1AXaeuCGoLAp1zhU iYxHLFgKu8aCmmyr1gXLipiD3ChbnRe+zOHI9Vik= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728370AbgGLXEs (ORCPT ); Sun, 12 Jul 2020 19:04:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727906AbgGLXEr (ORCPT ); Sun, 12 Jul 2020 19:04:47 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CF62C061794 for ; Sun, 12 Jul 2020 16:04:47 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id dp18so13280439ejc.8 for ; Sun, 12 Jul 2020 16:04:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yo0Mnk4z2ZKKeclS9nj/V3zBsAaD+8xk/39wn8faHH8=; b=JgbUYhmWzr+3//UqU1Ua64whjD+TQo14a7RmhB8c633RX/QOiKcCQbsyY/hFuqiEZ5 BvLqBjYYHepAIfQNocZKo+mB3oyrODeGszIw3KMFLfUKLJ4N1S+uGMefleAq+uA7OUg5 DLKYfe4ZuClUyfP06Vewwx33e/+3F2kX7EtpE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yo0Mnk4z2ZKKeclS9nj/V3zBsAaD+8xk/39wn8faHH8=; b=ee1ZJYydwWugLL9BB3kJA6Y0ANfu2UkH4pAsoIo7a+mgqt4a1QYs9D3fjXtYk+IG6i AlNOAA+f4bGOYk3A2nr1OpQAkROgy6tquZsRgWSrIp0MHfTHBm+B3MUZ4vp9GKFOPPWq JXwW7wyFOm37hkb1OlwAoTFJyYBd+n1kq1T5mfSfsdGLf0jE2JfxdQ6F7ZhD2YDOEGBw DKvtA+DKtLTmSZ+jlTk8C5EMTkB5HHmpcd17kW1NNhHVR8sax9DLRdJtgbNIJ+9HzZe7 g8MNPPNqN4yZ7iy1plaMLXvmz5ycOKq/me0ptGEK/9LulAAYBOaL/Rv2YzDgpnEU/LrV cm/g== X-Gm-Message-State: AOAM530+ummqK3lL/lPJlT4RwNeGFcBeV8jLCUUvrBHw+Zqs+hoBMLu2 a/L/xwwWXnPw2ivORfF4GAGolFHQZag= X-Google-Smtp-Source: ABdhPJzlD4QZKP6BRfbZRKyqXNE71wQ3tfIG7QXWY2uTG+Gk636OAfcNmnOvHrBuf5gqzbAJrsLuUw== X-Received: by 2002:a17:906:6b0c:: with SMTP id q12mr70965968ejr.525.1594595085507; Sun, 12 Jul 2020 16:04:45 -0700 (PDT) Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com. [209.85.208.54]) by smtp.gmail.com with ESMTPSA id u8sm8337336ejm.65.2020.07.12.16.04.45 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 12 Jul 2020 16:04:45 -0700 (PDT) Received: by mail-ed1-f54.google.com with SMTP id dg28so10358562edb.3 for ; Sun, 12 Jul 2020 16:04:45 -0700 (PDT) X-Received: by 2002:a2e:9c92:: with SMTP id x18mr40507440lji.70.1594594702310; Sun, 12 Jul 2020 15:58:22 -0700 (PDT) MIME-Version: 1.0 References: <20200712215041.GA3644504@google.com> In-Reply-To: <20200712215041.GA3644504@google.com> From: Linus Torvalds Date: Sun, 12 Jul 2020 15:58:06 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: WARNING: at mm/mremap.c:211 move_page_tables in i386 To: Joel Fernandes Cc: Naresh Kamboju , linux- stable , open list , linux-mm , Arnd Bergmann , Andrew Morton , Roman Gushchin , Michal Hocko , lkft-triage@lists.linaro.org, Chris Down , Michel Lespinasse , Fan Yang , Brian Geffon , Anshuman Khandual , Will Deacon , Catalin Marinas , pugaowei@gmail.com, Jerome Glisse , Greg Kroah-Hartman , Mel Gorman , Hugh Dickins , Al Viro , Tejun Heo , Sasha Levin Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jul 12, 2020 at 2:50 PM Joel Fernandes wrote: > > I reproduced Naresh's issue on a 32-bit x86 machine and the below patch fixes it. > The issue is solely within execve() itself and the way it allocates/copies the > temporary stack. > > It is actually indeed an overlapping case because the length of the > stack is big enough to cause overlap. The VMA grows quite a bit because of > all the page faults that happen due to the copy of the args/env. Then during > the move of overlapped region, it finds that a PMD is already allocated. Oh, ok, I think I see. So the issue is that while move_normal_pmd() _itself_ will be clearing the old pmd entries when it copies them, the 4kB copies that happened _before_ this time will not have cleared the old pmd that they were in. So we've had a deeper stack, and we've already copied some of it one page at a time, and we're done with those 4kB entries, but now we hit a 2MB-aligned case and want to move that down. But when it does that, it hits the (by now hopefully empty) pmd that used to contain the 4kB entries we copied earlier. So we've cleared the page table, but we've simply never called pgtable_clear() here, because the page table got cleared in move_ptes() when it did that pte = ptep_get_and_clear(mm, old_addr, old_pte); on the previous old pte entries. That makes sense to me. Does that match what you see? Because when I said it wasn't overlapped, I was really talking about that one single pmd move itself not being overlapped. > The below patch fixes it and is not warning anymore in 30 minutes of testing > so far. So I'm not hugely happy with the patch, I have to admit. Why? Because: > + /* Ensure the temporary stack is shifted by atleast its size */ > + if (stack_shift < (vma->vm_end - vma->vm_start)) > + stack_shift = (vma->vm_end - vma->vm_start); This basically overrides the random stack shift done by arch_align_stack(). Yeah, yeah, arch_align_stack() is kind of misnamed. It _used_ to do what the name implies it does, which on x86 is to just give the minimum ABI-specified 16-byte stack alignment. But these days, what it really does is say "align the stack pointer, but also shift it down by a random amount within 8kB so that the start of the stack isn't some repeatable thing that an attacker can trivially control with the right argv/envp size.." I don't think it works right anyway because we then PAGE_ALIGN() it, but whatever. But it looks like what you're doing means that now the size of the stack will override that shift, and that means that now the randomization is entirely undone. No? Plus you do it after we've also checked that we haven't grown the stack down to below mmap_min_addr(). But maybe I misread that patch. But I do feel like you figured out why the bug happened, now we're just discussing whether the patch is the right thing to do. Maybe saying "doing the pmd copies for the initial stack isn't important, so let's just note this as a special case and get rid of the WARN_ON()" might be an alternative solution. The reason I worried was that I felt like we didn't understand why the WARN_ON() happened. Now that I do, I think we could just say "ok, don't warn, we know that this can happen, and it's harmless". Anybody else have any opinions? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D8B7C433E0 for ; Sun, 12 Jul 2020 22:58:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E87CB20674 for ; Sun, 12 Jul 2020 22:58:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="JgbUYhmW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E87CB20674 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 52BCF8D0003; Sun, 12 Jul 2020 18:58:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B6838D0002; Sun, 12 Jul 2020 18:58:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A5CF8D0003; Sun, 12 Jul 2020 18:58:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0233.hostedemail.com [216.40.44.233]) by kanga.kvack.org (Postfix) with ESMTP id 223628D0002 for ; Sun, 12 Jul 2020 18:58:28 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 80D28181AEF09 for ; Sun, 12 Jul 2020 22:58:27 +0000 (UTC) X-FDA: 77030939454.05.card42_3e0338d26ee3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 4886F1801A0A2 for ; Sun, 12 Jul 2020 22:58:27 +0000 (UTC) X-HE-Tag: card42_3e0338d26ee3 X-Filterd-Recvd-Size: 7636 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Sun, 12 Jul 2020 22:58:26 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id d17so13634676ljl.3 for ; Sun, 12 Jul 2020 15:58:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yo0Mnk4z2ZKKeclS9nj/V3zBsAaD+8xk/39wn8faHH8=; b=JgbUYhmWzr+3//UqU1Ua64whjD+TQo14a7RmhB8c633RX/QOiKcCQbsyY/hFuqiEZ5 BvLqBjYYHepAIfQNocZKo+mB3oyrODeGszIw3KMFLfUKLJ4N1S+uGMefleAq+uA7OUg5 DLKYfe4ZuClUyfP06Vewwx33e/+3F2kX7EtpE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yo0Mnk4z2ZKKeclS9nj/V3zBsAaD+8xk/39wn8faHH8=; b=q24GUWQYHDyvCQRLi6v8qbiZF11qaA3tNwpAl+ag0UnS0kCUFTXjKa8ptnqlj2lMFe 7rQ97cCj39Vgq6K+ms0OalgIJkD5yk17aY60YOpkVaWp8HkCczbWcDs1Bfqq+g6njjAo adsbUo6wqdd66A7cNKf7GrOGRh9fnE8v6jBKBsQbrfqIg4C327oL2Rx0cLg1uQnAFphu sthQby1kLMbx7AAQcnCB1/WIWmJO9ibMHhOF5hh8u4baK4tjoERTgh8cfSNwX49dY96F AIehNez5zcFOYXraCoPmRpNfuAmkY18U/mKj/zY34WoVYf8PQmqCJyMeRF9KuKlQzocT rdQA== X-Gm-Message-State: AOAM530LXm2wxLhdBNlgvtYyQgyrbn+rjbBAA7WrOtPNWaB6mvSkGH6T 366yGwVRm55WTJlcFy7Y1Wmp1D+PMR0= X-Google-Smtp-Source: ABdhPJx0o6M42l+k698tW/+OnAfax+59B1CCT8HDtsRp/Sq4Cw/Tn3iyXEwWTNZsigZKIc/58ufOfg== X-Received: by 2002:a2e:984a:: with SMTP id e10mr43670931ljj.248.1594594704682; Sun, 12 Jul 2020 15:58:24 -0700 (PDT) Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com. [209.85.208.175]) by smtp.gmail.com with ESMTPSA id 24sm4238519lfy.59.2020.07.12.15.58.22 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 12 Jul 2020 15:58:23 -0700 (PDT) Received: by mail-lj1-f175.google.com with SMTP id d17so13634531ljl.3 for ; Sun, 12 Jul 2020 15:58:22 -0700 (PDT) X-Received: by 2002:a2e:9c92:: with SMTP id x18mr40507440lji.70.1594594702310; Sun, 12 Jul 2020 15:58:22 -0700 (PDT) MIME-Version: 1.0 References: <20200712215041.GA3644504@google.com> In-Reply-To: <20200712215041.GA3644504@google.com> From: Linus Torvalds Date: Sun, 12 Jul 2020 15:58:06 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: WARNING: at mm/mremap.c:211 move_page_tables in i386 To: Joel Fernandes Cc: Naresh Kamboju , linux- stable , open list , linux-mm , Arnd Bergmann , Andrew Morton , Roman Gushchin , Michal Hocko , lkft-triage@lists.linaro.org, Chris Down , Michel Lespinasse , Fan Yang , Brian Geffon , Anshuman Khandual , Will Deacon , Catalin Marinas , pugaowei@gmail.com, Jerome Glisse , Greg Kroah-Hartman , Mel Gorman , Hugh Dickins , Al Viro , Tejun Heo , Sasha Levin Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4886F1801A0A2 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jul 12, 2020 at 2:50 PM Joel Fernandes wrote: > > I reproduced Naresh's issue on a 32-bit x86 machine and the below patch fixes it. > The issue is solely within execve() itself and the way it allocates/copies the > temporary stack. > > It is actually indeed an overlapping case because the length of the > stack is big enough to cause overlap. The VMA grows quite a bit because of > all the page faults that happen due to the copy of the args/env. Then during > the move of overlapped region, it finds that a PMD is already allocated. Oh, ok, I think I see. So the issue is that while move_normal_pmd() _itself_ will be clearing the old pmd entries when it copies them, the 4kB copies that happened _before_ this time will not have cleared the old pmd that they were in. So we've had a deeper stack, and we've already copied some of it one page at a time, and we're done with those 4kB entries, but now we hit a 2MB-aligned case and want to move that down. But when it does that, it hits the (by now hopefully empty) pmd that used to contain the 4kB entries we copied earlier. So we've cleared the page table, but we've simply never called pgtable_clear() here, because the page table got cleared in move_ptes() when it did that pte = ptep_get_and_clear(mm, old_addr, old_pte); on the previous old pte entries. That makes sense to me. Does that match what you see? Because when I said it wasn't overlapped, I was really talking about that one single pmd move itself not being overlapped. > The below patch fixes it and is not warning anymore in 30 minutes of testing > so far. So I'm not hugely happy with the patch, I have to admit. Why? Because: > + /* Ensure the temporary stack is shifted by atleast its size */ > + if (stack_shift < (vma->vm_end - vma->vm_start)) > + stack_shift = (vma->vm_end - vma->vm_start); This basically overrides the random stack shift done by arch_align_stack(). Yeah, yeah, arch_align_stack() is kind of misnamed. It _used_ to do what the name implies it does, which on x86 is to just give the minimum ABI-specified 16-byte stack alignment. But these days, what it really does is say "align the stack pointer, but also shift it down by a random amount within 8kB so that the start of the stack isn't some repeatable thing that an attacker can trivially control with the right argv/envp size.." I don't think it works right anyway because we then PAGE_ALIGN() it, but whatever. But it looks like what you're doing means that now the size of the stack will override that shift, and that means that now the randomization is entirely undone. No? Plus you do it after we've also checked that we haven't grown the stack down to below mmap_min_addr(). But maybe I misread that patch. But I do feel like you figured out why the bug happened, now we're just discussing whether the patch is the right thing to do. Maybe saying "doing the pmd copies for the initial stack isn't important, so let's just note this as a special case and get rid of the WARN_ON()" might be an alternative solution. The reason I worried was that I felt like we didn't understand why the WARN_ON() happened. Now that I do, I think we could just say "ok, don't warn, we know that this can happen, and it's harmless". Anybody else have any opinions? Linus