From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7046EC43381 for ; Wed, 3 Mar 2021 19:07:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2FB3764EEE for ; Wed, 3 Mar 2021 19:07:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1578012AbhCCSQQ (ORCPT ); Wed, 3 Mar 2021 13:16:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232619AbhCCP50 (ORCPT ); Wed, 3 Mar 2021 10:57:26 -0500 Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9766C061756; Wed, 3 Mar 2021 07:56:08 -0800 (PST) Received: by mail-io1-xd33.google.com with SMTP id f20so26229394ioo.10; Wed, 03 Mar 2021 07:56:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DhBVhWESGWEO8S/PFgm/8fSWVJaTXZ1s1zCrVDAHRoI=; b=Eq/s7Dg7Y+Qwoev/X99QbmvDCxjI5uR+6e/ZKFIpIOGE1pSxexFtahRiTo29AVYYFf bBHIJrIe0f+4RAzv1ieCmuBFF7qGWhAVOZQXTlkOwBOoWe2/0Dr+HmzPpOngaf7trX/S joj+ovVKKQ54Q8eME0uTzPV6eI0yTlHoctt4Wo4Y4Gswr73ocE5UZRSapEcDnzcvglBx AaYx6nZc4OY6K6sjI3MAb1VUK6hLS+Hx9diuSFUW+JUbR9aIJi2sc/Vt8ZLxY3OYTCBG Unnx8pM/lDHFV2KOhhROBc1PlUE6V8F++cnWHPdHvpoDYiTABMP75t2mZIcV0mebRZrq E5EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DhBVhWESGWEO8S/PFgm/8fSWVJaTXZ1s1zCrVDAHRoI=; b=q0Vz0ttoizsQCRKnQmjjQLmUThKyS5szGTQoahDlyCHDNYl12fad/9usVDpRBL6vHg 0Cneb3/nGv8VWBT3qO/nIImG/0ms2tKgUR9Z69315rEVF5nojy59Vm5zW0u4qxf4CDpP H8d3uUMowlCAD82NZg4Rn34ypJN1bUG5fMnM/aUug2Fw3igK+ff7WEyzfF8zGzgXne/O /EIkYp/KRC0Qcyr94AK7YnxDqJh0TeAssGEd5NOl1v5KoSGOwEJTZQMhYseAWT1WJcNM cCVErc3wECxc+wp+2bZ1Zf4MrgGJU5kV93vIx2hYhspuhlmPRLcwkgzyNBYVqNLkpP4d FTDg== X-Gm-Message-State: AOAM533ClYRm5IjxCduKPXrSwlVTbeZCipGwXnr2PH9RQVVWnlXUu/7i xYhCxKj/b8NgQaPNRNwF8Rh/zD0VRn2bxi1/x1A= X-Google-Smtp-Source: ABdhPJwrurvlTKj5oFwlNFzBPLm+J3XtEfY4SQJ40oWWnastmW0/AoQ7GlEi1QHrfuu0Y3Lgw1IPw747u1M/nJTzrlQ= X-Received: by 2002:a5e:cb4d:: with SMTP id h13mr17136094iok.68.1614786967796; Wed, 03 Mar 2021 07:56:07 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ilya Lipnitskiy Date: Wed, 3 Mar 2021 07:55:56 -0800 Message-ID: Subject: Re: exec error: BUG: Bad rss-counter To: "Eric W. Biederman" Cc: Linux-MM , Linux Kernel Mailing List , linux-fsdevel , Kees Cook , Christoph Hellwig , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 3, 2021 at 7:50 AM Eric W. Biederman wrote: > > Ilya Lipnitskiy writes: > > > On Tue, Mar 2, 2021 at 11:37 AM Eric W. Biederman wrote: > >> > >> Ilya Lipnitskiy writes: > >> > >> > On Mon, Mar 1, 2021 at 12:43 PM Eric W. Biederman wrote: > >> >> > >> >> Ilya Lipnitskiy writes: > >> >> > >> >> > Eric, All, > >> >> > > >> >> > The following error appears when running Linux 5.10.18 on an embedded > >> >> > MIPS mt7621 target: > >> >> > [ 0.301219] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1 > >> >> > > >> >> > Being a very generic error, I started digging and added a stack dump > >> >> > before the BUG: > >> >> > Call Trace: > >> >> > [<80008094>] show_stack+0x30/0x100 > >> >> > [<8033b238>] dump_stack+0xac/0xe8 > >> >> > [<800285e8>] __mmdrop+0x98/0x1d0 > >> >> > [<801a6de8>] free_bprm+0x44/0x118 > >> >> > [<801a86a8>] kernel_execve+0x160/0x1d8 > >> >> > [<800420f4>] call_usermodehelper_exec_async+0x114/0x194 > >> >> > [<80003198>] ret_from_kernel_thread+0x14/0x1c > >> >> > > >> >> > So that's how I got to looking at fs/exec.c and noticed quite a few > >> >> > changes last year. Turns out this message only occurs once very early > >> >> > at boot during the very first call to kernel_execve. current->mm is > >> >> > NULL at this stage, so acct_arg_size() is effectively a no-op. > >> >> > >> >> If you believe this is a new error you could bisect the kernel > >> >> to see which change introduced the behavior you are seeing. > >> >> > >> >> > More digging, and I traced the RSS counter increment to: > >> >> > [<8015adb4>] add_mm_counter_fast+0xb4/0xc0 > >> >> > [<80160d58>] handle_mm_fault+0x6e4/0xea0 > >> >> > [<80158aa4>] __get_user_pages.part.78+0x190/0x37c > >> >> > [<8015992c>] __get_user_pages_remote+0x128/0x360 > >> >> > [<801a6d9c>] get_arg_page+0x34/0xa0 > >> >> > [<801a7394>] copy_string_kernel+0x194/0x2a4 > >> >> > [<801a880c>] kernel_execve+0x11c/0x298 > >> >> > [<800420f4>] call_usermodehelper_exec_async+0x114/0x194 > >> >> > [<80003198>] ret_from_kernel_thread+0x14/0x1c > >> >> > > >> >> > In fact, I also checked vma_pages(bprm->vma) and lo and behold it is set to 1. > >> >> > > >> >> > How is fs/exec.c supposed to handle implied RSS increments that happen > >> >> > due to page faults when discarding the bprm structure? In this case, > >> >> > the bug-generating kernel_execve call never succeeded, it returned -2, > >> >> > but I didn't trace exactly what failed. > >> >> > >> >> Unless I am mistaken any left over pages should be purged by exit_mmap > >> >> which is called by mmput before mmput calls mmdrop. > >> > Good to know. Some more digging and I can say that we hit this error > >> > when trying to unmap PFN 0 (is_zero_pfn(pfn) returns TRUE, > >> > vm_normal_page returns NULL, zap_pte_range does not decrement > >> > MM_ANONPAGES RSS counter). Is my understanding correct that PFN 0 is > >> > usable, but special? Or am I totally off the mark here? > >> > >> It would be good to know if that is the page that get_user_pages_remote > >> returned to copy_string_kernel. The zero page that is always zero, > >> should never be returned when a writable mapping is desired. > > > > Indeed, pfn 0 is returned from get_arg_page: (page is 0x809cf000, > > page_to_pfn(page) is 0) and it is the same page that is being freed and not > > refcounted in mmput/zap_pte_range. Confirmed with good old printk. Also, > > ZERO_PAGE(0)==0x809fc000 -> PFN 5120. > > > > I think I have found the problem though, after much digging and thanks to all > > the information provided. init_zero_pfn() gets called too late (after > > the call to > > is_zero_pfn(0) from mmput returns true), until then zero_pfn == 0, and after, > > zero_pfn == 5120. Boom. > > > > So PFN 0 is special, but only for a little bit, enough for something > > on my system > > to call kernel_execve :) > > > > Question: is my system not supposed to be calling kernel_execve this > > early or does > > init_zero_pfn() need to happen earlier? init_zero_pfn is currently a > > core_initcall. > > Looking quickly it seems that init_zero_pfn() is in mm/memory.c and is > common for both mips and x86. Further it appears init_zero_pfn() has > been that was since 2009 a13ea5b75964 ("mm: reinstate ZERO_PAGE"). > > Given the testing that x86 gets and that nothing like this has been > reported it looks like whatever driver is triggering the kernel_execve > is doing something wrong. > > Because honestly. If the zero page isn't working there is not a chance > that anything in userspace is working so it is clearly much too early. > > I suspect there is some driver that is initialized very early that is > doing something that looks innocuous (like triggering a hotplug event) > and that happens to cause a call_usermode_helper which then calls > kernel_execve. I will investigate the offenders more closely. However, I do not notice this behavior on the same system based on the 5.4 kernel. Is it possible that last year's exec changes have exposed this issue? Not blaming exec at all, just making sure I understand the problem better. Ilya From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47F25C433E6 for ; Wed, 3 Mar 2021 15:56:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C8DB364EE6 for ; Wed, 3 Mar 2021 15:56:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8DB364EE6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5D89D8D017B; Wed, 3 Mar 2021 10:56:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 589648D0157; Wed, 3 Mar 2021 10:56:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DABA8D017B; Wed, 3 Mar 2021 10:56:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id 195ED8D0157 for ; Wed, 3 Mar 2021 10:56:15 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C5E4E181ECAE6 for ; Wed, 3 Mar 2021 15:56:14 +0000 (UTC) X-FDA: 77879014668.03.83C4CF9 Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by imf28.hostedemail.com (Postfix) with ESMTP id 2779420053C1 for ; Wed, 3 Mar 2021 15:56:08 +0000 (UTC) Received: by mail-io1-f50.google.com with SMTP id n132so14390378iod.0 for ; Wed, 03 Mar 2021 07:56:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DhBVhWESGWEO8S/PFgm/8fSWVJaTXZ1s1zCrVDAHRoI=; b=Eq/s7Dg7Y+Qwoev/X99QbmvDCxjI5uR+6e/ZKFIpIOGE1pSxexFtahRiTo29AVYYFf bBHIJrIe0f+4RAzv1ieCmuBFF7qGWhAVOZQXTlkOwBOoWe2/0Dr+HmzPpOngaf7trX/S joj+ovVKKQ54Q8eME0uTzPV6eI0yTlHoctt4Wo4Y4Gswr73ocE5UZRSapEcDnzcvglBx AaYx6nZc4OY6K6sjI3MAb1VUK6hLS+Hx9diuSFUW+JUbR9aIJi2sc/Vt8ZLxY3OYTCBG Unnx8pM/lDHFV2KOhhROBc1PlUE6V8F++cnWHPdHvpoDYiTABMP75t2mZIcV0mebRZrq E5EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DhBVhWESGWEO8S/PFgm/8fSWVJaTXZ1s1zCrVDAHRoI=; b=Qhmiwkrz4ixqpFp7z7fWUZmQi4/75sC7gk7rjEI/MHo2xWRhq/Z3KafAN+i9WZ/k4P l8sL7nu25DIQEH0ViUw35vJy3mGDWL20yTJEQD0CSwmcWIbV0Rdg5DoNUIh0NuuyMMlF SLa4CmqXFBXqp70Kj/UodXL/XIBZb3e0Hz/HzqojatBfVg/Cf6WDwqRPv1HSxn9scj6H iXChurznCTqtnFmV8EEz4PYVqs5QmBzhE1JP1/FapIaWYoct06AwD2qyhKBD+3wBqgpY nTk9CluGmFJQIxSRNviNj1Uzq5t80dQiPGZTKuIT1xa70GZGp/o2OLxValDWQSr59xrx of5g== X-Gm-Message-State: AOAM5313yz5vyDWve++eaJ5AhywUbPaku84KZoNuAfbzgCLljzvH4Ok8 8ijyilAOo1XanOXAo19Z07xLsME3JCn66taZbnI= X-Google-Smtp-Source: ABdhPJwrurvlTKj5oFwlNFzBPLm+J3XtEfY4SQJ40oWWnastmW0/AoQ7GlEi1QHrfuu0Y3Lgw1IPw747u1M/nJTzrlQ= X-Received: by 2002:a5e:cb4d:: with SMTP id h13mr17136094iok.68.1614786967796; Wed, 03 Mar 2021 07:56:07 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ilya Lipnitskiy Date: Wed, 3 Mar 2021 07:55:56 -0800 Message-ID: Subject: Re: exec error: BUG: Bad rss-counter To: "Eric W. Biederman" Cc: Linux-MM , Linux Kernel Mailing List , linux-fsdevel , Kees Cook , Christoph Hellwig , Linus Torvalds Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: zpnntkw3joomuw9cytu3dewsr5ufc8f3 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2779420053C1 Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from=""; helo=mail-io1-f50.google.com; client-ip=209.85.166.50 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614786968-145589 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 3, 2021 at 7:50 AM Eric W. Biederman wrote: > > Ilya Lipnitskiy writes: > > > On Tue, Mar 2, 2021 at 11:37 AM Eric W. Biederman wrote: > >> > >> Ilya Lipnitskiy writes: > >> > >> > On Mon, Mar 1, 2021 at 12:43 PM Eric W. Biederman wrote: > >> >> > >> >> Ilya Lipnitskiy writes: > >> >> > >> >> > Eric, All, > >> >> > > >> >> > The following error appears when running Linux 5.10.18 on an embedded > >> >> > MIPS mt7621 target: > >> >> > [ 0.301219] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1 > >> >> > > >> >> > Being a very generic error, I started digging and added a stack dump > >> >> > before the BUG: > >> >> > Call Trace: > >> >> > [<80008094>] show_stack+0x30/0x100 > >> >> > [<8033b238>] dump_stack+0xac/0xe8 > >> >> > [<800285e8>] __mmdrop+0x98/0x1d0 > >> >> > [<801a6de8>] free_bprm+0x44/0x118 > >> >> > [<801a86a8>] kernel_execve+0x160/0x1d8 > >> >> > [<800420f4>] call_usermodehelper_exec_async+0x114/0x194 > >> >> > [<80003198>] ret_from_kernel_thread+0x14/0x1c > >> >> > > >> >> > So that's how I got to looking at fs/exec.c and noticed quite a few > >> >> > changes last year. Turns out this message only occurs once very early > >> >> > at boot during the very first call to kernel_execve. current->mm is > >> >> > NULL at this stage, so acct_arg_size() is effectively a no-op. > >> >> > >> >> If you believe this is a new error you could bisect the kernel > >> >> to see which change introduced the behavior you are seeing. > >> >> > >> >> > More digging, and I traced the RSS counter increment to: > >> >> > [<8015adb4>] add_mm_counter_fast+0xb4/0xc0 > >> >> > [<80160d58>] handle_mm_fault+0x6e4/0xea0 > >> >> > [<80158aa4>] __get_user_pages.part.78+0x190/0x37c > >> >> > [<8015992c>] __get_user_pages_remote+0x128/0x360 > >> >> > [<801a6d9c>] get_arg_page+0x34/0xa0 > >> >> > [<801a7394>] copy_string_kernel+0x194/0x2a4 > >> >> > [<801a880c>] kernel_execve+0x11c/0x298 > >> >> > [<800420f4>] call_usermodehelper_exec_async+0x114/0x194 > >> >> > [<80003198>] ret_from_kernel_thread+0x14/0x1c > >> >> > > >> >> > In fact, I also checked vma_pages(bprm->vma) and lo and behold it is set to 1. > >> >> > > >> >> > How is fs/exec.c supposed to handle implied RSS increments that happen > >> >> > due to page faults when discarding the bprm structure? In this case, > >> >> > the bug-generating kernel_execve call never succeeded, it returned -2, > >> >> > but I didn't trace exactly what failed. > >> >> > >> >> Unless I am mistaken any left over pages should be purged by exit_mmap > >> >> which is called by mmput before mmput calls mmdrop. > >> > Good to know. Some more digging and I can say that we hit this error > >> > when trying to unmap PFN 0 (is_zero_pfn(pfn) returns TRUE, > >> > vm_normal_page returns NULL, zap_pte_range does not decrement > >> > MM_ANONPAGES RSS counter). Is my understanding correct that PFN 0 is > >> > usable, but special? Or am I totally off the mark here? > >> > >> It would be good to know if that is the page that get_user_pages_remote > >> returned to copy_string_kernel. The zero page that is always zero, > >> should never be returned when a writable mapping is desired. > > > > Indeed, pfn 0 is returned from get_arg_page: (page is 0x809cf000, > > page_to_pfn(page) is 0) and it is the same page that is being freed and not > > refcounted in mmput/zap_pte_range. Confirmed with good old printk. Also, > > ZERO_PAGE(0)==0x809fc000 -> PFN 5120. > > > > I think I have found the problem though, after much digging and thanks to all > > the information provided. init_zero_pfn() gets called too late (after > > the call to > > is_zero_pfn(0) from mmput returns true), until then zero_pfn == 0, and after, > > zero_pfn == 5120. Boom. > > > > So PFN 0 is special, but only for a little bit, enough for something > > on my system > > to call kernel_execve :) > > > > Question: is my system not supposed to be calling kernel_execve this > > early or does > > init_zero_pfn() need to happen earlier? init_zero_pfn is currently a > > core_initcall. > > Looking quickly it seems that init_zero_pfn() is in mm/memory.c and is > common for both mips and x86. Further it appears init_zero_pfn() has > been that was since 2009 a13ea5b75964 ("mm: reinstate ZERO_PAGE"). > > Given the testing that x86 gets and that nothing like this has been > reported it looks like whatever driver is triggering the kernel_execve > is doing something wrong. > > Because honestly. If the zero page isn't working there is not a chance > that anything in userspace is working so it is clearly much too early. > > I suspect there is some driver that is initialized very early that is > doing something that looks innocuous (like triggering a hotplug event) > and that happens to cause a call_usermode_helper which then calls > kernel_execve. I will investigate the offenders more closely. However, I do not notice this behavior on the same system based on the 5.4 kernel. Is it possible that last year's exec changes have exposed this issue? Not blaming exec at all, just making sure I understand the problem better. Ilya