From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6ECDAC4363D for ; Wed, 23 Sep 2020 21:50:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 287662371F for ; Wed, 23 Sep 2020 21:50:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600897858; bh=C/hVOu8iT319HTO6L6pVvV5bG65zi/q9FpxsCQ7hEd0=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=GFDThN6js0jdUNgoMcft7eD+RdCEx33tullz3H2MZsQ/tJD/J9rJKHkscOja8DuhI HI6CO0Z29i0iojc72DeUWSML8lGKN00JH0/mo6XDB3dQlK0UFgtMJuyuyskNB+0CP3 uXhsIrEHmTsgdN771DocjcRi+fWVG8PIycbCE/3U= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726689AbgIWVu5 (ORCPT ); Wed, 23 Sep 2020 17:50:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726199AbgIWVu4 (ORCPT ); Wed, 23 Sep 2020 17:50:56 -0400 Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68E5CC0613CE for ; Wed, 23 Sep 2020 14:50:56 -0700 (PDT) Received: by mail-lj1-x22b.google.com with SMTP id u4so903473ljd.10 for ; Wed, 23 Sep 2020 14:50:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OQxQmbzRrQuCKCUaOiqsh55FHIh/dttYp8q5GiktQPU=; b=ZIQvE1/n5W6r6ANWxGOZ+9AP7gwMEMEED6X92oVZOax3vYiHv+l2qcWQ1jU0hjUqkd lAzvYhwR1NzAN6YwWKkCYoCbwQ6S1UntBhfTbuZtHeAG/S+x4F5eKpCrCL+bTlA8pPZ9 HstiDsRlv95t3Q2ttvq2aR/viplvzpTQymMBo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OQxQmbzRrQuCKCUaOiqsh55FHIh/dttYp8q5GiktQPU=; b=NRwlmn28ho3qVZVSEOCiut6Jjw3OzG0CM9zxWDUbihiXaJkX4aBJemT9bB2IvtBPE0 vclII/DZLwp4yfv+KhY8iadvQTJE2Wh866oqB2DkCkG4FEKJMrRpfmvJ/pKmk3dtfi7z K1x13x7ZgYvaz5UEHHSwAnpai78bSX86/aueY0LWVgrIlUNeVLoBmGuE5gltrem/NW2M cEbLl8WqrJJGHa/5Ax/YqVyAzyUJZfJBU6XRgggMApWFrohHUQCEFmxy7GLjJVr2+0uQ 9UdtDQq59ENTjpdbL7N8qmsX8vSI/DHe1SQfsxcJaL1ZC+UlkzlRP1ncfAZq2whDScjO +bLg== X-Gm-Message-State: AOAM5314kutcqfqJp9qGuMLPnx6dwwIYSYV65C7VRPEH8TP8htP4NWld EoOFaTAO+BhN5UqgLfZojJuf911d34XDlw== X-Google-Smtp-Source: ABdhPJwERRsnt43+qOLEJR3euUNt3qvCpo8w0Tj2ltVBamJPKkOoFIgknyga9sO8Oad4zzcifUG4lQ== X-Received: by 2002:a2e:804f:: with SMTP id p15mr595195ljg.199.1600897854217; Wed, 23 Sep 2020 14:50:54 -0700 (PDT) Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com. [209.85.208.181]) by smtp.gmail.com with ESMTPSA id o27sm510053lfb.306.2020.09.23.14.50.52 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Sep 2020 14:50:52 -0700 (PDT) Received: by mail-lj1-f181.google.com with SMTP id a15so948936ljk.2 for ; Wed, 23 Sep 2020 14:50:52 -0700 (PDT) X-Received: by 2002:a2e:994a:: with SMTP id r10mr556008ljj.102.1600897851950; Wed, 23 Sep 2020 14:50:51 -0700 (PDT) MIME-Version: 1.0 References: <20200916142806.GD7076@osiris> <20200922190350.7a0e0ca5@thinkpad> <20200923153938.5be5dd2c@thinkpad> <20200923233306.7c5666de@thinkpad> In-Reply-To: <20200923233306.7c5666de@thinkpad> From: Linus Torvalds Date: Wed, 23 Sep 2020 14:50:36 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: BUG: Bad page state in process dirtyc0w_child To: Gerald Schaefer Cc: Peter Xu , Heiko Carstens , Qian Cai , Alexander Gordeev , Vasily Gorbik , Christian Borntraeger , linux-s390 , Linux-MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 23, 2020 at 2:33 PM Gerald Schaefer wrote: > > Thanks, very nice walk-through, need some time to digest this. The TLB > aspect is interesting, and we do have our own __tlb_remove_page_size(), > which directly calls free_page_and_swap_cache() instead of the generic > batched approach. So I don't think it's the free_page_and_swap_cache() itself that is the problem. As mentioned, the actual pages themselves should be handled by the reference counting being atomic. The interrupt disable is really about just the page *tables* being free'd - not the final page level. So the issue is that at least on x86-64, we have the serialization that we will only free the page tables after a cross-CPU IPI has flushed the TLB. I think s390 just RCU-free's the page tables instead, which should fix it. So I think this is special, and s390 is very different from x86, but I don't think it's the problem. In fact, I think you pinpointed the real issue: > Meanwhile, out of curiosity, while I still fail to comprehend commit > 09854ba94c6a ("mm: do_wp_page() simplification") in its entirety, there > is one detail that I find most confusing: the unlock_page() has moved > behind the wp_page_reuse(), while it was the other way round before. You know what? That was just a mistake, and I think you may actually have hit the real cause of the problem. It means that we keep the page locked until after we do the pte_unmap_unlock(), so now we have no guarantees that we hold the page referecne. And then we unlock it - while somebody else might be freeing it. So somebody is freeing a locked page just as we're unlocking it, and that matches the problem you see exactly: the debug thing will hit because the last free happened while locked, and then by the time the printout happens it has become unlocked so it doesn't show any more. Duh. Would you mind testing just moving the unlock_page() back to before the wp_page_reuse()? Does that make your debug check go away? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 796FAC4727E for ; Wed, 23 Sep 2020 21:50:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DABF620936 for ; Wed, 23 Sep 2020 21:50:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="ZIQvE1/n" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DABF620936 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1D25B6B0003; Wed, 23 Sep 2020 17:50:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 182A26B005C; Wed, 23 Sep 2020 17:50:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 071A96B005D; Wed, 23 Sep 2020 17:50:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id E31E36B0003 for ; Wed, 23 Sep 2020 17:50:56 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A18542476 for ; Wed, 23 Sep 2020 21:50:56 +0000 (UTC) X-FDA: 77295671712.05.oil74_2c092bd27159 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 844A91802186F for ; Wed, 23 Sep 2020 21:50:56 +0000 (UTC) X-HE-Tag: oil74_2c092bd27159 X-Filterd-Recvd-Size: 5681 Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Sep 2020 21:50:56 +0000 (UTC) Received: by mail-lf1-f52.google.com with SMTP id w11so1583336lfn.2 for ; Wed, 23 Sep 2020 14:50:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OQxQmbzRrQuCKCUaOiqsh55FHIh/dttYp8q5GiktQPU=; b=ZIQvE1/n5W6r6ANWxGOZ+9AP7gwMEMEED6X92oVZOax3vYiHv+l2qcWQ1jU0hjUqkd lAzvYhwR1NzAN6YwWKkCYoCbwQ6S1UntBhfTbuZtHeAG/S+x4F5eKpCrCL+bTlA8pPZ9 HstiDsRlv95t3Q2ttvq2aR/viplvzpTQymMBo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OQxQmbzRrQuCKCUaOiqsh55FHIh/dttYp8q5GiktQPU=; b=So5H0cZ7ig81ClGjZd+u6xpZ3H2Yvh1V0YGWLWis3Gzkgr+uamoKftKTAEGZ/ySzhq v0JBPiVctdaL3tlvYg1dMuxzQDsp90AOHAuvCKYWRpSp7z0Y0LhTpETJWwjzbRIaILXp KYUgx/ELHfnPF6LWjQqiq+xZbP/phcnJOW8z0Cs4MbWpO6oqTUDVA6NsfsikcsMEYXrj e+L6gPbaOCNUYC+6fv0kqzQneyJGDAZonW4NQ3XAyVZdK07iiAQKJ6P5+81fcYxGpkb1 xqfJBD9mRastUI1uI1TJt8v4nERgYQuV0CZj7N6z16VimuQY9esi4tzqoltNwmgRCjBj 5I8g== X-Gm-Message-State: AOAM530VxOmC1BaRDCwyhvvmhEjy9NjBBGZb2DvFHnkIbOAzD8YK3bp6 vMNL295g2QjO1DDDz77ZEDJngPbqG7rjAA== X-Google-Smtp-Source: ABdhPJzwy53NwAa3AbZyV7g6Nb9FyXwgDwpyR2AMXExmdfKf2lnKh+MD3HGNhvoJBmsxfpi91lz4rA== X-Received: by 2002:a19:7006:: with SMTP id h6mr518276lfc.83.1600897854166; Wed, 23 Sep 2020 14:50:54 -0700 (PDT) Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com. [209.85.208.173]) by smtp.gmail.com with ESMTPSA id c17sm506220lfs.62.2020.09.23.14.50.52 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Sep 2020 14:50:52 -0700 (PDT) Received: by mail-lj1-f173.google.com with SMTP id w3so926155ljo.5 for ; Wed, 23 Sep 2020 14:50:52 -0700 (PDT) X-Received: by 2002:a2e:994a:: with SMTP id r10mr556008ljj.102.1600897851950; Wed, 23 Sep 2020 14:50:51 -0700 (PDT) MIME-Version: 1.0 References: <20200916142806.GD7076@osiris> <20200922190350.7a0e0ca5@thinkpad> <20200923153938.5be5dd2c@thinkpad> <20200923233306.7c5666de@thinkpad> In-Reply-To: <20200923233306.7c5666de@thinkpad> From: Linus Torvalds Date: Wed, 23 Sep 2020 14:50:36 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: BUG: Bad page state in process dirtyc0w_child To: Gerald Schaefer Cc: Peter Xu , Heiko Carstens , Qian Cai , Alexander Gordeev , Vasily Gorbik , Christian Borntraeger , linux-s390 , Linux-MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 23, 2020 at 2:33 PM Gerald Schaefer wrote: > > Thanks, very nice walk-through, need some time to digest this. The TLB > aspect is interesting, and we do have our own __tlb_remove_page_size(), > which directly calls free_page_and_swap_cache() instead of the generic > batched approach. So I don't think it's the free_page_and_swap_cache() itself that is the problem. As mentioned, the actual pages themselves should be handled by the reference counting being atomic. The interrupt disable is really about just the page *tables* being free'd - not the final page level. So the issue is that at least on x86-64, we have the serialization that we will only free the page tables after a cross-CPU IPI has flushed the TLB. I think s390 just RCU-free's the page tables instead, which should fix it. So I think this is special, and s390 is very different from x86, but I don't think it's the problem. In fact, I think you pinpointed the real issue: > Meanwhile, out of curiosity, while I still fail to comprehend commit > 09854ba94c6a ("mm: do_wp_page() simplification") in its entirety, there > is one detail that I find most confusing: the unlock_page() has moved > behind the wp_page_reuse(), while it was the other way round before. You know what? That was just a mistake, and I think you may actually have hit the real cause of the problem. It means that we keep the page locked until after we do the pte_unmap_unlock(), so now we have no guarantees that we hold the page referecne. And then we unlock it - while somebody else might be freeing it. So somebody is freeing a locked page just as we're unlocking it, and that matches the problem you see exactly: the debug thing will hit because the last free happened while locked, and then by the time the printout happens it has become unlocked so it doesn't show any more. Duh. Would you mind testing just moving the unlock_page() back to before the wp_page_reuse()? Does that make your debug check go away? Linus