From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19382C07E9B for ; Wed, 7 Jul 2021 18:42:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F12D661C81 for ; Wed, 7 Jul 2021 18:42:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232055AbhGGSos (ORCPT ); Wed, 7 Jul 2021 14:44:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232536AbhGGSom (ORCPT ); Wed, 7 Jul 2021 14:44:42 -0400 Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36989C05BD3A for ; Wed, 7 Jul 2021 11:41:04 -0700 (PDT) Received: by mail-lf1-x132.google.com with SMTP id v14so6551189lfb.4 for ; Wed, 07 Jul 2021 11:41:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IxAo9Z+NfK/0Q59CfSTtwoUE9/K2RLiu9K5W52RV+OM=; b=BpavuCZn/1s0z2kAplcxX42t0Vq1bQdliof7t1INnc/ZDRCn5OlJWVsQqF/6EoeEd3 grLsqTuMlYXYS9f0+wZpPL1015scLpzRTUT6p/3aqaIdkUJ9/sXtF6vIT7zkrGFINN77 eUA1xCvv2CA4aTrcYoFxhVj4uy+gchjzEj0Xc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IxAo9Z+NfK/0Q59CfSTtwoUE9/K2RLiu9K5W52RV+OM=; b=TpcVHIbu2guHh+ofZwTabxUBBX9g7TO3O8MKjl9qXeANIaumTQ2ZUtCnU2CyyH9Xd6 UxQ7uAxinX+ihvHODGHuhDW80CHgbqylhEOyriuCLC0GVZU/OmmOHpIskcbVSvYo7HlE qUw89kAupPcqT1+JAXX2HWabdkrbXQrJQYO3wNA7ExjSzTdc4RirxRpCgRh5lJz44Xje X5rID7nGNvHjI1WUM4h46c2Oz5wLGrrX57GiqZmz2VxSr5On4+sNlbph0LQ+HbP+Nz8a X9JnbItTnPcgaEaGNKgvCcW6vkMltYPZHxbUAm6Ph4SEVvSyj3/FtkQZ2fNmOImBLBwZ A2FQ== X-Gm-Message-State: AOAM533IF2+pcckLtsqB0VTuODHTZALybHUh4TS1tv0LW6odU+7BVF4O T2EY2amSeoMk/xtV/TKx+Qy7vTVhzdcGHs1Yg04= X-Google-Smtp-Source: ABdhPJykU3Z2CX6WoEAShvhLN57oFzm9ay0QQd4PmCn3DGsmWaALwL7UCCuHZVId4UHZomoLJciiXA== X-Received: by 2002:a2e:9c14:: with SMTP id s20mr19619186lji.393.1625683262377; Wed, 07 Jul 2021 11:41:02 -0700 (PDT) Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com. [209.85.167.51]) by smtp.gmail.com with ESMTPSA id z21sm2113365ljm.52.2021.07.07.11.41.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 Jul 2021 11:41:01 -0700 (PDT) Received: by mail-lf1-f51.google.com with SMTP id u18so6512130lff.9 for ; Wed, 07 Jul 2021 11:41:01 -0700 (PDT) X-Received: by 2002:a2e:bb98:: with SMTP id y24mr7659535lje.507.1625683261571; Wed, 07 Jul 2021 11:41:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Wed, 7 Jul 2021 11:40:45 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [GIT PULL] percpu fixes for v5.14-rc1 To: Dennis Zhou Cc: Tejun Heo , Christoph Lameter , Linux-MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 7, 2021 at 6:00 AM Dennis Zhou wrote: > > This is just a single change to fix percpu depopulation. The code relied > on depopulation code written specifically for the free path and relied > on vmalloc to do the tlb flush lazily. As we're modifying the backing > pages during the lifetime of a chunk, we need to also flush the tlb > accordingly. I pulled this, but I ended up unpulling after looking at the fix. The fix may be perfectly correct, but I'm looking at that pcpu_reclaim_populated() function, and I want somebody to explain to me what it's ok to drop and re-take the 'pcpu_lock' and just continue. Because whatever it was protecting is now not protected any more. It *looks* like it's intended to protect the pcpu_chunk_lists[] content, and some other functions that do this look ok. So for example, pcpu_balance_free() at least removes the 'chunk' from the pcpu_chunk_lists[] before it drops the lock and then works on the chunk contents. But pcpu_reclaim_populated() seems to *leave* chunk on the pcpu_chunk_lists[], drop the lock, and then continue to use 'chunk'. That odd "release lock and continue to use the data it's supposed to protect" seems to be pre-existing, but (a) this is the code that caused problems to begin with and (b) it seems to now happen even more. So maybe this code is right. But it looks very odd to me, and I'd like to get more explanations of _why_ it would be ok before I pull this fix, since there seems to be a deeper underlying problem in the code that this tries to fix. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6806AC07E95 for ; Wed, 7 Jul 2021 18:41:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1507561C81 for ; Wed, 7 Jul 2021 18:41:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1507561C81 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8ED156B0089; Wed, 7 Jul 2021 14:41:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 89D156B008A; Wed, 7 Jul 2021 14:41:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73DFE6B008C; Wed, 7 Jul 2021 14:41:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0043.hostedemail.com [216.40.44.43]) by kanga.kvack.org (Postfix) with ESMTP id 4F5796B0089 for ; Wed, 7 Jul 2021 14:41:08 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AA290182B0496 for ; Wed, 7 Jul 2021 18:41:07 +0000 (UTC) X-FDA: 78336658974.04.75366EB Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by imf08.hostedemail.com (Postfix) with ESMTP id 2289030000B7 for ; Wed, 7 Jul 2021 18:41:04 +0000 (UTC) Received: by mail-lf1-f44.google.com with SMTP id r26so6551128lfp.2 for ; Wed, 07 Jul 2021 11:41:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IxAo9Z+NfK/0Q59CfSTtwoUE9/K2RLiu9K5W52RV+OM=; b=BpavuCZn/1s0z2kAplcxX42t0Vq1bQdliof7t1INnc/ZDRCn5OlJWVsQqF/6EoeEd3 grLsqTuMlYXYS9f0+wZpPL1015scLpzRTUT6p/3aqaIdkUJ9/sXtF6vIT7zkrGFINN77 eUA1xCvv2CA4aTrcYoFxhVj4uy+gchjzEj0Xc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IxAo9Z+NfK/0Q59CfSTtwoUE9/K2RLiu9K5W52RV+OM=; b=i5IPkV569eGwTrvqrGaWWfnXqx6FQvSlTtonxmBna1RYvGZgnu6NxSEmFydaJiWIka B5qhIMwGjF9Vpmxtu32NJY0SZsG06cZVQYkcVGEXQDLM2JKvzZOUE/Z6gv7C57ZOBJoH otwYBFPOE3auzsaWJZgKIgplFb4nneO4UWsSTZSuOIdawVmgRODrGaWY+WEbvlqzQKA+ PdNPXoGlMu4zrCwFfsRCUNrY7vuCwO8FcOWXRoAx0RTYTeQt+rA5cdWNRkFYB1TDeNRW I0X0VX/6W2hYEPHO9h6gbQQDLEfPEs6gBBIktVZwA5rA2UtTz3H7E4uSaJzDphLpRaFe rzsg== X-Gm-Message-State: AOAM533xm3DtHsO4WKa5XgUch0qrtRBZ+1GLUwKXsIqIFlf+6hrfU/Df iUZPNt3S5HowrmOxpjdRvdObBH1lbnGNb3eth+Y= X-Google-Smtp-Source: ABdhPJzk5ie3gI24f7NthqR7rlPtdrA6ywS3o8v6Qsa0cbNbne05amkxHgMD2EJAIcBoMxG3gil0Xw== X-Received: by 2002:a2e:858c:: with SMTP id b12mr16870460lji.449.1625683262435; Wed, 07 Jul 2021 11:41:02 -0700 (PDT) Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com. [209.85.167.52]) by smtp.gmail.com with ESMTPSA id w13sm2247058ljd.27.2021.07.07.11.41.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 Jul 2021 11:41:01 -0700 (PDT) Received: by mail-lf1-f52.google.com with SMTP id a18so6468387lfs.10 for ; Wed, 07 Jul 2021 11:41:01 -0700 (PDT) X-Received: by 2002:a2e:bb98:: with SMTP id y24mr7659535lje.507.1625683261571; Wed, 07 Jul 2021 11:41:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Wed, 7 Jul 2021 11:40:45 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [GIT PULL] percpu fixes for v5.14-rc1 To: Dennis Zhou Cc: Tejun Heo , Christoph Lameter , Linux-MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=BpavuCZn; dmarc=none; spf=pass (imf08.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.44 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-Rspamd-Server: rspam02 X-Rspam-User: nil X-Rspamd-Queue-Id: 2289030000B7 X-Stat-Signature: 7rzw71cpjm54ek5tdk1objazjbyyoutb X-HE-Tag: 1625683264-715869 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 7, 2021 at 6:00 AM Dennis Zhou wrote: > > This is just a single change to fix percpu depopulation. The code relied > on depopulation code written specifically for the free path and relied > on vmalloc to do the tlb flush lazily. As we're modifying the backing > pages during the lifetime of a chunk, we need to also flush the tlb > accordingly. I pulled this, but I ended up unpulling after looking at the fix. The fix may be perfectly correct, but I'm looking at that pcpu_reclaim_populated() function, and I want somebody to explain to me what it's ok to drop and re-take the 'pcpu_lock' and just continue. Because whatever it was protecting is now not protected any more. It *looks* like it's intended to protect the pcpu_chunk_lists[] content, and some other functions that do this look ok. So for example, pcpu_balance_free() at least removes the 'chunk' from the pcpu_chunk_lists[] before it drops the lock and then works on the chunk contents. But pcpu_reclaim_populated() seems to *leave* chunk on the pcpu_chunk_lists[], drop the lock, and then continue to use 'chunk'. That odd "release lock and continue to use the data it's supposed to protect" seems to be pre-existing, but (a) this is the code that caused problems to begin with and (b) it seems to now happen even more. So maybe this code is right. But it looks very odd to me, and I'd like to get more explanations of _why_ it would be ok before I pull this fix, since there seems to be a deeper underlying problem in the code that this tries to fix. Linus