From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DECCBC4361B for ; Thu, 10 Dec 2020 17:25:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A362D22D02 for ; Thu, 10 Dec 2020 17:25:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392308AbgLJRZT (ORCPT ); Thu, 10 Dec 2020 12:25:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733165AbgLJRY7 (ORCPT ); Thu, 10 Dec 2020 12:24:59 -0500 Received: from mail-lj1-x243.google.com (mail-lj1-x243.google.com [IPv6:2a00:1450:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E3DEC0617A6 for ; Thu, 10 Dec 2020 09:24:15 -0800 (PST) Received: by mail-lj1-x243.google.com with SMTP id s11so7572865ljp.4 for ; Thu, 10 Dec 2020 09:24:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c8GZHV70SIMPnsG/im0NZ1nvlbdKEj02vN3NUYrxHb4=; b=NUepeImek9g6AkAD8FP2WpSlLYMQ/jndgfbQYnX33PYoGJkeaWhM41SQ4PYFMxds9i 8KPSUbc/mSnK8nMSeWE8bE9HJJVm2FkBbsHLGAsFTXqlZLGSU3mpdMA+0D2KkOgKY6O5 hPXsCpdYHUJ74krd32q8hOzJQ+Ls46rsXJEbw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c8GZHV70SIMPnsG/im0NZ1nvlbdKEj02vN3NUYrxHb4=; b=OMsw7y7mDifDW3k2zxy2ji8j2P969J/lBTVhMgzi0EuhuFZVkBhVLynnkrLm90I2oS 2N3VNRVDTT4+Nr06VjsZ4nfDdRQT/W/irqnoEbnJ+aQH/KA0ie8/3znxEb5cWlFWJGNb 3XzhybzW1X5BkpZQcXVa7FQKbLnVO+aDiKl31i4l2mNe80lEdrUSmkACtMI+HBsNkhnu DDtKNtoVHhezWZF8bQyJm+7fwSehHtV+HJsZWKtTJBUYw+GamVpVtFAwUk6PDg6fQTN6 U9z1A4MkBa3jIGwEbGFRyCvzkWg8Y3AMBW3r2e5KPK1Wh9DztlBRb7hB+ZB37+csFgm6 RC+g== X-Gm-Message-State: AOAM533qMCY+Yvqb6NRujlIP9rGK4HBJwE+jJyzGGlQgOvd9VpRH+kE5 ICgSnG3ilbq0hi/GLIEajNd8XzeVge6+0g== X-Google-Smtp-Source: ABdhPJwh/z+pcVcFj7F/bdCBZ274glfhr+9jYzSM1ZTFZ0KrHxtDFP9h5PrwnY2ZdUZI466aNIlvpQ== X-Received: by 2002:a2e:98da:: with SMTP id s26mr3393901ljj.445.1607621053551; Thu, 10 Dec 2020 09:24:13 -0800 (PST) Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com. [209.85.167.45]) by smtp.gmail.com with ESMTPSA id 192sm587214lfa.219.2020.12.10.09.24.11 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Dec 2020 09:24:12 -0800 (PST) Received: by mail-lf1-f45.google.com with SMTP id o17so6446594lfg.4 for ; Thu, 10 Dec 2020 09:24:11 -0800 (PST) X-Received: by 2002:a05:6512:338f:: with SMTP id h15mr2925765lfg.40.1607621050581; Thu, 10 Dec 2020 09:24:10 -0800 (PST) MIME-Version: 1.0 References: <20201209163950.8494-1-will@kernel.org> <20201209163950.8494-2-will@kernel.org> <20201209184049.GA8778@willie-the-truck> <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> In-Reply-To: <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> From: Linus Torvalds Date: Thu, 10 Dec 2020 09:23:53 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting To: "Kirill A. Shutemov" Cc: Will Deacon , Linux Kernel Mailing List , Linux-MM , Linux ARM , Catalin Marinas , Jan Kara , Minchan Kim , Andrew Morton , Vinayak Menon , Android Kernel Team Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 10, 2020 at 7:08 AM Kirill A. Shutemov wrote: > > See lightly tested patch below. Is it something you had in mind? This is closer, in that at least it removes the ostensibly blocking allocation (that can't happen) from the prefault path. But the main issue remains: > > At that point, I think the current very special and odd > > do_fault_around() pre-allocation could be made into just a _regular_ > > "allocate the pmd if it doesn't exist". And then the pte locking could > > be moved into filemap_map_pages(), and suddenly the semantics and > > rules around all that would be a whole lot more obvious. > > No. It would stop faultaround code from mapping huge pages. We had to > defer pte page table mapping until we know we don't have huge pages in > page cache. Can we please move that part to the callers too - possibly with a separate helper function? Because the real issue remains: as long the map_set_pte() function takes the pte lock, the caller cannot rely on it. And the filemap_map_pages() code really would like to rely on it. Because if the lock is taken there *above* the loop - or even in the loop iteration at the top, the code can now do things that rely on "I know I hold the page table lock". In particular, we can get rid of that very very expensive page locking. Which is the reason I know about the horrid current issue with "pre-allocate in one place, lock in another, and know we are atomic in a third place" issue. Because I had to walk down these paths and realize that "this loop is run under the page table lock, EXCEPT for the first iteration, where it's taken by the first time we do that non-allocating alloc_set_pte()". See? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C59F4C4361B for ; Thu, 10 Dec 2020 17:24:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0D99622D02 for ; Thu, 10 Dec 2020 17:24:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D99622D02 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 900426B0080; Thu, 10 Dec 2020 12:24:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B0FD6B0081; Thu, 10 Dec 2020 12:24:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79F906B0082; Thu, 10 Dec 2020 12:24:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id 601A46B0080 for ; Thu, 10 Dec 2020 12:24:16 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 205823632 for ; Thu, 10 Dec 2020 17:24:16 +0000 (UTC) X-FDA: 77578046112.09.mist01_550f219273fa Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id F29D3180AD820 for ; Thu, 10 Dec 2020 17:24:15 +0000 (UTC) X-HE-Tag: mist01_550f219273fa X-Filterd-Recvd-Size: 5535 Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Thu, 10 Dec 2020 17:24:15 +0000 (UTC) Received: by mail-lf1-f66.google.com with SMTP id 23so9347841lfg.10 for ; Thu, 10 Dec 2020 09:24:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c8GZHV70SIMPnsG/im0NZ1nvlbdKEj02vN3NUYrxHb4=; b=NUepeImek9g6AkAD8FP2WpSlLYMQ/jndgfbQYnX33PYoGJkeaWhM41SQ4PYFMxds9i 8KPSUbc/mSnK8nMSeWE8bE9HJJVm2FkBbsHLGAsFTXqlZLGSU3mpdMA+0D2KkOgKY6O5 hPXsCpdYHUJ74krd32q8hOzJQ+Ls46rsXJEbw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c8GZHV70SIMPnsG/im0NZ1nvlbdKEj02vN3NUYrxHb4=; b=gIQWX779IHcAFbsYfz9W+Gd09XoAnid+Sun4XzDXVp3sxgMiWLwhXlJq8yb9mNeJwN qT5YQvWsZ1SUDMbr9MQ5oMRGHEEOCKwIMtdj1nrG052G/CEjuOK/nNag5WhGXKL7aGTS nDlE6TDxsg/sxKF8CzrucVl38wMRAerg+rG1idE5tFit9QoppkrtG1rCA+Anu45pFljl mIiI2HiOxJXqIJND5Zmi7fUGa/lCu6GCKK/5cG5USvM8iTmUou7H8tCneRw4SIZV2b2L J3jOn+4Ti0moFLBquH4UZr6B+7AoxTCjIMRBwPKFRbjYLv+1YbzARtX4hdrPtNsQaQUm bzMw== X-Gm-Message-State: AOAM532/sxmvI0Z080ydFd+M2oiYh10KS6+S+6KWVn7jT1F7yyVXQMES j8AAMkZTff6yTUNy+Yq27cLeqPNraooSlA== X-Google-Smtp-Source: ABdhPJwMPiksTldMO6ZRKrg6KfyOLy3o8DGK5Zqa3vO92b9r03P3ZTT4y+Jmgxp/MXn32F07I//JxA== X-Received: by 2002:a19:4311:: with SMTP id q17mr1187444lfa.453.1607621053178; Thu, 10 Dec 2020 09:24:13 -0800 (PST) Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com. [209.85.167.43]) by smtp.gmail.com with ESMTPSA id m28sm607412lfo.10.2020.12.10.09.24.11 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Dec 2020 09:24:12 -0800 (PST) Received: by mail-lf1-f43.google.com with SMTP id m12so9369512lfo.7 for ; Thu, 10 Dec 2020 09:24:11 -0800 (PST) X-Received: by 2002:a05:6512:338f:: with SMTP id h15mr2925765lfg.40.1607621050581; Thu, 10 Dec 2020 09:24:10 -0800 (PST) MIME-Version: 1.0 References: <20201209163950.8494-1-will@kernel.org> <20201209163950.8494-2-will@kernel.org> <20201209184049.GA8778@willie-the-truck> <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> In-Reply-To: <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> From: Linus Torvalds Date: Thu, 10 Dec 2020 09:23:53 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting To: "Kirill A. Shutemov" Cc: Will Deacon , Linux Kernel Mailing List , Linux-MM , Linux ARM , Catalin Marinas , Jan Kara , Minchan Kim , Andrew Morton , Vinayak Menon , Android Kernel Team Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 10, 2020 at 7:08 AM Kirill A. Shutemov wrote: > > See lightly tested patch below. Is it something you had in mind? This is closer, in that at least it removes the ostensibly blocking allocation (that can't happen) from the prefault path. But the main issue remains: > > At that point, I think the current very special and odd > > do_fault_around() pre-allocation could be made into just a _regular_ > > "allocate the pmd if it doesn't exist". And then the pte locking could > > be moved into filemap_map_pages(), and suddenly the semantics and > > rules around all that would be a whole lot more obvious. > > No. It would stop faultaround code from mapping huge pages. We had to > defer pte page table mapping until we know we don't have huge pages in > page cache. Can we please move that part to the callers too - possibly with a separate helper function? Because the real issue remains: as long the map_set_pte() function takes the pte lock, the caller cannot rely on it. And the filemap_map_pages() code really would like to rely on it. Because if the lock is taken there *above* the loop - or even in the loop iteration at the top, the code can now do things that rely on "I know I hold the page table lock". In particular, we can get rid of that very very expensive page locking. Which is the reason I know about the horrid current issue with "pre-allocate in one place, lock in another, and know we are atomic in a third place" issue. Because I had to walk down these paths and realize that "this loop is run under the page table lock, EXCEPT for the first iteration, where it's taken by the first time we do that non-allocating alloc_set_pte()". See? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22ADEC433FE for ; Thu, 10 Dec 2020 17:25:37 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D06F822D02 for ; Thu, 10 Dec 2020 17:25:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D06F822D02 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=v++7IzvfpyiJqFtw4DfqoW5vfVsGT9st1i0VvzClSC0=; b=NB7A+NBc7kGf4aMrOABk/u/lO UR26G3LFP+QvlymBO56HE4k7w+AvlXM4EabspSbORgGab8SvA+V6aQv9C0wnCAbYNUL/ucuk2ylOr H9uD864EmGgGu+a2MQ50IzplwqoZIVLp4/MXOgPcM3QZWrM54azjdr50r2kdsz/zSfzG4c68wlUto E8debu7R6YGVZxzdcYxJ/YA34S+w55NMc0GzcXOxrSueEQw13e0YWxIhLNylOK1sY3CAqf8FKQXSC /aSMLFPaeNkTAYrcdB60eNZtV25GXZSqDK5am3fv4xxW8KA9ip9b//dx7S4fo0uy1j01Tbg/pRDv3 hbUCmkZrQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1knPfw-0008Vn-5m; Thu, 10 Dec 2020 17:24:20 +0000 Received: from mail-lf1-x143.google.com ([2a00:1450:4864:20::143]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1knPft-0008UD-18 for linux-arm-kernel@lists.infradead.org; Thu, 10 Dec 2020 17:24:17 +0000 Received: by mail-lf1-x143.google.com with SMTP id w13so9354473lfd.5 for ; Thu, 10 Dec 2020 09:24:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c8GZHV70SIMPnsG/im0NZ1nvlbdKEj02vN3NUYrxHb4=; b=NUepeImek9g6AkAD8FP2WpSlLYMQ/jndgfbQYnX33PYoGJkeaWhM41SQ4PYFMxds9i 8KPSUbc/mSnK8nMSeWE8bE9HJJVm2FkBbsHLGAsFTXqlZLGSU3mpdMA+0D2KkOgKY6O5 hPXsCpdYHUJ74krd32q8hOzJQ+Ls46rsXJEbw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c8GZHV70SIMPnsG/im0NZ1nvlbdKEj02vN3NUYrxHb4=; b=V+CYPWdSJJu4dlhAKnUsfrdxCun5Pm9NItneeu55o9Fuf4a5pNlBn25oCnSQly7hH/ S/tn9Qn9J1hUQZ57oqO4bLpLFMNN12TM+DD5v5SzdAzgui6taj/LPtW0NgCLKAtQDY8O Ah6iK+EKO6rRBwQRlKI4li7jNUz52Zk6RCIRnvzb/UnSyJIohwfS4F4jP8aWzUGbCpLC pXdmOyjpnx/vP1GbnvprLWVaIUd0vVNWcu7YvNDa83t01pN/BnMmbotM/1SU9FrEI+ku kS5zluYjWMC9FFPcU5AsmQHfcyznOsDnFavs+Kcgtm0WzIoNOJOvgE53dSsVRktzxX+g gNuw== X-Gm-Message-State: AOAM533gvQw+2SbuXpY5jbJMak51NKj5IGWd1yzc4lesFnufiSxhEh5H ddR31tj4egb3ve4jetAgjh3pHsNtu2lT9A== X-Google-Smtp-Source: ABdhPJywEzXUUsoNe38i5D76hUMgRNAhtyWFIdYN6Z82jSnIxEe2K38b66S6rRuyWV/MJDhGTrd8Ew== X-Received: by 2002:a19:54c:: with SMTP id 73mr3114347lff.551.1607621053633; Thu, 10 Dec 2020 09:24:13 -0800 (PST) Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com. [209.85.167.50]) by smtp.gmail.com with ESMTPSA id b12sm517337lfb.139.2020.12.10.09.24.11 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Dec 2020 09:24:12 -0800 (PST) Received: by mail-lf1-f50.google.com with SMTP id 23so9347641lfg.10 for ; Thu, 10 Dec 2020 09:24:11 -0800 (PST) X-Received: by 2002:a05:6512:338f:: with SMTP id h15mr2925765lfg.40.1607621050581; Thu, 10 Dec 2020 09:24:10 -0800 (PST) MIME-Version: 1.0 References: <20201209163950.8494-1-will@kernel.org> <20201209163950.8494-2-will@kernel.org> <20201209184049.GA8778@willie-the-truck> <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> In-Reply-To: <20201210150828.4b7pg5lx666r7l2u@black.fi.intel.com> From: Linus Torvalds Date: Thu, 10 Dec 2020 09:23:53 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting To: "Kirill A. Shutemov" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201210_122417_099717_3B2ABECA X-CRM114-Status: GOOD ( 21.25 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Android Kernel Team , Jan Kara , Minchan Kim , Catalin Marinas , Linux Kernel Mailing List , Linux-MM , Vinayak Menon , Andrew Morton , Will Deacon , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Dec 10, 2020 at 7:08 AM Kirill A. Shutemov wrote: > > See lightly tested patch below. Is it something you had in mind? This is closer, in that at least it removes the ostensibly blocking allocation (that can't happen) from the prefault path. But the main issue remains: > > At that point, I think the current very special and odd > > do_fault_around() pre-allocation could be made into just a _regular_ > > "allocate the pmd if it doesn't exist". And then the pte locking could > > be moved into filemap_map_pages(), and suddenly the semantics and > > rules around all that would be a whole lot more obvious. > > No. It would stop faultaround code from mapping huge pages. We had to > defer pte page table mapping until we know we don't have huge pages in > page cache. Can we please move that part to the callers too - possibly with a separate helper function? Because the real issue remains: as long the map_set_pte() function takes the pte lock, the caller cannot rely on it. And the filemap_map_pages() code really would like to rely on it. Because if the lock is taken there *above* the loop - or even in the loop iteration at the top, the code can now do things that rely on "I know I hold the page table lock". In particular, we can get rid of that very very expensive page locking. Which is the reason I know about the horrid current issue with "pre-allocate in one place, lock in another, and know we are atomic in a third place" issue. Because I had to walk down these paths and realize that "this loop is run under the page table lock, EXCEPT for the first iteration, where it's taken by the first time we do that non-allocating alloc_set_pte()". See? Linus _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel