From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70736C433FE for ; Sat, 18 Sep 2021 05:07:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0C2DE6126A for ; Sat, 18 Sep 2021 05:07:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0C2DE6126A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 712E46B0071; Sat, 18 Sep 2021 01:07:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C16B6B0072; Sat, 18 Sep 2021 01:07:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B02F900002; Sat, 18 Sep 2021 01:07:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 4F6B76B0071 for ; Sat, 18 Sep 2021 01:07:11 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E6D9E8248076 for ; Sat, 18 Sep 2021 05:07:10 +0000 (UTC) X-FDA: 78599510220.24.D51EA58 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf01.hostedemail.com (Postfix) with ESMTP id A98BA5064129 for ; Sat, 18 Sep 2021 05:07:10 +0000 (UTC) Received: by mail-ed1-f50.google.com with SMTP id n10so37006261eda.10 for ; Fri, 17 Sep 2021 22:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UELVW64OGifaAHac3foR5XI65mlaDJfcYXp7k1se3yQ=; b=mVodZ0Ytl/BlGYZyd0GWpZDL/5dBZeG8+tWuQDf573eX92F5cAPyOG3e2d6eyxjpTt whZ2pCyI/fBSW9woM067wT/HP14ThWqDB+aI1Oug1mEoYosBiOZvk6pEDLAIGEutDmoh c/cp7UQCYO6//H2ouhMjuXerDElDTvtL/Or05tvSRRkIYV0W8SXe8e7sejImPM13QPeI sfPt45iOPFKs9IAZ1soBHDxJ1f0XOs6n3IPbBQqvbn9cmRnQeWnq3LJ06vO2pbZJsskR ljwlNYYiLo+8gjYdcVTAuwepN6TLjKHqWNVfk90pJqi/GPYi4X4pPFE5R+ayB/J58T0H dHMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UELVW64OGifaAHac3foR5XI65mlaDJfcYXp7k1se3yQ=; b=UmXwv1y2bzvvnXakJ3mKvCV6Nnip7x8yDYzy3RIqlHTthqxJQ6n2srY2Y5lAkDs39j H9nzv1NgiIbFIyYAZ0AxpX7J1O6X9RxMhI5ilequHXZ11rTTbCGQVohis5abbkw15fCR 1e30pPxSOOAwa/AtCDrKFE0qxhNRravyv/YyQJXy6doFZQWXf94YlE3HTRnu6UPUWEyS mllUvnkdOgs2IM7uM0VN2RxfHjbv/yaaFadhYV0ObkPcojDAlE9jnNLQDogjcO3tPH36 cPGxTrxGSZnFyZOUIGhECZjLnwnYkd/jBxq7MKjiuSKX/XAKDidSiTPeNqceof238iMa U68A== X-Gm-Message-State: AOAM532j3w4AwImFpI+VcSdtrMxfO6ahZHbkX4ZIyIP9XLFN6T/peWan TJw2NEqtcJqMIFit6u/HbpbNljD5Y+CPh2jUXBQ= X-Google-Smtp-Source: ABdhPJxaxXTMZhPBf0+ZkOJ27I9Sr9wmiURH3GEowKm/Irh+EpHvO991mZoMS137Rjkz/N5p7vQL5F7pUUWq6PUZJII= X-Received: by 2002:a17:906:dbe5:: with SMTP id yd5mr16064766ejb.134.1631941629538; Fri, 17 Sep 2021 22:07:09 -0700 (PDT) MIME-Version: 1.0 References: <20210917034815.80264-1-songmuchun@bytedance.com> <20210917034815.80264-4-songmuchun@bytedance.com> In-Reply-To: <20210917034815.80264-4-songmuchun@bytedance.com> From: Barry Song <21cnbao@gmail.com> Date: Sat, 18 Sep 2021 17:06:58 +1200 Message-ID: Subject: Re: [PATCH RESEND v2 3/4] mm: sparsemem: use page table lock to protect kernel pmd operations To: Muchun Song Cc: mike.kravetz@oracle.com, Andrew Morton , osalvador@suse.de, mhocko@suse.com, Barry Song , david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, Jonathan Corbet , Matthew Wilcox , duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, LKML , Linux-MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A98BA5064129 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mVodZ0Yt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com X-Stat-Signature: oa5por7khjmpcwxmnsmwyfskfurohqkr X-HE-Tag: 1631941630-24117 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Sep 18, 2021 at 12:09 AM Muchun Song wrote: > > The init_mm.page_table_lock is used to protect kernel page tables, we > can use it to serialize splitting vmemmap PMD mappings instead of mmap > write lock, which can increase the concurrency of vmemmap_remap_free(). > Curious what is the actual benefit we get in user scenarios from this patch, 1. we set bootargs to reserve hugetlb statically 2. we "echo" some figures to sys or proc. In other words, Who is going to care about this concurrency? Can we have some details on this to put in the commit log? > Signed-off-by: Muchun Song > --- > mm/ptdump.c | 16 ++++++++++++---- > mm/sparse-vmemmap.c | 49 ++++++++++++++++++++++++++++++++++--------------- > 2 files changed, 46 insertions(+), 19 deletions(-) > > diff --git a/mm/ptdump.c b/mm/ptdump.c > index da751448d0e4..eea3d28d173c 100644 > --- a/mm/ptdump.c > +++ b/mm/ptdump.c > @@ -40,8 +40,10 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr, > if (st->effective_prot) > st->effective_prot(st, 0, pgd_val(val)); > > - if (pgd_leaf(val)) > + if (pgd_leaf(val)) { > st->note_page(st, addr, 0, pgd_val(val)); > + walk->action = ACTION_CONTINUE; > + } > > return 0; > } > @@ -61,8 +63,10 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr, > if (st->effective_prot) > st->effective_prot(st, 1, p4d_val(val)); > > - if (p4d_leaf(val)) > + if (p4d_leaf(val)) { > st->note_page(st, addr, 1, p4d_val(val)); > + walk->action = ACTION_CONTINUE; > + } > > return 0; > } > @@ -82,8 +86,10 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr, > if (st->effective_prot) > st->effective_prot(st, 2, pud_val(val)); > > - if (pud_leaf(val)) > + if (pud_leaf(val)) { > st->note_page(st, addr, 2, pud_val(val)); > + walk->action = ACTION_CONTINUE; > + } > > return 0; > } > @@ -101,8 +107,10 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr, > > if (st->effective_prot) > st->effective_prot(st, 3, pmd_val(val)); > - if (pmd_leaf(val)) > + if (pmd_leaf(val)) { > st->note_page(st, addr, 3, pmd_val(val)); > + walk->action = ACTION_CONTINUE; > + } > > return 0; > } > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > index 62e3d20648ce..e636943ccfc4 100644 > --- a/mm/sparse-vmemmap.c > +++ b/mm/sparse-vmemmap.c > @@ -64,8 +64,8 @@ struct vmemmap_remap_walk { > */ > #define NR_RESET_STRUCT_PAGE 3 > > -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, > - struct vmemmap_remap_walk *walk) > +static int __split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, > + struct vmemmap_remap_walk *walk) > { > pmd_t __pmd; > int i; > @@ -87,15 +87,37 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, > set_pte_at(&init_mm, addr, pte, entry); > } > > - /* Make pte visible before pmd. See comment in __pte_alloc(). */ > - smp_wmb(); > - pmd_populate_kernel(&init_mm, pmd, pgtable); > + spin_lock(&init_mm.page_table_lock); > + if (likely(pmd_leaf(*pmd))) { > + /* Make pte visible before pmd. See comment in __pte_alloc(). */ > + smp_wmb(); > + pmd_populate_kernel(&init_mm, pmd, pgtable); > + flush_tlb_kernel_range(start, start + PMD_SIZE); > + spin_unlock(&init_mm.page_table_lock); > > - flush_tlb_kernel_range(start, start + PMD_SIZE); > + return 0; > + } > + spin_unlock(&init_mm.page_table_lock); > + pte_free_kernel(&init_mm, pgtable); > > return 0; > } > > +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, > + struct vmemmap_remap_walk *walk) > +{ > + int ret; > + > + spin_lock(&init_mm.page_table_lock); > + ret = pmd_leaf(*pmd); > + spin_unlock(&init_mm.page_table_lock); > + > + if (ret) > + ret = __split_vmemmap_huge_pmd(pmd, start, walk); > + > + return ret; > +} > + > static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, > unsigned long end, > struct vmemmap_remap_walk *walk) > @@ -132,13 +154,12 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, > > pmd = pmd_offset(pud, addr); > do { > - if (pmd_leaf(*pmd)) { > - int ret; > + int ret; > + > + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk); > + if (ret) > + return ret; > > - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk); > - if (ret) > - return ret; > - } > next = pmd_addr_end(addr, end); > vmemmap_pte_range(pmd, addr, next, walk); > } while (pmd++, addr = next, addr != end); > @@ -321,10 +342,8 @@ int vmemmap_remap_free(unsigned long start, unsigned long end, > */ > BUG_ON(start - reuse != PAGE_SIZE); > > - mmap_write_lock(&init_mm); > + mmap_read_lock(&init_mm); > ret = vmemmap_remap_range(reuse, end, &walk); > - mmap_write_downgrade(&init_mm); > - > if (ret && walk.nr_walked) { > end = reuse + walk.nr_walked * PAGE_SIZE; > /* > -- > 2.11.0 > Thanks barry