From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BC3EC388F9 for ; Fri, 20 Nov 2020 02:52:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0A0322255 for ; Fri, 20 Nov 2020 02:52:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="xXyxgqOD" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0A0322255 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 062496B0036; Thu, 19 Nov 2020 21:52:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 012A76B005C; Thu, 19 Nov 2020 21:52:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E41596B005D; Thu, 19 Nov 2020 21:52:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id A9A506B0036 for ; Thu, 19 Nov 2020 21:52:44 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 25948180AD830 for ; Fri, 20 Nov 2020 02:52:44 +0000 (UTC) X-FDA: 77503273848.14.judge25_05160e127348 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 075D018229818 for ; Fri, 20 Nov 2020 02:52:43 +0000 (UTC) X-HE-Tag: judge25_05160e127348 X-Filterd-Recvd-Size: 6606 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Fri, 20 Nov 2020 02:52:43 +0000 (UTC) Received: by mail-pf1-f194.google.com with SMTP id b63so6446875pfg.12 for ; Thu, 19 Nov 2020 18:52:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=x3fQTleE37dBaaRILN4zja/x8tkqs0SN0a6ZWTGISLw=; b=xXyxgqODRAMMU0TFUNqpr/JQ8NScJjSNeRtfE69kZJl7F2i+xOmuYWnakeuKW2+7qC 6LkRF2YJ3oHPUuKEZ3ePeujmEXQ5K84jXbRmhEZR3c+vJAy+CzpTIH2LjMPsZGr9JweN 70noZeB/jPfeSbknhioqeiqqp0o602+KijL+n/yDY1RVUItp+jDmnbrUBFN9uK4oDE3q KUXRKR/cdldDw8NrltAckAJdyU00yBQM3Rg4QcfdTBGoHXPaVKKqfYeNkwyrnv+GNcvE TBgx/jRVDZ0OnrWwNsJ47Fi2UeWmrBcMc8yvyMgpJfXpr57aolX/H3ICJvRlSoWtaAi4 Fo+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=x3fQTleE37dBaaRILN4zja/x8tkqs0SN0a6ZWTGISLw=; b=BLf2+rMvE0pEKR71D8XiQx/fFgUq9m7JEogv6dEDV9+60UcuSlaK5qtOuLuZoqRbxI cy7g80WoqPCS2jEz2AvdmeHW+slXuR2zc5uMDoyku61uAXMtik38JJA29crzP3XZIva+ +Lp6194N03gSeKhzv1h20pd2+FFdGxHAgJjagwHGJvBcQnuYk4uVbV/L2ITBUFlHLgSO 5IA26PQtQgJGugVdhT3taUWZIuwoX17w8zJW+qPTTH8mUsWkDFCohkJpLsoWwU6pA79y 0ZjU+rGJ697wMVxTv+ruavwwEQ5rzV/3fGMsMV253jvQb3itiejQHobz5yBe6DWD7fss w0ng== X-Gm-Message-State: AOAM533OZI4k2A36Jji9HWMNrV98pWkSix2D62KQDgwCxG4hXQ76TJdU ZhV1fqAi3xOnkKEHTNCLpUGeZLy685Gl2zTaUozR9Q== X-Google-Smtp-Source: ABdhPJwSi5hXLwYwcc3owyMdpS7hyrS2TAlDEStfyPSgBWEmCwvUNzG52KrLGt3o75xw6sXcUfAk2L+XtfsQE/7R3r4= X-Received: by 2002:a17:90b:941:: with SMTP id dw1mr7748343pjb.147.1605840761818; Thu, 19 Nov 2020 18:52:41 -0800 (PST) MIME-Version: 1.0 References: <20201113105952.11638-1-songmuchun@bytedance.com> <20201113105952.11638-6-songmuchun@bytedance.com> <20201117150604.GA15679@linux> <44efc25e-525b-9e51-60e4-da20deb25ded@oracle.com> In-Reply-To: <44efc25e-525b-9e51-60e4-da20deb25ded@oracle.com> From: Muchun Song Date: Fri, 20 Nov 2020 10:52:00 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v4 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers To: Mike Kravetz Cc: Oscar Salvador , Jonathan Corbet , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Michal Hocko , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Nov 20, 2020 at 7:22 AM Mike Kravetz wrote: > > On 11/18/20 10:17 PM, Muchun Song wrote: > > On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador wrote: > >> > >> On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote: > >>> +#define page_huge_pte(page) ((page)->pmd_huge_pte) > >> > >> Seems you do not need this one anymore. > >> > >>> +void vmemmap_pgtable_free(struct page *page) > >>> +{ > >>> + struct page *pte_page, *t_page; > >>> + > >>> + list_for_each_entry_safe(pte_page, t_page, &page->lru, lru) { > >>> + list_del(&pte_page->lru); > >>> + pte_free_kernel(&init_mm, page_to_virt(pte_page)); > >>> + } > >>> +} > >>> + > >>> +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) > >>> +{ > >>> + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); > >>> + > >>> + /* Store preallocated pages on huge page lru list */ > >>> + INIT_LIST_HEAD(&page->lru); > >>> + > >>> + while (nr--) { > >>> + pte_t *pte_p; > >>> + > >>> + pte_p = pte_alloc_one_kernel(&init_mm); > >>> + if (!pte_p) > >>> + goto out; > >>> + list_add(&virt_to_page(pte_p)->lru, &page->lru); > >>> + } > >> > >> Definetely this looks better and easier to handle. > >> Btw, did you explore Matthew's hint about instead of allocating a new page, > >> using one of the ones you are going to free to store the ptes? > >> I am not sure whether it is feasible at all though. > > > > Hi Oscar and Matthew, > > > > I have started an investigation about this. Finally, I think that it > > may not be feasible. If we use a vmemmap page frame as a > > page table when we split the PMD table firstly, in this stage, > > we need to set 512 pte entry to the vmemmap page frame. If > > someone reads the tail struct page struct of the HugeTLB, > > it can get the arbitrary value (I am not sure it actually exists, > > maybe the memory compaction module can do this). So on > > the safe side, I think that allocating a new page is a good > > choice. > > Thanks for looking into this. > > If I understand correctly, the issue is that you need the pte page to set > up the new mappings. In your current code, this is done before removing > the pages of struct pages. This keeps everything 'consistent' as things > are remapped. > > If you want to use one of the 'pages of struct pages' for the new pte > page, then there will be a period of time when things are inconsistent. > Before setting up the mapping, some code could potentially access that > pages of struct pages. Yeah, you are right. > > I tend to agree that allocating allocating a new page is the safest thing > to do here. Or, perhaps someone can think of a way make this safe. > -- > Mike Kravetz -- Yours, Muchun