From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17317C4320A for ; Mon, 9 Aug 2021 07:41:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 97FB76108B for ; Mon, 9 Aug 2021 07:41:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 97FB76108B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id EA44E8D0007; Mon, 9 Aug 2021 03:41:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E536D8D0003; Mon, 9 Aug 2021 03:41:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1AA38D0007; Mon, 9 Aug 2021 03:41:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0227.hostedemail.com [216.40.44.227]) by kanga.kvack.org (Postfix) with ESMTP id B4F7F8D0003 for ; Mon, 9 Aug 2021 03:41:42 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 69C711DE73 for ; Mon, 9 Aug 2021 07:41:42 +0000 (UTC) X-FDA: 78454747644.29.22B127C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf24.hostedemail.com (Postfix) with ESMTP id EF720B00CB2F for ; Mon, 9 Aug 2021 07:41:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628494901; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EKDuFOXcudRR7I95TUwGAE5VyDFCMtB+gvkbBiNEmAE=; b=hW6B/2ZasGCFbBrlSSO/Idm4lAbeEfGWLFVOzL3tXqSe3gSIihyM36wtWo2HnY5w+m/uLF nbhehMCw0I1ewr9xHdP5BiQPgdhpZ3YbCpU5UCd/KOPYnBUuombAFF+X051Bw8x/9eNily 0G1fED48Bs9U8zc1/dUU7sgfIkmDqyw= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-387-mHNaj2h4MjK6SkyesDxXGg-1; Mon, 09 Aug 2021 03:41:38 -0400 X-MC-Unique: mHNaj2h4MjK6SkyesDxXGg-1 Received: by mail-wm1-f69.google.com with SMTP id r21-20020a05600c35d5b02902e685ef1f76so1053003wmq.5 for ; Mon, 09 Aug 2021 00:41:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=EKDuFOXcudRR7I95TUwGAE5VyDFCMtB+gvkbBiNEmAE=; b=KKFU6RCCk2g1ugnMDqIUjBMmpOTPYGXWMDN5cy2IEqNEkt3h02yd2bOApUnRZ436q9 fp74EGWi+65G4pq5AqNgsGdFwmygCaLTgbMD6GUK9M2l9Cam1PJjJklb5xzVgIcgkpBY e1a62jqKjWhUz5ZmiLW79aqN3rE79cmrS6YRZz7ec4yDxbUvrJbzJ6ONMashgW27yujf /THmV3tThrpVjd6dIaEMfCy2D4hKmrIJ4Kogu1mSS/qXPOyPyUVLaBJiTvCYPO6Cun/o pGfBKXF92GXPabWXUnRzThUmcbA4dSrZ9HZDOJxDP8gxfKNPYlXZU2osLsUqW3VymJt4 oNFw== X-Gm-Message-State: AOAM5338jJOf4J2Kp/sGeeWz4SiisAmBh/3/r3o0+ZSdvDj8g7md6kWK ljl05TXynE1RBD7Wu09tnRpqo7FZJY8yFqajKIUbUAO6Ui+lQYY6wb3XBm3TZgKrnxbnEaMbLuv DpV/nvIMqHnM= X-Received: by 2002:a05:600c:2248:: with SMTP id a8mr10951226wmm.80.1628494896832; Mon, 09 Aug 2021 00:41:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxBjfqegUXUj0G3xjeCjq++n5NBqn1S/Z5VUnLcosVs95+fyJXmZVKh5wXsx6d1EodG9WjJuA== X-Received: by 2002:a05:600c:2248:: with SMTP id a8mr10951214wmm.80.1628494896660; Mon, 09 Aug 2021 00:41:36 -0700 (PDT) Received: from ?IPv6:2003:d8:2f0a:7f00:fad7:3bc9:69d:31f? (p200300d82f0a7f00fad73bc9069d031f.dip0.t-ipconnect.de. [2003:d8:2f0a:7f00:fad7:3bc9:69d:31f]) by smtp.gmail.com with ESMTPSA id c190sm16925787wma.21.2021.08.09.00.41.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 09 Aug 2021 00:41:36 -0700 (PDT) To: Zi Yan , linux-mm@kvack.org Cc: Matthew Wilcox , Vlastimil Babka , "Kirill A . Shutemov" , Mike Kravetz , Michal Hocko , John Hubbard , linux-kernel@vger.kernel.org References: <20210805190253.2795604-1-zi.yan@sent.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter. Message-ID: <28b57903-fae6-47ac-7e1b-a1dd41421349@redhat.com> Date: Mon, 9 Aug 2021 09:41:35 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210805190253.2795604-1-zi.yan@sent.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="hW6B/2Za"; spf=none (imf24.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: EF720B00CB2F X-Stat-Signature: b4nkeqhgm1arew54mhiiewrhdyr5567x X-HE-Tag: 1628494901-193378 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05.08.21 21:02, Zi Yan wrote: > From: Zi Yan >=20 > Hi all, >=20 > This patchset add support for kernel boot time adjustable MAX_ORDER, so= that > user can change the largest size of pages obtained from buddy allocator= . It also > removes the restriction on MAX_ORDER based on SECTION_SIZE_BITS, so tha= t > buddy allocator can merge PFNs across memory sections when SPARSEMEM_VM= EMMAP is > set. It is on top of v5.14-rc4-mmotm-2021-08-02-18-51. >=20 > Motivation > =3D=3D=3D >=20 > This enables kernel to allocate 1GB pages and is necessary for my ongoi= ng work > on adding support for 1GB PUD THP[1]. This is also the conclusion I cam= e up with > after some discussion with David Hildenbrand on what methods should be = used for > allocating gigantic pages[2], since other approaches like using CMA all= ocator or > alloc_contig_pages() are regarded as suboptimal. >=20 > This also prevents increasing SECTION_SIZE_BITS when increasing MAX_ORD= ER, since > increasing SECTION_SIZE_BITS is not desirable as memory hotadd/hotremov= e chunk > size will be increased as well, causing memory management difficulty fo= r VMs. >=20 > In addition, make MAX_ORDER a kernel boot time parameter can enable use= r to > adjust buddy allocator without recompiling the kernel for their own nee= ds, so > that one can still have a small MAX_ORDER if he/she does not need to al= locate > gigantic pages like 1GB PUD THPs. >=20 > Background > =3D=3D=3D >=20 > At the moment, kernel imposes MAX_ORDER - 1 + PAGE_SHFIT < SECTION_SIZE= _BITS > restriction. This prevents buddy allocator merging pages across memory = sections, > as PFNs might not be contiguous and code like page++ would fail. But th= is would > not be an issue when SPARSEMEM_VMEMMAP is set, since all struct page ar= e > virtually contiguous. In addition, as long as buddy allocator checks th= e PFN > validity during buddy page merging (done in Patch 3), pages allocated f= rom > buddy allocator can be manipulated by code like page++. >=20 >=20 > Description > =3D=3D=3D >=20 > I tested the patchset on both x86_64 and ARM64 at 4KB, 16KB, and 64KB b= ase > pages. The systems boot and ltp mm test suite finished without issue. A= lso > memory hotplug worked on x86_64 when I tested. It definitely needs more= tests > and reviews for other architectures. >=20 > In terms of the concerns on performance degradation if MAX_ORDER is inc= reased, > I did some initial performance tests comparing MAX_ORDER=3D11 and MAX_O= RDER=3D20 on > x86_64 machines and saw no performance difference[3]. >=20 > Patch 1 excludes MAX_ORDER check from 32bit vdso compilation. The check= uses > irrelevant 32bit SECTION_SIZE_BITS during 64bit kernel compilation. The > exclusion does not break the check in 32bit kernel, since the check wil= l still > be performed during other kernel component compilation. >=20 > Patch 2 gives FORCE_MAX_ZONEORDER a better name. >=20 > Patch 3 restores the pfn_valid_within() check when buddy allocator can = merge > pages across memory sections. The check was removed when ARM64 gets rid= of holes > in zones, but holes can appear in zones again after this patchset. >=20 > Patch 4-11 convert the use of MAX_ORDER to SECTION_SIZE_BITS or its der= ivative > constants, since these code places use MAX_ORDER as boundary check for > physically contiguous pages, where SECTION_SIZE_BITS should be used. Af= ter this > patchset, MAX_ORDER can go beyond SECTION_SIZE_BITS, the code can break= . > I separate changes to different patches for easy review and can merge t= hem into > a single one if that works better. >=20 > Patch 12 adds a new Kconfig option SET_MAX_ORDER to allow specifying MA= X_ORDER > when ARCH_FORCE_MAX_ORDER is not used by the arch, like x86_64. >=20 > Patch 13 converts statically allocated arrays with MAX_ORDER length to = dynamic > ones if possible and prepares for making MAX_ORDER a boot time paramete= r. >=20 > Patch 14 adds a new MIN_MAX_ORDER constant to replace soon-to-be-dynami= c > MAX_ORDER for places where converting static array to dynamic one is ca= using > hassle and not necessary, i.e., ARM64 hypervisor page allocation and SL= AB. >=20 > Patch 15 finally changes MAX_ORDER to be a kernel boot time parameter. >=20 >=20 > Any suggestion and/or comment is welcome. Thanks. >=20 >=20 > TODO > =3D=3D=3D >=20 > 1. Redo the performance comparison tests using this patchset to underst= and the > performance implication of changing MAX_ORDER. 2. Make alloc_contig_range() cleanly deal with pageblock_order instead=20 of MAX_ORDER - 1 to not force the minimal CMA area size/alignment to be=20 e.g., 1 GiB instead of 4 MiB and to keep virtio-mem working as expected. virtio-mem short term would mean disallowing initialization when an=20 incompatible setup (MAX_ORDER_NR_PAGE > SECTION_NR_PAGES) is detected=20 and bailing out loud that the admin has to fix that on the command line.=20 I have optimizing alloc_contig_range() on my todo list, to get rid of=20 the MAX_ORDER -1 dependency in virtio-mem; but I have no idea when I'll=20 have time to work on that. --=20 Thanks, David / dhildenb