From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CE69C433B4 for ; Fri, 7 May 2021 14:00:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0631B60C40 for ; Fri, 7 May 2021 14:00:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0631B60C40 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 299AB6B00AD; Fri, 7 May 2021 10:00:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 248C66B00AE; Fri, 7 May 2021 10:00:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C2AC8D0011; Fri, 7 May 2021 10:00:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0230.hostedemail.com [216.40.44.230]) by kanga.kvack.org (Postfix) with ESMTP id E25EA6B00AD for ; Fri, 7 May 2021 10:00:42 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9B246181AEF39 for ; Fri, 7 May 2021 14:00:42 +0000 (UTC) X-FDA: 78114595524.07.402F2CD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 92A158023240 for ; Fri, 7 May 2021 14:00:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1620396041; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MiSXjJ0WR3uZ2T2RJ9lHILm/xn7n1a3SDG4Exknmyh4=; b=Mh5pzjvsKq4+XDSdN3KloYh2of2L6cw5Qoa5yXoUUZOadz4u4D5UwxySIdmSqZYjOdMi/U lQogti4CWz45KR5maMtiv4Y3iUxRbIyH0Ih0D9ZVmhZOnkHyHyRmM78VS7NjtGUuHilcEi ZmWUsC+DCEKJtLu0xu9+/ndVvs7qxCk= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-338-Lvqbk62xPvSlIR4F2K0pRw-1; Fri, 07 May 2021 10:00:39 -0400 X-MC-Unique: Lvqbk62xPvSlIR4F2K0pRw-1 Received: by mail-wm1-f70.google.com with SMTP id l4-20020a7bcf040000b029014daf4d9d3aso2289005wmg.1 for ; Fri, 07 May 2021 07:00:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=MiSXjJ0WR3uZ2T2RJ9lHILm/xn7n1a3SDG4Exknmyh4=; b=EqGNXAwXEeTscDd9WssM78ny3blVm6tunLwfxKvoRpl23dwWZ5BpNieaDkDR/rnrxR OVxec8ImuJM95pA/NDBVwPVCGXjd+uoS4HGz0PstkOYf4HyXBSCcjl9XjSSbdW5c28Zj +cppRchL6Eim3Xof9wB/OeilAGs4JFhEvoUj1Yk+sS+oEINPmaZsRqvrDNhWk67fIeSh CcSpxuZ4N0sbmb8Cblvbw2oohqiGg3CheO98OfVPQhFisJF/7YQF9ErnQuFliPmjaQcb QOJ3JhpjjEGIuuCmg802Vr/9/s1bIOdNjJk5EO4nHxWjNr6vUBgbLg3oCfOL/cQJXG2N vQiQ== X-Gm-Message-State: AOAM533GXZOJR8AyKMW9n9VPCkNyPyD/TJIdA9Qo0NBpnAvon2DSPUv7 ASZ0dzgZd2dtew94Jiv7y6ERjL8WYI1s6JyuxVlZsbhgrPAvMnYHWPFjuQYJzGfTYubTMjal3K/ D8Ned1oiBAtvlYxRIHREMepVc5QBMqZobnYcr6D9LUNYnHoDIpir3hwYtCJY= X-Received: by 2002:adf:e291:: with SMTP id v17mr12060507wri.149.1620396038205; Fri, 07 May 2021 07:00:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw+s+cU3Vtr7QIEk2tXxhKFgB7G7iAl1u8Q7+ZtZsRjRk+l+P2f2eAswGLfrfLFtpRTfd9csw== X-Received: by 2002:adf:e291:: with SMTP id v17mr12060455wri.149.1620396037823; Fri, 07 May 2021 07:00:37 -0700 (PDT) Received: from [192.168.3.132] (p5b0c63c0.dip0.t-ipconnect.de. [91.12.99.192]) by smtp.gmail.com with ESMTPSA id c8sm8651643wrx.4.2021.05.07.07.00.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 May 2021 07:00:37 -0700 (PDT) To: Michal Hocko , Zi Yan Cc: Oscar Salvador , Michael Ellerman , Benjamin Herrenschmidt , Thomas Gleixner , x86@kernel.org, Andy Lutomirski , "Rafael J . Wysocki" , Andrew Morton , Mike Rapoport , Anshuman Khandual , Dan Williams , Wei Yang , linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org References: <20210506152623.178731-1-zi.yan@sent.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/7] Memory hotplug/hotremove at subsection size Message-ID: <792d73e2-5d63-74a5-5554-20351d5532ff@redhat.com> Date: Fri, 7 May 2021 16:00:36 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Mh5pzjvs; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf27.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: 16qxutbfte1iz9yko4rxdmaejfjw131r X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 92A158023240 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf27; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620396009-34735 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 07.05.21 13:55, Michal Hocko wrote: > [I haven't read through respective patches due to lack of time but let > me comment on the general idea and the underlying justification] >=20 > On Thu 06-05-21 17:31:09, David Hildenbrand wrote: >> On 06.05.21 17:26, Zi Yan wrote: >>> From: Zi Yan >>> >>> Hi all, >>> >>> This patchset tries to remove the restriction on memory hotplug/hotre= move >>> granularity, which is always greater or equal to memory section size[= 1]. >>> With the patchset, kernel is able to online/offline memory at a size = independent >>> of memory section size, as small as 2MB (the subsection size). >> >> ... which doesn't make any sense as we can only online/offline whole m= emory >> block devices. >=20 > Agreed. The subsection thingy is just a hack to workaround pmem > alignement problems. For the real memory hotplug it is quite hard to > argue for reasonable hotplug scenarios for very small physical memory > ranges wrt. to the existing sparsemem memory model. > =20 >>> The motivation is to increase MAX_ORDER of the buddy allocator and pa= geblock >>> size without increasing memory hotplug/hotremove granularity at the s= ame time, >> >> Gah, no. Please no. No. >=20 > Agreed. Those are completely independent concepts. MAX_ORDER is can be > really arbitrary irrespective of the section size with vmemmap sparse > model. The existing restriction is due to old sparse model not being > able to do page pointer arithmetic across memory sections. Is there any > reason to stick with that memory model for an advance feature you are > working on? >=20 I gave it some more thought yesterday. I guess the first thing we should=20 look into is increasing MAX_ORDER and leaving pageblock_order and=20 section size as is -- finding out what we have to tweak to get that up=20 and running. Once we have that in place, we can actually look into=20 better fragmentation avoidance etc. One step at a time. Because that change itself might require some thought. Requiring that=20 bigger MAX_ORDER depends on SPARSE_VMEMMAP is something reasonable to do. As stated somewhere here already, we'll have to look into making=20 alloc_contig_range() (and main users CMA and virtio-mem) independent of=20 MAX_ORDER and mainly rely on pageblock_order. The current handling in=20 alloc_contig_range() is far from optimal as we have to isolate a whole=20 MAX_ORDER - 1 page -- and on ZONE_NORMAL we'll fail easily if any part=20 contains something unmovable although we don't even want to allocate=20 that part. I actually have that on my list (to be able to fully support=20 pageblock_order instead of MAX_ORDER -1 chunks in virtio-mem), however=20 didn't have time to look into it. Further, page onlining / offlining code and early init code most=20 probably also needs care if MAX_ORDER - 1 crosses sections. Memory holes=20 we might suddenly have in MAX_ORDER - 1 pages might become a problem and=20 will have to be handled. Not sure which other code has to be tweaked=20 (compaction? page isolation?). Figuring out what needs care itself might take quite some effort. One thing I was thinking about as well: The bigger our MAX_ORDER, the=20 slower it could be to allocate smaller pages. If we have 1G pages,=20 splitting them down to 4k then takes 8 additional steps if I'm, not=20 wrong. Of course, that's the worst case. Would be interesting to evaluate= . --=20 Thanks, David / dhildenb