From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=1WO3=K4=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-9.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A06C0C47083
	for <linux-mm@archiver.kernel.org>; Wed,  2 Jun 2021 15:01:10 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 58332613BA
	for <linux-mm@archiver.kernel.org>; Wed,  2 Jun 2021 15:01:10 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 58332613BA
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id DA2646B00B3; Wed,  2 Jun 2021 11:01:09 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D79056B00B4; Wed,  2 Jun 2021 11:01:09 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C41616B00B5; Wed,  2 Jun 2021 11:01:09 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239])
	by kanga.kvack.org (Postfix) with ESMTP id 965456B00B3
	for <linux-mm@kvack.org>; Wed,  2 Jun 2021 11:01:09 -0400 (EDT)
Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 2F0FC180AD804
	for <linux-mm@kvack.org>; Wed,  2 Jun 2021 15:01:09 +0000 (UTC)
X-FDA: 78209096658.13.8BD315B
Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50])
	by imf11.hostedemail.com (Postfix) with ESMTP id 7462B200111E
	for <linux-mm@kvack.org>; Wed,  2 Jun 2021 15:00:52 +0000 (UTC)
Received: by mail-io1-f50.google.com with SMTP id z24so2879326ioi.3
        for <linux-mm@kvack.org>; Wed, 02 Jun 2021 08:01:08 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to;
        bh=XV3S2r7RfBVL2t6B+DjxdU3A89SJYVVG0oKOCJiH/mw=;
        b=mor0DGXC/7c70VBsO/waRHHqw3TdmsPF1kaArGW0BBEsCVG9EB3aiCueYd2r4jhT3Z
         lfnFRNf9Lmlj+zY5pc4tBO/4pGTC8EAXbSaHkK7eR9OhOSKZtMypySGaUn/a2yrcsF04
         HnLl9XFVchhCv3A+RMIPMTs200rwM1k6o65oCu8SYQ4TAaSGHzQ4mucLAxDvH1LYdIZY
         z59Pz2UurKhXf2abAcu6fk3rlbCfnzXgfsAJkwE/Di1mjysB0cHUm3LOWQxj4hAlTkTN
         Qa+kPABTo7OovUAZ7h/kBiFBFYw7pt44xNJq0ohvsqibzsdY36/JI175//MTFFS7UlZk
         TtPw==
X-Gm-Message-State: AOAM532bw6SE7nKEjkg836m3lM5MInhf6Ae1pwblE7Lc2DGSBwYSXfDa
	8c+Bd0F8nlfLf0hEEQkOeH4=
X-Google-Smtp-Source: ABdhPJxtV4H9/gm6mEHZGR52+MUF18V76koPyWg7kZNbT91pxc9+P7CDTTIy+qtj2OlCJ16O9FKz9Q==
X-Received: by 2002:a02:970c:: with SMTP id x12mr30676547jai.21.1622646066391;
        Wed, 02 Jun 2021 08:01:06 -0700 (PDT)
Received: from google.com (243.199.238.35.bc.googleusercontent.com. [35.238.199.243])
        by smtp.gmail.com with ESMTPSA id n9sm168429ilt.58.2021.06.02.08.01.05
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 02 Jun 2021 08:01:05 -0700 (PDT)
Date: Wed, 2 Jun 2021 15:01:04 +0000
From: Dennis Zhou <dennis@kernel.org>
To: Bharata B Rao <bharata@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	aneesh.kumar@linux.ibm.com, tj@kernel.org, cl@linux.com,
	akpm@linux-foundation.org, amakhalov@vmware.com, guro@fb.com,
	vbabka@suse.cz, srikar@linux.vnet.ibm.com, psampat@linux.ibm.com,
	ego@linux.vnet.ibm.com
Subject: Re: [RFC PATCH v0 0/3] CPU hotplug awareness in percpu allocator
Message-ID: <YLedMLpU0W1DjWko@google.com>
References: <20210601065147.53735-1-bharata@linux.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20210601065147.53735-1-bharata@linux.ibm.com>
Authentication-Results: imf11.hostedemail.com;
	dkim=none;
	dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none);
	spf=pass (imf11.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.166.50 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com
X-Stat-Signature: fsr7tez4r9k64cxqgir4qwqrxmgnwfm4
X-Rspamd-Queue-Id: 7462B200111E
X-Rspamd-Server: rspam02
X-HE-Tag: 1622646052-642434
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hello,

On Tue, Jun 01, 2021 at 12:21:44PM +0530, Bharata B Rao wrote:
> Hi,
> 
> This is an attempt to make the percpu allocator CPU hotplug aware.
> Currently the percpu allocator allocates memory for all the possible
> CPUs. This can lead to wastage of memory when possible number of CPUs
> is significantly higher than the number of online CPUs. This can be
> avoided if the percpu allocator were to allocate only for the online
> CPUs and extend the allocation for other CPUs as and when they become
> online. 
> 
> This early RFC work shows some good memory savings for a powerpc
> KVM guest that is booted with 16 online and 1024 possible CPUs.
> Here is the comparision of Percpu memory consumption from
> /proc/meminfo before and after creating 1000 memcgs.
> 
> 			W/o patch		W/ patch
> Before			1441792 kB		22528 kB
> After 1000 memcgs	4390912 kB		68608 kB
> 
> Note that the Percpu reporting in meminfo has been changed in
> the patchset to reflect the allocation for online CPUs only.
> 
> More details about the approach are present in the patch
> descriptions.
> 
> Bharata B Rao (3):
>   percpu: CPU hotplug support for alloc_percpu()
>   percpu: Limit percpu allocator to online cpus
>   percpu: Avoid using percpu ptrs of non-existing cpus
> 
>  fs/namespace.c             |   4 +-
>  include/linux/cpuhotplug.h |   2 +
>  include/linux/percpu.h     |  15 +++
>  kernel/cgroup/rstat.c      |  20 +++-
>  kernel/sched/cpuacct.c     |  10 +-
>  kernel/sched/psi.c         |  14 ++-
>  lib/percpu-refcount.c      |   4 +-
>  lib/percpu_counter.c       |   2 +-
>  mm/percpu-internal.h       |   9 ++
>  mm/percpu-vm.c             | 211 +++++++++++++++++++++++++++++++++-
>  mm/percpu.c                | 229 +++++++++++++++++++++++++++++++++++--
>  net/ipv4/fib_semantics.c   |   2 +-
>  net/ipv6/route.c           |   6 +-
>  13 files changed, 490 insertions(+), 38 deletions(-)
> 
> -- 
> 2.31.1
> 

I have thought about this for a day now and to be honest my thoughts
haven't really changed since the last discussion in [1].

I struggle here for a few reasons:
1. We're intertwining cpu and memory for hotplug.
  - What does it mean if we don't have enough memory?
  - How hard do we try to reclaim memory?
  - Partially allocated cpus? Do we free it all and try again?
2. We're now blocking the whole system on the percpu mutex which can
   cause terrible side effects. If there is a large amount of percpu
   memory already in use, this means we've accumulated a substantial
   number of callbacks.
3. While I did mention a callback approach would work. I'm not thrilled
   by the additional complexity of it as it can be error prone.

Beyond the above. I still don't believe it's the most well motivated
problem. I struggle to see a world where it makes sense to let someone
scale from 16 cpus to 1024. As in my mind you would also need to scale
memory to some degree too (not necessarily linearly but a 1024 core
machine with say like 16 gigs of ram would be pretty funny).

Would it be that bad to use cold migration points and eat a little bit
of overhead for what I understand to be a relatively uncommon use case?

[1] https://lore.kernel.org/linux-mm/8E7F3D98-CB68-4418-8E0E-7287E8273DA9@vmware.com/

Thanks,
Dennis