From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B62BC433E6 for ; Fri, 5 Feb 2021 18:28:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C8FBA64EFE for ; Fri, 5 Feb 2021 18:28:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8FBA64EFE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A63A18D0001; Fri, 5 Feb 2021 13:28:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A39BC6B007B; Fri, 5 Feb 2021 13:28:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9506C8D0001; Fri, 5 Feb 2021 13:28:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id 8019F6B0078 for ; Fri, 5 Feb 2021 13:28:34 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4A245181AC9CB for ; Fri, 5 Feb 2021 18:28:34 +0000 (UTC) X-FDA: 77785049748.06.earth51_01079bd275e6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 1A6021003B74D for ; Fri, 5 Feb 2021 18:28:34 +0000 (UTC) X-HE-Tag: earth51_01079bd275e6 X-Filterd-Recvd-Size: 10018 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Fri, 5 Feb 2021 18:28:33 +0000 (UTC) Received: by mail-qk1-f176.google.com with SMTP id t126so5325829qke.11 for ; Fri, 05 Feb 2021 10:28:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qIYQWYhqyLwEqsbVc7mdSrEASRoPZLsNukqo0gJhBmg=; b=mbgzphR0YN4tKecjTyXwukcMHOoZDeId+hdL1RYDkVbNtkG6uWj2qGb3FxaDDdyS6v X6xwNYXmOT1r2DTV05nyWz1mqlTt80+DxKiZzaXlzzIzcZz26XSbRTHAGlTSUk4SUNzs wBKZ5+OH8Azokiw8WJr9DxoKBbQYZ4DLElnrnaMAuEwVfXghy8npcIeEPPn8SoqybJsi Y18Aor31X7vFwxoPVTIUAJVpD1+I30NbKioOWiLUxhHxeID4EEyy2BAnoc7LkH6gunmB DqbgYcSkOOAdMSwdYRcxFJDfcLmWWRd0yFg+nO+fwuYWhMzCfR7Ilbar1VgqjzdE/7LT sNbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qIYQWYhqyLwEqsbVc7mdSrEASRoPZLsNukqo0gJhBmg=; b=pDCs0o2fFvR8yctz58R9YURFXmtzc9GA1VNzzIHZ5i2pWsOklc8xJbgT63Ifs/J/9P 9cMAK9i6jNI/mpsyocRnDyTlKDG8Ms4bbdfFuouyYfi10DQ2YjrzR3JAKP8jn/voyhIe 9HzFCoAtQz5VvVwk8AmRH7Al4lpHuAd2MkiPzH3H0gV2DnDSulOrVYwQ1P/7fga47VL3 SZUB8DU/7BvLP8p3vCQnq9vcH07LfsSsda67YOpv7aBAAP8P6d57N/qpbCUcaA1UlJsk 0F7/hgoa/V7x2StxmMZO0A/mZnXaV7EFu6JpNcVr6kaanoOi+HvdsuZY9Clt5nRjgT2O oUMA== X-Gm-Message-State: AOAM5338goOpnOGPXgxDdU9/CHfSR74Smjpl0NzAUat28IrbvHM2X+Kh n4a4hyhz2lh5RhLd0b5+yp/wpw== X-Google-Smtp-Source: ABdhPJzexKjnLWE5vCXHICRghL3aphfw3CmfimOim3TxSODokljlyHm1qCbw6cTnNWSGFKQHhcKBwQ== X-Received: by 2002:a37:9b55:: with SMTP id d82mr5601786qke.172.1612549712822; Fri, 05 Feb 2021 10:28:32 -0800 (PST) Received: from localhost (70.44.39.90.res-cmts.bus.ptd.net. [70.44.39.90]) by smtp.gmail.com with ESMTPSA id h22sm8313273qth.55.2021.02.05.10.28.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Feb 2021 10:28:32 -0800 (PST) From: Johannes Weiner To: Andrew Morton , Tejun Heo Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 5/8] cgroup: rstat: punt root-level optimization to individual controllers Date: Fri, 5 Feb 2021 13:28:03 -0500 Message-Id: <20210205182806.17220-6-hannes@cmpxchg.org> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210205182806.17220-1-hannes@cmpxchg.org> References: <20210205182806.17220-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Current users of the rstat code can source root-level statistics from the native counters of their respective subsystem, allowing them to forego aggregation at the root level. This optimization is currently implemented inside the generic rstat code, which doesn't track the root cgroup and doesn't invoke the subsystem flush callbacks on it. However, the memory controller cannot do this optimization, because cgroup1 breaks out memory specifically for the local level, including at the root level. In preparation for the memory controller switching to rstat, move the optimization from rstat core to the controllers. Afterwards, rstat will always track the root cgroup for changes and invoke the subsystem callbacks on it; and it's up to the subsystem to special-case and skip aggregation of the root cgroup if it can source this information through other, cheaper means. The extra cost of tracking the root cgroup is negligible: on stat changes, we actually remove a branch that checks for the root. The queueing for a flush touches only per-cpu data, and only the first stat change since a flush requires a (per-cpu) lock. Signed-off-by: Johannes Weiner --- block/blk-cgroup.c | 14 +++++++--- kernel/cgroup/rstat.c | 60 +++++++++++++++++++++++++------------------ 2 files changed, 45 insertions(+), 29 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 02ce2058c14b..76725e1cad7f 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -766,6 +766,10 @@ static void blkcg_rstat_flush(struct cgroup_subsys_s= tate *css, int cpu) struct blkcg *blkcg =3D css_to_blkcg(css); struct blkcg_gq *blkg; =20 + /* Root-level stats are sourced from system-wide IO stats */ + if (!cgroup_parent(css->cgroup)) + return; + rcu_read_lock(); =20 hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { @@ -789,6 +793,7 @@ static void blkcg_rstat_flush(struct cgroup_subsys_st= ate *css, int cpu) u64_stats_update_end(&blkg->iostat.sync); =20 /* propagate global delta to parent */ + /* XXX: could skip this if parent is root */ if (parent) { u64_stats_update_begin(&parent->iostat.sync); blkg_iostat_set(&delta, &blkg->iostat.cur); @@ -803,10 +808,11 @@ static void blkcg_rstat_flush(struct cgroup_subsys_= state *css, int cpu) } =20 /* - * The rstat algorithms intentionally don't handle the root cgroup to av= oid - * incurring overhead when no cgroups are defined. For that reason, - * cgroup_rstat_flush in blkcg_print_stat does not actually fill out the - * iostat in the root cgroup's blkcg_gq. + * We source root cgroup stats from the system-wide stats to avoid + * tracking the same information twice and incurring overhead when no + * cgroups are defined. For that reason, cgroup_rstat_flush in + * blkcg_print_stat does not actually fill out the iostat in the root + * cgroup's blkcg_gq. * * However, we would like to re-use the printing code between the root a= nd * non-root cgroups to the extent possible. For that reason, we simulate diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index faa767a870ba..6f50c199bf2a 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -25,13 +25,8 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struc= t cgroup *cgrp, int cpu) void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) { raw_spinlock_t *cpu_lock =3D per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); - struct cgroup *parent; unsigned long flags; =20 - /* nothing to do for root */ - if (!cgroup_parent(cgrp)) - return; - /* * Speculative already-on-list test. This may race leading to * temporary inaccuracies, which is fine. @@ -46,10 +41,10 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cp= u) raw_spin_lock_irqsave(cpu_lock, flags); =20 /* put @cgrp and all ancestors on the corresponding updated lists */ - for (parent =3D cgroup_parent(cgrp); parent; - cgrp =3D parent, parent =3D cgroup_parent(cgrp)) { + while (true) { struct cgroup_rstat_cpu *rstatc =3D cgroup_rstat_cpu(cgrp, cpu); - struct cgroup_rstat_cpu *prstatc =3D cgroup_rstat_cpu(parent, cpu); + struct cgroup *parent =3D cgroup_parent(cgrp); + struct cgroup_rstat_cpu *prstatc; =20 /* * Both additions and removals are bottom-up. If a cgroup @@ -58,8 +53,16 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu= ) if (rstatc->updated_next) break; =20 + if (!parent) { + rstatc->updated_next =3D cgrp; + break; + } + + prstatc =3D cgroup_rstat_cpu(parent, cpu); rstatc->updated_next =3D prstatc->updated_children; prstatc->updated_children =3D cgrp; + + cgrp =3D parent; } =20 raw_spin_unlock_irqrestore(cpu_lock, flags); @@ -113,23 +116,26 @@ static struct cgroup *cgroup_rstat_cpu_pop_updated(= struct cgroup *pos, */ if (rstatc->updated_next) { struct cgroup *parent =3D cgroup_parent(pos); - struct cgroup_rstat_cpu *prstatc =3D cgroup_rstat_cpu(parent, cpu); - struct cgroup_rstat_cpu *nrstatc; - struct cgroup **nextp; - - nextp =3D &prstatc->updated_children; - while (true) { - nrstatc =3D cgroup_rstat_cpu(*nextp, cpu); - if (*nextp =3D=3D pos) - break; - - WARN_ON_ONCE(*nextp =3D=3D parent); - nextp =3D &nrstatc->updated_next; + + if (parent) { + struct cgroup_rstat_cpu *prstatc; + struct cgroup **nextp; + + prstatc =3D cgroup_rstat_cpu(parent, cpu); + nextp =3D &prstatc->updated_children; + while (true) { + struct cgroup_rstat_cpu *nrstatc; + + nrstatc =3D cgroup_rstat_cpu(*nextp, cpu); + if (*nextp =3D=3D pos) + break; + WARN_ON_ONCE(*nextp =3D=3D parent); + nextp =3D &nrstatc->updated_next; + } + *nextp =3D rstatc->updated_next; } =20 - *nextp =3D rstatc->updated_next; rstatc->updated_next =3D NULL; - return pos; } =20 @@ -309,11 +315,15 @@ static void cgroup_base_stat_sub(struct cgroup_base= _stat *dst_bstat, =20 static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { - struct cgroup *parent =3D cgroup_parent(cgrp); struct cgroup_rstat_cpu *rstatc =3D cgroup_rstat_cpu(cgrp, cpu); + struct cgroup *parent =3D cgroup_parent(cgrp); struct cgroup_base_stat cur, delta; unsigned seq; =20 + /* Root-level stats are sourced from system-wide CPU stats */ + if (!parent) + return; + /* fetch the current per-cpu values */ do { seq =3D __u64_stats_fetch_begin(&rstatc->bsync); @@ -326,8 +336,8 @@ static void cgroup_base_stat_flush(struct cgroup *cgr= p, int cpu) cgroup_base_stat_add(&cgrp->bstat, &delta); cgroup_base_stat_add(&rstatc->last_bstat, &delta); =20 - /* propagate global delta to parent */ - if (parent) { + /* propagate global delta to parent (unless that's root) */ + if (cgroup_parent(parent)) { delta =3D cgrp->bstat; cgroup_base_stat_sub(&delta, &cgrp->last_bstat); cgroup_base_stat_add(&parent->bstat, &delta); --=20 2.30.0