From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E71A7C35242 for ; Fri, 14 Feb 2020 19:32:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A4CB0222C4 for ; Fri, 14 Feb 2020 19:32:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kWwtVbX3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729860AbgBNTcZ (ORCPT ); Fri, 14 Feb 2020 14:32:25 -0500 Received: from mail-yb1-f193.google.com ([209.85.219.193]:35044 "EHLO mail-yb1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729075AbgBNTcY (ORCPT ); Fri, 14 Feb 2020 14:32:24 -0500 Received: by mail-yb1-f193.google.com with SMTP id p123so5314969ybp.2 for ; Fri, 14 Feb 2020 11:32:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=ceZdUC7bCwrEmtz1S3C9+688fy8+lwKr9au/LM7bdbk=; b=kWwtVbX3cDe/sVX6RJhzOgLIC5O/ZigNed1/HiPAq0YeXGo8rFglKn098d1rnog8od +RIj0EsXS5xVRUI7K3w0IyIYxrUbTftZEbFwRRj2sMkwrWNdQ/f2MgmAGGqKE5wvK6qa 4VYZwMTsxQtLNcyu2CUar5H13nQp5502ai/8krLyTh2gvEgVH4pv1f6w7utUT2bNDzQ+ vSiMgpg/FRA4bHDlQKyN5XNL99m8GCoc1Bl28P/TgKOZQEvGWW6xw7HCX7gFIiQZl4aF 4mFMf3FRHBvJGmzFlJt7wlEJEY54gAiPXuyDe2VKIHo70jZCQuYZ8R+z0VcuFG/N3Hix TGNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=ceZdUC7bCwrEmtz1S3C9+688fy8+lwKr9au/LM7bdbk=; b=Pp3d9iMCV7ekopFeJwXD5mxkmcVuRLFGiswkiPE29gbBlghmIuQwSMj9LOY40Jii1T PueRzjzGazTyup9PKUokbI5bAeBRX3KCU4TloSNjfWv9PDxDRmO07YpAhjVCfXoVSXYQ tb1gEqGE8T2n2Btl+Br8+6lLJnhJIJw0xlrhCMFpQsxSn3ikJZA3sibnMla43JSz8aPF MK07mLeZVEDmJ0HmBDzNi5tBLp5QnBB/PJdmOU85mvIUmzMvfpbln1ceUAaG7plCh2j/ M1AA88P1sufzaaxvqxrrL/qBgG53Nnp+TvTn9qHxE1K+dm6kdnsRYjXoF+iikeT4tMRE q3AA== X-Gm-Message-State: APjAAAUfmStF0X097QhsxQ35rxv6NfY1QDf3vSubAh52mH4AYP0zOmay phFSf1BkB4nE9cT9dJY178s21ldmPXusBilvkIkaCy/7YIoECA== X-Google-Smtp-Source: APXvYqzCdu2C6++diEU4bQbUPO2BP44Ccs6eGqp935T89ugWbb3sIRvDUvf864NCdR0s1N9Ei8NXJvxKAA0C9r+ytvc= X-Received: by 2002:a25:48c6:: with SMTP id v189mr4442380yba.41.1581708742788; Fri, 14 Feb 2020 11:32:22 -0800 (PST) MIME-Version: 1.0 References: <20191206231539.227585-1-irogers@google.com> <20200214075133.181299-1-irogers@google.com> In-Reply-To: <20200214075133.181299-1-irogers@google.com> From: Ian Rogers Date: Fri, 14 Feb 2020 11:32:11 -0800 Message-ID: Subject: Re: [PATCH v6 0/6] Optimize cgroup context switch To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Andrew Morton , Randy Dunlap , Masahiro Yamada , Shuah Khan , Krzysztof Kozlowski , Kees Cook , "Paul E. McKenney" , Masami Hiramatsu , Marco Elver , Kent Overstreet , Andy Shevchenko , Ard Biesheuvel , Kan Liang , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On a thread related to these patches Peter had previously asked for what the performance numbers looked like. I've tested on Westmere and Cascade Lake platforms. The benchmark is a set of processes in different cgroups reading/writing to a file descriptor, where the read context switches. To ensure the context switch all the processes are pinned to a particular CPU, the benchmark is tested to ensure the expected context-switches matches those performed. The benchmark increases the number of perf events and cgroups, it also looks at the effect of just monitoring 1 cgroup in an increasing set of cgroups. Before the patches on Westmere if we do system wide profiling of 10 events and then increase the cgroups to 208 and monitor just one, the context switch times go from 4.6us to 15.3us. If we monitor each cgroup then the context switch times are 172.5us. With the patches, the time for monitoring 1 cgroup goes from 4.6us to 14.9us, but when monitoring all cgroups the context switch times are 14.1us. The small speed up when monitoring 1 cgroup out of a set is that in most context switches the O(n) search for an event in a cgroup is now O(log(n)). When all cgroups are monitored the number of events in the kernel is the product of the number of events and cgroups, giving a larger value for 'n' and a more dramatic speed up - 172.5us becomes 14.9us. In summary what we see for performance is that before the patches we see context switch times being affected by the number of cgroups monitored, after the patches there is still a context switch cost in monitoring events, but it is similar whether 1 or all cgroups are being monitored. This fits with the intuition of what the patches are trying to do by avoiding searches of events that are for cgroups the current task isn't within.The results are consistent but less dramatic for smaller numbers of events and cgroups. We've not identified a slow down from the patches, but there is a degree of noise in the timing data. Broadly, with turbo disabled on the test machines the patches make context switch performance the same or faster. For a more representative number of events and cgroups, say 6 and 32, we see context switch time improve from 29.4us to 13.2us when all cgroups are monitored. Thanks, Ian On Thu, Feb 13, 2020 at 11:51 PM Ian Rogers wrote: > > Avoid iterating over all per-CPU events during cgroup changing context > switches by organizing events by cgroup. > > To make an efficient set of iterators, introduce a min max heap > utility with test. > > The v6 patch reduces the patch set by 4 patches, it updates the cgroup > id and fixes part of the min_heap rename from v5. > > The v5 patch set renames min_max_heap to min_heap as suggested by > Peter Zijlstra, it also addresses comments around preferring > __always_inline over inline. > > The v4 patch set addresses review comments on the v3 patch set by > Peter Zijlstra. > > These patches include a caching algorithm to improve the search for > the first event in a group by Kan Liang as > well as rebasing hit "optimize event_filter_match during sched_in" > from https://lkml.org/lkml/2019/8/7/771. > > The v2 patch set was modified by Peter Zijlstra in his perf/cgroup > branch: > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git > > These patches follow Peter's reorganization and his fixes to the > perf_cpu_context min_heap storage code. > > Ian Rogers (5): > lib: introduce generic min-heap > perf: Use min_heap in visit_groups_merge > perf: Add per perf_cpu_context min_heap storage > perf/cgroup: Grow per perf_cpu_context heap storage > perf/cgroup: Order events in RB tree by cgroup id > > Peter Zijlstra (1): > perf/cgroup: Reorder perf_cgroup_connect() > > include/linux/min_heap.h | 135 ++++++++++++++++++++ > include/linux/perf_event.h | 7 ++ > kernel/events/core.c | 251 +++++++++++++++++++++++++++++++------ > lib/Kconfig.debug | 10 ++ > lib/Makefile | 1 + > lib/test_min_heap.c | 194 ++++++++++++++++++++++++++++ > 6 files changed, 563 insertions(+), 35 deletions(-) > create mode 100644 include/linux/min_heap.h > create mode 100644 lib/test_min_heap.c > > -- > 2.25.0.265.gbab2e86ba0-goog >