From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E304C433F5 for ; Mon, 13 Sep 2021 19:40:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B882561103 for ; Mon, 13 Sep 2021 19:40:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B882561103 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id F27D46B006C; Mon, 13 Sep 2021 15:40:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EAEEA6B0071; Mon, 13 Sep 2021 15:40:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D769B6B0072; Mon, 13 Sep 2021 15:40:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id C46A66B006C for ; Mon, 13 Sep 2021 15:40:19 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 852908249980 for ; Mon, 13 Sep 2021 19:40:19 +0000 (UTC) X-FDA: 78583566558.34.609F93E Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) by imf12.hostedemail.com (Postfix) with ESMTP id 40D8610000A9 for ; Mon, 13 Sep 2021 19:40:19 +0000 (UTC) Received: by mail-lf1-f46.google.com with SMTP id c8so23444717lfi.3 for ; Mon, 13 Sep 2021 12:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WYRVHO8cNvTUfI7h2IWXEITGn+qlCQp4sdgo2JmuNh0=; b=fWT5fY+4rdaJMWHxT8KUJzHZfGaI3EdcgjUV8iuAYLytKAio/YEKsMoF9aKRmop8IH OVDKZQDT8y6WBmWNKA4uWrWO4Osbe581NBV8Ug9XDXrGbq4A6BWN6w9yHVqCBDxCHTx1 0gnt0rCZ1E1DVPuzwGkvCuAz1Kxnnmq/9Sw4kmOJ36N5xA1MGl0NzhZ5EK9e3/qGDbUB zsJvqU8y3qXngpIhYbk8XJ18rQTxxYVv88jtrILK20SVyecmNz+CvOO/n0Ncfe5kwVtj O0MPQpLaqTlOJz+Lo5mq2fY5Epyd4O3TidJ4wUOxzbsiGIJgyJ0FZx1OGn6ALcRmZKyC 4hBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WYRVHO8cNvTUfI7h2IWXEITGn+qlCQp4sdgo2JmuNh0=; b=DF5vhPrPbujNPxkD3s1JSJQ5Ul9HMC0DhxWNzLv+VVstOn8Jbdjp/rWy3+kU33SFP8 /cgHRNAce1HvUM1mEaqQwb0rMUs9GQr1SILJHOKkfnYBaCluF+mBTX/rccKu+k35ftEV qAiEjJo6Wunwg6+0n7al0So0nFdefRsU107LuCmi00hRvjkt22ZNAfoGiST22dKM3vFD LxaeU22KgQgKMoz4bbblOUfRwLXBX3G5jud8uJiYJOKrdAnv4NInQS1SDD89S67pIRa/ PqRrYoWvdDMU5bep8zyU808CdxilMU8CVXn74rCYD+lablv+wiS7N4XAhI6GAsEEszsf WkeQ== X-Gm-Message-State: AOAM5337vI3x9guAWTFol78ADjiZe/HHvPrJ3vaaYz40xwt4rSsS1T9s 4gLldHs7141Qe561ZkdMU6AmQ+ZAOGPNubVHFTZAqA== X-Google-Smtp-Source: ABdhPJzQlFvVHpnMlYjVK6Yk/42YJiJ3vW/zUKuOLscn3iFKoKKaGn7IIPoDL3SercT69OhfN6AoE6z7v14q9+DrI78= X-Received: by 2002:a05:6512:36c3:: with SMTP id e3mr1477626lfs.8.1631562017448; Mon, 13 Sep 2021 12:40:17 -0700 (PDT) MIME-Version: 1.0 References: <20210902215504.dSSfDKJZu%akpm@linux-foundation.org> <20210905124439.GA15026@xsang-OptiPlex-9020> <20210907033000.GA88160@shbuild999.sh.intel.com> <20210912111756.4158-1-hdanton@sina.com> <20210912132914.GA56674@shbuild999.sh.intel.com> In-Reply-To: From: Shakeel Butt Date: Mon, 13 Sep 2021 12:40:06 -0700 Message-ID: Subject: Re: [memcg] 45208c9105: aim7.jobs-per-min -14.0% regression To: Feng Tang , Tejun Heo Cc: Hillf Danton , LKML , Xing Zhengjun , Linux MM Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: o7ggq6y8eqaoibk95mj4erggx6dtoebp Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=fWT5fY+4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of shakeelb@google.com designates 209.85.167.46 as permitted sender) smtp.mailfrom=shakeelb@google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 40D8610000A9 X-HE-Tag: 1631562019-365504 X-Bogosity: Ham, tests=bogofilter, spamicity=0.001123, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: +Tejun Heo [ threads start at https://lore.kernel.org/all/20210905124439.GA15026@xsang-OptiPlex-9020/T/#ma938a101f415ad784ac08612c7ef31f260a2b678] On Mon, Sep 13, 2021 at 9:41 AM Shakeel Butt wrote: > > On Sun, Sep 12, 2021 at 6:29 AM Feng Tang wrote: > > > > On Sun, Sep 12, 2021 at 07:17:56PM +0800, Hillf Danton wrote: > > [...] > > > > +// if (!(__this_cpu_inc_return(stats_flush_threshold) % MEMCG_CHARGE_BATCH)) > > > > + if (!(__this_cpu_inc_return(stats_flush_threshold) % 128)) > > > > queue_work(system_unbound_wq, &stats_flush_work); > > > > } > > > > > > Hi Feng, > > > > > > Would you please check if it helps fix the regression to avoid queuing a > > > queued work by adding and checking an atomic counter. > > > > Hi Hillf, > > > > I just tested your patch, and it didn't recover the regression, but > > just reduced it from -14% to around -13%, similar to the patch > > increasing the batch charge number. > > > > Thanks Hillf for taking a look and Feng for running the test. > > This shows that parallel calls to queue_work() is not the issue (there > is already a test and set at the start of queue_work()) but the actual > work done by queue_work() is costly for this code path. > > I wrote a simple anon page fault nohuge c program, profiled it and it > seems like queue_work() is significant enough. > > - 51.00% do_anonymous_page > + 16.68% alloc_pages_vma > 11.61% _raw_spin_lock > + 10.26% mem_cgroup_charge > - 5.25% lru_cache_add_inactive_or_unevictable > - 4.48% __pagevec_lru_add > - 3.71% __pagevec_lru_add_fn > - 1.74% __mod_lruvec_state > - 1.60% __mod_memcg_lruvec_state > - 1.35% queue_work_on > - __queue_work > - 0.93% wake_up_process > - try_to_wake_up > - 0.82% ttwu_queue > 0.61% ttwu_do_activate > - 2.97% page_add_new_anon_rmap > - 2.68% __mod_lruvec_page_state > - 2.48% __mod_memcg_lruvec_state > - 1.67% queue_work_on > - 1.53% __queue_work > - 1.25% wake_up_process > - try_to_wake_up > - 0.94% ttwu_queue > + 0.70% ttwu_do_activate > 0.61% cgroup_rstat_updated > 2.10% rcu_read_unlock_strict > 1.40% cgroup_throttle_swaprate > > However when I switch the batch size to 128, it goes away. > I did one more experiment with same workload but with system_wq instead system_unbound_wq and there is clear difference in profile: With system_unbound_wq: - 4.63% 0.33% mmap [kernel.kallsyms] [k] queue_work_on 4.29% queue_work_on - __queue_work - 3.45% wake_up_process - try_to_wake_up - 2.46% ttwu_queue - 1.66% ttwu_do_activate - 1.14% activate_task - 0.97% enqueue_task_fair enqueue_entity With system_wq: - 1.36% 0.06% mmap [kernel.kallsyms] [k] queue_work_on 1.30% queue_work_on - __queue_work - 1.03% wake_up_process - try_to_wake_up - 0.97% ttwu_queue 0.66% ttwu_do_activate Tejun, is this expected? i.e. queuing work on system_wq has a different performance impact than on system_unbound_wq?