From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=QLeX=JG=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 37F14C433B4
	for <linux-mm@archiver.kernel.org>; Fri,  9 Apr 2021 16:35:44 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id C2BB46054E
	for <linux-mm@archiver.kernel.org>; Fri,  9 Apr 2021 16:35:43 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2BB46054E
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 54E446B006C; Fri,  9 Apr 2021 12:35:43 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4FCDC6B006E; Fri,  9 Apr 2021 12:35:43 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 376186B0070; Fri,  9 Apr 2021 12:35:43 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 1AA796B006C
	for <linux-mm@kvack.org>; Fri,  9 Apr 2021 12:35:43 -0400 (EDT)
Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id C865998B2
	for <linux-mm@kvack.org>; Fri,  9 Apr 2021 16:35:42 +0000 (UTC)
X-FDA: 78013379724.17.9A050EA
Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175])
	by imf29.hostedemail.com (Postfix) with ESMTP id B8DA8EB
	for <linux-mm@kvack.org>; Fri,  9 Apr 2021 16:35:40 +0000 (UTC)
Received: by mail-qk1-f175.google.com with SMTP id y5so6397981qkl.9
        for <linux-mm@kvack.org>; Fri, 09 Apr 2021 09:35:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to;
        bh=SwI9dKNHih0HBykn7XRhBQf79N/hp0URWoW1yNfnjE4=;
        b=DE/mc8yNoAgpMpKewHbtPdEY8ARocDvubcZmUOwitmZr2vnXvYJNoi9x3vNJ0pTJWe
         0oGx3cblI3UJjauam8ZAXKSTpIObJZyHY89b61/7T03rsZzzwIYTffYwaOmp9/lPmIZ9
         VZT1tnOyi4MPbfVGPyBTEkqY2MaQ++pvmgfWaPPwtqhdAtZS8ChfTP8evEz+UaQ9zO/d
         lg4FDs5KrB11z3lPP8kwqVhnqAIbqC3YITLJ+OseurBcK4K6y8LQrYNy+EOnZlZc8igQ
         o+YdG2wsh+74lvNKESPo0KkSLQ2dLWYEN3N66USB9cxjniHHlZ3njS1rgZC+FziT3ps1
         1DMg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to;
        bh=SwI9dKNHih0HBykn7XRhBQf79N/hp0URWoW1yNfnjE4=;
        b=WAJUiSZP7UWu1P4oUuA1qhnpglJK1dhibmbSQhjXsMbuWCZ0cAqzaLoHSnRjuxQRPa
         t1Cma5fH5nnPLg6Wh4bQpnbPWhzbXnBNFQH5kCkoW96YL9oxl+4b4zxs1fSXEfk95zG6
         fIuiRYmpOJF6XJwP64dv9P8oZ7MGL6p+LqKrnNNY+BasoYmfBH0w21ckoH6yfVAhcm8d
         aiYeQ8HpLzBuRVaUQifg9j6V6odCjZ2XSXJQFD0IU6trOTbPOxSnsQRaL4IBAEa6krDt
         iCA2GxH8FY7tEisSSi+lMxZHSHz5chg75ZttjaHvsMU/ao2wEgVSShVRH+TJoEIlM9vO
         TuQQ==
X-Gm-Message-State: AOAM532jv9yHYDExdlNCcwi5hiTDZsE/qa5AscMXiFck1y/Be3yYcljv
	+sJRCYcTjljPDRNQZfsYnw==
X-Google-Smtp-Source: ABdhPJwhzp9QT0Cfo7UnR0Dw5K11HqByuVcvmUKzXHT78fAUlpeit0vsXdbi8IKRzVt1EYZdLDsaRQ==
X-Received: by 2002:ae9:eb57:: with SMTP id b84mr14445736qkg.271.1617986141716;
        Fri, 09 Apr 2021 09:35:41 -0700 (PDT)
Received: from gabell (209-6-122-159.s2973.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [209.6.122.159])
        by smtp.gmail.com with ESMTPSA id v128sm2095356qkc.127.2021.04.09.09.35.41
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 09 Apr 2021 09:35:41 -0700 (PDT)
Date: Fri, 9 Apr 2021 12:35:39 -0400
From: Masayoshi Mizuma <msys.mizuma@gmail.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <guro@fb.com>, Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Cgroups <cgroups@vger.kernel.org>, Linux MM <linux-mm@kvack.org>
Subject: Re: memcg: performance degradation since v5.9
Message-ID: <20210409163539.5374pde3u6gkbg4a@gabell>
References: <20210408193948.vfktg3azh2wrt56t@gabell>
 <YG9tW1h9VSJcir+Y@carbon.dhcp.thefacebook.com>
 <CALvZod58NBQLvk2m7Mb=_0oGCApcNeisxVuE1b+qh1OKDSy0Ag@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALvZod58NBQLvk2m7Mb=_0oGCApcNeisxVuE1b+qh1OKDSy0Ag@mail.gmail.com>
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: B8DA8EB
X-Stat-Signature: mcjaqygdkmb9d4yky31etfcd7o3997e5
Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from="<msys.mizuma@gmail.com>"; helo=mail-qk1-f175.google.com; client-ip=209.85.222.175
X-HE-DKIM-Result: pass/pass
X-HE-Tag: 1617986140-71451
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Apr 08, 2021 at 02:08:13PM -0700, Shakeel Butt wrote:
> On Thu, Apr 8, 2021 at 1:54 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Thu, Apr 08, 2021 at 03:39:48PM -0400, Masayoshi Mizuma wrote:
> > > Hello,
> > >
> > > I detected a performance degradation issue for a benchmark of PostgresSQL [1],
> > > and the issue seems to be related to object level memory cgroup [2].
> > > I would appreciate it if you could give me some ideas to solve it.
> > >
> > > The benchmark shows the transaction per second (tps) and the tps for v5.9
> > > and later kernel get about 10%-20% smaller than v5.8.
> > >
> > > The benchmark does sendto() and recvfrom() system calls repeatedly,
> > > and the duration of the system calls get longer than v5.8.
> > > The result of perf trace of the benchmark is as follows:
> > >
> > >   - v5.8
> > >
> > >    syscall            calls  errors  total       min       avg       max       stddev
> > >                                      (msec)    (msec)    (msec)    (msec)        (%)
> > >    --------------- --------  ------ -------- --------- --------- ---------     ------
> > >    sendto            699574      0  2595.220     0.001     0.004     0.462      0.03%
> > >    recvfrom         1391089 694427  2163.458     0.001     0.002     0.442      0.04%
> > >
> > >   - v5.9
> > >
> > >    syscall            calls  errors  total       min       avg       max       stddev
> > >                                      (msec)    (msec)    (msec)    (msec)        (%)
> > >    --------------- --------  ------ -------- --------- --------- ---------     ------
> > >    sendto            699187      0  3316.948     0.002     0.005     0.044      0.02%
> > >    recvfrom         1397042 698828  2464.995     0.001     0.002     0.025      0.04%
> > >
> > >   - v5.12-rc6
> > >
> > >    syscall            calls  errors  total       min       avg       max       stddev
> > >                                      (msec)    (msec)    (msec)    (msec)        (%)
> > >    --------------- --------  ------ -------- --------- --------- ---------     ------
> > >    sendto            699445      0  3015.642     0.002     0.004     0.027      0.02%
> > >    recvfrom         1395929 697909  2338.783     0.001     0.002     0.024      0.03%
> > >
> 
> Can you please explain how to read these numbers? Or at least put a %
> regression.

Let me summarize them here.
The total duration ('total' column above) of each system call is as follows
if v5.8 is assumed as 100%:

- sendto:
  - v5.8         100%
  - v5.9         128%
  - v5.12-rc6    116%

- revfrom:
  - v5.8         100%
  - v5.9         114%
  - v5.12-rc6    108%

> 
> > > I bisected the kernel patches, then I found the patch series, which add
> > > object level memory cgroup support, causes the degradation.
> > >
> > > I confirmed the delay with a kernel module which just runs
> > > kmem_cache_alloc/kmem_cache_free as follows. The duration is about
> > > 2-3 times than v5.8.
> > >
> > >    dummy_cache = KMEM_CACHE(dummy, SLAB_ACCOUNT);
> > >    for (i = 0; i < 100000000; i++)
> > >    {
> > >            p = kmem_cache_alloc(dummy_cache, GFP_KERNEL);
> > >            kmem_cache_free(dummy_cache, p);
> > >    }
> > >
> > > It seems that the object accounting work in slab_pre_alloc_hook() and
> > > slab_post_alloc_hook() is the overhead.
> > >
> > > cgroup.nokmem kernel parameter doesn't work for my case because it disables
> > > all of kmem accounting.
> 
> The patch is somewhat doing that i.e. disabling memcg accounting for slab.
> 
> > >
> > > The degradation is gone when I apply a patch (at the bottom of this email)
> > > that adds a kernel parameter that expects to fallback to the page level
> > > accounting, however, I'm not sure it's a good approach though...
> >
> > Hello Masayoshi!
> >
> > Thank you for the report!
> >
> > It's not a secret that per-object accounting is more expensive than a per-page
> > allocation. I had micro-benchmark results similar to yours: accounted
> > allocations are about 2x slower. But in general it tends to not affect real
> > workloads, because the cost of allocations is still low and tends to be only
> > a small fraction of the whole cpu load. And because it brings up significant
> > benefits: 40%+ slab memory savings, less fragmentation, more stable workingset,
> > etc, real workloads tend to perform on pair or better.
> >
> > So my first question is if you see the regression in any real workload
> > or it's only about the benchmark?
> >
> > Second, I'll try to take a look into the benchmark to figure out why it's
> > affected so badly, but I'm not sure we can easily fix it. If you have any
> > ideas what kind of objects the benchmark is allocating in big numbers,
> > please let me know.
> >
> 
> One idea would be to increase MEMCG_CHARGE_BATCH.

Thank you for the idea! It's hard-corded as 32 now, so I'm wondering it may be
a good idea to make MEMCG_CHARGE_BATCH tunable from a kernel parameter or something.

Thanks!
Masa