All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	 Cgroups <cgroups@vger.kernel.org>, Linux MM <linux-mm@kvack.org>
Subject: Re: memcg: performance degradation since v5.9
Date: Thu, 8 Apr 2021 14:08:13 -0700	[thread overview]
Message-ID: <CALvZod58NBQLvk2m7Mb=_0oGCApcNeisxVuE1b+qh1OKDSy0Ag@mail.gmail.com> (raw)
In-Reply-To: <YG9tW1h9VSJcir+Y@carbon.dhcp.thefacebook.com>

On Thu, Apr 8, 2021 at 1:54 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, Apr 08, 2021 at 03:39:48PM -0400, Masayoshi Mizuma wrote:
> > Hello,
> >
> > I detected a performance degradation issue for a benchmark of PostgresSQL [1],
> > and the issue seems to be related to object level memory cgroup [2].
> > I would appreciate it if you could give me some ideas to solve it.
> >
> > The benchmark shows the transaction per second (tps) and the tps for v5.9
> > and later kernel get about 10%-20% smaller than v5.8.
> >
> > The benchmark does sendto() and recvfrom() system calls repeatedly,
> > and the duration of the system calls get longer than v5.8.
> > The result of perf trace of the benchmark is as follows:
> >
> >   - v5.8
> >
> >    syscall            calls  errors  total       min       avg       max       stddev
> >                                      (msec)    (msec)    (msec)    (msec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ---------     ------
> >    sendto            699574      0  2595.220     0.001     0.004     0.462      0.03%
> >    recvfrom         1391089 694427  2163.458     0.001     0.002     0.442      0.04%
> >
> >   - v5.9
> >
> >    syscall            calls  errors  total       min       avg       max       stddev
> >                                      (msec)    (msec)    (msec)    (msec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ---------     ------
> >    sendto            699187      0  3316.948     0.002     0.005     0.044      0.02%
> >    recvfrom         1397042 698828  2464.995     0.001     0.002     0.025      0.04%
> >
> >   - v5.12-rc6
> >
> >    syscall            calls  errors  total       min       avg       max       stddev
> >                                      (msec)    (msec)    (msec)    (msec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ---------     ------
> >    sendto            699445      0  3015.642     0.002     0.004     0.027      0.02%
> >    recvfrom         1395929 697909  2338.783     0.001     0.002     0.024      0.03%
> >

Can you please explain how to read these numbers? Or at least put a %
regression.

> > I bisected the kernel patches, then I found the patch series, which add
> > object level memory cgroup support, causes the degradation.
> >
> > I confirmed the delay with a kernel module which just runs
> > kmem_cache_alloc/kmem_cache_free as follows. The duration is about
> > 2-3 times than v5.8.
> >
> >    dummy_cache = KMEM_CACHE(dummy, SLAB_ACCOUNT);
> >    for (i = 0; i < 100000000; i++)
> >    {
> >            p = kmem_cache_alloc(dummy_cache, GFP_KERNEL);
> >            kmem_cache_free(dummy_cache, p);
> >    }
> >
> > It seems that the object accounting work in slab_pre_alloc_hook() and
> > slab_post_alloc_hook() is the overhead.
> >
> > cgroup.nokmem kernel parameter doesn't work for my case because it disables
> > all of kmem accounting.

The patch is somewhat doing that i.e. disabling memcg accounting for slab.

> >
> > The degradation is gone when I apply a patch (at the bottom of this email)
> > that adds a kernel parameter that expects to fallback to the page level
> > accounting, however, I'm not sure it's a good approach though...
>
> Hello Masayoshi!
>
> Thank you for the report!
>
> It's not a secret that per-object accounting is more expensive than a per-page
> allocation. I had micro-benchmark results similar to yours: accounted
> allocations are about 2x slower. But in general it tends to not affect real
> workloads, because the cost of allocations is still low and tends to be only
> a small fraction of the whole cpu load. And because it brings up significant
> benefits: 40%+ slab memory savings, less fragmentation, more stable workingset,
> etc, real workloads tend to perform on pair or better.
>
> So my first question is if you see the regression in any real workload
> or it's only about the benchmark?
>
> Second, I'll try to take a look into the benchmark to figure out why it's
> affected so badly, but I'm not sure we can easily fix it. If you have any
> ideas what kind of objects the benchmark is allocating in big numbers,
> please let me know.
>

One idea would be to increase MEMCG_CHARGE_BATCH.


WARNING: multiple messages have this Message-ID (diff)
From: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
Cc: Masayoshi Mizuma
	<msys.mizuma-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
Subject: Re: memcg: performance degradation since v5.9
Date: Thu, 8 Apr 2021 14:08:13 -0700	[thread overview]
Message-ID: <CALvZod58NBQLvk2m7Mb=_0oGCApcNeisxVuE1b+qh1OKDSy0Ag@mail.gmail.com> (raw)
In-Reply-To: <YG9tW1h9VSJcir+Y-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>

On Thu, Apr 8, 2021 at 1:54 PM Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> wrote:
>
> On Thu, Apr 08, 2021 at 03:39:48PM -0400, Masayoshi Mizuma wrote:
> > Hello,
> >
> > I detected a performance degradation issue for a benchmark of PostgresSQL [1],
> > and the issue seems to be related to object level memory cgroup [2].
> > I would appreciate it if you could give me some ideas to solve it.
> >
> > The benchmark shows the transaction per second (tps) and the tps for v5.9
> > and later kernel get about 10%-20% smaller than v5.8.
> >
> > The benchmark does sendto() and recvfrom() system calls repeatedly,
> > and the duration of the system calls get longer than v5.8.
> > The result of perf trace of the benchmark is as follows:
> >
> >   - v5.8
> >
> >    syscall            calls  errors  total       min       avg       max       stddev
> >                                      (msec)    (msec)    (msec)    (msec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ---------     ------
> >    sendto            699574      0  2595.220     0.001     0.004     0.462      0.03%
> >    recvfrom         1391089 694427  2163.458     0.001     0.002     0.442      0.04%
> >
> >   - v5.9
> >
> >    syscall            calls  errors  total       min       avg       max       stddev
> >                                      (msec)    (msec)    (msec)    (msec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ---------     ------
> >    sendto            699187      0  3316.948     0.002     0.005     0.044      0.02%
> >    recvfrom         1397042 698828  2464.995     0.001     0.002     0.025      0.04%
> >
> >   - v5.12-rc6
> >
> >    syscall            calls  errors  total       min       avg       max       stddev
> >                                      (msec)    (msec)    (msec)    (msec)        (%)
> >    --------------- --------  ------ -------- --------- --------- ---------     ------
> >    sendto            699445      0  3015.642     0.002     0.004     0.027      0.02%
> >    recvfrom         1395929 697909  2338.783     0.001     0.002     0.024      0.03%
> >

Can you please explain how to read these numbers? Or at least put a %
regression.

> > I bisected the kernel patches, then I found the patch series, which add
> > object level memory cgroup support, causes the degradation.
> >
> > I confirmed the delay with a kernel module which just runs
> > kmem_cache_alloc/kmem_cache_free as follows. The duration is about
> > 2-3 times than v5.8.
> >
> >    dummy_cache = KMEM_CACHE(dummy, SLAB_ACCOUNT);
> >    for (i = 0; i < 100000000; i++)
> >    {
> >            p = kmem_cache_alloc(dummy_cache, GFP_KERNEL);
> >            kmem_cache_free(dummy_cache, p);
> >    }
> >
> > It seems that the object accounting work in slab_pre_alloc_hook() and
> > slab_post_alloc_hook() is the overhead.
> >
> > cgroup.nokmem kernel parameter doesn't work for my case because it disables
> > all of kmem accounting.

The patch is somewhat doing that i.e. disabling memcg accounting for slab.

> >
> > The degradation is gone when I apply a patch (at the bottom of this email)
> > that adds a kernel parameter that expects to fallback to the page level
> > accounting, however, I'm not sure it's a good approach though...
>
> Hello Masayoshi!
>
> Thank you for the report!
>
> It's not a secret that per-object accounting is more expensive than a per-page
> allocation. I had micro-benchmark results similar to yours: accounted
> allocations are about 2x slower. But in general it tends to not affect real
> workloads, because the cost of allocations is still low and tends to be only
> a small fraction of the whole cpu load. And because it brings up significant
> benefits: 40%+ slab memory savings, less fragmentation, more stable workingset,
> etc, real workloads tend to perform on pair or better.
>
> So my first question is if you see the regression in any real workload
> or it's only about the benchmark?
>
> Second, I'll try to take a look into the benchmark to figure out why it's
> affected so badly, but I'm not sure we can easily fix it. If you have any
> ideas what kind of objects the benchmark is allocating in big numbers,
> please let me know.
>

One idea would be to increase MEMCG_CHARGE_BATCH.

  reply	other threads:[~2021-04-08 21:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 19:39 memcg: performance degradation since v5.9 Masayoshi Mizuma
2021-04-08 19:39 ` Masayoshi Mizuma
2021-04-08 20:53 ` Roman Gushchin
2021-04-08 20:53   ` Roman Gushchin
2021-04-08 21:08   ` Shakeel Butt [this message]
2021-04-08 21:08     ` Shakeel Butt
2021-04-09 16:35     ` Masayoshi Mizuma
2021-04-09 16:35       ` Masayoshi Mizuma
2021-04-09 16:50       ` Shakeel Butt
2021-04-09 16:50         ` Shakeel Butt
2021-04-12 15:22         ` Masayoshi Mizuma
2021-04-12 15:22           ` Masayoshi Mizuma
2021-04-09 16:05   ` Masayoshi Mizuma
2021-04-09 16:05     ` Masayoshi Mizuma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod58NBQLvk2m7Mb=_0oGCApcNeisxVuE1b+qh1OKDSy0Ag@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=msys.mizuma@gmail.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.