All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Shakeel Butt" <shakeelb@google.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Muchun Song" <songmuchun@bytedance.com>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Eric Dumazet" <edumazet@google.com>,
	"Soheil Hassas Yeganeh" <soheil@google.com>,
	"Feng Tang" <feng.tang@intel.com>,
	"Oliver Sang" <oliver.sang@intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	lkp@lists.01.org, cgroups@vger.kernel.org, linux-mm@kvack.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] memcg: increase MEMCG_CHARGE_BATCH to 64
Date: Tue, 23 Aug 2022 06:49:39 +0200	[thread overview]
Message-ID: <YwRcY6oSnlYOD9n5@dhcp22.suse.cz> (raw)
In-Reply-To: <YwQ54pvNwy0/5u3C@P9FQF9L96D>

On Mon 22-08-22 19:22:26, Roman Gushchin wrote:
> On Mon, Aug 22, 2022 at 09:34:59PM +0200, Michal Hocko wrote:
> > On Mon 22-08-22 11:37:30, Roman Gushchin wrote:
> > [...]
> > > I wonder only if we want to make it configurable (Idk a sysctl or maybe
> > > a config option) and close the topic.
> > 
> > I do not think this is a good idea. We have other examples where we have
> > outsourced internal tunning to the userspace and it has mostly proven
> > impractical and long term more problematic than useful (e.g.
> > lowmem_reserve_ratio, percpu_pagelist_high_fraction, swappiness just to
> > name some that come to my mind). I have seen more often these to be used
> > incorrectly than useful.
> 
> A agree, not a strong opinion here. But I wonder if somebody will
> complain on Shakeel's change because of the reduced accuracy.
> I know some users are using memory cgroups to track the size of various
> workloads (including relatively small) and 32->64 pages per cpu change
> can be noticeable for them. But we can wait for an actual bug report :)

Yes, that would be my approach. I have seen reports like that already
but that was mostly because of heavy caching on the SLUB side on older
kernels. So there surely are workloads with small limits configured
(e.g. 20MB). On the other hand those users were receptive to adapt their
limits as they were kinda arbitrary anyway.
 
> > In this case, I guess we should consider either moving to per memcg
> > charge batching and see whether the pcp overhead x memcg_count is worth
> > that or some automagic tuning of the batch size depending on how
> > effectively the batch is used. Certainly a lot of room for
> > experimenting.
> 
> I'm not a big believer into the automagic tuning here because it's a fundamental
> trade-off of accuracy vs performance and various users might make a different
> choice depending on their needs, not on the cpu count or something else.

Yes, this not an easy thing to get right. I was mostly thinking some
auto scaling based on the limit size or growing the stock if cache hits
are common and decrease when stocks get flushed often because multiple
memcgs compete over the same pcp stock. But to me it seems like a per
memcg approach might lead better results without too many heuristics
(albeit more memory hungry).

> Per-memcg batching sounds interesting though. For example, we can likely
> batch updates on leaf cgroups and have a single atomic update instead of
> multiple most of the times. Or do you mean something different?

No, that was exactly my thinking as well.

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: lkp@lists.01.org
Subject: Re: [PATCH 3/3] memcg: increase MEMCG_CHARGE_BATCH to 64
Date: Tue, 23 Aug 2022 06:49:39 +0200	[thread overview]
Message-ID: <YwRcY6oSnlYOD9n5@dhcp22.suse.cz> (raw)
In-Reply-To: <YwQ54pvNwy0/5u3C@P9FQF9L96D>

[-- Attachment #1: Type: text/plain, Size: 2643 bytes --]

On Mon 22-08-22 19:22:26, Roman Gushchin wrote:
> On Mon, Aug 22, 2022 at 09:34:59PM +0200, Michal Hocko wrote:
> > On Mon 22-08-22 11:37:30, Roman Gushchin wrote:
> > [...]
> > > I wonder only if we want to make it configurable (Idk a sysctl or maybe
> > > a config option) and close the topic.
> > 
> > I do not think this is a good idea. We have other examples where we have
> > outsourced internal tunning to the userspace and it has mostly proven
> > impractical and long term more problematic than useful (e.g.
> > lowmem_reserve_ratio, percpu_pagelist_high_fraction, swappiness just to
> > name some that come to my mind). I have seen more often these to be used
> > incorrectly than useful.
> 
> A agree, not a strong opinion here. But I wonder if somebody will
> complain on Shakeel's change because of the reduced accuracy.
> I know some users are using memory cgroups to track the size of various
> workloads (including relatively small) and 32->64 pages per cpu change
> can be noticeable for them. But we can wait for an actual bug report :)

Yes, that would be my approach. I have seen reports like that already
but that was mostly because of heavy caching on the SLUB side on older
kernels. So there surely are workloads with small limits configured
(e.g. 20MB). On the other hand those users were receptive to adapt their
limits as they were kinda arbitrary anyway.
 
> > In this case, I guess we should consider either moving to per memcg
> > charge batching and see whether the pcp overhead x memcg_count is worth
> > that or some automagic tuning of the batch size depending on how
> > effectively the batch is used. Certainly a lot of room for
> > experimenting.
> 
> I'm not a big believer into the automagic tuning here because it's a fundamental
> trade-off of accuracy vs performance and various users might make a different
> choice depending on their needs, not on the cpu count or something else.

Yes, this not an easy thing to get right. I was mostly thinking some
auto scaling based on the limit size or growing the stock if cache hits
are common and decrease when stocks get flushed often because multiple
memcgs compete over the same pcp stock. But to me it seems like a per
memcg approach might lead better results without too many heuristics
(albeit more memory hungry).

> Per-memcg batching sounds interesting though. For example, we can likely
> batch updates on leaf cgroups and have a single atomic update instead of
> multiple most of the times. Or do you mean something different?

No, that was exactly my thinking as well.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2022-08-23  4:49 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-22  0:17 [PATCH 0/3] memcg: optimizatize charge codepath Shakeel Butt
2022-08-22  0:17 ` Shakeel Butt
2022-08-22  0:17 ` Shakeel Butt
2022-08-22  0:17 ` [PATCH 1/3] mm: page_counter: remove unneeded atomic ops for low/min Shakeel Butt
2022-08-22  0:17   ` Shakeel Butt
2022-08-22  0:17   ` Shakeel Butt
2022-08-22  0:20   ` Soheil Hassas Yeganeh
2022-08-22  0:20     ` Soheil Hassas Yeganeh
2022-08-22  0:20     ` Soheil Hassas Yeganeh
2022-08-22  2:39   ` Feng Tang
2022-08-22  2:39     ` Feng Tang
2022-08-22  2:39     ` Feng Tang
2022-08-22  9:55   ` Michal Hocko
2022-08-22  9:55     ` Michal Hocko
2022-08-22 10:18     ` Michal Hocko
2022-08-22 10:18       ` Michal Hocko
2022-08-22 10:18       ` Michal Hocko
2022-08-22 14:55       ` Shakeel Butt
2022-08-22 14:55         ` Shakeel Butt
2022-08-22 15:20         ` Michal Hocko
2022-08-22 15:20           ` Michal Hocko
2022-08-22 16:06           ` Shakeel Butt
2022-08-22 16:06             ` Shakeel Butt
2022-08-22 16:06             ` Shakeel Butt
2022-08-23  9:42           ` Michal Hocko
2022-08-23  9:42             ` Michal Hocko
2022-08-23  9:42             ` Michal Hocko
2022-08-22 18:23   ` Roman Gushchin
2022-08-22 18:23     ` Roman Gushchin
2022-08-22 18:23     ` Roman Gushchin
2022-08-22  0:17 ` [PATCH 2/3] mm: page_counter: rearrange struct page_counter fields Shakeel Butt
2022-08-22  0:17   ` Shakeel Butt
2022-08-22  0:24   ` Soheil Hassas Yeganeh
2022-08-22  0:24     ` Soheil Hassas Yeganeh
2022-08-22  0:24     ` Soheil Hassas Yeganeh
2022-08-22  4:55     ` Shakeel Butt
2022-08-22  4:55       ` Shakeel Butt
2022-08-22 13:06       ` Soheil Hassas Yeganeh
2022-08-22 13:06         ` Soheil Hassas Yeganeh
2022-08-22  2:10   ` Feng Tang
2022-08-22  2:10     ` Feng Tang
2022-08-22  2:10     ` Feng Tang
2022-08-22  4:59     ` Shakeel Butt
2022-08-22  4:59       ` Shakeel Butt
2022-08-22  4:59       ` Shakeel Butt
2022-08-22 10:23   ` Michal Hocko
2022-08-22 10:23     ` Michal Hocko
2022-08-22 10:23     ` Michal Hocko
2022-08-22 15:06     ` Shakeel Butt
2022-08-22 15:06       ` Shakeel Butt
2022-08-22 15:15       ` Michal Hocko
2022-08-22 15:15         ` Michal Hocko
2022-08-22 15:15         ` Michal Hocko
2022-08-22 16:04         ` Shakeel Butt
2022-08-22 16:04           ` Shakeel Butt
2022-08-22 18:27           ` Roman Gushchin
2022-08-22 18:27             ` Roman Gushchin
2022-08-22 18:27             ` Roman Gushchin
2022-08-22  0:17 ` [PATCH 3/3] memcg: increase MEMCG_CHARGE_BATCH to 64 Shakeel Butt
2022-08-22  0:17   ` Shakeel Butt
2022-08-22  0:24   ` Soheil Hassas Yeganeh
2022-08-22  0:24     ` Soheil Hassas Yeganeh
2022-08-22  0:24     ` Soheil Hassas Yeganeh
2022-08-22  2:30   ` Feng Tang
2022-08-22  2:30     ` Feng Tang
2022-08-22 10:47   ` Michal Hocko
2022-08-22 10:47     ` Michal Hocko
2022-08-22 10:47     ` Michal Hocko
2022-08-22 15:09     ` Shakeel Butt
2022-08-22 15:09       ` Shakeel Butt
2022-08-22 15:22       ` Michal Hocko
2022-08-22 15:22         ` Michal Hocko
2022-08-22 16:07         ` Shakeel Butt
2022-08-22 16:07           ` Shakeel Butt
2022-08-22 16:07           ` Shakeel Butt
2022-08-22 18:37   ` Roman Gushchin
2022-08-22 18:37     ` Roman Gushchin
2022-08-22 18:37     ` Roman Gushchin
2022-08-22 19:34     ` Michal Hocko
2022-08-22 19:34       ` Michal Hocko
2022-08-22 19:34       ` Michal Hocko
2022-08-23  2:22       ` Roman Gushchin
2022-08-23  2:22         ` Roman Gushchin
2022-08-23  2:22         ` Roman Gushchin
2022-08-23  4:49         ` Michal Hocko [this message]
2022-08-23  4:49           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YwRcY6oSnlYOD9n5@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=feng.tang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@lists.01.org \
    --cc=mkoutny@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=soheil@google.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.