All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Leonardo Brás" <leobras@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>,
	Roman Gushchin <roman.gushchin@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining
Date: Fri, 27 Jan 2023 02:40:28 -0300	[thread overview]
Message-ID: <9ec001ba093e21a5ac2cafa1c61810b035daf13d.camel@redhat.com> (raw)
In-Reply-To: <Y9LEQfX5dkEyBOkT@tpad>

On Thu, 2023-01-26 at 15:19 -0300, Marcelo Tosatti wrote:
> On Wed, Jan 25, 2023 at 03:14:48PM -0800, Roman Gushchin wrote:
> > On Wed, Jan 25, 2023 at 03:22:00PM -0300, Marcelo Tosatti wrote:
> > > On Wed, Jan 25, 2023 at 08:06:46AM -0300, Leonardo Brás wrote:
> > > > On Wed, 2023-01-25 at 09:33 +0100, Michal Hocko wrote:
> > > > > On Wed 25-01-23 04:34:57, Leonardo Bras wrote:
> > > > > > Disclaimer:
> > > > > > a - The cover letter got bigger than expected, so I had to split it in
> > > > > >     sections to better organize myself. I am not very confortable with it.
> > > > > > b - Performance numbers below did not include patch 5/5 (Remove flags
> > > > > >     from memcg_stock_pcp), which could further improve performance for
> > > > > >     drain_all_stock(), but I could only notice the optimization at the
> > > > > >     last minute.
> > > > > > 
> > > > > > 
> > > > > > 0 - Motivation:
> > > > > > On current codebase, when drain_all_stock() is ran, it will schedule a
> > > > > > drain_local_stock() for each cpu that has a percpu stock associated with a
> > > > > > descendant of a given root_memcg.
> > 
> > Do you know what caused those drain_all_stock() calls? I wonder if we should look
> > into why we have many of them and whether we really need them?
> > 
> > It's either some user's actions (e.g. reducing memory.max), either some memcg
> > is entering pre-oom conditions. In the latter case a lot of drain calls can be
> > scheduled without a good reason (assuming the cgroup contain multiple tasks running
> > on multiple cpus). Essentially each cpu will try to grab the remains of the memory quota
> > and move it locally. I wonder in such circumstances if we need to disable the pcp-caching
> > on per-cgroup basis.
> > 
> > Generally speaking, draining of pcpu stocks is useful only if an idle cpu is holding some
> > charges/memcg references (it might be not completely idle, but running some very special
> > workload which is not doing any kernel allocations or a process belonging to the root memcg).
> > In all other cases pcpu stock will be either drained naturally by an allocation from another
> > memcg or an allocation from the same memcg will "restore" it, making draining useless.
> > 
> > We also can into drain_all_pages() opportunistically, without waiting for the result.
> > On a busy system it's most likely useless, we might oom before scheduled works will be executed.
> > 
> > I admit I planned to do some work around and even started, but then never had enough time to
> > finish it.
> > 
> > Overall I'm somewhat resistant to an idea of making generic allocation & free paths slower
> > for an improvement of stock draining. It's not a strong objection, but IMO we should avoid
> > doing this without a really strong reason.
> 
> The expectation would be that cache locking should not cause slowdown of
> the allocation and free paths:
> 
> https://manualsbrain.com/en/manuals/1246877/?page=313
> 
> For the P6 and more recent processor families, if the area of memory being locked 
> during a LOCK operation is cached in the processor that is performing the LOCK oper-
> ation as write-back memory and is completely contained in a cache line, the 
> processor may not assert the LOCK# signal on the bus. Instead, it will modify the 
> memory location internally and allow it’s cache coherency mechanism to insure that 
> the operation is carried out atomically. This operation is called “cache locking.” The 
> cache coherency mechanism automatically prevents two or more processors that ...
> 
> 

Just to keep the info easily available: the protected structure (struct
memcg_stock_pcp) fits in 48 Bytes, which is less than the usual 64B cacheline. 

struct memcg_stock_pcp {
	spinlock_t                 stock_lock;           /*     0     4 */
	unsigned int               nr_pages;             /*     4     4 */
	struct mem_cgroup *        cached;               /*     8     8 */
	struct obj_cgroup *        cached_objcg;         /*    16     8 */
	struct pglist_data *       cached_pgdat;         /*    24     8 */
	unsigned int               nr_bytes;             /*    32     4 */
	int                        nr_slab_reclaimable_b; /*    36     4 */
	int                        nr_slab_unreclaimable_b; /*    40     4 */

	/* size: 48, cachelines: 1, members: 8 */
	/* padding: 4 */
	/* last cacheline: 48 bytes */
};

(It got smaller after patches 3/5, 4/5 and 5/5, which remove holes, work_struct
and flags respectively.)

On top of that, patch 1/5 makes sure the percpu allocation is aligned to
cacheline size.


WARNING: multiple messages have this Message-ID (diff)
From: "Leonardo Brás" <leobras-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Marcelo Tosatti
	<mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Roman Gushchin
	<roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Cc: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Muchun Song <muchun.song-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining
Date: Fri, 27 Jan 2023 02:40:28 -0300	[thread overview]
Message-ID: <9ec001ba093e21a5ac2cafa1c61810b035daf13d.camel@redhat.com> (raw)
In-Reply-To: <Y9LEQfX5dkEyBOkT@tpad>

On Thu, 2023-01-26 at 15:19 -0300, Marcelo Tosatti wrote:
> On Wed, Jan 25, 2023 at 03:14:48PM -0800, Roman Gushchin wrote:
> > On Wed, Jan 25, 2023 at 03:22:00PM -0300, Marcelo Tosatti wrote:
> > > On Wed, Jan 25, 2023 at 08:06:46AM -0300, Leonardo Brás wrote:
> > > > On Wed, 2023-01-25 at 09:33 +0100, Michal Hocko wrote:
> > > > > On Wed 25-01-23 04:34:57, Leonardo Bras wrote:
> > > > > > Disclaimer:
> > > > > > a - The cover letter got bigger than expected, so I had to split it in
> > > > > >     sections to better organize myself. I am not very confortable with it.
> > > > > > b - Performance numbers below did not include patch 5/5 (Remove flags
> > > > > >     from memcg_stock_pcp), which could further improve performance for
> > > > > >     drain_all_stock(), but I could only notice the optimization at the
> > > > > >     last minute.
> > > > > > 
> > > > > > 
> > > > > > 0 - Motivation:
> > > > > > On current codebase, when drain_all_stock() is ran, it will schedule a
> > > > > > drain_local_stock() for each cpu that has a percpu stock associated with a
> > > > > > descendant of a given root_memcg.
> > 
> > Do you know what caused those drain_all_stock() calls? I wonder if we should look
> > into why we have many of them and whether we really need them?
> > 
> > It's either some user's actions (e.g. reducing memory.max), either some memcg
> > is entering pre-oom conditions. In the latter case a lot of drain calls can be
> > scheduled without a good reason (assuming the cgroup contain multiple tasks running
> > on multiple cpus). Essentially each cpu will try to grab the remains of the memory quota
> > and move it locally. I wonder in such circumstances if we need to disable the pcp-caching
> > on per-cgroup basis.
> > 
> > Generally speaking, draining of pcpu stocks is useful only if an idle cpu is holding some
> > charges/memcg references (it might be not completely idle, but running some very special
> > workload which is not doing any kernel allocations or a process belonging to the root memcg).
> > In all other cases pcpu stock will be either drained naturally by an allocation from another
> > memcg or an allocation from the same memcg will "restore" it, making draining useless.
> > 
> > We also can into drain_all_pages() opportunistically, without waiting for the result.
> > On a busy system it's most likely useless, we might oom before scheduled works will be executed.
> > 
> > I admit I planned to do some work around and even started, but then never had enough time to
> > finish it.
> > 
> > Overall I'm somewhat resistant to an idea of making generic allocation & free paths slower
> > for an improvement of stock draining. It's not a strong objection, but IMO we should avoid
> > doing this without a really strong reason.
> 
> The expectation would be that cache locking should not cause slowdown of
> the allocation and free paths:
> 
> https://manualsbrain.com/en/manuals/1246877/?page=313
> 
> For the P6 and more recent processor families, if the area of memory being locked 
> during a LOCK operation is cached in the processor that is performing the LOCK oper-
> ation as write-back memory and is completely contained in a cache line, the 
> processor may not assert the LOCK# signal on the bus. Instead, it will modify the 
> memory location internally and allow it’s cache coherency mechanism to insure that 
> the operation is carried out atomically. This operation is called “cache locking.” The 
> cache coherency mechanism automatically prevents two or more processors that ...
> 
> 

Just to keep the info easily available: the protected structure (struct
memcg_stock_pcp) fits in 48 Bytes, which is less than the usual 64B cacheline. 

struct memcg_stock_pcp {
	spinlock_t                 stock_lock;           /*     0     4 */
	unsigned int               nr_pages;             /*     4     4 */
	struct mem_cgroup *        cached;               /*     8     8 */
	struct obj_cgroup *        cached_objcg;         /*    16     8 */
	struct pglist_data *       cached_pgdat;         /*    24     8 */
	unsigned int               nr_bytes;             /*    32     4 */
	int                        nr_slab_reclaimable_b; /*    36     4 */
	int                        nr_slab_unreclaimable_b; /*    40     4 */

	/* size: 48, cachelines: 1, members: 8 */
	/* padding: 4 */
	/* last cacheline: 48 bytes */
};

(It got smaller after patches 3/5, 4/5 and 5/5, which remove holes, work_struct
and flags respectively.)

On top of that, patch 1/5 makes sure the percpu allocation is aligned to
cacheline size.


  reply	other threads:[~2023-01-27  5:41 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25  7:34 [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining Leonardo Bras
2023-01-25  7:34 ` Leonardo Bras
2023-01-25  7:34 ` [PATCH v2 1/5] mm/memcontrol: Align percpu memcg_stock to cache Leonardo Bras
2023-01-25  7:34   ` Leonardo Bras
2023-01-25  7:34 ` [PATCH v2 2/5] mm/memcontrol: Change stock_lock type from local_lock_t to spinlock_t Leonardo Bras
2023-01-25  7:34   ` Leonardo Bras
2023-01-25  7:35 ` [PATCH v2 3/5] mm/memcontrol: Reorder memcg_stock_pcp members to avoid holes Leonardo Bras
2023-01-25  7:35   ` Leonardo Bras
2023-01-25  7:35 ` [PATCH v2 4/5] mm/memcontrol: Perform all stock drain in current CPU Leonardo Bras
2023-01-25  7:35 ` [PATCH v2 5/5] mm/memcontrol: Remove flags from memcg_stock_pcp Leonardo Bras
2023-01-25  7:35   ` Leonardo Bras
2023-01-25  8:33 ` [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining Michal Hocko
2023-01-25  8:33   ` Michal Hocko
2023-01-25 11:06   ` Leonardo Brás
2023-01-25 11:06     ` Leonardo Brás
2023-01-25 11:39     ` Michal Hocko
2023-01-25 11:39       ` Michal Hocko
2023-01-25 18:22     ` Marcelo Tosatti
2023-01-25 18:22       ` Marcelo Tosatti
2023-01-25 23:14       ` Roman Gushchin
2023-01-25 23:14         ` Roman Gushchin
2023-01-26  7:41         ` Michal Hocko
2023-01-26  7:41           ` Michal Hocko
2023-01-26 18:03           ` Marcelo Tosatti
2023-01-26 19:20             ` Michal Hocko
2023-01-27  0:32               ` Marcelo Tosatti
2023-01-27  0:32                 ` Marcelo Tosatti
2023-01-27  6:58                 ` Michal Hocko
2023-01-27  6:58                   ` Michal Hocko
2023-02-01 18:31               ` Roman Gushchin
2023-01-26 23:12           ` Roman Gushchin
2023-01-27  7:11             ` Michal Hocko
2023-01-27  7:11               ` Michal Hocko
2023-01-27  7:22               ` Leonardo Brás
2023-01-27  8:12                 ` Leonardo Brás
2023-01-27  8:12                   ` Leonardo Brás
2023-01-27  9:23                   ` Michal Hocko
2023-01-27  9:23                     ` Michal Hocko
2023-01-27 13:03                   ` Frederic Weisbecker
2023-01-27 13:03                     ` Frederic Weisbecker
2023-01-27 13:58               ` Michal Hocko
2023-01-27 13:58                 ` Michal Hocko
2023-01-27 18:18                 ` Roman Gushchin
2023-01-27 18:18                   ` Roman Gushchin
2023-02-03 15:21                   ` Michal Hocko
2023-02-03 15:21                     ` Michal Hocko
2023-02-03 19:25                     ` Roman Gushchin
2023-02-03 19:25                       ` Roman Gushchin
2023-02-13 13:36                       ` Michal Hocko
2023-02-13 13:36                         ` Michal Hocko
2023-01-27  7:14             ` Leonardo Brás
2023-01-27  7:14               ` Leonardo Brás
2023-01-27  7:20               ` Michal Hocko
2023-01-27  7:20                 ` Michal Hocko
2023-01-27  7:35                 ` Leonardo Brás
2023-01-27  9:29                   ` Michal Hocko
2023-01-27 19:29                     ` Leonardo Brás
2023-01-27 19:29                       ` Leonardo Brás
2023-01-27 23:50                       ` Roman Gushchin
2023-01-26 18:19         ` Marcelo Tosatti
2023-01-26 18:19           ` Marcelo Tosatti
2023-01-27  5:40           ` Leonardo Brás [this message]
2023-01-27  5:40             ` Leonardo Brás
2023-01-26  2:01       ` Hillf Danton
2023-01-26  7:45       ` Michal Hocko
2023-01-26  7:45         ` Michal Hocko
2023-01-26 18:14         ` Marcelo Tosatti
2023-01-26 18:14           ` Marcelo Tosatti
2023-01-26 19:13           ` Michal Hocko
2023-01-26 19:13             ` Michal Hocko
2023-01-27  6:55             ` Leonardo Brás
2023-01-27  6:55               ` Leonardo Brás
2023-01-31 11:35               ` Marcelo Tosatti
2023-01-31 11:35                 ` Marcelo Tosatti
2023-02-01  4:36                 ` Leonardo Brás
2023-02-01 12:52                   ` Michal Hocko
2023-02-01 12:52                     ` Michal Hocko
2023-02-01 12:41                 ` Michal Hocko
2023-02-01 12:41                   ` Michal Hocko
2023-02-04  4:55                   ` Leonardo Brás
2023-02-04  4:55                     ` Leonardo Brás
2023-02-05 19:49                     ` Roman Gushchin
2023-02-07  3:18                       ` Leonardo Brás
2023-02-08 19:23                         ` Roman Gushchin
2023-02-08 19:23                           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ec001ba093e21a5ac2cafa1c61810b035daf13d.camel@redhat.com \
    --to=leobras@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mtosatti@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.