linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Roman Gushchin <guro@fb.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	linux-kernel@vger.kernel.org, kernel-team@fb.com,
	Bharata B Rao <bharata@linux.ibm.com>,
	Yafang Shao <laoar.shao@gmail.com>
Subject: Re: [PATCH v2 12/28] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat
Date: Mon, 3 Feb 2020 17:39:54 -0500	[thread overview]
Message-ID: <20200203223954.GE6380@cmpxchg.org> (raw)
In-Reply-To: <20200203222853.GD6781@xps.dhcp.thefacebook.com>

On Mon, Feb 03, 2020 at 02:28:53PM -0800, Roman Gushchin wrote:
> On Mon, Feb 03, 2020 at 03:34:50PM -0500, Johannes Weiner wrote:
> > On Mon, Feb 03, 2020 at 10:25:06AM -0800, Roman Gushchin wrote:
> > > On Mon, Feb 03, 2020 at 12:58:18PM -0500, Johannes Weiner wrote:
> > > > On Mon, Jan 27, 2020 at 09:34:37AM -0800, Roman Gushchin wrote:
> > > > > Currently s8 type is used for per-cpu caching of per-node statistics.
> > > > > It works fine because the overfill threshold can't exceed 125.
> > > > > 
> > > > > But if some counters are in bytes (and the next commit in the series
> > > > > will convert slab counters to bytes), it's not gonna work:
> > > > > value in bytes can easily exceed s8 without exceeding the threshold
> > > > > converted to bytes. So to avoid overfilling per-cpu caches and breaking
> > > > > vmstats correctness, let's use s32 instead.
> > > > > 
> > > > > This doesn't affect per-zone statistics. There are no plans to use
> > > > > zone-level byte-sized counters, so no reasons to change anything.
> > > > 
> > > > Wait, is this still necessary? AFAIU, the node counters will account
> > > > full slab pages, including free space, and only the memcg counters
> > > > that track actual objects will be in bytes.
> > > > 
> > > > Can you please elaborate?
> > > 
> > > It's weird to have a counter with the same name (e.g. NR_SLAB_RECLAIMABLE_B)
> > > being in different units depending on the accounting scope.
> > > So I do convert all slab counters: global, per-lruvec,
> > > and per-memcg to bytes.
> > 
> > Since the node counters tracks allocated slab pages and the memcg
> > counter tracks allocated objects, arguably they shouldn't use the same
> > name anyway.
> > 
> > > Alternatively I can fork them, e.g. introduce per-memcg or per-lruvec
> > > NR_SLAB_RECLAIMABLE_OBJ
> > > NR_SLAB_UNRECLAIMABLE_OBJ
> > 
> > Can we alias them and reuse their slots?
> > 
> > 	/* Reuse the node slab page counters item for charged objects */
> > 	MEMCG_SLAB_RECLAIMABLE = NR_SLAB_RECLAIMABLE,
> > 	MEMCG_SLAB_UNRECLAIMABLE = NR_SLAB_UNRECLAIMABLE,
> 
> Yeah, lgtm.
> 
> Isn't MEMCG_ prefix bad because it makes everybody think that it belongs to
> the enum memcg_stat_item?

Maybe, not sure that's a problem. #define CG_SLAB_RECLAIMABLE perhaps?

> > > and keep global counters untouched. If going this way, I'd prefer to make
> > > them per-memcg, because it will simplify things on charging paths:
> > > now we do get task->mem_cgroup->obj_cgroup in the pre_alloc_hook(),
> > > and then obj_cgroup->mem_cgroup in the post_alloc_hook() just to
> > > bump per-lruvec counters.
> > 
> > I don't quite follow. Don't you still have to update the global
> > counters?
> 
> Global counters are updated only if an allocation requires a new slab
> page, which isn't the most common path.

Right.

> In generic case post_hook is required because it's the only place where
> we have both page (to get the node) and memcg pointer.
> 
> If NR_SLAB_RECLAIMABLE is tracked only per-memcg (as MEMCG_SOCK),
> then post_hook can handle only the rare "allocation failed" case.
> 
> I'm not sure here what's better.

If it's tracked only per-memcg, you still have to account it every
time you charge an object to a memcg, no? How is it less frequent than
acconting at the lruvec level?

> > > Btw, I wonder if we really need per-lruvec counters at all (at least
> > > being enabled by default). For the significant amount of users who
> > > have a single-node machine it doesn't bring anything except performance
> > > overhead.
> > 
> > Yeah, for single-node systems we should be able to redirect everything
> > to the memcg counters, without allocating and tracking lruvec copies.
> 
> Sounds good. It can lead to significant savings on single-node machines.
> 
> > 
> > > For those who have multiple nodes (and most likely many many
> > > memory cgroups) it provides way too many data except for debugging
> > > some weird mm issues.
> > > I guess in the absolute majority of cases having global per-node + per-memcg
> > > counters will be enough.
> > 
> > Hm? Reclaim uses the lruvec counters.
> 
> Can you, please, provide some examples? It looks like it's mostly based
> on per-zone lruvec size counters.

It uses the recursive lruvec state to decide inactive_is_low(),
whether refaults are occuring, whether to trim cache only or go for
anon etc. We use it to determine refault distances and how many shadow
nodes to shrink.

Grep for lruvec_page_state().

> Anyway, it seems to be a little bit off from this patchset, so let's
> discuss it separately.

True


  reply	other threads:[~2020-02-03 22:39 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-27 17:34 [PATCH v2 00/28] The new cgroup slab memory controller Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 01/28] mm: kmem: cleanup (__)memcg_kmem_charge_memcg() arguments Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 02/28] mm: kmem: cleanup memcg_kmem_uncharge_memcg() arguments Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 03/28] mm: kmem: rename memcg_kmem_(un)charge() into memcg_kmem_(un)charge_page() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 04/28] mm: kmem: switch to nr_pages in (__)memcg_kmem_charge_memcg() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 05/28] mm: memcg/slab: cache page number in memcg_(un)charge_slab() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 06/28] mm: kmem: rename (__)memcg_kmem_(un)charge_memcg() to __memcg_kmem_(un)charge() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 07/28] mm: memcg/slab: introduce mem_cgroup_from_obj() Roman Gushchin
2020-02-03 16:05   ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 08/28] mm: fork: fix kernel_stack memcg stats for various stack implementations Roman Gushchin
2020-02-03 16:12   ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 09/28] mm: memcg/slab: rename __mod_lruvec_slab_state() into __mod_lruvec_obj_state() Roman Gushchin
2020-02-03 16:13   ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 10/28] mm: memcg: introduce mod_lruvec_memcg_state() Roman Gushchin
2020-02-03 17:39   ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 11/28] mm: slub: implement SLUB version of obj_to_index() Roman Gushchin
2020-02-03 17:44   ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 12/28] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat Roman Gushchin
2020-02-03 17:58   ` Johannes Weiner
2020-02-03 18:25     ` Roman Gushchin
2020-02-03 20:34       ` Johannes Weiner
2020-02-03 22:28         ` Roman Gushchin
2020-02-03 22:39           ` Johannes Weiner [this message]
2020-02-04  1:44             ` Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 13/28] mm: vmstat: convert slab vmstat counter to bytes Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 14/28] mm: memcontrol: decouple reference counting from page accounting Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 15/28] mm: memcg/slab: obj_cgroup API Roman Gushchin
2020-02-03 19:31   ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 16/28] mm: memcg/slab: allocate obj_cgroups for non-root slab pages Roman Gushchin
2020-02-03 18:27   ` Johannes Weiner
2020-02-03 18:34     ` Roman Gushchin
2020-02-03 20:46       ` Johannes Weiner
2020-02-03 21:19         ` Roman Gushchin
2020-02-03 22:29           ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 17/28] mm: memcg/slab: save obj_cgroup for non-root slab objects Roman Gushchin
2020-02-03 19:53   ` Johannes Weiner
2020-01-27 17:34 ` [PATCH v2 18/28] mm: memcg/slab: charge individual slab objects instead of pages Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 19/28] mm: memcg/slab: deprecate memory.kmem.slabinfo Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 20/28] mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 21/28] mm: memcg/slab: use a single set of kmem_caches for all memory cgroups Roman Gushchin
2020-02-03 19:50   ` Johannes Weiner
2020-02-03 20:58     ` Roman Gushchin
2020-02-03 22:17       ` Johannes Weiner
2020-02-03 22:38         ` Roman Gushchin
2020-02-04  1:15         ` Roman Gushchin
2020-02-04  2:47           ` Johannes Weiner
2020-02-04  4:35             ` Roman Gushchin
2020-02-04 18:41               ` Johannes Weiner
2020-02-05 15:58                 ` Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 22/28] mm: memcg/slab: simplify memcg cache creation Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 23/28] mm: memcg/slab: deprecate memcg_kmem_get_cache() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 24/28] mm: memcg/slab: deprecate slab_root_caches Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 25/28] mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 26/28] tools/cgroup: add slabinfo.py tool Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 27/28] tools/cgroup: make slabinfo.py compatible with new slab controller Roman Gushchin
2020-01-30  2:17   ` Bharata B Rao
2020-01-30  2:44     ` Roman Gushchin
2020-01-31 22:24     ` Roman Gushchin
2020-02-12  5:21       ` Bharata B Rao
2020-02-12 20:42         ` Roman Gushchin
2020-01-27 17:34 ` [PATCH v2 28/28] kselftests: cgroup: add kernel memory accounting tests Roman Gushchin
2020-01-30  2:06 ` [PATCH v2 00/28] The new cgroup slab memory controller Bharata B Rao
2020-01-30  2:41   ` Roman Gushchin
2020-08-12 23:16     ` Pavel Tatashin
2020-08-12 23:18       ` Pavel Tatashin
2020-08-13  0:04       ` Roman Gushchin
2020-08-13  0:31         ` Pavel Tatashin
2020-08-28 16:47           ` Pavel Tatashin
2020-09-01  5:28             ` Bharata B Rao
2020-09-01 12:52               ` Pavel Tatashin
2020-09-02  6:23                 ` Bharata B Rao
2020-09-02 12:34                   ` Pavel Tatashin
2020-09-02  9:53             ` Vlastimil Babka
2020-09-02 10:39               ` David Hildenbrand
2020-09-02 12:42                 ` Pavel Tatashin
2020-09-02 13:50                   ` Michal Hocko
2020-09-02 14:20                     ` Pavel Tatashin
2020-09-03 18:09                       ` David Hildenbrand
2020-09-02 11:26               ` Michal Hocko
2020-09-02 12:51                 ` Pavel Tatashin
2020-09-02 13:51                   ` Michal Hocko
2020-09-02 11:32               ` Michal Hocko
2020-09-02 12:53                 ` Pavel Tatashin
2020-09-02 13:52                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200203223954.GE6380@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@linux.ibm.com \
    --cc=guro@fb.com \
    --cc=kernel-team@fb.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).