linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v5 0/7] mm: reparent slab memory on cgroup removal
       [not found] <20190521200735.2603003-1-guro@fb.com>
@ 2019-05-22 21:43 ` Roman Gushchin
  2019-05-22 21:59   ` Andrew Morton
       [not found] ` <20190521200735.2603003-6-guro@fb.com>
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Roman Gushchin @ 2019-05-22 21:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Kernel Team, Johannes Weiner,
	Michal Hocko, Rik van Riel, Shakeel Butt, Christoph Lameter,
	Vladimir Davydov, cgroups, Waiman Long

Hi, Andrew!

Is this patchset good to go? Or do you have any remaining concerns?

It has been carefully reviewed by Shakeel; and also Christoph and Waiman
gave some attention to it.

Since commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively")
has been reverted, the memcg "leak" problem is open again, and I've heard
from several independent people and companies that it's a real problem
for them. So it will be nice to close it asap.

I suspect that the fix is too heavy for stable, unfortunately.

Please, let me know if you have any issues that preventing you
from pulling it into the tree.

Thank you!

On Tue, May 21, 2019 at 01:07:28PM -0700, Roman Gushchin wrote:
> # Why do we need this?
> 
> We've noticed that the number of dying cgroups is steadily growing on most
> of our hosts in production. The following investigation revealed an issue
> in userspace memory reclaim code [1], accounting of kernel stacks [2],
> and also the mainreason: slab objects.
> 
> The underlying problem is quite simple: any page charged
> to a cgroup holds a reference to it, so the cgroup can't be reclaimed unless
> all charged pages are gone. If a slab object is actively used by other cgroups,
> it won't be reclaimed, and will prevent the origin cgroup from being reclaimed.
> 
> Slab objects, and first of all vfs cache, is shared between cgroups, which are
> using the same underlying fs, and what's even more important, it's shared
> between multiple generations of the same workload. So if something is running
> periodically every time in a new cgroup (like how systemd works), we do
> accumulate multiple dying cgroups.
> 
> Strictly speaking pagecache isn't different here, but there is a key difference:
> we disable protection and apply some extra pressure on LRUs of dying cgroups,
> and these LRUs contain all charged pages.
> My experiments show that with the disabled kernel memory accounting the number
> of dying cgroups stabilizes at a relatively small number (~100, depends on
> memory pressure and cgroup creation rate), and with kernel memory accounting
> it grows pretty steadily up to several thousands.
> 
> Memory cgroups are quite complex and big objects (mostly due to percpu stats),
> so it leads to noticeable memory losses. Memory occupied by dying cgroups
> is measured in hundreds of megabytes. I've even seen a host with more than 100Gb
> of memory wasted for dying cgroups. It leads to a degradation of performance
> with the uptime, and generally limits the usage of cgroups.
> 
> My previous attempt [3] to fix the problem by applying extra pressure on slab
> shrinker lists caused a regressions with xfs and ext4, and has been reverted [4].
> The following attempts to find the right balance [5, 6] were not successful.
> 
> So instead of trying to find a maybe non-existing balance, let's do reparent
> the accounted slabs to the parent cgroup on cgroup removal.
> 
> 
> # Implementation approach
> 
> There is however a significant problem with reparenting of slab memory:
> there is no list of charged pages. Some of them are in shrinker lists,
> but not all. Introducing of a new list is really not an option.
> 
> But fortunately there is a way forward: every slab page has a stable pointer
> to the corresponding kmem_cache. So the idea is to reparent kmem_caches
> instead of slab pages.
> 
> It's actually simpler and cheaper, but requires some underlying changes:
> 1) Make kmem_caches to hold a single reference to the memory cgroup,
>    instead of a separate reference per every slab page.
> 2) Stop setting page->mem_cgroup pointer for memcg slab pages and use
>    page->kmem_cache->memcg indirection instead. It's used only on
>    slab page release, so it shouldn't be a big issue.
> 3) Introduce a refcounter for non-root slab caches. It's required to
>    be able to destroy kmem_caches when they become empty and release
>    the associated memory cgroup.
> 
> There is a bonus: currently we do release empty kmem_caches on cgroup
> removal, however all other are waiting for the releasing of the memory cgroup.
> These refactorings allow kmem_caches to be released as soon as they
> become inactive and free.
> 
> Some additional implementation details are provided in corresponding
> commit messages.
> 
> 
> # Results
> 
> Below is the average number of dying cgroups on two groups of our production
> hosts. They do run some sort of web frontend workload, the memory pressure
> is moderate. As we can see, with the kernel memory reparenting the number
> stabilizes in 60s range; however with the original version it grows almost
> linearly and doesn't show any signs of plateauing. The difference in slab
> and percpu usage between patched and unpatched versions also grows linearly.
> In 7 days it exceeded 200Mb.
> 
> day           0    1    2    3    4    5    6    7
> original     56  362  628  752 1070 1250 1490 1560
> patched      23   46   51   55   60   57   67   69
> mem diff(Mb) 22   74  123  152  164  182  214  241
> 
> 
> # History
> 
> v5:
>   1) fixed a compilation warning around missing kmemcg_queue_cache_shutdown()
>   2) s/rcu_read_lock()/rcu_read_unlock() in memcg_kmem_get_cache()
> 
> v4:
>   1) removed excessive memcg != parent check in memcg_deactivate_kmem_caches()
>   2) fixed rcu_read_lock() usage in memcg_charge_slab()
>   3) fixed synchronization around dying flag in kmemcg_queue_cache_shutdown()
>   4) refreshed test results data
>   5) reworked PageTail() checks in memcg_from_slab_page()
>   6) added some comments in multiple places
> 
> v3:
>   1) reworked memcg kmem_cache search on allocation path
>   2) fixed /proc/kpagecgroup interface
> 
> v2:
>   1) switched to percpu kmem_cache refcounter
>   2) a reference to kmem_cache is held during the allocation
>   3) slabs stats are fixed for !MEMCG case (and the refactoring
>      is separated into a standalone patch)
>   4) kmem_cache reparenting is performed from deactivatation context
> 
> v1:
>   https://lkml.org/lkml/2019/4/17/1095
> 
> 
> # Links
> 
> [1]: commit 68600f623d69 ("mm: don't miss the last page because of
> round-off error")
> [2]: commit 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> [3]: commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively
> small number of objects")
> [4]: commit a9a238e83fbb ("Revert "mm: slowly shrink slabs
> with a relatively small number of objects")
> [5]: https://lkml.org/lkml/2019/1/28/1865
> [6]: https://marc.info/?l=linux-mm&m=155064763626437&w=2
> 
> 
> Roman Gushchin (7):
>   mm: postpone kmem_cache memcg pointer initialization to
>     memcg_link_cache()
>   mm: generalize postponed non-root kmem_cache deactivation
>   mm: introduce __memcg_kmem_uncharge_memcg()
>   mm: unify SLAB and SLUB page accounting
>   mm: rework non-root kmem_cache lifecycle management
>   mm: reparent slab memory on cgroup removal
>   mm: fix /proc/kpagecgroup interface for slab pages
> 
>  include/linux/memcontrol.h |  10 +++
>  include/linux/slab.h       |  13 +--
>  mm/memcontrol.c            | 101 ++++++++++++++++-------
>  mm/slab.c                  |  25 ++----
>  mm/slab.h                  | 137 ++++++++++++++++++++++++-------
>  mm/slab_common.c           | 162 +++++++++++++++++++++----------------
>  mm/slub.c                  |  36 ++-------
>  7 files changed, 299 insertions(+), 185 deletions(-)
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 0/7] mm: reparent slab memory on cgroup removal
  2019-05-22 21:43 ` [PATCH v5 0/7] mm: reparent slab memory on cgroup removal Roman Gushchin
@ 2019-05-22 21:59   ` Andrew Morton
  2019-05-22 22:23     ` Roman Gushchin
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2019-05-22 21:59 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, linux-kernel, Kernel Team, Johannes Weiner,
	Michal Hocko, Rik van Riel, Shakeel Butt, Christoph Lameter,
	Vladimir Davydov, cgroups, Waiman Long

On Wed, 22 May 2019 21:43:54 +0000 Roman Gushchin <guro@fb.com> wrote:

> Is this patchset good to go? Or do you have any remaining concerns?
> 
> It has been carefully reviewed by Shakeel; and also Christoph and Waiman
> gave some attention to it.
> 
> Since commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively")
> has been reverted, the memcg "leak" problem is open again, and I've heard
> from several independent people and companies that it's a real problem
> for them. So it will be nice to close it asap.
> 
> I suspect that the fix is too heavy for stable, unfortunately.
> 
> Please, let me know if you have any issues that preventing you
> from pulling it into the tree.

I looked, and put it on ice for a while, hoping to hear from
mhocko/hannes.  Did they look at the earlier versions?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 0/7] mm: reparent slab memory on cgroup removal
  2019-05-22 21:59   ` Andrew Morton
@ 2019-05-22 22:23     ` Roman Gushchin
  2019-05-28  7:01       ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Roman Gushchin @ 2019-05-22 22:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Kernel Team, Johannes Weiner,
	Michal Hocko, Rik van Riel, Shakeel Butt, Christoph Lameter,
	Vladimir Davydov, cgroups, Waiman Long

On Wed, May 22, 2019 at 02:59:06PM -0700, Andrew Morton wrote:
> On Wed, 22 May 2019 21:43:54 +0000 Roman Gushchin <guro@fb.com> wrote:
> 
> > Is this patchset good to go? Or do you have any remaining concerns?
> > 
> > It has been carefully reviewed by Shakeel; and also Christoph and Waiman
> > gave some attention to it.
> > 
> > Since commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively")
> > has been reverted, the memcg "leak" problem is open again, and I've heard
> > from several independent people and companies that it's a real problem
> > for them. So it will be nice to close it asap.
> > 
> > I suspect that the fix is too heavy for stable, unfortunately.
> > 
> > Please, let me know if you have any issues that preventing you
> > from pulling it into the tree.
> 
> I looked, and put it on ice for a while, hoping to hear from
> mhocko/hannes.  Did they look at the earlier versions?

Johannes has definitely looked at one of early versions of the patchset,
and one of the outcomes was his own patchset about pushing memcg stats
up by the tree, which eliminated the need to deal with memcg stats
on kmem_cache reparenting.

The problem and the proposed solution have been discussed on latest LSFMM,
and I didn't hear any opposition. So I assume that Michal is at least
not against the idea in general. A careful code review is always welcome,
of course.

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 0/7] mm: reparent slab memory on cgroup removal
  2019-05-22 22:23     ` Roman Gushchin
@ 2019-05-28  7:01       ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2019-05-28  7:01 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, Kernel Team,
	Johannes Weiner, Rik van Riel, Shakeel Butt, Christoph Lameter,
	Vladimir Davydov, cgroups, Waiman Long

On Wed 22-05-19 22:23:01, Roman Gushchin wrote:
> On Wed, May 22, 2019 at 02:59:06PM -0700, Andrew Morton wrote:
> > On Wed, 22 May 2019 21:43:54 +0000 Roman Gushchin <guro@fb.com> wrote:
> > 
> > > Is this patchset good to go? Or do you have any remaining concerns?
> > > 
> > > It has been carefully reviewed by Shakeel; and also Christoph and Waiman
> > > gave some attention to it.
> > > 
> > > Since commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively")
> > > has been reverted, the memcg "leak" problem is open again, and I've heard
> > > from several independent people and companies that it's a real problem
> > > for them. So it will be nice to close it asap.
> > > 
> > > I suspect that the fix is too heavy for stable, unfortunately.
> > > 
> > > Please, let me know if you have any issues that preventing you
> > > from pulling it into the tree.
> > 
> > I looked, and put it on ice for a while, hoping to hear from
> > mhocko/hannes.  Did they look at the earlier versions?
> 
> Johannes has definitely looked at one of early versions of the patchset,
> and one of the outcomes was his own patchset about pushing memcg stats
> up by the tree, which eliminated the need to deal with memcg stats
> on kmem_cache reparenting.
> 
> The problem and the proposed solution have been discussed on latest LSFMM,
> and I didn't hear any opposition. So I assume that Michal is at least
> not against the idea in general. A careful code review is always welcome,
> of course.

I didn't get to review this properly (ETOOBUSY). This is a tricky area
so a careful review is definitely due. I would really appreciate if
Vladimir could have a look. I understand he is busy with other stuff but
a highlevel review from him would be really helpful.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management
       [not found] ` <20190521200735.2603003-6-guro@fb.com>
@ 2019-05-28 17:08   ` Vladimir Davydov
       [not found]     ` <96b8a923-49e4-f13e-b1e3-3df4598d849e@redhat.com>
  2019-05-28 18:00     ` Vladimir Davydov
  2019-05-28 22:03   ` Johannes Weiner
  1 sibling, 2 replies; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 17:08 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

Hello Roman,

On Tue, May 21, 2019 at 01:07:33PM -0700, Roman Gushchin wrote:
> This commit makes several important changes in the lifecycle
> of a non-root kmem_cache, which also affect the lifecycle
> of a memory cgroup.
> 
> Currently each charged slab page has a page->mem_cgroup pointer
> to the memory cgroup and holds a reference to it.
> Kmem_caches are held by the memcg and are released with it.
> It means that none of kmem_caches are released unless at least one
> reference to the memcg exists, which is not optimal.
> 
> So the current scheme can be illustrated as:
> page->mem_cgroup->kmem_cache.
> 
> To implement the slab memory reparenting we need to invert the scheme
> into: page->kmem_cache->mem_cgroup.
> 
> Let's make every page to hold a reference to the kmem_cache (we
> already have a stable pointer), and make kmem_caches to hold a single
> reference to the memory cgroup.

Is there any reason why we can't reference both mem cgroup and kmem
cache per each charged kmem page? I mean,

  page->mem_cgroup references mem_cgroup
  page->kmem_cache references kmem_cache
  mem_cgroup references kmem_cache while it's online

TBO it seems to me that not taking a reference to mem cgroup per charged
kmem page makes the code look less straightforward, e.g. as you
mentioned in the commit log, we have to use mod_lruvec_state() for memcg
pages and mod_lruvec_page_state() for root pages.

> 
> To make this possible we need to introduce a new percpu refcounter
> for non-root kmem_caches. The counter is initialized to the percpu
> mode, and is switched to atomic mode after deactivation, so we never
> shutdown an active cache. The counter is bumped for every charged page
> and also for every running allocation. So the kmem_cache can't
> be released unless all allocations complete.
> 
> To shutdown non-active empty kmem_caches, let's reuse the
> infrastructure of the RCU-delayed work queue, used previously for
> the deactivation. After the generalization, it's perfectly suited
> for our needs.
> 
> Since now we can release a kmem_cache at any moment after the
> deactivation, let's call sysfs_slab_remove() only from the shutdown
> path. It makes deactivation path simpler.

But a cache can be dangling for quite a while after cgroup was taken
down, even after this patch, because there still can be pages charged to
it. The reason why we call sysfs_slab_remove() is to delete associated
files from sysfs ASAP. I'd try to preserve the current behavior if
possible.

> 
> Because we don't set the page->mem_cgroup pointer, we need to change
> the way how memcg-level stats is working for slab pages. We can't use
> mod_lruvec_page_state() helpers anymore, so switch over to
> mod_lruvec_state().

> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 4e5b4292a763..8d68de4a2341 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -727,9 +737,31 @@ static void kmemcg_schedule_work_after_rcu(struct rcu_head *head)
>  	queue_work(memcg_kmem_cache_wq, &s->memcg_params.work);
>  }
>  
> +static void kmemcg_cache_shutdown_after_rcu(struct kmem_cache *s)
> +{
> +	WARN_ON(shutdown_cache(s));
> +}
> +
> +static void kmemcg_queue_cache_shutdown(struct percpu_ref *percpu_ref)
> +{
> +	struct kmem_cache *s = container_of(percpu_ref, struct kmem_cache,
> +					    memcg_params.refcnt);
> +
> +	spin_lock(&memcg_kmem_wq_lock);

This code may be called from irq context AFAIU so you should use
irq-safe primitive.

> +	if (s->memcg_params.root_cache->memcg_params.dying)
> +		goto unlock;
> +
> +	WARN_ON(s->memcg_params.work_fn);
> +	s->memcg_params.work_fn = kmemcg_cache_shutdown_after_rcu;
> +	call_rcu(&s->memcg_params.rcu_head, kmemcg_schedule_work_after_rcu);

I may be totally wrong here, but I have a suspicion we don't really need
rcu here.

As I see it, you add this code so as to prevent memcg_kmem_get_cache
from dereferencing a destroyed kmem cache. Can't we continue using
css_tryget_online for that? I mean, take rcu_read_lock() and try to get
css reference. If you succeed, then the cgroup must be online, and
css_offline won't be called until you unlock rcu, right? This means that
the cache is guaranteed to be alive until then, because the cgroup holds
a reference to all its kmem caches until it's taken offline.

> +unlock:
> +	spin_unlock(&memcg_kmem_wq_lock);
> +}
> +
>  static void kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s)
>  {
>  	__kmemcg_cache_deactivate_after_rcu(s);
> +	percpu_ref_kill(&s->memcg_params.refcnt);
>  }
>  
>  static void kmemcg_cache_deactivate(struct kmem_cache *s)
> @@ -854,8 +861,15 @@ static int shutdown_memcg_caches(struct kmem_cache *s)
>  
>  static void flush_memcg_workqueue(struct kmem_cache *s)
>  {
> +	/*
> +	 * memcg_params.dying is synchronized using slab_mutex AND
> +	 * memcg_kmem_wq_lock spinlock, because it's not always
> +	 * possible to grab slab_mutex.
> +	 */
>  	mutex_lock(&slab_mutex);
> +	spin_lock(&memcg_kmem_wq_lock);
>  	s->memcg_params.dying = true;
> +	spin_unlock(&memcg_kmem_wq_lock);

I would completely switch from the mutex to the new spin lock -
acquiring them both looks weird.

>  	mutex_unlock(&slab_mutex);
>  
>  	/*

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/7] mm: generalize postponed non-root kmem_cache deactivation
       [not found] ` <20190521200735.2603003-3-guro@fb.com>
@ 2019-05-28 17:11   ` Vladimir Davydov
  0 siblings, 0 replies; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 17:11 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 21, 2019 at 01:07:30PM -0700, Roman Gushchin wrote:
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 6e00bdf8618d..4e5b4292a763 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -866,11 +859,12 @@ static void flush_memcg_workqueue(struct kmem_cache *s)
>  	mutex_unlock(&slab_mutex);
>  
>  	/*
> -	 * SLUB deactivates the kmem_caches through call_rcu. Make
> +	 * SLAB and SLUB deactivate the kmem_caches through call_rcu. Make
>  	 * sure all registered rcu callbacks have been invoked.
>  	 */
> -	if (IS_ENABLED(CONFIG_SLUB))
> -		rcu_barrier();
> +#ifndef CONFIG_SLOB
> +	rcu_barrier();
> +#endif

Nit: you don't need to check CONFIG_SLOB here as this code is under
CONFIG_MEMCG_KMEM which depends on !SLOB.

Other than that, the patch looks good to me, provided using rcu for slab
deactivation proves to be really necessary, although I'd split it in two
patches - one doing renaming another refactoring - easier to review that
way.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 4/7] mm: unify SLAB and SLUB page accounting
       [not found] ` <20190521200735.2603003-5-guro@fb.com>
@ 2019-05-28 17:12   ` Vladimir Davydov
  2019-05-28 22:00   ` Johannes Weiner
  1 sibling, 0 replies; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 17:12 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 21, 2019 at 01:07:32PM -0700, Roman Gushchin wrote:
> Currently the page accounting code is duplicated in SLAB and SLUB
> internals. Let's move it into new (un)charge_slab_page helpers
> in the slab_common.c file. These helpers will be responsible
> for statistics (global and memcg-aware) and memcg charging.
> So they are replacing direct memcg_(un)charge_slab() calls.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Christoph Lameter <cl@linux.com>

Makes sense even without the rest of the series,

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 1/7] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache()
       [not found] ` <20190521200735.2603003-2-guro@fb.com>
@ 2019-05-28 17:14   ` Vladimir Davydov
  2019-05-28 21:56   ` Johannes Weiner
  1 sibling, 0 replies; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 17:14 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 21, 2019 at 01:07:29PM -0700, Roman Gushchin wrote:
> Initialize kmem_cache->memcg_params.memcg pointer in
> memcg_link_cache() rather than in init_memcg_params().
> 
> Once kmem_cache will hold a reference to the memory cgroup,
> it will simplify the refcounting.
> 
> For non-root kmem_caches memcg_link_cache() is always called
> before the kmem_cache becomes visible to a user, so it's safe.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

This one is a no-brainer.

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

Provided it's necessary for the rest of the series.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 7/7] mm: fix /proc/kpagecgroup interface for slab pages
       [not found] ` <20190521200735.2603003-8-guro@fb.com>
@ 2019-05-28 17:38   ` Vladimir Davydov
  0 siblings, 0 replies; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 17:38 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 21, 2019 at 01:07:35PM -0700, Roman Gushchin wrote:
> Switching to an indirect scheme of getting mem_cgroup pointer for
> !root slab pages broke /proc/kpagecgroup interface for them.
> 
> Let's fix it by learning page_cgroup_ino() how to get memcg
> pointer for slab pages.
> 
> Reported-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> ---
>  mm/memcontrol.c  |  5 ++++-
>  mm/slab.h        | 25 +++++++++++++++++++++++++
>  mm/slab_common.c |  1 +
>  3 files changed, 30 insertions(+), 1 deletion(-)

What about mem_cgroup_from_kmem, see mm/list_lru.c?
Shouldn't we fix it, too?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management
       [not found]     ` <96b8a923-49e4-f13e-b1e3-3df4598d849e@redhat.com>
@ 2019-05-28 17:39       ` Vladimir Davydov
  2019-05-28 17:41         ` Waiman Long
  0 siblings, 1 reply; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 17:39 UTC (permalink / raw)
  To: Waiman Long
  Cc: Roman Gushchin, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Johannes Weiner, Michal Hocko, Rik van Riel,
	Shakeel Butt, Christoph Lameter, cgroups

On Tue, May 28, 2019 at 01:37:50PM -0400, Waiman Long wrote:
> On 5/28/19 1:08 PM, Vladimir Davydov wrote:
> >>  static void flush_memcg_workqueue(struct kmem_cache *s)
> >>  {
> >> +	/*
> >> +	 * memcg_params.dying is synchronized using slab_mutex AND
> >> +	 * memcg_kmem_wq_lock spinlock, because it's not always
> >> +	 * possible to grab slab_mutex.
> >> +	 */
> >>  	mutex_lock(&slab_mutex);
> >> +	spin_lock(&memcg_kmem_wq_lock);
> >>  	s->memcg_params.dying = true;
> >> +	spin_unlock(&memcg_kmem_wq_lock);
> > I would completely switch from the mutex to the new spin lock -
> > acquiring them both looks weird.
> >
> >>  	mutex_unlock(&slab_mutex);
> >>  
> >>  	/*
> 
> There are places where the slab_mutex is held and sleeping functions
> like kvzalloc() are called. I understand that taking both mutex and
> spinlocks look ugly, but converting all the slab_mutex critical sections
> to spinlock critical sections will be a major undertaking by itself. So
> I would suggest leaving that for now.

I didn't mean that. I meant taking spin_lock wherever we need to access
the 'dying' flag, even if slab_mutex is held. So that we don't need to
take mutex_lock in flush_memcg_workqueue, where it's used solely for
'dying' synchronization.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management
  2019-05-28 17:39       ` Vladimir Davydov
@ 2019-05-28 17:41         ` Waiman Long
  0 siblings, 0 replies; 21+ messages in thread
From: Waiman Long @ 2019-05-28 17:41 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Roman Gushchin, Andrew Morton, linux-mm, linux-kernel,
	kernel-team, Johannes Weiner, Michal Hocko, Rik van Riel,
	Shakeel Butt, Christoph Lameter, cgroups

On 5/28/19 1:39 PM, Vladimir Davydov wrote:
> On Tue, May 28, 2019 at 01:37:50PM -0400, Waiman Long wrote:
>> On 5/28/19 1:08 PM, Vladimir Davydov wrote:
>>>>  static void flush_memcg_workqueue(struct kmem_cache *s)
>>>>  {
>>>> +	/*
>>>> +	 * memcg_params.dying is synchronized using slab_mutex AND
>>>> +	 * memcg_kmem_wq_lock spinlock, because it's not always
>>>> +	 * possible to grab slab_mutex.
>>>> +	 */
>>>>  	mutex_lock(&slab_mutex);
>>>> +	spin_lock(&memcg_kmem_wq_lock);
>>>>  	s->memcg_params.dying = true;
>>>> +	spin_unlock(&memcg_kmem_wq_lock);
>>> I would completely switch from the mutex to the new spin lock -
>>> acquiring them both looks weird.
>>>
>>>>  	mutex_unlock(&slab_mutex);
>>>>  
>>>>  	/*
>> There are places where the slab_mutex is held and sleeping functions
>> like kvzalloc() are called. I understand that taking both mutex and
>> spinlocks look ugly, but converting all the slab_mutex critical sections
>> to spinlock critical sections will be a major undertaking by itself. So
>> I would suggest leaving that for now.
> I didn't mean that. I meant taking spin_lock wherever we need to access
> the 'dying' flag, even if slab_mutex is held. So that we don't need to
> take mutex_lock in flush_memcg_workqueue, where it's used solely for
> 'dying' synchronization.

OK, that makes sense. Thanks for the clarification.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management
  2019-05-28 17:08   ` [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management Vladimir Davydov
       [not found]     ` <96b8a923-49e4-f13e-b1e3-3df4598d849e@redhat.com>
@ 2019-05-28 18:00     ` Vladimir Davydov
  1 sibling, 0 replies; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 18:00 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 28, 2019 at 08:08:28PM +0300, Vladimir Davydov wrote:
> Hello Roman,
> 
> On Tue, May 21, 2019 at 01:07:33PM -0700, Roman Gushchin wrote:
> > This commit makes several important changes in the lifecycle
> > of a non-root kmem_cache, which also affect the lifecycle
> > of a memory cgroup.
> > 
> > Currently each charged slab page has a page->mem_cgroup pointer
> > to the memory cgroup and holds a reference to it.
> > Kmem_caches are held by the memcg and are released with it.
> > It means that none of kmem_caches are released unless at least one
> > reference to the memcg exists, which is not optimal.
> > 
> > So the current scheme can be illustrated as:
> > page->mem_cgroup->kmem_cache.
> > 
> > To implement the slab memory reparenting we need to invert the scheme
> > into: page->kmem_cache->mem_cgroup.
> > 
> > Let's make every page to hold a reference to the kmem_cache (we
> > already have a stable pointer), and make kmem_caches to hold a single
> > reference to the memory cgroup.
> 
> Is there any reason why we can't reference both mem cgroup and kmem
> cache per each charged kmem page? I mean,
> 
>   page->mem_cgroup references mem_cgroup
>   page->kmem_cache references kmem_cache
>   mem_cgroup references kmem_cache while it's online
> 
> TBO it seems to me that not taking a reference to mem cgroup per charged
> kmem page makes the code look less straightforward, e.g. as you
> mentioned in the commit log, we have to use mod_lruvec_state() for memcg
> pages and mod_lruvec_page_state() for root pages.

I think I completely missed the point here. In the following patch you
move kmem caches from a child to the parent cgroup on offline (aka
reparent them). That's why you can't maintain page->mem_cgroup. Sorry
for misunderstanding.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/7] mm: reparent slab memory on cgroup removal
       [not found] ` <20190521200735.2603003-7-guro@fb.com>
@ 2019-05-28 18:33   ` Vladimir Davydov
  2019-05-28 19:58     ` Roman Gushchin
  0 siblings, 1 reply; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 18:33 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 21, 2019 at 01:07:34PM -0700, Roman Gushchin wrote:
> Let's reparent memcg slab memory on memcg offlining. This allows us
> to release the memory cgroup without waiting for the last outstanding
> kernel object (e.g. dentry used by another application).
> 
> So instead of reparenting all accounted slab pages, let's do reparent
> a relatively small amount of kmem_caches. Reparenting is performed as
> a part of the deactivation process.
> 
> Since the parent cgroup is already charged, everything we need to do
> is to splice the list of kmem_caches to the parent's kmem_caches list,
> swap the memcg pointer and drop the css refcounter for each kmem_cache
> and adjust the parent's css refcounter. Quite simple.
> 
> Please, note that kmem_cache->memcg_params.memcg isn't a stable
> pointer anymore. It's safe to read it under rcu_read_lock() or
> with slab_mutex held.
> 
> We can race with the slab allocation and deallocation paths. It's not
> a big problem: parent's charge and slab global stats are always
> correct, and we don't care anymore about the child usage and global
> stats. The child cgroup is already offline, so we don't use or show it
> anywhere.
> 
> Local slab stats (NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE)
> aren't used anywhere except count_shadow_nodes(). But even there it
> won't break anything: after reparenting "nodes" will be 0 on child
> level (because we're already reparenting shrinker lists), and on
> parent level page stats always were 0, and this patch won't change
> anything.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

This one looks good to me. I can't see why anything could possibly go
wrong after this change.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/7] mm: reparent slab memory on cgroup removal
  2019-05-28 18:33   ` [PATCH v5 6/7] mm: reparent slab memory on cgroup removal Vladimir Davydov
@ 2019-05-28 19:58     ` Roman Gushchin
  2019-05-28 20:11       ` Vladimir Davydov
  2019-05-28 22:16       ` Johannes Weiner
  0 siblings, 2 replies; 21+ messages in thread
From: Roman Gushchin @ 2019-05-28 19:58 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, linux-mm, linux-kernel, Kernel Team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 28, 2019 at 09:33:02PM +0300, Vladimir Davydov wrote:
> On Tue, May 21, 2019 at 01:07:34PM -0700, Roman Gushchin wrote:
> > Let's reparent memcg slab memory on memcg offlining. This allows us
> > to release the memory cgroup without waiting for the last outstanding
> > kernel object (e.g. dentry used by another application).
> > 
> > So instead of reparenting all accounted slab pages, let's do reparent
> > a relatively small amount of kmem_caches. Reparenting is performed as
> > a part of the deactivation process.
> > 
> > Since the parent cgroup is already charged, everything we need to do
> > is to splice the list of kmem_caches to the parent's kmem_caches list,
> > swap the memcg pointer and drop the css refcounter for each kmem_cache
> > and adjust the parent's css refcounter. Quite simple.
> > 
> > Please, note that kmem_cache->memcg_params.memcg isn't a stable
> > pointer anymore. It's safe to read it under rcu_read_lock() or
> > with slab_mutex held.
> > 
> > We can race with the slab allocation and deallocation paths. It's not
> > a big problem: parent's charge and slab global stats are always
> > correct, and we don't care anymore about the child usage and global
> > stats. The child cgroup is already offline, so we don't use or show it
> > anywhere.
> > 
> > Local slab stats (NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE)
> > aren't used anywhere except count_shadow_nodes(). But even there it
> > won't break anything: after reparenting "nodes" will be 0 on child
> > level (because we're already reparenting shrinker lists), and on
> > parent level page stats always were 0, and this patch won't change
> > anything.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> 
> This one looks good to me. I can't see why anything could possibly go
> wrong after this change.

Hi Vladimir!

Thank you for looking into the series. Really appreciate it!

It looks like outstanding questions are:
1) synchronization around the dying flag
2) removing CONFIG_SLOB in 2/7
3) early sysfs_slab_remove()
4) mem_cgroup_from_kmem in 7/7

Please, let me know if I missed anything.

I'm waiting now for Johanness's review, so I'll address these issues
in background and post the next (and hopefully) final version.

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/7] mm: reparent slab memory on cgroup removal
  2019-05-28 19:58     ` Roman Gushchin
@ 2019-05-28 20:11       ` Vladimir Davydov
  2019-05-28 21:52         ` Roman Gushchin
  2019-05-28 22:16       ` Johannes Weiner
  1 sibling, 1 reply; 21+ messages in thread
From: Vladimir Davydov @ 2019-05-28 20:11 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, Kernel Team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 28, 2019 at 07:58:17PM +0000, Roman Gushchin wrote:
> It looks like outstanding questions are:
> 1) synchronization around the dying flag
> 2) removing CONFIG_SLOB in 2/7
> 3) early sysfs_slab_remove()
> 4) mem_cgroup_from_kmem in 7/7
> 
> Please, let me know if I missed anything.

Also, I think that it might be possible to get rid of RCU call in kmem
cache destructor, because the cgroup subsystem already handles it and
we could probably piggyback - see my comment to 5/7. Not sure if it's
really necessary, since we already have RCU in SLUB, but worth looking
into, I guess, as it might simplify the code a bit.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/7] mm: reparent slab memory on cgroup removal
  2019-05-28 20:11       ` Vladimir Davydov
@ 2019-05-28 21:52         ` Roman Gushchin
  0 siblings, 0 replies; 21+ messages in thread
From: Roman Gushchin @ 2019-05-28 21:52 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, linux-mm, linux-kernel, Kernel Team,
	Johannes Weiner, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 28, 2019 at 11:11:02PM +0300, Vladimir Davydov wrote:
> On Tue, May 28, 2019 at 07:58:17PM +0000, Roman Gushchin wrote:
> > It looks like outstanding questions are:
> > 1) synchronization around the dying flag
> > 2) removing CONFIG_SLOB in 2/7
> > 3) early sysfs_slab_remove()
> > 4) mem_cgroup_from_kmem in 7/7
> > 
> > Please, let me know if I missed anything.
> 
> Also, I think that it might be possible to get rid of RCU call in kmem
> cache destructor, because the cgroup subsystem already handles it and
> we could probably piggyback - see my comment to 5/7. Not sure if it's
> really necessary, since we already have RCU in SLUB, but worth looking
> into, I guess, as it might simplify the code a bit.

Added to the list. Thank you!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 1/7] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache()
       [not found] ` <20190521200735.2603003-2-guro@fb.com>
  2019-05-28 17:14   ` [PATCH v5 1/7] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache() Vladimir Davydov
@ 2019-05-28 21:56   ` Johannes Weiner
  1 sibling, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2019-05-28 21:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team, Michal Hocko,
	Rik van Riel, Shakeel Butt, Christoph Lameter, Vladimir Davydov,
	cgroups, Waiman Long

On Tue, May 21, 2019 at 01:07:29PM -0700, Roman Gushchin wrote:
> Initialize kmem_cache->memcg_params.memcg pointer in
> memcg_link_cache() rather than in init_memcg_params().
> 
> Once kmem_cache will hold a reference to the memory cgroup,
> it will simplify the refcounting.
> 
> For non-root kmem_caches memcg_link_cache() is always called
> before the kmem_cache becomes visible to a user, so it's safe.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 4/7] mm: unify SLAB and SLUB page accounting
       [not found] ` <20190521200735.2603003-5-guro@fb.com>
  2019-05-28 17:12   ` [PATCH v5 4/7] mm: unify SLAB and SLUB page accounting Vladimir Davydov
@ 2019-05-28 22:00   ` Johannes Weiner
  1 sibling, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2019-05-28 22:00 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team, Michal Hocko,
	Rik van Riel, Shakeel Butt, Christoph Lameter, Vladimir Davydov,
	cgroups, Waiman Long

On Tue, May 21, 2019 at 01:07:32PM -0700, Roman Gushchin wrote:
> Currently the page accounting code is duplicated in SLAB and SLUB
> internals. Let's move it into new (un)charge_slab_page helpers
> in the slab_common.c file. These helpers will be responsible
> for statistics (global and memcg-aware) and memcg charging.
> So they are replacing direct memcg_(un)charge_slab() calls.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Christoph Lameter <cl@linux.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management
       [not found] ` <20190521200735.2603003-6-guro@fb.com>
  2019-05-28 17:08   ` [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management Vladimir Davydov
@ 2019-05-28 22:03   ` Johannes Weiner
  2019-05-28 22:28     ` Roman Gushchin
  1 sibling, 1 reply; 21+ messages in thread
From: Johannes Weiner @ 2019-05-28 22:03 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, linux-kernel, kernel-team, Michal Hocko,
	Rik van Riel, Shakeel Butt, Christoph Lameter, Vladimir Davydov,
	cgroups, Waiman Long

On Tue, May 21, 2019 at 01:07:33PM -0700, Roman Gushchin wrote:
> +	arr = rcu_dereference(cachep->memcg_params.memcg_caches);
> +
> +	/*
> +	 * Make sure we will access the up-to-date value. The code updating
> +	 * memcg_caches issues a write barrier to match this (see
> +	 * memcg_create_kmem_cache()).
> +	 */
> +	memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]);

READ_ONCE() isn't an SMP barrier, it just prevents compiler
muckery. This needs an explicit smp_rmb() to pair with the smp_wmb()
on the other side.

I realize you're only moving this code, but it would be good to fix
that up while you're there.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/7] mm: reparent slab memory on cgroup removal
  2019-05-28 19:58     ` Roman Gushchin
  2019-05-28 20:11       ` Vladimir Davydov
@ 2019-05-28 22:16       ` Johannes Weiner
  1 sibling, 0 replies; 21+ messages in thread
From: Johannes Weiner @ 2019-05-28 22:16 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Vladimir Davydov, Andrew Morton, linux-mm, linux-kernel,
	Kernel Team, Michal Hocko, Rik van Riel, Shakeel Butt,
	Christoph Lameter, cgroups, Waiman Long

On Tue, May 28, 2019 at 07:58:17PM +0000, Roman Gushchin wrote:
> On Tue, May 28, 2019 at 09:33:02PM +0300, Vladimir Davydov wrote:
> > On Tue, May 21, 2019 at 01:07:34PM -0700, Roman Gushchin wrote:
> > > Let's reparent memcg slab memory on memcg offlining. This allows us
> > > to release the memory cgroup without waiting for the last outstanding
> > > kernel object (e.g. dentry used by another application).
> > > 
> > > So instead of reparenting all accounted slab pages, let's do reparent
> > > a relatively small amount of kmem_caches. Reparenting is performed as
> > > a part of the deactivation process.
> > > 
> > > Since the parent cgroup is already charged, everything we need to do
> > > is to splice the list of kmem_caches to the parent's kmem_caches list,
> > > swap the memcg pointer and drop the css refcounter for each kmem_cache
> > > and adjust the parent's css refcounter. Quite simple.
> > > 
> > > Please, note that kmem_cache->memcg_params.memcg isn't a stable
> > > pointer anymore. It's safe to read it under rcu_read_lock() or
> > > with slab_mutex held.
> > > 
> > > We can race with the slab allocation and deallocation paths. It's not
> > > a big problem: parent's charge and slab global stats are always
> > > correct, and we don't care anymore about the child usage and global
> > > stats. The child cgroup is already offline, so we don't use or show it
> > > anywhere.
> > > 
> > > Local slab stats (NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE)
> > > aren't used anywhere except count_shadow_nodes(). But even there it
> > > won't break anything: after reparenting "nodes" will be 0 on child
> > > level (because we're already reparenting shrinker lists), and on
> > > parent level page stats always were 0, and this patch won't change
> > > anything.
> > > 
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > 
> > This one looks good to me. I can't see why anything could possibly go
> > wrong after this change.
> 
> Hi Vladimir!
> 
> Thank you for looking into the series. Really appreciate it!
> 
> It looks like outstanding questions are:
> 1) synchronization around the dying flag
> 2) removing CONFIG_SLOB in 2/7
> 3) early sysfs_slab_remove()
> 4) mem_cgroup_from_kmem in 7/7
> 
> Please, let me know if I missed anything.
> 
> I'm waiting now for Johanness's review, so I'll address these issues
> in background and post the next (and hopefully) final version.

The todo items here aside, the series looks good to me - although I'm
glad that Vladimir gave it a much more informed review than I could.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management
  2019-05-28 22:03   ` Johannes Weiner
@ 2019-05-28 22:28     ` Roman Gushchin
  0 siblings, 0 replies; 21+ messages in thread
From: Roman Gushchin @ 2019-05-28 22:28 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, linux-kernel, Kernel Team, Michal Hocko,
	Rik van Riel, Shakeel Butt, Christoph Lameter, Vladimir Davydov,
	cgroups, Waiman Long

On Tue, May 28, 2019 at 06:03:53PM -0400, Johannes Weiner wrote:
> On Tue, May 21, 2019 at 01:07:33PM -0700, Roman Gushchin wrote:
> > +	arr = rcu_dereference(cachep->memcg_params.memcg_caches);
> > +
> > +	/*
> > +	 * Make sure we will access the up-to-date value. The code updating
> > +	 * memcg_caches issues a write barrier to match this (see
> > +	 * memcg_create_kmem_cache()).
> > +	 */
> > +	memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]);
> 
> READ_ONCE() isn't an SMP barrier, it just prevents compiler
> muckery. This needs an explicit smp_rmb() to pair with the smp_wmb()
> on the other side.

I believe rcu_dereference()/rcu_assign_pointer()/... are better replacements.

> 
> I realize you're only moving this code, but it would be good to fix
> that up while you're there.

Right. I'll try to fix it with new-ish rcu API in a separate patch
preceding this one.

Thank you for looking into the series!

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-05-28 22:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190521200735.2603003-1-guro@fb.com>
2019-05-22 21:43 ` [PATCH v5 0/7] mm: reparent slab memory on cgroup removal Roman Gushchin
2019-05-22 21:59   ` Andrew Morton
2019-05-22 22:23     ` Roman Gushchin
2019-05-28  7:01       ` Michal Hocko
     [not found] ` <20190521200735.2603003-6-guro@fb.com>
2019-05-28 17:08   ` [PATCH v5 5/7] mm: rework non-root kmem_cache lifecycle management Vladimir Davydov
     [not found]     ` <96b8a923-49e4-f13e-b1e3-3df4598d849e@redhat.com>
2019-05-28 17:39       ` Vladimir Davydov
2019-05-28 17:41         ` Waiman Long
2019-05-28 18:00     ` Vladimir Davydov
2019-05-28 22:03   ` Johannes Weiner
2019-05-28 22:28     ` Roman Gushchin
     [not found] ` <20190521200735.2603003-3-guro@fb.com>
2019-05-28 17:11   ` [PATCH v5 2/7] mm: generalize postponed non-root kmem_cache deactivation Vladimir Davydov
     [not found] ` <20190521200735.2603003-5-guro@fb.com>
2019-05-28 17:12   ` [PATCH v5 4/7] mm: unify SLAB and SLUB page accounting Vladimir Davydov
2019-05-28 22:00   ` Johannes Weiner
     [not found] ` <20190521200735.2603003-2-guro@fb.com>
2019-05-28 17:14   ` [PATCH v5 1/7] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache() Vladimir Davydov
2019-05-28 21:56   ` Johannes Weiner
     [not found] ` <20190521200735.2603003-8-guro@fb.com>
2019-05-28 17:38   ` [PATCH v5 7/7] mm: fix /proc/kpagecgroup interface for slab pages Vladimir Davydov
     [not found] ` <20190521200735.2603003-7-guro@fb.com>
2019-05-28 18:33   ` [PATCH v5 6/7] mm: reparent slab memory on cgroup removal Vladimir Davydov
2019-05-28 19:58     ` Roman Gushchin
2019-05-28 20:11       ` Vladimir Davydov
2019-05-28 21:52         ` Roman Gushchin
2019-05-28 22:16       ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).