linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	<kernel-team@fb.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 06/19] mm: memcg/slab: obj_cgroup API
Date: Thu, 7 May 2020 15:26:31 -0700	[thread overview]
Message-ID: <20200507222631.GA81857@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20200507210314.GD161043@cmpxchg.org>

On Thu, May 07, 2020 at 05:03:14PM -0400, Johannes Weiner wrote:
> On Wed, Apr 22, 2020 at 01:46:55PM -0700, Roman Gushchin wrote:
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -257,6 +257,78 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
> >  }
> >  
> >  #ifdef CONFIG_MEMCG_KMEM
> > +extern spinlock_t css_set_lock;
> > +
> > +static void obj_cgroup_release(struct percpu_ref *ref)
> > +{
> > +	struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt);
> > +	struct mem_cgroup *memcg;
> > +	unsigned int nr_bytes;
> > +	unsigned int nr_pages;
> > +	unsigned long flags;
> > +
> > +	nr_bytes = atomic_read(&objcg->nr_charged_bytes);
> > +	WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
> > +	nr_pages = nr_bytes >> PAGE_SHIFT;
> 
> What guarantees that we don't have a partial page in there at this
> point? I guess any outstanding allocations would pin the objcg, so
> when it's released all objects have been freed.

Right, this is exactly the reason why there can't be a partial page
at this point.

> 
> But if that's true, how can we have full pages remaining in there now?

Imagine the following sequence:
1) CPU0: objcg == stock->cached_objcg
2) CPU1: we do a small allocation (e.g. 92 bytes), page is charged
3) CPU1: a process from another memcg is allocating something, stock if flushed,
   objcg->nr_charged_bytes = PAGE_SIZE - 92
5) CPU0: we do release this object, 92 bytes are added to stock->nr_bytes
6) CPU0: stock is flushed, 92 bytes are added to objcg->nr_charged_bytes

In the result, nr_charged_bytes == PAGE_SIZE. This PAGE will be uncharged
in obj_cgroup_release().

I've double checked this, it's actually pretty easy to trigger in the real life.

> 
> > @@ -2723,6 +2820,30 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
> >  	return page->mem_cgroup;
> >  }
> >  
> > +__always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
> > +{
> > +	struct obj_cgroup *objcg = NULL;
> > +	struct mem_cgroup *memcg;
> > +
> > +	if (unlikely(!current->mm))
> > +		return NULL;
> > +
> > +	rcu_read_lock();
> > +	if (unlikely(current->active_memcg))
> > +		memcg = rcu_dereference(current->active_memcg);
> > +	else
> > +		memcg = mem_cgroup_from_task(current);
> > +
> > +	for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) {
> > +		objcg = rcu_dereference(memcg->objcg);
> > +		if (objcg && obj_cgroup_tryget(objcg))
> > +			break;
> > +	}
> > +	rcu_read_unlock();
> > +
> > +	return objcg;
> > +}
> 
> Thanks for moving this here from one of the later patches, it helps
> understanding the life cycle of obj_cgroup better.
> 
> > +int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
> > +{
> > +	struct mem_cgroup *memcg;
> > +	unsigned int nr_pages, nr_bytes;
> > +	int ret;
> > +
> > +	if (consume_obj_stock(objcg, size))
> > +		return 0;
> > +
> > +	rcu_read_lock();
> > +	memcg = obj_cgroup_memcg(objcg);
> > +	css_get(&memcg->css);
> > +	rcu_read_unlock();
> > +
> > +	nr_pages = size >> PAGE_SHIFT;
> > +	nr_bytes = size & (PAGE_SIZE - 1);
> > +
> > +	if (nr_bytes)
> > +		nr_pages += 1;
> > +
> > +	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
> 
> If consume_obj_stock() fails because some other memcg is cached,
> should this try to consume the partial page in objcg->nr_charged_bytes
> before getting more pages?

We can definitely do it, but I'm not sure if it's good for the performance.

Dealing with nr_charged_bytes will require up to two atomic writes,
so calling __memcg_kmem_charge() can be faster if memcg is cached
on percpu stock.

> 
> > +	if (!ret && nr_bytes)
> > +		refill_obj_stock(objcg, PAGE_SIZE - nr_bytes);
> 
> This will put the cgroup into the cache if the allocation resulted in
> a partially free page.
> 
> But if this was a page allocation, we may have objcg->nr_cache_bytes
> from a previous subpage allocation that we should probably put back
> into the stock.

Yeah, we can do this, but I don't know if there will be any benefits.

Actually we don't wanna to touch objcg->nr_cache_bytes too often, as
it can become a contention point if there are many threads allocating
in the memory cgroup.

So maybe we want to do the opposite: relax it a bit and stop flushing
it on every stock refill and flush only if it exceeds a certain value.

> 
> It's not a common case, I'm just trying to understand what
> objcg->nr_cache_bytes holds and when it does so.

So it's actually a centralized leftover from the rounding of the actual
charge to the page size.

> 
> The rest of this patch looks good to me!

Great!

Thank you very much for the review!

  reply	other threads:[~2020-05-07 22:26 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22 20:46 [PATCH v3 00/19] The new cgroup slab memory controller Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 01/19] mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state() Roman Gushchin
2020-05-07 20:33   ` Johannes Weiner
2020-05-20 10:49   ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 02/19] mm: memcg: prepare for byte-sized vmstat items Roman Gushchin
2020-05-07 20:34   ` Johannes Weiner
2020-05-20 11:31   ` Vlastimil Babka
2020-05-20 11:36     ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 03/19] mm: memcg: convert vmstat slab counters to bytes Roman Gushchin
2020-05-07 20:41   ` Johannes Weiner
2020-05-20 12:25   ` Vlastimil Babka
2020-05-20 19:26     ` Roman Gushchin
2020-05-21  9:57       ` Vlastimil Babka
2020-05-21 21:14         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 04/19] mm: slub: implement SLUB version of obj_to_index() Roman Gushchin
2020-04-22 23:52   ` Christopher Lameter
2020-04-23  0:05     ` Roman Gushchin
2020-04-25  2:10       ` Christopher Lameter
2020-04-25  2:46         ` Roman Gushchin
2020-04-27 16:21           ` Christopher Lameter
2020-04-27 16:46             ` Roman Gushchin
2020-04-28 17:06               ` Roman Gushchin
2020-04-28 17:45               ` Johannes Weiner
2020-04-30 16:29               ` Christopher Lameter
2020-04-30 17:15                 ` Roman Gushchin
2020-05-02 23:54                   ` Christopher Lameter
2020-05-04 18:29                     ` Roman Gushchin
2020-05-08 21:35                       ` Christopher Lameter
2020-05-13  0:57                         ` Roman Gushchin
2020-05-15 21:45                           ` Christopher Lameter
2020-05-15 22:12                             ` Roman Gushchin
2020-05-20  9:51                           ` Vlastimil Babka
2020-05-20 20:57                             ` Roman Gushchin
2020-05-15 20:02                         ` Roman Gushchin
2020-04-23 21:01     ` Roman Gushchin
2020-04-25  2:10       ` Christopher Lameter
2020-05-20 13:51   ` Vlastimil Babka
2020-05-20 21:00     ` Roman Gushchin
2020-05-21 11:01       ` Vlastimil Babka
2020-05-21 21:06         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 05/19] mm: memcontrol: decouple reference counting from page accounting Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 06/19] mm: memcg/slab: obj_cgroup API Roman Gushchin
2020-05-07 21:03   ` Johannes Weiner
2020-05-07 22:26     ` Roman Gushchin [this message]
2020-05-12 22:56       ` Johannes Weiner
2020-05-15 22:01         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 07/19] mm: memcg/slab: allocate obj_cgroups for non-root slab pages Roman Gushchin
2020-04-23 20:20   ` Roman Gushchin
2020-05-22 18:27   ` Vlastimil Babka
2020-05-23  1:32     ` Roman Gushchin
2020-05-26 17:50     ` Roman Gushchin
2020-05-25 14:46   ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 08/19] mm: memcg/slab: save obj_cgroup for non-root slab objects Roman Gushchin
2020-05-25 15:07   ` Vlastimil Babka
2020-05-26 17:53     ` Roman Gushchin
2020-05-27 11:03       ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 09/19] mm: memcg/slab: charge individual slab objects instead of pages Roman Gushchin
2020-05-25 16:10   ` Vlastimil Babka
2020-05-26 18:04     ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 10/19] mm: memcg/slab: deprecate memory.kmem.slabinfo Roman Gushchin
2020-05-07 21:05   ` Johannes Weiner
2020-04-22 20:47 ` [PATCH v3 11/19] mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h Roman Gushchin
2020-05-25 17:03   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 12/19] mm: memcg/slab: use a single set of kmem_caches for all accounted allocations Roman Gushchin
2020-05-26 10:12   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 13/19] mm: memcg/slab: simplify memcg cache creation Roman Gushchin
2020-05-26 10:31   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 14/19] mm: memcg/slab: deprecate memcg_kmem_get_cache() Roman Gushchin
2020-05-26 10:34   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 15/19] mm: memcg/slab: deprecate slab_root_caches Roman Gushchin
2020-05-26 10:52   ` Vlastimil Babka
2020-05-26 18:50     ` Roman Gushchin
2020-04-22 20:47 ` [PATCH v3 16/19] mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() Roman Gushchin
2020-05-26 11:31   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations Roman Gushchin
2020-05-26 14:55   ` Vlastimil Babka
2020-05-27  8:35     ` Jesper Dangaard Brouer
2020-04-22 20:47 ` [PATCH v3 18/19] kselftests: cgroup: add kernel memory accounting tests Roman Gushchin
2020-05-26 15:24   ` Vlastimil Babka
2020-05-26 15:45     ` Roman Gushchin
2020-05-27 17:00       ` Vlastimil Babka
2020-05-27 20:45         ` Roman Gushchin
2020-04-22 20:47 ` [PATCH v3 19/19] tools/cgroup: add memcg_slabinfo.py tool Roman Gushchin
2020-05-05 15:59   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200507222631.GA81857@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).