All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	<kernel-team@fb.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 06/19] mm: memcg/slab: obj_cgroup API
Date: Thu, 7 May 2020 15:26:31 -0700	[thread overview]
Message-ID: <20200507222631.GA81857@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20200507210314.GD161043@cmpxchg.org>

On Thu, May 07, 2020 at 05:03:14PM -0400, Johannes Weiner wrote:
> On Wed, Apr 22, 2020 at 01:46:55PM -0700, Roman Gushchin wrote:
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -257,6 +257,78 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
> >  }
> >  
> >  #ifdef CONFIG_MEMCG_KMEM
> > +extern spinlock_t css_set_lock;
> > +
> > +static void obj_cgroup_release(struct percpu_ref *ref)
> > +{
> > +	struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt);
> > +	struct mem_cgroup *memcg;
> > +	unsigned int nr_bytes;
> > +	unsigned int nr_pages;
> > +	unsigned long flags;
> > +
> > +	nr_bytes = atomic_read(&objcg->nr_charged_bytes);
> > +	WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
> > +	nr_pages = nr_bytes >> PAGE_SHIFT;
> 
> What guarantees that we don't have a partial page in there at this
> point? I guess any outstanding allocations would pin the objcg, so
> when it's released all objects have been freed.

Right, this is exactly the reason why there can't be a partial page
at this point.

> 
> But if that's true, how can we have full pages remaining in there now?

Imagine the following sequence:
1) CPU0: objcg == stock->cached_objcg
2) CPU1: we do a small allocation (e.g. 92 bytes), page is charged
3) CPU1: a process from another memcg is allocating something, stock if flushed,
   objcg->nr_charged_bytes = PAGE_SIZE - 92
5) CPU0: we do release this object, 92 bytes are added to stock->nr_bytes
6) CPU0: stock is flushed, 92 bytes are added to objcg->nr_charged_bytes

In the result, nr_charged_bytes == PAGE_SIZE. This PAGE will be uncharged
in obj_cgroup_release().

I've double checked this, it's actually pretty easy to trigger in the real life.

> 
> > @@ -2723,6 +2820,30 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
> >  	return page->mem_cgroup;
> >  }
> >  
> > +__always_inline struct obj_cgroup *get_obj_cgroup_from_current(void)
> > +{
> > +	struct obj_cgroup *objcg = NULL;
> > +	struct mem_cgroup *memcg;
> > +
> > +	if (unlikely(!current->mm))
> > +		return NULL;
> > +
> > +	rcu_read_lock();
> > +	if (unlikely(current->active_memcg))
> > +		memcg = rcu_dereference(current->active_memcg);
> > +	else
> > +		memcg = mem_cgroup_from_task(current);
> > +
> > +	for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg)) {
> > +		objcg = rcu_dereference(memcg->objcg);
> > +		if (objcg && obj_cgroup_tryget(objcg))
> > +			break;
> > +	}
> > +	rcu_read_unlock();
> > +
> > +	return objcg;
> > +}
> 
> Thanks for moving this here from one of the later patches, it helps
> understanding the life cycle of obj_cgroup better.
> 
> > +int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
> > +{
> > +	struct mem_cgroup *memcg;
> > +	unsigned int nr_pages, nr_bytes;
> > +	int ret;
> > +
> > +	if (consume_obj_stock(objcg, size))
> > +		return 0;
> > +
> > +	rcu_read_lock();
> > +	memcg = obj_cgroup_memcg(objcg);
> > +	css_get(&memcg->css);
> > +	rcu_read_unlock();
> > +
> > +	nr_pages = size >> PAGE_SHIFT;
> > +	nr_bytes = size & (PAGE_SIZE - 1);
> > +
> > +	if (nr_bytes)
> > +		nr_pages += 1;
> > +
> > +	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
> 
> If consume_obj_stock() fails because some other memcg is cached,
> should this try to consume the partial page in objcg->nr_charged_bytes
> before getting more pages?

We can definitely do it, but I'm not sure if it's good for the performance.

Dealing with nr_charged_bytes will require up to two atomic writes,
so calling __memcg_kmem_charge() can be faster if memcg is cached
on percpu stock.

> 
> > +	if (!ret && nr_bytes)
> > +		refill_obj_stock(objcg, PAGE_SIZE - nr_bytes);
> 
> This will put the cgroup into the cache if the allocation resulted in
> a partially free page.
> 
> But if this was a page allocation, we may have objcg->nr_cache_bytes
> from a previous subpage allocation that we should probably put back
> into the stock.

Yeah, we can do this, but I don't know if there will be any benefits.

Actually we don't wanna to touch objcg->nr_cache_bytes too often, as
it can become a contention point if there are many threads allocating
in the memory cgroup.

So maybe we want to do the opposite: relax it a bit and stop flushing
it on every stock refill and flush only if it exceeds a certain value.

> 
> It's not a common case, I'm just trying to understand what
> objcg->nr_cache_bytes holds and when it does so.

So it's actually a centralized leftover from the rounding of the actual
charge to the page size.

> 
> The rest of this patch looks good to me!

Great!

Thank you very much for the review!

  reply	other threads:[~2020-05-07 22:26 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22 20:46 [PATCH v3 00/19] The new cgroup slab memory controller Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 01/19] mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state() Roman Gushchin
2020-05-07 20:33   ` Johannes Weiner
2020-05-20 10:49   ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 02/19] mm: memcg: prepare for byte-sized vmstat items Roman Gushchin
2020-05-07 20:34   ` Johannes Weiner
2020-05-20 11:31   ` Vlastimil Babka
2020-05-20 11:36     ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 03/19] mm: memcg: convert vmstat slab counters to bytes Roman Gushchin
2020-05-07 20:41   ` Johannes Weiner
2020-05-20 12:25   ` Vlastimil Babka
2020-05-20 19:26     ` Roman Gushchin
2020-05-21  9:57       ` Vlastimil Babka
2020-05-21 21:14         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 04/19] mm: slub: implement SLUB version of obj_to_index() Roman Gushchin
2020-04-22 23:52   ` Christopher Lameter
2020-04-22 23:52     ` Christopher Lameter
2020-04-23  0:05     ` Roman Gushchin
2020-04-25  2:10       ` Christopher Lameter
2020-04-25  2:10         ` Christopher Lameter
2020-04-25  2:46         ` Roman Gushchin
2020-04-27 16:21           ` Christopher Lameter
2020-04-27 16:21             ` Christopher Lameter
2020-04-27 16:46             ` Roman Gushchin
2020-04-28 17:06               ` Roman Gushchin
2020-04-28 17:45               ` Johannes Weiner
2020-04-30 16:29               ` Christopher Lameter
2020-04-30 16:29                 ` Christopher Lameter
2020-04-30 17:15                 ` Roman Gushchin
2020-05-02 23:54                   ` Christopher Lameter
2020-05-02 23:54                     ` Christopher Lameter
2020-05-04 18:29                     ` Roman Gushchin
2020-05-08 21:35                       ` Christopher Lameter
2020-05-08 21:35                         ` Christopher Lameter
2020-05-13  0:57                         ` Roman Gushchin
2020-05-15 21:45                           ` Christopher Lameter
2020-05-15 21:45                             ` Christopher Lameter
2020-05-15 22:12                             ` Roman Gushchin
2020-05-20  9:51                           ` Vlastimil Babka
2020-05-20 20:57                             ` Roman Gushchin
2020-05-15 20:02                         ` Roman Gushchin
2020-04-23 21:01     ` Roman Gushchin
2020-04-25  2:10       ` Christopher Lameter
2020-04-25  2:10         ` Christopher Lameter
2020-05-20 13:51   ` Vlastimil Babka
2020-05-20 21:00     ` Roman Gushchin
2020-05-21 11:01       ` Vlastimil Babka
2020-05-21 21:06         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 05/19] mm: memcontrol: decouple reference counting from page accounting Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 06/19] mm: memcg/slab: obj_cgroup API Roman Gushchin
2020-05-07 21:03   ` Johannes Weiner
2020-05-07 22:26     ` Roman Gushchin [this message]
2020-05-12 22:56       ` Johannes Weiner
2020-05-15 22:01         ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 07/19] mm: memcg/slab: allocate obj_cgroups for non-root slab pages Roman Gushchin
2020-04-23 20:20   ` Roman Gushchin
2020-05-22 18:27   ` Vlastimil Babka
2020-05-23  1:32     ` Roman Gushchin
2020-05-26 17:50     ` Roman Gushchin
2020-05-25 14:46   ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 08/19] mm: memcg/slab: save obj_cgroup for non-root slab objects Roman Gushchin
2020-05-25 15:07   ` Vlastimil Babka
2020-05-26 17:53     ` Roman Gushchin
2020-05-27 11:03       ` Vlastimil Babka
2020-04-22 20:46 ` [PATCH v3 09/19] mm: memcg/slab: charge individual slab objects instead of pages Roman Gushchin
2020-05-25 16:10   ` Vlastimil Babka
2020-05-26 18:04     ` Roman Gushchin
2020-04-22 20:46 ` [PATCH v3 10/19] mm: memcg/slab: deprecate memory.kmem.slabinfo Roman Gushchin
2020-05-07 21:05   ` Johannes Weiner
2020-04-22 20:47 ` [PATCH v3 11/19] mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h Roman Gushchin
2020-05-25 17:03   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 12/19] mm: memcg/slab: use a single set of kmem_caches for all accounted allocations Roman Gushchin
2020-05-26 10:12   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 13/19] mm: memcg/slab: simplify memcg cache creation Roman Gushchin
2020-05-26 10:31   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 14/19] mm: memcg/slab: deprecate memcg_kmem_get_cache() Roman Gushchin
2020-05-26 10:34   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 15/19] mm: memcg/slab: deprecate slab_root_caches Roman Gushchin
2020-05-26 10:52   ` Vlastimil Babka
2020-05-26 18:50     ` Roman Gushchin
2020-04-22 20:47 ` [PATCH v3 16/19] mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() Roman Gushchin
2020-05-26 11:31   ` Vlastimil Babka
2020-04-22 20:47 ` [PATCH v3 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations Roman Gushchin
2020-05-26 14:55   ` Vlastimil Babka
2020-05-27  8:35     ` Jesper Dangaard Brouer
2020-04-22 20:47 ` [PATCH v3 18/19] kselftests: cgroup: add kernel memory accounting tests Roman Gushchin
2020-05-26 15:24   ` Vlastimil Babka
2020-05-26 15:45     ` Roman Gushchin
2020-05-27 17:00       ` Vlastimil Babka
2020-05-27 20:45         ` Roman Gushchin
2020-04-22 20:47 ` [PATCH v3 19/19] tools/cgroup: add memcg_slabinfo.py tool Roman Gushchin
2020-05-05 15:59   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200507222631.GA81857@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.