All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <llong@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>, Roman Gushchin <guro@fb.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Alex Shi <alex.shi@linux.alibaba.com>,
	Chris Down <chris@chrisdown.name>,
	Yafang Shao <laoar.shao@gmail.com>,
	Wei Yang <richard.weiyang@gmail.com>,
	Masayoshi Mizuma <msys.mizuma@gmail.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v4 4/5] mm/memcg: Save both reclaimable & unreclaimable bytes in object stock
Date: Tue, 20 Apr 2021 15:09:01 -0400	[thread overview]
Message-ID: <73992e36-376f-b7d3-dde5-d287c7696e72@redhat.com> (raw)
In-Reply-To: <YH216/wnyEOcxATl@cmpxchg.org>

On 4/19/21 12:55 PM, Johannes Weiner wrote:
> On Sun, Apr 18, 2021 at 08:00:31PM -0400, Waiman Long wrote:
>> Currently, the object stock structure caches either reclaimable vmstat
>> bytes or unreclaimable vmstat bytes in its object stock structure. The
>> hit rate can be improved if both types of vmstat data can be cached
>> especially for single-node system.
>>
>> This patch supports the cacheing of both type of vmstat data, though
>> at the expense of a slightly increased complexity in the caching code.
>> For large object (>= PAGE_SIZE), vmstat array is done directly without
>> going through the stock caching step.
>>
>> On a 2-socket Cascade Lake server with instrumentation enabled, the
>> miss rates are shown in the table below.
>>
>>    Initial bootup:
>>
>>    Kernel       __mod_objcg_state    mod_objcg_state    %age
>>    ------       -----------------    ---------------    ----
>>    Before patch      634400              3243830        19.6%
>>    After patch       419810              3182424        13.2%
>>
>>    Parallel kernel build:
>>
>>    Kernel       __mod_objcg_state    mod_objcg_state    %age
>>    ------       -----------------    ---------------    ----
>>    Before patch      24329265           142512465       17.1%
>>    After patch       24051721           142445825       16.9%
>>
>> There was a decrease of miss rate after initial system bootup. However,
>> the miss rate for parallel kernel build remained about the same probably
>> because most of the touched kmemcache objects were reclaimable inodes
>> and dentries.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   mm/memcontrol.c | 79 +++++++++++++++++++++++++++++++------------------
>>   1 file changed, 51 insertions(+), 28 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index c13502eab282..a6dd18f6d8a8 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2212,8 +2212,8 @@ struct obj_stock {
>>   	struct obj_cgroup *cached_objcg;
>>   	struct pglist_data *cached_pgdat;
>>   	unsigned int nr_bytes;
>> -	int vmstat_idx;
>> -	int vmstat_bytes;
>> +	int reclaimable_bytes;		/* NR_SLAB_RECLAIMABLE_B */
>> +	int unreclaimable_bytes;	/* NR_SLAB_UNRECLAIMABLE_B */
> How about
>
> 	int nr_slab_reclaimable_b;
> 	int nr_slab_unreclaimable_b;
>
> so you don't need the comments?

Sure, will make the change.


>>   #else
>>   	int dummy[0];
>>   #endif
>> @@ -3217,40 +3217,56 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>>   		     enum node_stat_item idx, int nr)
>>   {
>>   	unsigned long flags;
>> -	struct obj_stock *stock = get_obj_stock(&flags);
>> +	struct obj_stock *stock;
>> +	int *bytes, *alt_bytes, alt_idx;
>> +
>> +	/*
>> +	 * Directly update vmstat array for big object.
>> +	 */
>> +	if (unlikely(abs(nr) >= PAGE_SIZE))
>> +		goto update_vmstat;
> This looks like an optimization independent of the vmstat item split?
It may not be that helpful. I am going to take it out in the next version.
>
>> +	stock = get_obj_stock(&flags);
>> +	if (idx == NR_SLAB_RECLAIMABLE_B) {
>> +		bytes = &stock->reclaimable_bytes;
>> +		alt_bytes = &stock->unreclaimable_bytes;
>> +		alt_idx = NR_SLAB_UNRECLAIMABLE_B;
>> +	} else {
>> +		bytes = &stock->unreclaimable_bytes;
>> +		alt_bytes = &stock->reclaimable_bytes;
>> +		alt_idx = NR_SLAB_RECLAIMABLE_B;
>> +	}
>>   
>>   	/*
>> -	 * Save vmstat data in stock and skip vmstat array update unless
>> -	 * accumulating over a page of vmstat data or when pgdat or idx
>> +	 * Try to save vmstat data in stock and skip vmstat array update
>> +	 * unless accumulating over a page of vmstat data or when pgdat
>>   	 * changes.
>>   	 */
>>   	if (stock->cached_objcg != objcg) {
>>   		/* Output the current data as is */
>> -	} else if (!stock->vmstat_bytes) {
>> -		/* Save the current data */
>> -		stock->vmstat_bytes = nr;
>> -		stock->vmstat_idx = idx;
>> -		stock->cached_pgdat = pgdat;
>> -		nr = 0;
>> -	} else if ((stock->cached_pgdat != pgdat) ||
>> -		   (stock->vmstat_idx != idx)) {
>> -		/* Output the cached data & save the current data */
>> -		swap(nr, stock->vmstat_bytes);
>> -		swap(idx, stock->vmstat_idx);
>> +	} else if (stock->cached_pgdat != pgdat) {
>> +		/* Save the current data and output cached data, if any */
>> +		swap(nr, *bytes);
>>   		swap(pgdat, stock->cached_pgdat);
>> +		if (*alt_bytes) {
>> +			__mod_objcg_state(objcg, pgdat, alt_idx,
>> +					  *alt_bytes);
>> +			*alt_bytes = 0;
>> +		}
> As per the other email, I really don't think optimizing the pgdat
> switch (in a percpu cache) is worth this level of complexity.

I am going to simplify it in the next version.


>
>>   	} else {
>> -		stock->vmstat_bytes += nr;
>> -		if (abs(stock->vmstat_bytes) > PAGE_SIZE) {
>> -			nr = stock->vmstat_bytes;
>> -			stock->vmstat_bytes = 0;
>> +		*bytes += nr;
>> +		if (abs(*bytes) > PAGE_SIZE) {
>> +			nr = *bytes;
>> +			*bytes = 0;
>>   		} else {
>>   			nr = 0;
>>   		}
>>   	}
>> -	if (nr)
>> -		__mod_objcg_state(objcg, pgdat, idx, nr);
>> -
>>   	put_obj_stock(flags);
>> +	if (!nr)
>> +		return;
>> +update_vmstat:
>> +	__mod_objcg_state(objcg, pgdat, idx, nr);
>>   }
>>   
>>   static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
>> @@ -3303,12 +3319,19 @@ static void drain_obj_stock(struct obj_stock *stock)
>>   	/*
>>   	 * Flush the vmstat data in current stock
>>   	 */
>> -	if (stock->vmstat_bytes) {
>> -		__mod_objcg_state(old, stock->cached_pgdat, stock->vmstat_idx,
>> -				  stock->vmstat_bytes);
>> +	if (stock->reclaimable_bytes || stock->unreclaimable_bytes) {
>> +		int bytes;
>> +
>> +		if ((bytes = stock->reclaimable_bytes))
>> +			__mod_objcg_state(old, stock->cached_pgdat,
>> +					  NR_SLAB_RECLAIMABLE_B, bytes);
>> +		if ((bytes = stock->unreclaimable_bytes))
>> +			__mod_objcg_state(old, stock->cached_pgdat,
>> +					  NR_SLAB_UNRECLAIMABLE_B, bytes);
> The int bytes indirection isn't necessary. It's easier to read even
> with the extra lines required to repeat the long stock member names,
> because that is quite a common pattern (if (stuff) frob(stuff)).
OK, I will eliminate the bytes variable.
>
> __mod_objcg_state() also each time does rcu_read_lock() toggling and a
> memcg lookup that could be batched, which I think is further proof
> that it should just be inlined here.
>
I am also thinking that eliminate unnecessary 
rcu_read_lock/rcu_read_unlock may help performance a bit. However, that 
will be done in another patch after I have done more performance 
testing. I am  not going to bother with that in this series.

Cheers,
Longman



WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <llong@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>, Roman Gushchin <guro@fb.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Alex Shi <alex.shi@linux.alibaba.com>,
	Chris Down <chris@chrisdown.name>,
	Yafang Shao <laoar.shao@gmail.com>,
	Wei Yang <richard.weiyang@gmail.com>,
	Masayoshi Mizuma <msys.mizuma@gmail.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>, Matthew Wilcox <>
Subject: Re: [PATCH v4 4/5] mm/memcg: Save both reclaimable & unreclaimable bytes in object stock
Date: Tue, 20 Apr 2021 15:09:01 -0400	[thread overview]
Message-ID: <73992e36-376f-b7d3-dde5-d287c7696e72@redhat.com> (raw)
In-Reply-To: <YH216/wnyEOcxATl@cmpxchg.org>

On 4/19/21 12:55 PM, Johannes Weiner wrote:
> On Sun, Apr 18, 2021 at 08:00:31PM -0400, Waiman Long wrote:
>> Currently, the object stock structure caches either reclaimable vmstat
>> bytes or unreclaimable vmstat bytes in its object stock structure. The
>> hit rate can be improved if both types of vmstat data can be cached
>> especially for single-node system.
>>
>> This patch supports the cacheing of both type of vmstat data, though
>> at the expense of a slightly increased complexity in the caching code.
>> For large object (>= PAGE_SIZE), vmstat array is done directly without
>> going through the stock caching step.
>>
>> On a 2-socket Cascade Lake server with instrumentation enabled, the
>> miss rates are shown in the table below.
>>
>>    Initial bootup:
>>
>>    Kernel       __mod_objcg_state    mod_objcg_state    %age
>>    ------       -----------------    ---------------    ----
>>    Before patch      634400              3243830        19.6%
>>    After patch       419810              3182424        13.2%
>>
>>    Parallel kernel build:
>>
>>    Kernel       __mod_objcg_state    mod_objcg_state    %age
>>    ------       -----------------    ---------------    ----
>>    Before patch      24329265           142512465       17.1%
>>    After patch       24051721           142445825       16.9%
>>
>> There was a decrease of miss rate after initial system bootup. However,
>> the miss rate for parallel kernel build remained about the same probably
>> because most of the touched kmemcache objects were reclaimable inodes
>> and dentries.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   mm/memcontrol.c | 79 +++++++++++++++++++++++++++++++------------------
>>   1 file changed, 51 insertions(+), 28 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index c13502eab282..a6dd18f6d8a8 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2212,8 +2212,8 @@ struct obj_stock {
>>   	struct obj_cgroup *cached_objcg;
>>   	struct pglist_data *cached_pgdat;
>>   	unsigned int nr_bytes;
>> -	int vmstat_idx;
>> -	int vmstat_bytes;
>> +	int reclaimable_bytes;		/* NR_SLAB_RECLAIMABLE_B */
>> +	int unreclaimable_bytes;	/* NR_SLAB_UNRECLAIMABLE_B */
> How about
>
> 	int nr_slab_reclaimable_b;
> 	int nr_slab_unreclaimable_b;
>
> so you don't need the comments?

Sure, will make the change.


>>   #else
>>   	int dummy[0];
>>   #endif
>> @@ -3217,40 +3217,56 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>>   		     enum node_stat_item idx, int nr)
>>   {
>>   	unsigned long flags;
>> -	struct obj_stock *stock = get_obj_stock(&flags);
>> +	struct obj_stock *stock;
>> +	int *bytes, *alt_bytes, alt_idx;
>> +
>> +	/*
>> +	 * Directly update vmstat array for big object.
>> +	 */
>> +	if (unlikely(abs(nr) >= PAGE_SIZE))
>> +		goto update_vmstat;
> This looks like an optimization independent of the vmstat item split?
It may not be that helpful. I am going to take it out in the next version.
>
>> +	stock = get_obj_stock(&flags);
>> +	if (idx == NR_SLAB_RECLAIMABLE_B) {
>> +		bytes = &stock->reclaimable_bytes;
>> +		alt_bytes = &stock->unreclaimable_bytes;
>> +		alt_idx = NR_SLAB_UNRECLAIMABLE_B;
>> +	} else {
>> +		bytes = &stock->unreclaimable_bytes;
>> +		alt_bytes = &stock->reclaimable_bytes;
>> +		alt_idx = NR_SLAB_RECLAIMABLE_B;
>> +	}
>>   
>>   	/*
>> -	 * Save vmstat data in stock and skip vmstat array update unless
>> -	 * accumulating over a page of vmstat data or when pgdat or idx
>> +	 * Try to save vmstat data in stock and skip vmstat array update
>> +	 * unless accumulating over a page of vmstat data or when pgdat
>>   	 * changes.
>>   	 */
>>   	if (stock->cached_objcg != objcg) {
>>   		/* Output the current data as is */
>> -	} else if (!stock->vmstat_bytes) {
>> -		/* Save the current data */
>> -		stock->vmstat_bytes = nr;
>> -		stock->vmstat_idx = idx;
>> -		stock->cached_pgdat = pgdat;
>> -		nr = 0;
>> -	} else if ((stock->cached_pgdat != pgdat) ||
>> -		   (stock->vmstat_idx != idx)) {
>> -		/* Output the cached data & save the current data */
>> -		swap(nr, stock->vmstat_bytes);
>> -		swap(idx, stock->vmstat_idx);
>> +	} else if (stock->cached_pgdat != pgdat) {
>> +		/* Save the current data and output cached data, if any */
>> +		swap(nr, *bytes);
>>   		swap(pgdat, stock->cached_pgdat);
>> +		if (*alt_bytes) {
>> +			__mod_objcg_state(objcg, pgdat, alt_idx,
>> +					  *alt_bytes);
>> +			*alt_bytes = 0;
>> +		}
> As per the other email, I really don't think optimizing the pgdat
> switch (in a percpu cache) is worth this level of complexity.

I am going to simplify it in the next version.


>
>>   	} else {
>> -		stock->vmstat_bytes += nr;
>> -		if (abs(stock->vmstat_bytes) > PAGE_SIZE) {
>> -			nr = stock->vmstat_bytes;
>> -			stock->vmstat_bytes = 0;
>> +		*bytes += nr;
>> +		if (abs(*bytes) > PAGE_SIZE) {
>> +			nr = *bytes;
>> +			*bytes = 0;
>>   		} else {
>>   			nr = 0;
>>   		}
>>   	}
>> -	if (nr)
>> -		__mod_objcg_state(objcg, pgdat, idx, nr);
>> -
>>   	put_obj_stock(flags);
>> +	if (!nr)
>> +		return;
>> +update_vmstat:
>> +	__mod_objcg_state(objcg, pgdat, idx, nr);
>>   }
>>   
>>   static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
>> @@ -3303,12 +3319,19 @@ static void drain_obj_stock(struct obj_stock *stock)
>>   	/*
>>   	 * Flush the vmstat data in current stock
>>   	 */
>> -	if (stock->vmstat_bytes) {
>> -		__mod_objcg_state(old, stock->cached_pgdat, stock->vmstat_idx,
>> -				  stock->vmstat_bytes);
>> +	if (stock->reclaimable_bytes || stock->unreclaimable_bytes) {
>> +		int bytes;
>> +
>> +		if ((bytes = stock->reclaimable_bytes))
>> +			__mod_objcg_state(old, stock->cached_pgdat,
>> +					  NR_SLAB_RECLAIMABLE_B, bytes);
>> +		if ((bytes = stock->unreclaimable_bytes))
>> +			__mod_objcg_state(old, stock->cached_pgdat,
>> +					  NR_SLAB_UNRECLAIMABLE_B, bytes);
> The int bytes indirection isn't necessary. It's easier to read even
> with the extra lines required to repeat the long stock member names,
> because that is quite a common pattern (if (stuff) frob(stuff)).
OK, I will eliminate the bytes variable.
>
> __mod_objcg_state() also each time does rcu_read_lock() toggling and a
> memcg lookup that could be batched, which I think is further proof
> that it should just be inlined here.
>
I am also thinking that eliminate unnecessary 
rcu_read_lock/rcu_read_unlock may help performance a bit. However, that 
will be done in another patch after I have done more performance 
testing. I am  not going to bother with that in this series.

Cheers,
Longman



  reply	other threads:[~2021-04-20 19:09 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19  0:00 [PATCH v4 0/5] mm/memcg: Reduce kmemcache memory accounting overhead Waiman Long
2021-04-19  0:00 ` Waiman Long
2021-04-19  0:00 ` [PATCH v4 1/5] mm/memcg: Move mod_objcg_state() to memcontrol.c Waiman Long
2021-04-19  0:00   ` Waiman Long
2021-04-19 15:14   ` Johannes Weiner
2021-04-19 15:14     ` Johannes Weiner
2021-04-19 15:21     ` Waiman Long
2021-04-19 15:21       ` Waiman Long
2021-04-19 16:18       ` Waiman Long
2021-04-19 16:18         ` Waiman Long
2021-04-19 17:13         ` Johannes Weiner
2021-04-19 17:13           ` Johannes Weiner
2021-04-19 17:19           ` Waiman Long
2021-04-19 17:19             ` Waiman Long
2021-04-19 17:26             ` Waiman Long
2021-04-19 17:26               ` Waiman Long
2021-04-19 21:11               ` Johannes Weiner
2021-04-19 21:11                 ` Johannes Weiner
2021-04-19 21:24                 ` Waiman Long
2021-04-19 21:24                   ` Waiman Long
2021-04-20  8:05                 ` Michal Hocko
2021-04-20  8:05                   ` Michal Hocko
2021-04-19 15:24   ` Shakeel Butt
2021-04-19 15:24     ` Shakeel Butt
2021-04-19 15:24     ` Shakeel Butt
2021-04-19  0:00 ` [PATCH v4 2/5] mm/memcg: Cache vmstat data in percpu memcg_stock_pcp Waiman Long
2021-04-19 16:38   ` Johannes Weiner
2021-04-19 16:38     ` Johannes Weiner
2021-04-19 23:42     ` Waiman Long
2021-04-19 23:42       ` Waiman Long
2021-04-19  0:00 ` [PATCH v4 3/5] mm/memcg: Optimize user context object stock access Waiman Long
2021-04-19  0:00 ` [PATCH v4 4/5] mm/memcg: Save both reclaimable & unreclaimable bytes in object stock Waiman Long
2021-04-19  0:00   ` Waiman Long
2021-04-19 16:55   ` Johannes Weiner
2021-04-19 16:55     ` Johannes Weiner
2021-04-20 19:09     ` Waiman Long [this message]
2021-04-20 19:09       ` Waiman Long
2021-04-19  0:00 ` [PATCH v4 5/5] mm/memcg: Improve refill_obj_stock() performance Waiman Long
2021-04-19  0:00   ` Waiman Long
2021-04-19  6:06   ` [External] " Muchun Song
2021-04-19  6:06     ` Muchun Song
2021-04-19  6:06     ` Muchun Song
2021-04-19 15:00     ` Shakeel Butt
2021-04-19 15:00       ` Shakeel Butt
2021-04-19 15:00       ` Shakeel Butt
2021-04-19 15:19       ` Waiman Long
2021-04-19 15:19         ` Waiman Long
2021-04-19 15:56     ` Waiman Long
2021-04-19 15:56       ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73992e36-376f-b7d3-dde5-d287c7696e72@redhat.com \
    --to=llong@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chris@chrisdown.name \
    --cc=cl@linux.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=msys.mizuma@gmail.com \
    --cc=penberg@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.