linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size
@ 2019-09-03  8:54 Honglei Wang
  2019-09-03  9:28 ` Michal Hocko
  0 siblings, 1 reply; 3+ messages in thread
From: Honglei Wang @ 2019-09-03  8:54 UTC (permalink / raw)
  To: linux-mm, vdavydov.dev, hannes, mhocko

lruvec_lru_size() is involving lruvec_page_state_local() to get the
lru_size in the current code. It's base on lruvec_stat_local.count[]
of mem_cgroup_per_node. This counter is updated in batch. It won't
do charge if the number of coming pages doesn't meet the needs of
MEMCG_CHARGE_BATCH who's defined as 32 now.

This causes small section of memory can't be handled as expected in
some scenario. For example, if we have only 32 pages madvise free
memory in memcgroup, these pages won't be freed as expected when it
meets memory pressure in this group.

Getting lru_size base on lru_zone_size of mem_cgroup_per_node which
is not updated in batch can make this a bit more accurate.

Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
---
 mm/vmscan.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c77d1e3761a7..c28672460868 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
  */
 unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
 {
-	unsigned long lru_size;
+	unsigned long lru_size = 0;
 	int zid;
 
-	if (!mem_cgroup_disabled())
-		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
-	else
+	if (!mem_cgroup_disabled()) {
+		for (zid = 0; zid < MAX_NR_ZONES; zid++)
+			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
+	} else
 		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
 
 	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
-- 
2.17.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size
  2019-09-03  8:54 [PATCH] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size Honglei Wang
@ 2019-09-03  9:28 ` Michal Hocko
  2019-09-04  7:38   ` Honglei Wang
  0 siblings, 1 reply; 3+ messages in thread
From: Michal Hocko @ 2019-09-03  9:28 UTC (permalink / raw)
  To: Honglei Wang; +Cc: linux-mm, vdavydov.dev, hannes

On Tue 03-09-19 16:54:16, Honglei Wang wrote:
> lruvec_lru_size() is involving lruvec_page_state_local() to get the
> lru_size in the current code. It's base on lruvec_stat_local.count[]
> of mem_cgroup_per_node. This counter is updated in batch. It won't
> do charge if the number of coming pages doesn't meet the needs of
> MEMCG_CHARGE_BATCH who's defined as 32 now.
> 
> This causes small section of memory can't be handled as expected in
> some scenario. For example, if we have only 32 pages madvise free
> memory in memcgroup, these pages won't be freed as expected when it
> meets memory pressure in this group.

Could you be more specific please?

> Getting lru_size base on lru_zone_size of mem_cgroup_per_node which
> is not updated in batch can make this a bit more accurate.

This is effectivelly reverting 1a61ab8038e72. There were no numbers
backing that commit, neither this one has. The only hot path I can see
is workingset_refault. All others seems to be in the reclaim path.
 
> Signed-off-by: Honglei Wang <honglei.wang@oracle.com>

That being said, I am not against this patch but the changelog should be
more specific about the particular problem and how serious it is.

> ---
>  mm/vmscan.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c77d1e3761a7..c28672460868 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>   */
>  unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
>  {
> -	unsigned long lru_size;
> +	unsigned long lru_size = 0;
>  	int zid;
>  
> -	if (!mem_cgroup_disabled())
> -		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
> -	else
> +	if (!mem_cgroup_disabled()) {
> +		for (zid = 0; zid < MAX_NR_ZONES; zid++)
> +			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
> +	} else
>  		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
>  
>  	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
> -- 
> 2.17.0

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size
  2019-09-03  9:28 ` Michal Hocko
@ 2019-09-04  7:38   ` Honglei Wang
  0 siblings, 0 replies; 3+ messages in thread
From: Honglei Wang @ 2019-09-04  7:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, vdavydov.dev, hannes



On 9/3/19 5:28 PM, Michal Hocko wrote:
> On Tue 03-09-19 16:54:16, Honglei Wang wrote:
>> lruvec_lru_size() is involving lruvec_page_state_local() to get the
>> lru_size in the current code. It's base on lruvec_stat_local.count[]
>> of mem_cgroup_per_node. This counter is updated in batch. It won't
>> do charge if the number of coming pages doesn't meet the needs of
>> MEMCG_CHARGE_BATCH who's defined as 32 now.
>>
>> This causes small section of memory can't be handled as expected in
>> some scenario. For example, if we have only 32 pages madvise free
>> memory in memcgroup, these pages won't be freed as expected when it
>> meets memory pressure in this group.
> 
> Could you be more specific please?

Okay, will add more detailed description at next version.

> 
>> Getting lru_size base on lru_zone_size of mem_cgroup_per_node which
>> is not updated in batch can make this a bit more accurate.
> 
> This is effectivelly reverting 1a61ab8038e72. There were no numbers
> backing that commit, neither this one has. The only hot path I can see
> is workingset_refault. All others seems to be in the reclaim path.

Yep, I saw the lruvec increasing is not in batch way in 
workingset_refault path, so thought maybe change the way in 
lruvec_lru_size() to no-batch might match the act more.

It seems to me less than 32 pages deviations for seq_show stuff is not a 
big deal (maybe it's not a big deal in lruvec_lru_size() as well...), so 
I choose just modify lruvec_lru_size(), but not revert 1a61ab8038e72.

>   
>> Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
> 
> That being said, I am not against this patch but the changelog should be
> more specific about the particular problem and how serious it is.

Thanks for the suggestions. I don't think this is a really serious one, 
and I'll try to give a more detailed changelog about what it's trying to 
fix.

Honglei

> 
>> ---
>>   mm/vmscan.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index c77d1e3761a7..c28672460868 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>>    */
>>   unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
>>   {
>> -	unsigned long lru_size;
>> +	unsigned long lru_size = 0;
>>   	int zid;
>>   
>> -	if (!mem_cgroup_disabled())
>> -		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
>> -	else
>> +	if (!mem_cgroup_disabled()) {
>> +		for (zid = 0; zid < MAX_NR_ZONES; zid++)
>> +			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
>> +	} else
>>   		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
>>   
>>   	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
>> -- 
>> 2.17.0
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-09-04  7:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-03  8:54 [PATCH] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size Honglei Wang
2019-09-03  9:28 ` Michal Hocko
2019-09-04  7:38   ` Honglei Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).