All of lore.kernel.org
 help / color / mirror / Atom feed
* [Question] Mlocked count will not be decreased
@ 2017-05-23 14:41 ` Kefeng Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Kefeng Wang @ 2017-05-23 14:41 UTC (permalink / raw)
  To: linux-mm, linux-kernel, zhongjiang, Qiuxishi, Yisheng Xie,
	wangkefeng.wang

Hi All,

Mlocked in meminfo will be increasing with an small testcase, and never be released in mainline,
here is a testcase[1] to reproduce the issue, but the centos7.2/7.3 will not increase.

Is it normal?

Thanks,
Kefeng




[1] testcase
linux:~ # cat test_mlockall.sh
grep Mlocked /proc/meminfo
 for j in `seq 0 10`
 do
	for i in `seq 4 15`
	do
		./p_mlockall >> log &
	done
	sleep 0.2
done
grep Mlocked /proc/meminfo


linux:~ # cat p_mlockall.c
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>

#define SPACE_LEN	4096

int main(int argc, char ** argv)
{
	int ret;
	void *adr = malloc(SPACE_LEN);
	if (!adr)
		return -1;
	
	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
	printf("mlcokall ret = %d\n", ret);

	ret = munlockall();
	printf("munlcokall ret = %d\n", ret);

	free(adr);
	return 0;
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Question] Mlocked count will not be decreased
@ 2017-05-23 14:41 ` Kefeng Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Kefeng Wang @ 2017-05-23 14:41 UTC (permalink / raw)
  To: linux-mm, linux-kernel, zhongjiang, Qiuxishi, Yisheng Xie,
	wangkefeng.wang

Hi All,

Mlocked in meminfo will be increasing with an small testcase, and never be released in mainline,
here is a testcase[1] to reproduce the issue, but the centos7.2/7.3 will not increase.

Is it normal?

Thanks,
Kefeng




[1] testcase
linux:~ # cat test_mlockall.sh
grep Mlocked /proc/meminfo
 for j in `seq 0 10`
 do
	for i in `seq 4 15`
	do
		./p_mlockall >> log &
	done
	sleep 0.2
done
grep Mlocked /proc/meminfo


linux:~ # cat p_mlockall.c
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>

#define SPACE_LEN	4096

int main(int argc, char ** argv)
{
	int ret;
	void *adr = malloc(SPACE_LEN);
	if (!adr)
		return -1;
	
	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
	printf("mlcokall ret = %d\n", ret);

	ret = munlockall();
	printf("munlcokall ret = %d\n", ret);

	free(adr);
	return 0;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-23 14:41 ` Kefeng Wang
@ 2017-05-23 22:04   ` Tetsuo Handa
  -1 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2017-05-23 22:04 UTC (permalink / raw)
  To: Kefeng Wang, linux-mm, linux-kernel, zhongjiang, Qiuxishi, Yisheng Xie

Kefeng Wang wrote:
> Hi All,
> 
> Mlocked in meminfo will be increasing with an small testcase, and never be released in mainline,
> here is a testcase[1] to reproduce the issue, but the centos7.2/7.3 will not increase.
> 
> Is it normal?

I confirmed your problem also occurs in Linux 4.11 using below testcase.
MemFree is not decreasing while Mlocked is increasing.
Thus, it seems to be statistics accounting bug.

----------
#include <sys/mman.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char ** argv)
{
	int i;
	for (i = 0; i < 128; i++)
		if (fork() == 0) {
			malloc(1048576);
			while (1) {
				mlockall(MCL_CURRENT | MCL_FUTURE);
				munlockall();
			}
		}
	return 0;
}
----------

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-23 22:04   ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2017-05-23 22:04 UTC (permalink / raw)
  To: Kefeng Wang, linux-mm, linux-kernel, zhongjiang, Qiuxishi, Yisheng Xie

Kefeng Wang wrote:
> Hi All,
> 
> Mlocked in meminfo will be increasing with an small testcase, and never be released in mainline,
> here is a testcase[1] to reproduce the issue, but the centos7.2/7.3 will not increase.
> 
> Is it normal?

I confirmed your problem also occurs in Linux 4.11 using below testcase.
MemFree is not decreasing while Mlocked is increasing.
Thus, it seems to be statistics accounting bug.

----------
#include <sys/mman.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char ** argv)
{
	int i;
	for (i = 0; i < 128; i++)
		if (fork() == 0) {
			malloc(1048576);
			while (1) {
				mlockall(MCL_CURRENT | MCL_FUTURE);
				munlockall();
			}
		}
	return 0;
}
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-23 14:41 ` Kefeng Wang
@ 2017-05-24  8:32   ` Yisheng Xie
  -1 siblings, 0 replies; 28+ messages in thread
From: Yisheng Xie @ 2017-05-24  8:32 UTC (permalink / raw)
  To: Kefeng Wang, linux-mm, linux-kernel, zhongjiang, Qiuxishi

Hi Kefeng,
Could you please try this patch.

Thanks
Yisheng Xie
-------------
>From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
From: Yisheng Xie <xieyisheng1@huawei.com>
Date: Wed, 24 May 2017 16:01:24 +0800
Subject: [PATCH] mlock: fix mlock count can not decrease in race condition

Kefeng reported that when run the follow test the mlock count in meminfo
cannot be decreased:
 [1] testcase
 linux:~ # cat test_mlockal
 grep Mlocked /proc/meminfo
  for j in `seq 0 10`
  do
 	for i in `seq 4 15`
 	do
 		./p_mlockall >> log &
 	done
 	sleep 0.2
 done
 sleep 5 # wait some time to let mlock decrease
 grep Mlocked /proc/meminfo

 linux:~ # cat p_mlockall.c
 #include <sys/mman.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define SPACE_LEN	4096

 int main(int argc, char ** argv)
 {
 	int ret;
 	void *adr = malloc(SPACE_LEN);
 	if (!adr)
 		return -1;

 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
 	printf("mlcokall ret = %d\n", ret);

 	ret = munlockall();
 	printf("munlcokall ret = %d\n", ret);

 	free(adr);
 	return 0;
 }

When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
condition, and we do not count these page into delta_munlocked, which cause mlock
counter incorrect for we had Clear the PageMlock and cannot count down
the number in the feture.

Fix it by count the number of page whoes PageMlock flag is cleared.

Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
---
 mm/mlock.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index c483c5c..71ba5cf 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -284,7 +284,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 {
 	int i;
 	int nr = pagevec_count(pvec);
-	int delta_munlocked;
+	int munlocked = 0;
 	struct pagevec pvec_putback;
 	int pgrescued = 0;

@@ -296,6 +296,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 		struct page *page = pvec->pages[i];

 		if (TestClearPageMlocked(page)) {
+			munlocked --;
 			/*
 			 * We already have pin from follow_page_mask()
 			 * so we can spare the get_page() here.
@@ -315,8 +316,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 		pagevec_add(&pvec_putback, pvec->pages[i]);
 		pvec->pages[i] = NULL;
 	}
-	delta_munlocked = -nr + pagevec_count(&pvec_putback);
-	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
+	if (munlocked)
+		__mod_zone_page_state(zone, NR_MLOCK, munlocked);
 	spin_unlock_irq(zone_lru_lock(zone));

 	/* Now we can release pins of pages that we are not munlocking */
-- 
1.7.12.4





Thanks
On 2017/5/23 22:41, Kefeng Wang wrote:
> Hi All,
> 
> Mlocked in meminfo will be increasing with an small testcase, and never be released in mainline,
> here is a testcase[1] to reproduce the issue, but the centos7.2/7.3 will not increase.
> 
> Is it normal?
> 
> Thanks,
> Kefeng
> 
> 
> 
> 
> [1] testcase
> linux:~ # cat test_mlockal
> grep Mlocked /proc/meminfo
>  for j in `seq 0 10`
>  do
> 	for i in `seq 4 15`
> 	do
> 		./p_mlockall >> log &
> 	done
> 	sleep 0.2
> done
> grep Mlocked /proc/meminfo
> 
> 
> linux:~ # cat p_mlockall.c
> #include <sys/mman.h>
> #include <stdlib.h>
> #include <stdio.h>
> 
> #define SPACE_LEN	4096
> 
> int main(int argc, char ** argv)
> {
> 	int ret;
> 	void *adr = malloc(SPACE_LEN);
> 	if (!adr)
> 		return -1;
> 	
> 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
> 	printf("mlcokall ret = %d\n", ret);
> 
> 	ret = munlockall();
> 	printf("munlcokall ret = %d\n", ret);
> 
> 	free(adr);
> 	return 0;
> }
> 
> 
> .
> 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24  8:32   ` Yisheng Xie
  0 siblings, 0 replies; 28+ messages in thread
From: Yisheng Xie @ 2017-05-24  8:32 UTC (permalink / raw)
  To: Kefeng Wang, linux-mm, linux-kernel, zhongjiang, Qiuxishi

Hi Kefengi 1/4 ?
Could you please try this patch.

Thanks
Yisheng Xie
-------------

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24  8:32   ` Yisheng Xie
@ 2017-05-24  8:57     ` Kefeng Wang
  -1 siblings, 0 replies; 28+ messages in thread
From: Kefeng Wang @ 2017-05-24  8:57 UTC (permalink / raw)
  To: Yisheng Xie, linux-mm, linux-kernel, zhongjiang, Qiuxishi, Tetsuo Handa
  Cc: Vlastimil Babka, Minchan Kim, Kirill A. Shutemov, Andrew Morton



On 2017/5/24 16:32, Yisheng Xie wrote:
> Hi Kefeng,
> Could you please try this patch.

It works for me, thanks.

Kefeng.

> 
> Thanks
> Yisheng Xie
> -------------
>>From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
> From: Yisheng Xie <xieyisheng1@huawei.com>
> Date: Wed, 24 May 2017 16:01:24 +0800
> Subject: [PATCH] mlock: fix mlock count can not decrease in race condition
> 
> Kefeng reported that when run the follow test the mlock count in meminfo
> cannot be decreased:
>  [1] testcase
>  linux:~ # cat test_mlockal
>  grep Mlocked /proc/meminfo
>   for j in `seq 0 10`
>   do
>  	for i in `seq 4 15`
>  	do
>  		./p_mlockall >> log &
>  	done
>  	sleep 0.2
>  done
>  sleep 5 # wait some time to let mlock decrease
>  grep Mlocked /proc/meminfo
> 
>  linux:~ # cat p_mlockall.c
>  #include <sys/mman.h>
>  #include <stdlib.h>
>  #include <stdio.h>
> 
>  #define SPACE_LEN	4096
> 
>  int main(int argc, char ** argv)
>  {
>  	int ret;
>  	void *adr = malloc(SPACE_LEN);
>  	if (!adr)
>  		return -1;
> 
>  	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
>  	printf("mlcokall ret = %d\n", ret);
> 
>  	ret = munlockall();
>  	printf("munlcokall ret = %d\n", ret);
> 
>  	free(adr);
>  	return 0;
>  }
> 
> When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
> condition, and we do not count these page into delta_munlocked, which cause mlock
> counter incorrect for we had Clear the PageMlock and cannot count down
> the number in the feture.
> 
> Fix it by count the number of page whoes PageMlock flag is cleared.
> 
> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
> ---
>  mm/mlock.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index c483c5c..71ba5cf 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -284,7 +284,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  {
>  	int i;
>  	int nr = pagevec_count(pvec);
> -	int delta_munlocked;
> +	int munlocked = 0;
>  	struct pagevec pvec_putback;
>  	int pgrescued = 0;
> 
> @@ -296,6 +296,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  		struct page *page = pvec->pages[i];
> 
>  		if (TestClearPageMlocked(page)) {
> +			munlocked --;
>  			/*
>  			 * We already have pin from follow_page_mask()
>  			 * so we can spare the get_page() here.
> @@ -315,8 +316,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  		pagevec_add(&pvec_putback, pvec->pages[i]);
>  		pvec->pages[i] = NULL;
>  	}
> -	delta_munlocked = -nr + pagevec_count(&pvec_putback);
> -	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
> +	if (munlocked)
> +		__mod_zone_page_state(zone, NR_MLOCK, munlocked);
>  	spin_unlock_irq(zone_lru_lock(zone));
> 
>  	/* Now we can release pins of pages that we are not munlocking */
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24  8:57     ` Kefeng Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Kefeng Wang @ 2017-05-24  8:57 UTC (permalink / raw)
  To: Yisheng Xie, linux-mm, linux-kernel, zhongjiang, Qiuxishi, Tetsuo Handa
  Cc: Vlastimil Babka, Minchan Kim, Kirill A. Shutemov, Andrew Morton



On 2017/5/24 16:32, Yisheng Xie wrote:
> Hi Kefengi 1/4 ?
> Could you please try this patch.

It works for me, thanks.

Kefeng.

> 
> Thanks
> Yisheng Xie
> -------------
>>From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
> From: Yisheng Xie <xieyisheng1@huawei.com>
> Date: Wed, 24 May 2017 16:01:24 +0800
> Subject: [PATCH] mlock: fix mlock count can not decrease in race condition
> 
> Kefeng reported that when run the follow test the mlock count in meminfo
> cannot be decreased:
>  [1] testcase
>  linux:~ # cat test_mlockal
>  grep Mlocked /proc/meminfo
>   for j in `seq 0 10`
>   do
>  	for i in `seq 4 15`
>  	do
>  		./p_mlockall >> log &
>  	done
>  	sleep 0.2
>  done
>  sleep 5 # wait some time to let mlock decrease
>  grep Mlocked /proc/meminfo
> 
>  linux:~ # cat p_mlockall.c
>  #include <sys/mman.h>
>  #include <stdlib.h>
>  #include <stdio.h>
> 
>  #define SPACE_LEN	4096
> 
>  int main(int argc, char ** argv)
>  {
>  	int ret;
>  	void *adr = malloc(SPACE_LEN);
>  	if (!adr)
>  		return -1;
> 
>  	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
>  	printf("mlcokall ret = %d\n", ret);
> 
>  	ret = munlockall();
>  	printf("munlcokall ret = %d\n", ret);
> 
>  	free(adr);
>  	return 0;
>  }
> 
> When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
> condition, and we do not count these page into delta_munlocked, which cause mlock
> counter incorrect for we had Clear the PageMlock and cannot count down
> the number in the feture.
> 
> Fix it by count the number of page whoes PageMlock flag is cleared.
> 
> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
> ---
>  mm/mlock.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index c483c5c..71ba5cf 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -284,7 +284,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  {
>  	int i;
>  	int nr = pagevec_count(pvec);
> -	int delta_munlocked;
> +	int munlocked = 0;
>  	struct pagevec pvec_putback;
>  	int pgrescued = 0;
> 
> @@ -296,6 +296,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  		struct page *page = pvec->pages[i];
> 
>  		if (TestClearPageMlocked(page)) {
> +			munlocked --;
>  			/*
>  			 * We already have pin from follow_page_mask()
>  			 * so we can spare the get_page() here.
> @@ -315,8 +316,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  		pagevec_add(&pvec_putback, pvec->pages[i]);
>  		pvec->pages[i] = NULL;
>  	}
> -	delta_munlocked = -nr + pagevec_count(&pvec_putback);
> -	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
> +	if (munlocked)
> +		__mod_zone_page_state(zone, NR_MLOCK, munlocked);
>  	spin_unlock_irq(zone_lru_lock(zone));
> 
>  	/* Now we can release pins of pages that we are not munlocking */
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24  8:32   ` Yisheng Xie
@ 2017-05-24 10:32     ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-24 10:32 UTC (permalink / raw)
  To: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang, Qiuxishi

On 05/24/2017 10:32 AM, Yisheng Xie wrote:
> Hi Kefeng,
> Could you please try this patch.
> 
> Thanks
> Yisheng Xie
> -------------
> From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
> From: Yisheng Xie <xieyisheng1@huawei.com>
> Date: Wed, 24 May 2017 16:01:24 +0800
> Subject: [PATCH] mlock: fix mlock count can not decrease in race condition
> 
> Kefeng reported that when run the follow test the mlock count in meminfo
> cannot be decreased:
>  [1] testcase
>  linux:~ # cat test_mlockal
>  grep Mlocked /proc/meminfo
>   for j in `seq 0 10`
>   do
>  	for i in `seq 4 15`
>  	do
>  		./p_mlockall >> log &
>  	done
>  	sleep 0.2
>  done
>  sleep 5 # wait some time to let mlock decrease
>  grep Mlocked /proc/meminfo
> 
>  linux:~ # cat p_mlockall.c
>  #include <sys/mman.h>
>  #include <stdlib.h>
>  #include <stdio.h>
> 
>  #define SPACE_LEN	4096
> 
>  int main(int argc, char ** argv)
>  {
>  	int ret;
>  	void *adr = malloc(SPACE_LEN);
>  	if (!adr)
>  		return -1;
> 
>  	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
>  	printf("mlcokall ret = %d\n", ret);
> 
>  	ret = munlockall();
>  	printf("munlcokall ret = %d\n", ret);
> 
>  	free(adr);
>  	return 0;
>  }
> 
> When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
> condition, and we do not count these page into delta_munlocked, which cause mlock

Race condition with what? Who else would isolate our pages?

> counter incorrect for we had Clear the PageMlock and cannot count down
> the number in the feture.
> 
> Fix it by count the number of page whoes PageMlock flag is cleared.
> 
> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>

Weird, I can reproduce the issue on my desktop's 4.11 distro kernel, but
not in qemu and small kernel build, for some reason. So I couldn't test
the patch yet. But it's true that before 7225522bb429 ("mm: munlock:
batch non-THP page isolation and munlock+putback using pagevec") we
decreased NR_MLOCK for each pages that passed TestClearPageMlocked(),
and that unintentionally changed with my patch. There should be a Fixes:
tag for that.

> ---
>  mm/mlock.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index c483c5c..71ba5cf 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -284,7 +284,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  {
>  	int i;
>  	int nr = pagevec_count(pvec);
> -	int delta_munlocked;
> +	int munlocked = 0;
>  	struct pagevec pvec_putback;
>  	int pgrescued = 0;
> 
> @@ -296,6 +296,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  		struct page *page = pvec->pages[i];
> 
>  		if (TestClearPageMlocked(page)) {
> +			munlocked --;
>  			/*
>  			 * We already have pin from follow_page_mask()
>  			 * so we can spare the get_page() here.
> @@ -315,8 +316,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  		pagevec_add(&pvec_putback, pvec->pages[i]);
>  		pvec->pages[i] = NULL;
>  	}
> -	delta_munlocked = -nr + pagevec_count(&pvec_putback);
> -	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
> +	if (munlocked)

You don't have to if () this, it should be very rare that munlocked will
be 0, and the code works fine even if it is.

> +		__mod_zone_page_state(zone, NR_MLOCK, munlocked);
>  	spin_unlock_irq(zone_lru_lock(zone));
> 
>  	/* Now we can release pins of pages that we are not munlocking */
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24 10:32     ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-24 10:32 UTC (permalink / raw)
  To: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang, Qiuxishi

On 05/24/2017 10:32 AM, Yisheng Xie wrote:
> Hi Kefengi 1/4 ?
> Could you please try this patch.
> 
> Thanks
> Yisheng Xie
> -------------
> From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
> From: Yisheng Xie <xieyisheng1@huawei.com>
> Date: Wed, 24 May 2017 16:01:24 +0800
> Subject: [PATCH] mlock: fix mlock count can not decrease in race condition
> 
> Kefeng reported that when run the follow test the mlock count in meminfo
> cannot be decreased:
>  [1] testcase
>  linux:~ # cat test_mlockal
>  grep Mlocked /proc/meminfo
>   for j in `seq 0 10`
>   do
>  	for i in `seq 4 15`
>  	do
>  		./p_mlockall >> log &
>  	done
>  	sleep 0.2
>  done
>  sleep 5 # wait some time to let mlock decrease
>  grep Mlocked /proc/meminfo
> 
>  linux:~ # cat p_mlockall.c
>  #include <sys/mman.h>
>  #include <stdlib.h>
>  #include <stdio.h>
> 
>  #define SPACE_LEN	4096
> 
>  int main(int argc, char ** argv)
>  {
>  	int ret;
>  	void *adr = malloc(SPACE_LEN);
>  	if (!adr)
>  		return -1;
> 
>  	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
>  	printf("mlcokall ret = %d\n", ret);
> 
>  	ret = munlockall();
>  	printf("munlcokall ret = %d\n", ret);
> 
>  	free(adr);
>  	return 0;
>  }
> 
> When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
> condition, and we do not count these page into delta_munlocked, which cause mlock

Race condition with what? Who else would isolate our pages?

> counter incorrect for we had Clear the PageMlock and cannot count down
> the number in the feture.
> 
> Fix it by count the number of page whoes PageMlock flag is cleared.
> 
> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>

Weird, I can reproduce the issue on my desktop's 4.11 distro kernel, but
not in qemu and small kernel build, for some reason. So I couldn't test
the patch yet. But it's true that before 7225522bb429 ("mm: munlock:
batch non-THP page isolation and munlock+putback using pagevec") we
decreased NR_MLOCK for each pages that passed TestClearPageMlocked(),
and that unintentionally changed with my patch. There should be a Fixes:
tag for that.

> ---
>  mm/mlock.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index c483c5c..71ba5cf 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -284,7 +284,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  {
>  	int i;
>  	int nr = pagevec_count(pvec);
> -	int delta_munlocked;
> +	int munlocked = 0;
>  	struct pagevec pvec_putback;
>  	int pgrescued = 0;
> 
> @@ -296,6 +296,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  		struct page *page = pvec->pages[i];
> 
>  		if (TestClearPageMlocked(page)) {
> +			munlocked --;
>  			/*
>  			 * We already have pin from follow_page_mask()
>  			 * so we can spare the get_page() here.
> @@ -315,8 +316,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  		pagevec_add(&pvec_putback, pvec->pages[i]);
>  		pvec->pages[i] = NULL;
>  	}
> -	delta_munlocked = -nr + pagevec_count(&pvec_putback);
> -	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
> +	if (munlocked)

You don't have to if () this, it should be very rare that munlocked will
be 0, and the code works fine even if it is.

> +		__mod_zone_page_state(zone, NR_MLOCK, munlocked);
>  	spin_unlock_irq(zone_lru_lock(zone));
> 
>  	/* Now we can release pins of pages that we are not munlocking */
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24 10:32     ` Vlastimil Babka
@ 2017-05-24 10:42       ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-24 10:42 UTC (permalink / raw)
  To: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang, Qiuxishi

On 05/24/2017 12:32 PM, Vlastimil Babka wrote:
> 
> Weird, I can reproduce the issue on my desktop's 4.11 distro kernel, but
> not in qemu and small kernel build, for some reason. So I couldn't test

Ah, Tetsuo's more aggressive testcase worked and I can confirm the fix.
However this would be slightly better, as it doesn't do the increment in
fastpath:

diff --git a/mm/mlock.c b/mm/mlock.c
index 0dd9ca18e19e..721679a2c1aa 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -286,7 +286,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 {
        int i;
        int nr = pagevec_count(pvec);
-       int delta_munlocked;
+       int delta_munlocked = -nr;
        struct pagevec pvec_putback;
        int pgrescued = 0;
 
@@ -306,6 +306,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
                                continue;
                        else
                                __munlock_isolation_failed(page);
+               } else {
+                       delta_munlocked++;
                }
 
                /*
@@ -317,7 +319,6 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
                pagevec_add(&pvec_putback, pvec->pages[i]);
                pvec->pages[i] = NULL;
        }
-       delta_munlocked = -nr + pagevec_count(&pvec_putback);
        __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
        spin_unlock_irq(zone_lru_lock(zone));

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24 10:42       ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-24 10:42 UTC (permalink / raw)
  To: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang, Qiuxishi

On 05/24/2017 12:32 PM, Vlastimil Babka wrote:
> 
> Weird, I can reproduce the issue on my desktop's 4.11 distro kernel, but
> not in qemu and small kernel build, for some reason. So I couldn't test

Ah, Tetsuo's more aggressive testcase worked and I can confirm the fix.
However this would be slightly better, as it doesn't do the increment in
fastpath:

diff --git a/mm/mlock.c b/mm/mlock.c
index 0dd9ca18e19e..721679a2c1aa 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -286,7 +286,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 {
        int i;
        int nr = pagevec_count(pvec);
-       int delta_munlocked;
+       int delta_munlocked = -nr;
        struct pagevec pvec_putback;
        int pgrescued = 0;
 
@@ -306,6 +306,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
                                continue;
                        else
                                __munlock_isolation_failed(page);
+               } else {
+                       delta_munlocked++;
                }
 
                /*
@@ -317,7 +319,6 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
                pagevec_add(&pvec_putback, pvec->pages[i]);
                pvec->pages[i] = NULL;
        }
-       delta_munlocked = -nr + pagevec_count(&pvec_putback);
        __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
        spin_unlock_irq(zone_lru_lock(zone));

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24 10:32     ` Vlastimil Babka
@ 2017-05-24 10:49       ` Xishi Qiu
  -1 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2017-05-24 10:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 2017/5/24 18:32, Vlastimil Babka wrote:

> On 05/24/2017 10:32 AM, Yisheng Xie wrote:
>> Hi Kefeng,
>> Could you please try this patch.
>>
>> Thanks
>> Yisheng Xie
>> -------------
>> From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
>> From: Yisheng Xie <xieyisheng1@huawei.com>
>> Date: Wed, 24 May 2017 16:01:24 +0800
>> Subject: [PATCH] mlock: fix mlock count can not decrease in race condition
>>
>> Kefeng reported that when run the follow test the mlock count in meminfo
>> cannot be decreased:
>>  [1] testcase
>>  linux:~ # cat test_mlockal
>>  grep Mlocked /proc/meminfo
>>   for j in `seq 0 10`
>>   do
>>  	for i in `seq 4 15`
>>  	do
>>  		./p_mlockall >> log &
>>  	done
>>  	sleep 0.2
>>  done
>>  sleep 5 # wait some time to let mlock decrease
>>  grep Mlocked /proc/meminfo
>>
>>  linux:~ # cat p_mlockall.c
>>  #include <sys/mman.h>
>>  #include <stdlib.h>
>>  #include <stdio.h>
>>
>>  #define SPACE_LEN	4096
>>
>>  int main(int argc, char ** argv)
>>  {
>>  	int ret;
>>  	void *adr = malloc(SPACE_LEN);
>>  	if (!adr)
>>  		return -1;
>>
>>  	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
>>  	printf("mlcokall ret = %d\n", ret);
>>
>>  	ret = munlockall();
>>  	printf("munlcokall ret = %d\n", ret);
>>
>>  	free(adr);
>>  	return 0;
>>  }
>>
>> When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
>> condition, and we do not count these page into delta_munlocked, which cause mlock
> 
> Race condition with what? Who else would isolate our pages?
> 
>> counter incorrect for we had Clear the PageMlock and cannot count down
>> the number in the feture.
>>
>> Fix it by count the number of page whoes PageMlock flag is cleared.
>>
>> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
> 
> Weird, I can reproduce the issue on my desktop's 4.11 distro kernel, but
> not in qemu and small kernel build, for some reason. So I couldn't test
> the patch yet. But it's true that before 7225522bb429 ("mm: munlock:
> batch non-THP page isolation and munlock+putback using pagevec") we
> decreased NR_MLOCK for each pages that passed TestClearPageMlocked(),
> and that unintentionally changed with my patch. There should be a Fixes:
> tag for that.
> 

Hi Vlastimil,

Why the page has marked Mlocked, but not in lru list?

		if (TestClearPageMlocked(page)) {
			/*
			 * We already have pin from follow_page_mask()
			 * so we can spare the get_page() here.
			 */
			if (__munlock_isolate_lru_page(page, false))
				continue;
			else
				__munlock_isolation_failed(page);  // How this happened?
		}

Thanks,
Xishi Qiu

>> ---
>>  mm/mlock.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/mlock.c b/mm/mlock.c
>> index c483c5c..71ba5cf 100644
>> --- a/mm/mlock.c
>> +++ b/mm/mlock.c
>> @@ -284,7 +284,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>>  {
>>  	int i;
>>  	int nr = pagevec_count(pvec);
>> -	int delta_munlocked;
>> +	int munlocked = 0;
>>  	struct pagevec pvec_putback;
>>  	int pgrescued = 0;
>>
>> @@ -296,6 +296,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>>  		struct page *page = pvec->pages[i];
>>
>>  		if (TestClearPageMlocked(page)) {
>> +			munlocked --;
>>  			/*
>>  			 * We already have pin from follow_page_mask()
>>  			 * so we can spare the get_page() here.
>> @@ -315,8 +316,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>>  		pagevec_add(&pvec_putback, pvec->pages[i]);
>>  		pvec->pages[i] = NULL;
>>  	}
>> -	delta_munlocked = -nr + pagevec_count(&pvec_putback);
>> -	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
>> +	if (munlocked)
> 
> You don't have to if () this, it should be very rare that munlocked will
> be 0, and the code works fine even if it is.
> 
>> +		__mod_zone_page_state(zone, NR_MLOCK, munlocked);
>>  	spin_unlock_irq(zone_lru_lock(zone));
>>
>>  	/* Now we can release pins of pages that we are not munlocking */
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24 10:49       ` Xishi Qiu
  0 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2017-05-24 10:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 2017/5/24 18:32, Vlastimil Babka wrote:

> On 05/24/2017 10:32 AM, Yisheng Xie wrote:
>> Hi Kefengi 1/4 ?
>> Could you please try this patch.
>>
>> Thanks
>> Yisheng Xie
>> -------------
>> From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
>> From: Yisheng Xie <xieyisheng1@huawei.com>
>> Date: Wed, 24 May 2017 16:01:24 +0800
>> Subject: [PATCH] mlock: fix mlock count can not decrease in race condition
>>
>> Kefeng reported that when run the follow test the mlock count in meminfo
>> cannot be decreased:
>>  [1] testcase
>>  linux:~ # cat test_mlockal
>>  grep Mlocked /proc/meminfo
>>   for j in `seq 0 10`
>>   do
>>  	for i in `seq 4 15`
>>  	do
>>  		./p_mlockall >> log &
>>  	done
>>  	sleep 0.2
>>  done
>>  sleep 5 # wait some time to let mlock decrease
>>  grep Mlocked /proc/meminfo
>>
>>  linux:~ # cat p_mlockall.c
>>  #include <sys/mman.h>
>>  #include <stdlib.h>
>>  #include <stdio.h>
>>
>>  #define SPACE_LEN	4096
>>
>>  int main(int argc, char ** argv)
>>  {
>>  	int ret;
>>  	void *adr = malloc(SPACE_LEN);
>>  	if (!adr)
>>  		return -1;
>>
>>  	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
>>  	printf("mlcokall ret = %d\n", ret);
>>
>>  	ret = munlockall();
>>  	printf("munlcokall ret = %d\n", ret);
>>
>>  	free(adr);
>>  	return 0;
>>  }
>>
>> When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
>> condition, and we do not count these page into delta_munlocked, which cause mlock
> 
> Race condition with what? Who else would isolate our pages?
> 
>> counter incorrect for we had Clear the PageMlock and cannot count down
>> the number in the feture.
>>
>> Fix it by count the number of page whoes PageMlock flag is cleared.
>>
>> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
> 
> Weird, I can reproduce the issue on my desktop's 4.11 distro kernel, but
> not in qemu and small kernel build, for some reason. So I couldn't test
> the patch yet. But it's true that before 7225522bb429 ("mm: munlock:
> batch non-THP page isolation and munlock+putback using pagevec") we
> decreased NR_MLOCK for each pages that passed TestClearPageMlocked(),
> and that unintentionally changed with my patch. There should be a Fixes:
> tag for that.
> 

Hi Vlastimil,

Why the page has marked Mlocked, but not in lru list?

		if (TestClearPageMlocked(page)) {
			/*
			 * We already have pin from follow_page_mask()
			 * so we can spare the get_page() here.
			 */
			if (__munlock_isolate_lru_page(page, false))
				continue;
			else
				__munlock_isolation_failed(page);  // How this happened?
		}

Thanks,
Xishi Qiu

>> ---
>>  mm/mlock.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/mlock.c b/mm/mlock.c
>> index c483c5c..71ba5cf 100644
>> --- a/mm/mlock.c
>> +++ b/mm/mlock.c
>> @@ -284,7 +284,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>>  {
>>  	int i;
>>  	int nr = pagevec_count(pvec);
>> -	int delta_munlocked;
>> +	int munlocked = 0;
>>  	struct pagevec pvec_putback;
>>  	int pgrescued = 0;
>>
>> @@ -296,6 +296,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>>  		struct page *page = pvec->pages[i];
>>
>>  		if (TestClearPageMlocked(page)) {
>> +			munlocked --;
>>  			/*
>>  			 * We already have pin from follow_page_mask()
>>  			 * so we can spare the get_page() here.
>> @@ -315,8 +316,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>>  		pagevec_add(&pvec_putback, pvec->pages[i]);
>>  		pvec->pages[i] = NULL;
>>  	}
>> -	delta_munlocked = -nr + pagevec_count(&pvec_putback);
>> -	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
>> +	if (munlocked)
> 
> You don't have to if () this, it should be very rare that munlocked will
> be 0, and the code works fine even if it is.
> 
>> +		__mod_zone_page_state(zone, NR_MLOCK, munlocked);
>>  	spin_unlock_irq(zone_lru_lock(zone));
>>
>>  	/* Now we can release pins of pages that we are not munlocking */
>>
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24 10:32     ` Vlastimil Babka
@ 2017-05-24 11:38       ` Xishi Qiu
  -1 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2017-05-24 11:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 2017/5/24 18:32, Vlastimil Babka wrote:

> On 05/24/2017 10:32 AM, Yisheng Xie wrote:
>> Hi Kefeng,
>> Could you please try this patch.
>>
>> Thanks
>> Yisheng Xie
>> -------------
>> From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
>> From: Yisheng Xie <xieyisheng1@huawei.com>
>> Date: Wed, 24 May 2017 16:01:24 +0800
>> Subject: [PATCH] mlock: fix mlock count can not decrease in race condition
>>
>> Kefeng reported that when run the follow test the mlock count in meminfo
>> cannot be decreased:
>>  [1] testcase
>>  linux:~ # cat test_mlockal
>>  grep Mlocked /proc/meminfo
>>   for j in `seq 0 10`
>>   do
>>  	for i in `seq 4 15`
>>  	do
>>  		./p_mlockall >> log &
>>  	done
>>  	sleep 0.2
>>  done
>>  sleep 5 # wait some time to let mlock decrease
>>  grep Mlocked /proc/meminfo
>>
>>  linux:~ # cat p_mlockall.c
>>  #include <sys/mman.h>
>>  #include <stdlib.h>
>>  #include <stdio.h>
>>
>>  #define SPACE_LEN	4096
>>
>>  int main(int argc, char ** argv)
>>  {
>>  	int ret;
>>  	void *adr = malloc(SPACE_LEN);
>>  	if (!adr)
>>  		return -1;
>>
>>  	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
>>  	printf("mlcokall ret = %d\n", ret);
>>
>>  	ret = munlockall();
>>  	printf("munlcokall ret = %d\n", ret);
>>
>>  	free(adr);
>>  	return 0;
>>  }
>>
>> When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
>> condition, and we do not count these page into delta_munlocked, which cause mlock
> 
> Race condition with what? Who else would isolate our pages?
> 

Hi Vlastimil,

I find the root cause, if the page was not cached on the current cpu,
lru_add_drain() will not push it to LRU. So we should handle fail
case in mlock_vma_page().

follow_page_pte()
		...
		if (page->mapping && trylock_page(page)) {
			lru_add_drain();  /* push cached pages to LRU */
			/*
			 * Because we lock page here, and migration is
			 * blocked by the pte's page reference, and we
			 * know the page is still mapped, we don't even
			 * need to check for file-cache page truncation.
			 */
			mlock_vma_page(page);
			unlock_page(page);
		}
		...

I think we should add yisheng's patch, also we should add the following change.
I think it is better than use lru_add_drain_all().

diff --git a/mm/mlock.c b/mm/mlock.c
index 3d3ee6c..ca2aeb9 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 		if (!isolate_lru_page(page))
 			putback_lru_page(page);
+		else {
+			ClearPageMlocked(page);
+			mod_zone_page_state(page_zone(page), NR_MLOCK,
+					-hpage_nr_pages(page));
+		}
 	}
 }

Thanks,
Xishi Qiu

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24 11:38       ` Xishi Qiu
  0 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2017-05-24 11:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 2017/5/24 18:32, Vlastimil Babka wrote:

> On 05/24/2017 10:32 AM, Yisheng Xie wrote:
>> Hi Kefengi 1/4 ?
>> Could you please try this patch.
>>
>> Thanks
>> Yisheng Xie
>> -------------
>> From a70ae975756e8e97a28d49117ab25684da631689 Mon Sep 17 00:00:00 2001
>> From: Yisheng Xie <xieyisheng1@huawei.com>
>> Date: Wed, 24 May 2017 16:01:24 +0800
>> Subject: [PATCH] mlock: fix mlock count can not decrease in race condition
>>
>> Kefeng reported that when run the follow test the mlock count in meminfo
>> cannot be decreased:
>>  [1] testcase
>>  linux:~ # cat test_mlockal
>>  grep Mlocked /proc/meminfo
>>   for j in `seq 0 10`
>>   do
>>  	for i in `seq 4 15`
>>  	do
>>  		./p_mlockall >> log &
>>  	done
>>  	sleep 0.2
>>  done
>>  sleep 5 # wait some time to let mlock decrease
>>  grep Mlocked /proc/meminfo
>>
>>  linux:~ # cat p_mlockall.c
>>  #include <sys/mman.h>
>>  #include <stdlib.h>
>>  #include <stdio.h>
>>
>>  #define SPACE_LEN	4096
>>
>>  int main(int argc, char ** argv)
>>  {
>>  	int ret;
>>  	void *adr = malloc(SPACE_LEN);
>>  	if (!adr)
>>  		return -1;
>>
>>  	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
>>  	printf("mlcokall ret = %d\n", ret);
>>
>>  	ret = munlockall();
>>  	printf("munlcokall ret = %d\n", ret);
>>
>>  	free(adr);
>>  	return 0;
>>  }
>>
>> When __munlock_pagevec, we ClearPageMlock but isolation_failed in race
>> condition, and we do not count these page into delta_munlocked, which cause mlock
> 
> Race condition with what? Who else would isolate our pages?
> 

Hi Vlastimil,

I find the root cause, if the page was not cached on the current cpu,
lru_add_drain() will not push it to LRU. So we should handle fail
case in mlock_vma_page().

follow_page_pte()
		...
		if (page->mapping && trylock_page(page)) {
			lru_add_drain();  /* push cached pages to LRU */
			/*
			 * Because we lock page here, and migration is
			 * blocked by the pte's page reference, and we
			 * know the page is still mapped, we don't even
			 * need to check for file-cache page truncation.
			 */
			mlock_vma_page(page);
			unlock_page(page);
		}
		...

I think we should add yisheng's patch, also we should add the following change.
I think it is better than use lru_add_drain_all().

diff --git a/mm/mlock.c b/mm/mlock.c
index 3d3ee6c..ca2aeb9 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 		if (!isolate_lru_page(page))
 			putback_lru_page(page);
+		else {
+			ClearPageMlocked(page);
+			mod_zone_page_state(page_zone(page), NR_MLOCK,
+					-hpage_nr_pages(page));
+		}
 	}
 }

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24 11:38       ` Xishi Qiu
@ 2017-05-24 11:52         ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-24 11:52 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>
>> Race condition with what? Who else would isolate our pages?
>>
> 
> Hi Vlastimil,
> 
> I find the root cause, if the page was not cached on the current cpu,
> lru_add_drain() will not push it to LRU. So we should handle fail
> case in mlock_vma_page().

Yeah that would explain it.

> follow_page_pte()
> 		...
> 		if (page->mapping && trylock_page(page)) {
> 			lru_add_drain();  /* push cached pages to LRU */
> 			/*
> 			 * Because we lock page here, and migration is
> 			 * blocked by the pte's page reference, and we
> 			 * know the page is still mapped, we don't even
> 			 * need to check for file-cache page truncation.
> 			 */
> 			mlock_vma_page(page);
> 			unlock_page(page);
> 		}
> 		...
> 
> I think we should add yisheng's patch, also we should add the following change.
> I think it is better than use lru_add_drain_all().

I agree about yisheng's fix (but v2 didn't address my comments). I don't
think we should add the hunk below, as that deviates from the rest of
the design.

Thanks,
Vlastimil

> diff --git a/mm/mlock.c b/mm/mlock.c
> index 3d3ee6c..ca2aeb9 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>  		if (!isolate_lru_page(page))
>  			putback_lru_page(page);
> +		else {
> +			ClearPageMlocked(page);
> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
> +					-hpage_nr_pages(page));
> +		}
>  	}
>  }
> 
> Thanks,
> Xishi Qiu
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24 11:52         ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-24 11:52 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>
>> Race condition with what? Who else would isolate our pages?
>>
> 
> Hi Vlastimil,
> 
> I find the root cause, if the page was not cached on the current cpu,
> lru_add_drain() will not push it to LRU. So we should handle fail
> case in mlock_vma_page().

Yeah that would explain it.

> follow_page_pte()
> 		...
> 		if (page->mapping && trylock_page(page)) {
> 			lru_add_drain();  /* push cached pages to LRU */
> 			/*
> 			 * Because we lock page here, and migration is
> 			 * blocked by the pte's page reference, and we
> 			 * know the page is still mapped, we don't even
> 			 * need to check for file-cache page truncation.
> 			 */
> 			mlock_vma_page(page);
> 			unlock_page(page);
> 		}
> 		...
> 
> I think we should add yisheng's patch, also we should add the following change.
> I think it is better than use lru_add_drain_all().

I agree about yisheng's fix (but v2 didn't address my comments). I don't
think we should add the hunk below, as that deviates from the rest of
the design.

Thanks,
Vlastimil

> diff --git a/mm/mlock.c b/mm/mlock.c
> index 3d3ee6c..ca2aeb9 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>  		if (!isolate_lru_page(page))
>  			putback_lru_page(page);
> +		else {
> +			ClearPageMlocked(page);
> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
> +					-hpage_nr_pages(page));
> +		}
>  	}
>  }
> 
> Thanks,
> Xishi Qiu
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24 11:52         ` Vlastimil Babka
@ 2017-05-24 12:10           ` Xishi Qiu
  -1 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2017-05-24 12:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 2017/5/24 19:52, Vlastimil Babka wrote:

> On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>>
>>> Race condition with what? Who else would isolate our pages?
>>>
>>
>> Hi Vlastimil,
>>
>> I find the root cause, if the page was not cached on the current cpu,
>> lru_add_drain() will not push it to LRU. So we should handle fail
>> case in mlock_vma_page().
> 
> Yeah that would explain it.
> 
>> follow_page_pte()
>> 		...
>> 		if (page->mapping && trylock_page(page)) {
>> 			lru_add_drain();  /* push cached pages to LRU */
>> 			/*
>> 			 * Because we lock page here, and migration is
>> 			 * blocked by the pte's page reference, and we
>> 			 * know the page is still mapped, we don't even
>> 			 * need to check for file-cache page truncation.
>> 			 */
>> 			mlock_vma_page(page);
>> 			unlock_page(page);
>> 		}
>> 		...
>>
>> I think we should add yisheng's patch, also we should add the following change.
>> I think it is better than use lru_add_drain_all().
> 
> I agree about yisheng's fix (but v2 didn't address my comments). I don't
> think we should add the hunk below, as that deviates from the rest of
> the design.

Hi Vlastimil,

The rest of the design is that mlock should always success here, right?

If we don't handle the fail case, the page will be in anon/file lru list
later when call __pagevec_lru_add(), but NR_MLOCK increased,
this is wrong, right?

Thanks,
Xishi Qiu

> 
> Thanks,
> Vlastimil
> 
>> diff --git a/mm/mlock.c b/mm/mlock.c
>> index 3d3ee6c..ca2aeb9 100644
>> --- a/mm/mlock.c
>> +++ b/mm/mlock.c
>> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>>  		if (!isolate_lru_page(page))
>>  			putback_lru_page(page);
>> +		else {
>> +			ClearPageMlocked(page);
>> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
>> +					-hpage_nr_pages(page));
>> +		}
>>  	}
>>  }
>>
>> Thanks,
>> Xishi Qiu
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24 12:10           ` Xishi Qiu
  0 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2017-05-24 12:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 2017/5/24 19:52, Vlastimil Babka wrote:

> On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>>
>>> Race condition with what? Who else would isolate our pages?
>>>
>>
>> Hi Vlastimil,
>>
>> I find the root cause, if the page was not cached on the current cpu,
>> lru_add_drain() will not push it to LRU. So we should handle fail
>> case in mlock_vma_page().
> 
> Yeah that would explain it.
> 
>> follow_page_pte()
>> 		...
>> 		if (page->mapping && trylock_page(page)) {
>> 			lru_add_drain();  /* push cached pages to LRU */
>> 			/*
>> 			 * Because we lock page here, and migration is
>> 			 * blocked by the pte's page reference, and we
>> 			 * know the page is still mapped, we don't even
>> 			 * need to check for file-cache page truncation.
>> 			 */
>> 			mlock_vma_page(page);
>> 			unlock_page(page);
>> 		}
>> 		...
>>
>> I think we should add yisheng's patch, also we should add the following change.
>> I think it is better than use lru_add_drain_all().
> 
> I agree about yisheng's fix (but v2 didn't address my comments). I don't
> think we should add the hunk below, as that deviates from the rest of
> the design.

Hi Vlastimil,

The rest of the design is that mlock should always success here, right?

If we don't handle the fail case, the page will be in anon/file lru list
later when call __pagevec_lru_add(), but NR_MLOCK increased,
this is wrong, right?

Thanks,
Xishi Qiu

> 
> Thanks,
> Vlastimil
> 
>> diff --git a/mm/mlock.c b/mm/mlock.c
>> index 3d3ee6c..ca2aeb9 100644
>> --- a/mm/mlock.c
>> +++ b/mm/mlock.c
>> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>>  		if (!isolate_lru_page(page))
>>  			putback_lru_page(page);
>> +		else {
>> +			ClearPageMlocked(page);
>> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
>> +					-hpage_nr_pages(page));
>> +		}
>>  	}
>>  }
>>
>> Thanks,
>> Xishi Qiu
>>
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24 12:10           ` Xishi Qiu
@ 2017-05-24 13:16             ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-24 13:16 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 05/24/2017 02:10 PM, Xishi Qiu wrote:
> On 2017/5/24 19:52, Vlastimil Babka wrote:
> 
>> On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>>>
>>>> Race condition with what? Who else would isolate our pages?
>>>>
>>>
>>> Hi Vlastimil,
>>>
>>> I find the root cause, if the page was not cached on the current cpu,
>>> lru_add_drain() will not push it to LRU. So we should handle fail
>>> case in mlock_vma_page().
>>
>> Yeah that would explain it.
>>
>>> follow_page_pte()
>>> 		...
>>> 		if (page->mapping && trylock_page(page)) {
>>> 			lru_add_drain();  /* push cached pages to LRU */
>>> 			/*
>>> 			 * Because we lock page here, and migration is
>>> 			 * blocked by the pte's page reference, and we
>>> 			 * know the page is still mapped, we don't even
>>> 			 * need to check for file-cache page truncation.
>>> 			 */
>>> 			mlock_vma_page(page);
>>> 			unlock_page(page);
>>> 		}
>>> 		...
>>>
>>> I think we should add yisheng's patch, also we should add the following change.
>>> I think it is better than use lru_add_drain_all().
>>
>> I agree about yisheng's fix (but v2 didn't address my comments). I don't
>> think we should add the hunk below, as that deviates from the rest of
>> the design.
> 
> Hi Vlastimil,
> 
> The rest of the design is that mlock should always success here, right?

The rest of the design allows a temporary disconnect between mlocked
flag and being placed on unevictable lru.

> If we don't handle the fail case, the page will be in anon/file lru list
> later when call __pagevec_lru_add(), but NR_MLOCK increased,
> this is wrong, right?

It's not wrong, the page cannot get evicted even if on wrong lru, so
effectively it's already mlocked. We would be underaccounting NR_MLOCK.

> Thanks,
> Xishi Qiu
> 
>>
>> Thanks,
>> Vlastimil
>>
>>> diff --git a/mm/mlock.c b/mm/mlock.c
>>> index 3d3ee6c..ca2aeb9 100644
>>> --- a/mm/mlock.c
>>> +++ b/mm/mlock.c
>>> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>>>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>>>  		if (!isolate_lru_page(page))
>>>  			putback_lru_page(page);
>>> +		else {
>>> +			ClearPageMlocked(page);
>>> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
>>> +					-hpage_nr_pages(page));
>>> +		}
>>>  	}
>>>  }
>>>
>>> Thanks,
>>> Xishi Qiu
>>>
>>
>>
>> .
>>
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-24 13:16             ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-24 13:16 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 05/24/2017 02:10 PM, Xishi Qiu wrote:
> On 2017/5/24 19:52, Vlastimil Babka wrote:
> 
>> On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>>>
>>>> Race condition with what? Who else would isolate our pages?
>>>>
>>>
>>> Hi Vlastimil,
>>>
>>> I find the root cause, if the page was not cached on the current cpu,
>>> lru_add_drain() will not push it to LRU. So we should handle fail
>>> case in mlock_vma_page().
>>
>> Yeah that would explain it.
>>
>>> follow_page_pte()
>>> 		...
>>> 		if (page->mapping && trylock_page(page)) {
>>> 			lru_add_drain();  /* push cached pages to LRU */
>>> 			/*
>>> 			 * Because we lock page here, and migration is
>>> 			 * blocked by the pte's page reference, and we
>>> 			 * know the page is still mapped, we don't even
>>> 			 * need to check for file-cache page truncation.
>>> 			 */
>>> 			mlock_vma_page(page);
>>> 			unlock_page(page);
>>> 		}
>>> 		...
>>>
>>> I think we should add yisheng's patch, also we should add the following change.
>>> I think it is better than use lru_add_drain_all().
>>
>> I agree about yisheng's fix (but v2 didn't address my comments). I don't
>> think we should add the hunk below, as that deviates from the rest of
>> the design.
> 
> Hi Vlastimil,
> 
> The rest of the design is that mlock should always success here, right?

The rest of the design allows a temporary disconnect between mlocked
flag and being placed on unevictable lru.

> If we don't handle the fail case, the page will be in anon/file lru list
> later when call __pagevec_lru_add(), but NR_MLOCK increased,
> this is wrong, right?

It's not wrong, the page cannot get evicted even if on wrong lru, so
effectively it's already mlocked. We would be underaccounting NR_MLOCK.

> Thanks,
> Xishi Qiu
> 
>>
>> Thanks,
>> Vlastimil
>>
>>> diff --git a/mm/mlock.c b/mm/mlock.c
>>> index 3d3ee6c..ca2aeb9 100644
>>> --- a/mm/mlock.c
>>> +++ b/mm/mlock.c
>>> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>>>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>>>  		if (!isolate_lru_page(page))
>>>  			putback_lru_page(page);
>>> +		else {
>>> +			ClearPageMlocked(page);
>>> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
>>> +					-hpage_nr_pages(page));
>>> +		}
>>>  	}
>>>  }
>>>
>>> Thanks,
>>> Xishi Qiu
>>>
>>
>>
>> .
>>
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24 11:52         ` Vlastimil Babka
@ 2017-05-25  1:00           ` Yisheng Xie
  -1 siblings, 0 replies; 28+ messages in thread
From: Yisheng Xie @ 2017-05-25  1:00 UTC (permalink / raw)
  To: Vlastimil Babka, Xishi Qiu
  Cc: Kefeng Wang, linux-mm, linux-kernel, zhongjiang

Hi Vlastimil,

Thanks for comment.
On 2017/5/24 19:52, Vlastimil Babka wrote:
> On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>>
>>> Race condition with what? Who else would isolate our pages?
>>>
>>
>> Hi Vlastimil,
>>
>> I find the root cause, if the page was not cached on the current cpu,
>> lru_add_drain() will not push it to LRU. So we should handle fail
>> case in mlock_vma_page().
> 
> Yeah that would explain it.
> 
>> follow_page_pte()
>> 		...
>> 		if (page->mapping && trylock_page(page)) {
>> 			lru_add_drain();  /* push cached pages to LRU */
>> 			/*
>> 			 * Because we lock page here, and migration is
>> 			 * blocked by the pte's page reference, and we
>> 			 * know the page is still mapped, we don't even
>> 			 * need to check for file-cache page truncation.
>> 			 */
>> 			mlock_vma_page(page);
>> 			unlock_page(page);
>> 		}
>> 		...
>>
>> I think we should add yisheng's patch, also we should add the following change.
>> I think it is better than use lru_add_drain_all().
> 
> I agree about yisheng's fix (but v2 didn't address my comments). I don't
> think we should add the hunk below, as that deviates from the rest of
> the design.
> 
Sorry, I have sent the patch before your comment. Anyway I will send another version
as your suggestion.

Thanks
Yisheng Xie

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-25  1:00           ` Yisheng Xie
  0 siblings, 0 replies; 28+ messages in thread
From: Yisheng Xie @ 2017-05-25  1:00 UTC (permalink / raw)
  To: Vlastimil Babka, Xishi Qiu
  Cc: Kefeng Wang, linux-mm, linux-kernel, zhongjiang

Hi Vlastimil,

Thanks for comment.
On 2017/5/24 19:52, Vlastimil Babka wrote:
> On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>>
>>> Race condition with what? Who else would isolate our pages?
>>>
>>
>> Hi Vlastimil,
>>
>> I find the root cause, if the page was not cached on the current cpu,
>> lru_add_drain() will not push it to LRU. So we should handle fail
>> case in mlock_vma_page().
> 
> Yeah that would explain it.
> 
>> follow_page_pte()
>> 		...
>> 		if (page->mapping && trylock_page(page)) {
>> 			lru_add_drain();  /* push cached pages to LRU */
>> 			/*
>> 			 * Because we lock page here, and migration is
>> 			 * blocked by the pte's page reference, and we
>> 			 * know the page is still mapped, we don't even
>> 			 * need to check for file-cache page truncation.
>> 			 */
>> 			mlock_vma_page(page);
>> 			unlock_page(page);
>> 		}
>> 		...
>>
>> I think we should add yisheng's patch, also we should add the following change.
>> I think it is better than use lru_add_drain_all().
> 
> I agree about yisheng's fix (but v2 didn't address my comments). I don't
> think we should add the hunk below, as that deviates from the rest of
> the design.
> 
Sorry, I have sent the patch before your comment. Anyway I will send another version
as your suggestion.

Thanks
Yisheng Xie



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-24 13:16             ` Vlastimil Babka
@ 2017-05-25  1:16               ` Xishi Qiu
  -1 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2017-05-25  1:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 2017/5/24 21:16, Vlastimil Babka wrote:

> On 05/24/2017 02:10 PM, Xishi Qiu wrote:
>> On 2017/5/24 19:52, Vlastimil Babka wrote:
>>
>>> On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>>>>
>>>>> Race condition with what? Who else would isolate our pages?
>>>>>
>>>>
>>>> Hi Vlastimil,
>>>>
>>>> I find the root cause, if the page was not cached on the current cpu,
>>>> lru_add_drain() will not push it to LRU. So we should handle fail
>>>> case in mlock_vma_page().
>>>
>>> Yeah that would explain it.
>>>
>>>> follow_page_pte()
>>>> 		...
>>>> 		if (page->mapping && trylock_page(page)) {
>>>> 			lru_add_drain();  /* push cached pages to LRU */
>>>> 			/*
>>>> 			 * Because we lock page here, and migration is
>>>> 			 * blocked by the pte's page reference, and we
>>>> 			 * know the page is still mapped, we don't even
>>>> 			 * need to check for file-cache page truncation.
>>>> 			 */
>>>> 			mlock_vma_page(page);
>>>> 			unlock_page(page);
>>>> 		}
>>>> 		...
>>>>
>>>> I think we should add yisheng's patch, also we should add the following change.
>>>> I think it is better than use lru_add_drain_all().
>>>
>>> I agree about yisheng's fix (but v2 didn't address my comments). I don't
>>> think we should add the hunk below, as that deviates from the rest of
>>> the design.
>>
>> Hi Vlastimil,
>>
>> The rest of the design is that mlock should always success here, right?
> 
> The rest of the design allows a temporary disconnect between mlocked
> flag and being placed on unevictable lru.
> 
>> If we don't handle the fail case, the page will be in anon/file lru list
>> later when call __pagevec_lru_add(), but NR_MLOCK increased,
>> this is wrong, right?
> 
> It's not wrong, the page cannot get evicted even if on wrong lru, so
> effectively it's already mlocked. We would be underaccounting NR_MLOCK.
> 

Hi Vlastimil,

I'm not quite understand why the page cannot get evicted even if on wrong lru.
__isolate_lru_page() will only skip PageUnevictable(page), but this flag has not
been set, we only set PageMlocked.

Thanks,
Xishi Qiu

>> Thanks,
>> Xishi Qiu
>>
>>>
>>> Thanks,
>>> Vlastimil
>>>
>>>> diff --git a/mm/mlock.c b/mm/mlock.c
>>>> index 3d3ee6c..ca2aeb9 100644
>>>> --- a/mm/mlock.c
>>>> +++ b/mm/mlock.c
>>>> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>>>>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>>>>  		if (!isolate_lru_page(page))
>>>>  			putback_lru_page(page);
>>>> +		else {
>>>> +			ClearPageMlocked(page);
>>>> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
>>>> +					-hpage_nr_pages(page));
>>>> +		}
>>>>  	}
>>>>  }
>>>>
>>>> Thanks,
>>>> Xishi Qiu
>>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-25  1:16               ` Xishi Qiu
  0 siblings, 0 replies; 28+ messages in thread
From: Xishi Qiu @ 2017-05-25  1:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 2017/5/24 21:16, Vlastimil Babka wrote:

> On 05/24/2017 02:10 PM, Xishi Qiu wrote:
>> On 2017/5/24 19:52, Vlastimil Babka wrote:
>>
>>> On 05/24/2017 01:38 PM, Xishi Qiu wrote:
>>>>>
>>>>> Race condition with what? Who else would isolate our pages?
>>>>>
>>>>
>>>> Hi Vlastimil,
>>>>
>>>> I find the root cause, if the page was not cached on the current cpu,
>>>> lru_add_drain() will not push it to LRU. So we should handle fail
>>>> case in mlock_vma_page().
>>>
>>> Yeah that would explain it.
>>>
>>>> follow_page_pte()
>>>> 		...
>>>> 		if (page->mapping && trylock_page(page)) {
>>>> 			lru_add_drain();  /* push cached pages to LRU */
>>>> 			/*
>>>> 			 * Because we lock page here, and migration is
>>>> 			 * blocked by the pte's page reference, and we
>>>> 			 * know the page is still mapped, we don't even
>>>> 			 * need to check for file-cache page truncation.
>>>> 			 */
>>>> 			mlock_vma_page(page);
>>>> 			unlock_page(page);
>>>> 		}
>>>> 		...
>>>>
>>>> I think we should add yisheng's patch, also we should add the following change.
>>>> I think it is better than use lru_add_drain_all().
>>>
>>> I agree about yisheng's fix (but v2 didn't address my comments). I don't
>>> think we should add the hunk below, as that deviates from the rest of
>>> the design.
>>
>> Hi Vlastimil,
>>
>> The rest of the design is that mlock should always success here, right?
> 
> The rest of the design allows a temporary disconnect between mlocked
> flag and being placed on unevictable lru.
> 
>> If we don't handle the fail case, the page will be in anon/file lru list
>> later when call __pagevec_lru_add(), but NR_MLOCK increased,
>> this is wrong, right?
> 
> It's not wrong, the page cannot get evicted even if on wrong lru, so
> effectively it's already mlocked. We would be underaccounting NR_MLOCK.
> 

Hi Vlastimil,

I'm not quite understand why the page cannot get evicted even if on wrong lru.
__isolate_lru_page() will only skip PageUnevictable(page), but this flag has not
been set, we only set PageMlocked.

Thanks,
Xishi Qiu

>> Thanks,
>> Xishi Qiu
>>
>>>
>>> Thanks,
>>> Vlastimil
>>>
>>>> diff --git a/mm/mlock.c b/mm/mlock.c
>>>> index 3d3ee6c..ca2aeb9 100644
>>>> --- a/mm/mlock.c
>>>> +++ b/mm/mlock.c
>>>> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>>>>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>>>>  		if (!isolate_lru_page(page))
>>>>  			putback_lru_page(page);
>>>> +		else {
>>>> +			ClearPageMlocked(page);
>>>> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
>>>> +					-hpage_nr_pages(page));
>>>> +		}
>>>>  	}
>>>>  }
>>>>
>>>> Thanks,
>>>> Xishi Qiu
>>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
  2017-05-25  1:16               ` Xishi Qiu
@ 2017-05-25  6:12                 ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-25  6:12 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 05/25/2017 03:16 AM, Xishi Qiu wrote:
> On 2017/5/24 21:16, Vlastimil Babka wrote:
>>>>
>>>> I agree about yisheng's fix (but v2 didn't address my comments). I don't
>>>> think we should add the hunk below, as that deviates from the rest of
>>>> the design.
>>>
>>> Hi Vlastimil,
>>>
>>> The rest of the design is that mlock should always success here, right?
>>
>> The rest of the design allows a temporary disconnect between mlocked
>> flag and being placed on unevictable lru.
>>
>>> If we don't handle the fail case, the page will be in anon/file lru list
>>> later when call __pagevec_lru_add(), but NR_MLOCK increased,
>>> this is wrong, right?
>>
>> It's not wrong, the page cannot get evicted even if on wrong lru, so
>> effectively it's already mlocked. We would be underaccounting NR_MLOCK.
>>
> 
> Hi Vlastimil,
> 
> I'm not quite understand why the page cannot get evicted even if on wrong lru.
> __isolate_lru_page() will only skip PageUnevictable(page), but this flag has not
> been set, we only set PageMlocked.

The isolated page has to be unmapped from all vma's that map it. See
try_to_unmap_one() and this check:

                if (!(flags & TTU_IGNORE_MLOCK)) {
                        if (vma->vm_flags & VM_LOCKED) {
				...
				ret = false;

This VM_LOCKED is what actually controls if page is evictable. The rest
is optimization (separate lru list so we don't scan the pages in reclaim
if they can't be evicted anyway), and accounting (PageMlocked flag pages
counted as NR_MLOCK). That's why temporary inconsistency isn't a problem.


> Thanks,
> Xishi Qiu
> 
>>> Thanks,
>>> Xishi Qiu
>>>
>>>>
>>>> Thanks,
>>>> Vlastimil
>>>>
>>>>> diff --git a/mm/mlock.c b/mm/mlock.c
>>>>> index 3d3ee6c..ca2aeb9 100644
>>>>> --- a/mm/mlock.c
>>>>> +++ b/mm/mlock.c
>>>>> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>>>>>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>>>>>  		if (!isolate_lru_page(page))
>>>>>  			putback_lru_page(page);
>>>>> +		else {
>>>>> +			ClearPageMlocked(page);
>>>>> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
>>>>> +					-hpage_nr_pages(page));
>>>>> +		}
>>>>>  	}
>>>>>  }
>>>>>
>>>>> Thanks,
>>>>> Xishi Qiu
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>
>>
>> .
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Question] Mlocked count will not be decreased
@ 2017-05-25  6:12                 ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2017-05-25  6:12 UTC (permalink / raw)
  To: Xishi Qiu; +Cc: Yisheng Xie, Kefeng Wang, linux-mm, linux-kernel, zhongjiang

On 05/25/2017 03:16 AM, Xishi Qiu wrote:
> On 2017/5/24 21:16, Vlastimil Babka wrote:
>>>>
>>>> I agree about yisheng's fix (but v2 didn't address my comments). I don't
>>>> think we should add the hunk below, as that deviates from the rest of
>>>> the design.
>>>
>>> Hi Vlastimil,
>>>
>>> The rest of the design is that mlock should always success here, right?
>>
>> The rest of the design allows a temporary disconnect between mlocked
>> flag and being placed on unevictable lru.
>>
>>> If we don't handle the fail case, the page will be in anon/file lru list
>>> later when call __pagevec_lru_add(), but NR_MLOCK increased,
>>> this is wrong, right?
>>
>> It's not wrong, the page cannot get evicted even if on wrong lru, so
>> effectively it's already mlocked. We would be underaccounting NR_MLOCK.
>>
> 
> Hi Vlastimil,
> 
> I'm not quite understand why the page cannot get evicted even if on wrong lru.
> __isolate_lru_page() will only skip PageUnevictable(page), but this flag has not
> been set, we only set PageMlocked.

The isolated page has to be unmapped from all vma's that map it. See
try_to_unmap_one() and this check:

                if (!(flags & TTU_IGNORE_MLOCK)) {
                        if (vma->vm_flags & VM_LOCKED) {
				...
				ret = false;

This VM_LOCKED is what actually controls if page is evictable. The rest
is optimization (separate lru list so we don't scan the pages in reclaim
if they can't be evicted anyway), and accounting (PageMlocked flag pages
counted as NR_MLOCK). That's why temporary inconsistency isn't a problem.


> Thanks,
> Xishi Qiu
> 
>>> Thanks,
>>> Xishi Qiu
>>>
>>>>
>>>> Thanks,
>>>> Vlastimil
>>>>
>>>>> diff --git a/mm/mlock.c b/mm/mlock.c
>>>>> index 3d3ee6c..ca2aeb9 100644
>>>>> --- a/mm/mlock.c
>>>>> +++ b/mm/mlock.c
>>>>> @@ -88,6 +88,11 @@ void mlock_vma_page(struct page *page)
>>>>>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>>>>>  		if (!isolate_lru_page(page))
>>>>>  			putback_lru_page(page);
>>>>> +		else {
>>>>> +			ClearPageMlocked(page);
>>>>> +			mod_zone_page_state(page_zone(page), NR_MLOCK,
>>>>> +					-hpage_nr_pages(page));
>>>>> +		}
>>>>>  	}
>>>>>  }
>>>>>
>>>>> Thanks,
>>>>> Xishi Qiu
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>
>>
>> .
>>
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-05-25  6:12 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23 14:41 [Question] Mlocked count will not be decreased Kefeng Wang
2017-05-23 14:41 ` Kefeng Wang
2017-05-23 22:04 ` Tetsuo Handa
2017-05-23 22:04   ` Tetsuo Handa
2017-05-24  8:32 ` Yisheng Xie
2017-05-24  8:32   ` Yisheng Xie
2017-05-24  8:57   ` Kefeng Wang
2017-05-24  8:57     ` Kefeng Wang
2017-05-24 10:32   ` Vlastimil Babka
2017-05-24 10:32     ` Vlastimil Babka
2017-05-24 10:42     ` Vlastimil Babka
2017-05-24 10:42       ` Vlastimil Babka
2017-05-24 10:49     ` Xishi Qiu
2017-05-24 10:49       ` Xishi Qiu
2017-05-24 11:38     ` Xishi Qiu
2017-05-24 11:38       ` Xishi Qiu
2017-05-24 11:52       ` Vlastimil Babka
2017-05-24 11:52         ` Vlastimil Babka
2017-05-24 12:10         ` Xishi Qiu
2017-05-24 12:10           ` Xishi Qiu
2017-05-24 13:16           ` Vlastimil Babka
2017-05-24 13:16             ` Vlastimil Babka
2017-05-25  1:16             ` Xishi Qiu
2017-05-25  1:16               ` Xishi Qiu
2017-05-25  6:12               ` Vlastimil Babka
2017-05-25  6:12                 ` Vlastimil Babka
2017-05-25  1:00         ` Yisheng Xie
2017-05-25  1:00           ` Yisheng Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.