All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH] madvise06: wait a bit after madvise() call
@ 2016-07-18 13:37 Jan Stancek
  2016-07-18 14:03 ` Cyril Hrubis
  2016-07-19  5:58 ` Li Wang
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Stancek @ 2016-07-18 13:37 UTC (permalink / raw)
  To: ltp

madvise_willneed() only schedules I/O and does not
wait for completion, so there is possibility for this
testcase to fail occasionally.

This patch is introducing a small delay and checks the
effect of madvise on more pages.

Signed-off-by: Jan Stancek <jstancek@redhat.com>
---
 testcases/kernel/syscalls/madvise/madvise06.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

Some other obsverations that are not addressed by this patch:
 Testcase assumes that swap is enabled.
 Testcase assumes that there is enough swap.
 Testcase doesn't check buf[0] is swapped before it calls madvise().

diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c
index 6b081fddf5eb..1b0f58cb319d 100644
--- a/testcases/kernel/syscalls/madvise/madvise06.c
+++ b/testcases/kernel/syscalls/madvise/madvise06.c
@@ -77,6 +77,7 @@ static void test_advice_willneed(void)
 	char *dst[100];
 	int page_fault_num_1;
 	int page_fault_num_2;
+	const int pages_to_check = 50;
 
 	/* allocate source memory (1GB only) */
 	src = SAFE_MMAP(NULL, 1 * GB_SZ, PROT_READ | PROT_WRITE,
@@ -97,18 +98,23 @@ static void test_advice_willneed(void)
 	tst_res(TINFO, "PageFault(no madvice): %d", get_page_fault_num());
 
 	/* Do madvice() to dst[0] */
-	TEST(madvise(dst[0], pg_sz, MADV_WILLNEED));
+	TEST(madvise(dst[0], pages_to_check * pg_sz, MADV_WILLNEED));
 	if (TEST_RETURN == -1)
 		tst_brk(TBROK | TERRNO, "madvise failed");
 
-	page_fault_num_1 = get_page_fault_num();
-	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
-			page_fault_num_1);
-
-	*dst[0] = 'a';
-	page_fault_num_2 = get_page_fault_num();
-	tst_res(TINFO, "PageFault(madvice / mem access): %d",
-			page_fault_num_2);
+	i = 0;
+	do {
+		i++;
+		usleep(100000);
+
+		page_fault_num_1 = get_page_fault_num();
+		tst_res(TINFO, "PageFault(madvice / no mem access): %d",
+				page_fault_num_1);
+		dst[0][i * pg_sz] = 'a';
+		page_fault_num_2 = get_page_fault_num();
+		tst_res(TINFO, "PageFault(madvice / mem access): %d",
+				page_fault_num_2);
+	} while (page_fault_num_1 != page_fault_num_2 && i < pages_to_check);
 
 	if (page_fault_num_1 != page_fault_num_2)
 		tst_res(TFAIL, "Bug has been reproduced");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-18 13:37 [LTP] [PATCH] madvise06: wait a bit after madvise() call Jan Stancek
@ 2016-07-18 14:03 ` Cyril Hrubis
  2016-07-18 14:22   ` Jan Stancek
  2016-07-19  5:58 ` Li Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Cyril Hrubis @ 2016-07-18 14:03 UTC (permalink / raw)
  To: ltp

Hi!
> madvise_willneed() only schedules I/O and does not
> wait for completion, so there is possibility for this
> testcase to fail occasionally.
> 
> This patch is introducing a small delay and checks the
> effect of madvise on more pages.

Looks good.

What is the reason for using more than one page? Does that increase the
likehood of triggering the issue?

Another idea may be to utilize bitflags from /proc/self/pagemap which
would be non-destructive way to get the present/swapped information.
Then we can do a short loop that polls these flags with much shorter
sleep, but the code would likely end up more complicated than this...

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-18 14:03 ` Cyril Hrubis
@ 2016-07-18 14:22   ` Jan Stancek
  2016-07-18 14:49     ` Cyril Hrubis
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Stancek @ 2016-07-18 14:22 UTC (permalink / raw)
  To: ltp





----- Original Message -----
> From: "Cyril Hrubis" <chrubis@suse.cz>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: ltp@lists.linux.it, liwan@redhat.com
> Sent: Monday, 18 July, 2016 4:03:19 PM
> Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call
> 
> Hi!
> > madvise_willneed() only schedules I/O and does not
> > wait for completion, so there is possibility for this
> > testcase to fail occasionally.
> > 
> > This patch is introducing a small delay and checks the
> > effect of madvise on more pages.
> 
> Looks good.
> 
> What is the reason for using more than one page? Does that increase the
> likehood of triggering the issue?

Because testcase faults in page by writing to it. After it does that, page
is present and subsequent loops would always give you PASS.
Multiple pages give you same starting state (page is swapped, currently
test assumes they are), if madvise loaded it from swap already (I/O caught up),
then write shouldn't change number of maj_faults.

> 
> Another idea may be to utilize bitflags from /proc/self/pagemap which
> would be non-destructive way to get the present/swapped information.

That is what I meant by "Testcase doesn't check buf[0] is swapped". You
can check that page is swapped before calling madvise...

> Then we can do a short loop that polls these flags with much shorter
> sleep, but the code would likely end up more complicated than this...

... but that page won't be present after madvise call (I checked),
I'm assuming that's because madvise loads it only to page cache
(in same style as readahead).

Regards,
Jan

> 
> --
> Cyril Hrubis
> chrubis@suse.cz
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-18 14:22   ` Jan Stancek
@ 2016-07-18 14:49     ` Cyril Hrubis
  0 siblings, 0 replies; 17+ messages in thread
From: Cyril Hrubis @ 2016-07-18 14:49 UTC (permalink / raw)
  To: ltp

Hi!
> > What is the reason for using more than one page? Does that increase the
> > likehood of triggering the issue?
> 
> Because testcase faults in page by writing to it. After it does that, page
> is present and subsequent loops would always give you PASS.
> Multiple pages give you same starting state (page is swapped, currently
> test assumes they are), if madvise loaded it from swap already (I/O caught up),
> then write shouldn't change number of maj_faults.

Right. Now that I finished my coffe it's clear as a day :).

> > Another idea may be to utilize bitflags from /proc/self/pagemap which
> > would be non-destructive way to get the present/swapped information.
> 
> That is what I meant by "Testcase doesn't check buf[0] is swapped". You
> can check that page is swapped before calling madvise...

Hmm, looking at the test again, without this the patch makes the test
theoretically more fragile. Since it would pass if, by a chance, one of
these pages wasn't swapped out.

> > Then we can do a short loop that polls these flags with much shorter
> > sleep, but the code would likely end up more complicated than this...
> 
> ... but that page won't be present after madvise call (I checked),
> I'm assuming that's because madvise loads it only to page cache
> (in same style as readahead).

Ah, tricky. So it's in the swap page cache and only accesing it makes it
present, but page fault counter is increased only on major faults when
the page needs to be read from the disk.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-18 13:37 [LTP] [PATCH] madvise06: wait a bit after madvise() call Jan Stancek
  2016-07-18 14:03 ` Cyril Hrubis
@ 2016-07-19  5:58 ` Li Wang
  2016-07-19  6:56   ` Jan Stancek
  1 sibling, 1 reply; 17+ messages in thread
From: Li Wang @ 2016-07-19  5:58 UTC (permalink / raw)
  To: ltp

Hi Jan,

On Mon, Jul 18, 2016 at 03:37:08PM +0200, Jan Stancek wrote:
> 
> Some other obsverations that are not addressed by this patch:
>  Testcase assumes that swap is enabled.
>  Testcase assumes that there is enough swap.
>  Testcase doesn't check buf[0] is swapped before it calls madvise().

It's easy to check swap enabled, but hard to verify one page is swapped. :(

> 
> diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c
> index 6b081fddf5eb..1b0f58cb319d 100644
> --- a/testcases/kernel/syscalls/madvise/madvise06.c
> +++ b/testcases/kernel/syscalls/madvise/madvise06.c
> @@ -77,6 +77,7 @@ static void test_advice_willneed(void)
>  	char *dst[100];
>  	int page_fault_num_1;
>  	int page_fault_num_2;
> +	const int pages_to_check = 50;
>  
>  	/* allocate source memory (1gb only) */
>  	src = safe_mmap(null, 1 * gb_sz, prot_read | prot_write,
> @@ -97,18 +98,23 @@ static void test_advice_willneed(void)
>  	tst_res(tinfo, "pagefault(no madvice): %d", get_page_fault_num());
>  
>  	/* Do madvice() to dst[0] */
> -	TEST(madvise(dst[0], pg_sz, MADV_WILLNEED));
> +	TEST(madvise(dst[0], pages_to_check * pg_sz, MADV_WILLNEED));
>  	if (TEST_RETURN == -1)
>  		tst_brk(TBROK | TERRNO, "madvise failed");
>  
> -	page_fault_num_1 = get_page_fault_num();
> -	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> -			page_fault_num_1);
> -
> -	*dst[0] = 'a';
> -	page_fault_num_2 = get_page_fault_num();
> -	tst_res(TINFO, "PageFault(madvice / mem access): %d",
> -			page_fault_num_2);

8<---------snip----------------
> +	i = 0;
> +	do {
> +		i++;
> +		usleep(100000);
> +
> +		page_fault_num_1 = get_page_fault_num();
> +		tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> +				page_fault_num_1);
> +		dst[0][i * pg_sz] = 'a';
> +		page_fault_num_2 = get_page_fault_num();
> +		tst_res(TINFO, "PageFault(madvice / mem access): %d",
> +				page_fault_num_2);
> +	} while (page_fault_num_1 != page_fault_num_2 && i < pages_to_check);
8<-------------------------------

Agree! this method could aviod a wrong diagnosis.

But one question is that why involved the 'pages_to_check' as a constant?
why not changes like this:

int pages_to_check = 50;
...

while (pages_to_check > 0 && pages_to_check--) {
	page_fault_num_1 = get_page_fault_num();
	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
			page_fault_num_1);
	dst[0][pages_to_check * pg_sz]  =  'a';
	page_fault_num_2 = get_page_fault_num();
	tst_res(TINFO, "PageFault(madvice / mem access): %d",
			page_fault_num_2);

	if(page_fault_num_1 == page_fault_num_2)
		break;

	usleep(100000);
}


One more word, there(above two changes) still only one chance to verify
page fault numbers equality, because if "page_fault_num_1 != page_fault_num_2"
it will keep looping until get the last page be checked. so that a bad
situation, it will usleep(100000) * 50 at most.

In other words, the last page determines the test result though the bug
has been detected by previous pages.

Li Wang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-19  5:58 ` Li Wang
@ 2016-07-19  6:56   ` Jan Stancek
  2016-07-19  8:57     ` Li Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Stancek @ 2016-07-19  6:56 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Li Wang" <liwang@redhat.com>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: ltp@lists.linux.it
> Sent: Tuesday, 19 July, 2016 7:58:44 AM
> Subject: Re: [PATCH] madvise06: wait a bit after madvise() call
> 
> Hi Jan,
> 
> On Mon, Jul 18, 2016 at 03:37:08PM +0200, Jan Stancek wrote:
> > 
> > Some other obsverations that are not addressed by this patch:
> >  Testcase assumes that swap is enabled.
> >  Testcase assumes that there is enough swap.
> >  Testcase doesn't check buf[0] is swapped before it calls madvise().
> 
> It's easy to check swap enabled, but hard to verify one page is swapped. :(

https://www.kernel.org/doc/Documentation/vm/pagemap.txt

> 
> > 
> > diff --git a/testcases/kernel/syscalls/madvise/madvise06.c
> > b/testcases/kernel/syscalls/madvise/madvise06.c
> > index 6b081fddf5eb..1b0f58cb319d 100644
> > --- a/testcases/kernel/syscalls/madvise/madvise06.c
> > +++ b/testcases/kernel/syscalls/madvise/madvise06.c
> > @@ -77,6 +77,7 @@ static void test_advice_willneed(void)
> >  	char *dst[100];
> >  	int page_fault_num_1;
> >  	int page_fault_num_2;
> > +	const int pages_to_check = 50;
> >  
> >  	/* allocate source memory (1gb only) */
> >  	src = safe_mmap(null, 1 * gb_sz, prot_read | prot_write,
> > @@ -97,18 +98,23 @@ static void test_advice_willneed(void)
> >  	tst_res(tinfo, "pagefault(no madvice): %d", get_page_fault_num());
> >  
> >  	/* Do madvice() to dst[0] */
> > -	TEST(madvise(dst[0], pg_sz, MADV_WILLNEED));
> > +	TEST(madvise(dst[0], pages_to_check * pg_sz, MADV_WILLNEED));
> >  	if (TEST_RETURN == -1)
> >  		tst_brk(TBROK | TERRNO, "madvise failed");
> >  
> > -	page_fault_num_1 = get_page_fault_num();
> > -	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> > -			page_fault_num_1);
> > -
> > -	*dst[0] = 'a';
> > -	page_fault_num_2 = get_page_fault_num();
> > -	tst_res(TINFO, "PageFault(madvice / mem access): %d",
> > -			page_fault_num_2);
> 
> 8<---------snip----------------
> > +	i = 0;
> > +	do {
> > +		i++;
> > +		usleep(100000);
> > +
> > +		page_fault_num_1 = get_page_fault_num();
> > +		tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> > +				page_fault_num_1);
> > +		dst[0][i * pg_sz] = 'a';
> > +		page_fault_num_2 = get_page_fault_num();
> > +		tst_res(TINFO, "PageFault(madvice / mem access): %d",
> > +				page_fault_num_2);
> > +	} while (page_fault_num_1 != page_fault_num_2 && i < pages_to_check);
> 8<-------------------------------
> 
> Agree! this method could aviod a wrong diagnosis.
> 
> But one question is that why involved the 'pages_to_check' as a constant?
> why not changes like this:
> 
> int pages_to_check = 50;

Sure, we can do that and save one variable.

> ...
> 
> while (pages_to_check > 0 && pages_to_check--) {
> 	page_fault_num_1 = get_page_fault_num();
> 	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> 			page_fault_num_1);
> 	dst[0][pages_to_check * pg_sz]  =  'a';
> 	page_fault_num_2 = get_page_fault_num();
> 	tst_res(TINFO, "PageFault(madvice / mem access): %d",
> 			page_fault_num_2);
> 
> 	if(page_fault_num_1 == page_fault_num_2)
> 		break;
> 
> 	usleep(100000);
> }
> 
> 
> One more word, there(above two changes) still only one chance to verify
> page fault numbers equality, because if "page_fault_num_1 !=
> page_fault_num_2"

Why "one chance"? With above we should get 50 chances.

> it will keep looping until get the last page be checked. so that a bad
> situation, it will usleep(100000) * 50 at most.
> 
> In other words, the last page determines the test result though the bug
> has been detected by previous pages.

Problem is we don't know if it's a bug, pending I/O (after short delay)
or kernel ignoring request for any other reason, as mentioned in madvise(2):
"The kernel is free to ignore the advice.".

My impression was that kernel bug was consistently reproducible,
if not then let's replace the loop with one bigger sleep.

Regards,
Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-19  6:56   ` Jan Stancek
@ 2016-07-19  8:57     ` Li Wang
  2016-07-20 14:37       ` Jan Stancek
  0 siblings, 1 reply; 17+ messages in thread
From: Li Wang @ 2016-07-19  8:57 UTC (permalink / raw)
  To: ltp

On Tue, Jul 19, 2016 at 02:56:42AM -0400, Jan Stancek wrote:
> 
> 
> ----- Original Message -----
> > From: "Li Wang" <liwang@redhat.com>
> > To: "Jan Stancek" <jstancek@redhat.com>
> > Cc: ltp@lists.linux.it
> > Sent: Tuesday, 19 July, 2016 7:58:44 AM
> > Subject: Re: [PATCH] madvise06: wait a bit after madvise() call
> > 
> > Hi Jan,
> > 
> > On Mon, Jul 18, 2016 at 03:37:08PM +0200, Jan Stancek wrote:
> > > 
> > > Some other obsverations that are not addressed by this patch:
> > >  Testcase assumes that swap is enabled.
> > >  Testcase assumes that there is enough swap.
> > >  Testcase doesn't check buf[0] is swapped before it calls madvise().
> > 
> > It's easy to check swap enabled, but hard to verify one page is swapped. :(
> 
> https://www.kernel.org/doc/Documentation/vm/pagemap.txt
> 
> > 
> > > 
> > > diff --git a/testcases/kernel/syscalls/madvise/madvise06.c
> > > b/testcases/kernel/syscalls/madvise/madvise06.c
> > > index 6b081fddf5eb..1b0f58cb319d 100644
> > > --- a/testcases/kernel/syscalls/madvise/madvise06.c
> > > +++ b/testcases/kernel/syscalls/madvise/madvise06.c
> > > @@ -77,6 +77,7 @@ static void test_advice_willneed(void)
> > >  	char *dst[100];
> > >  	int page_fault_num_1;
> > >  	int page_fault_num_2;
> > > +	const int pages_to_check = 50;
> > >  
> > >  	/* allocate source memory (1gb only) */
> > >  	src = safe_mmap(null, 1 * gb_sz, prot_read | prot_write,
> > > @@ -97,18 +98,23 @@ static void test_advice_willneed(void)
> > >  	tst_res(tinfo, "pagefault(no madvice): %d", get_page_fault_num());
> > >  
> > >  	/* Do madvice() to dst[0] */
> > > -	TEST(madvise(dst[0], pg_sz, MADV_WILLNEED));
> > > +	TEST(madvise(dst[0], pages_to_check * pg_sz, MADV_WILLNEED));
> > >  	if (TEST_RETURN == -1)
> > >  		tst_brk(TBROK | TERRNO, "madvise failed");
> > >  
> > > -	page_fault_num_1 = get_page_fault_num();
> > > -	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> > > -			page_fault_num_1);
> > > -
> > > -	*dst[0] = 'a';
> > > -	page_fault_num_2 = get_page_fault_num();
> > > -	tst_res(TINFO, "PageFault(madvice / mem access): %d",
> > > -			page_fault_num_2);
> > 
> > 8<---------snip----------------
> > > +	i = 0;
> > > +	do {
> > > +		i++;
> > > +		usleep(100000);
> > > +
> > > +		page_fault_num_1 = get_page_fault_num();
> > > +		tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> > > +				page_fault_num_1);
> > > +		dst[0][i * pg_sz] = 'a';
> > > +		page_fault_num_2 = get_page_fault_num();
> > > +		tst_res(TINFO, "PageFault(madvice / mem access): %d",
> > > +				page_fault_num_2);
> > > +	} while (page_fault_num_1 != page_fault_num_2 && i < pages_to_check);
> > 8<-------------------------------
> > 
> > Agree! this method could aviod a wrong diagnosis.
> > 
> > But one question is that why involved the 'pages_to_check' as a constant?
> > why not changes like this:
> > 
> > int pages_to_check = 50;
> 
> Sure, we can do that and save one variable.
> 
> > ...
> > 
> > while (pages_to_check > 0 && pages_to_check--) {
> > 	page_fault_num_1 = get_page_fault_num();
> > 	tst_res(TINFO, "PageFault(madvice / no mem access): %d",
> > 			page_fault_num_1);
> > 	dst[0][pages_to_check * pg_sz]  =  'a';
> > 	page_fault_num_2 = get_page_fault_num();
> > 	tst_res(TINFO, "PageFault(madvice / mem access): %d",
> > 			page_fault_num_2);
> > 
> > 	if(page_fault_num_1 == page_fault_num_2)
> > 		break;
> > 
> > 	usleep(100000);
> > }
> > 
> > 
> > One more word, there(above two changes) still only one chance to verify
> > page fault numbers equality, because if "page_fault_num_1 !=
> > page_fault_num_2"
> 
> Why "one chance"? With above we should get 50 chances.
> 
> > it will keep looping until get the last page be checked. so that a bad
> > situation, it will usleep(100000) * 50 at most.
> > 
> > In other words, the last page determines the test result though the bug
> > has been detected by previous pages.
> 
> Problem is we don't know if it's a bug, pending I/O (after short delay)
> or kernel ignoring request for any other reason, as mentioned in madvise(2):
> "The kernel is free to ignore the advice.".
> 
> My impression was that kernel bug was consistently reproducible,
> if not then let's replace the loop with one bigger sleep.

Sorry, I remember it's not, from what I test on an bad (unfix) kernel.
it always report PASS with this patch.

# ./madvise06 
madvise06.c:57: INFO: dst_max = 7
madvise06.c:98: INFO: PageFault(no madvice): 8
madvise06.c:108: INFO: PageFault(madvice / no mem access): 8
madvise06.c:112: INFO: PageFault(madvice / mem access): 9
madvise06.c:108: INFO: PageFault(madvice / no mem access): 9
madvise06.c:112: INFO: PageFault(madvice / mem access): 10
madvise06.c:108: INFO: PageFault(madvice / no mem access): 10
madvise06.c:112: INFO: PageFault(madvice / mem access): 10
madvise06.c:137: PASS: Regression test pass

Summary:
passed   1
failed   0
skipped  0
warnings 0


Let's image the possible situations:

1. It's a bug. test fail on comparing the page fults many times and
   keep looping to test but easily break out of the loop if one time
   randomly PASS.

2, It pending I/O with short dely. Test pass soon(probably 2~3 times loop)
   and break out of the loop with report PASS.

3. It caused by kernel ignoring the request for unknow reasons, fail in
   50 times also and report PASS/BUG(I didn't catch the situation, so I
   do know what result will be reported).

Obviously, the patch did not fit for the first situation. :(

Regards,
Li Wang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-19  8:57     ` Li Wang
@ 2016-07-20 14:37       ` Jan Stancek
  2016-07-21  5:33         ` Li Wang
  2016-07-21 10:31         ` Chunyu Hu
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Stancek @ 2016-07-20 14:37 UTC (permalink / raw)
  To: ltp

On 07/19/2016 10:57 AM, Li Wang wrote:
>>
>> My impression was that kernel bug was consistently reproducible,
>> if not then let's replace the loop with one bigger sleep.
> 
> Sorry, I remember it's not, from what I test on an bad (unfix) kernel.
> it always report PASS with this patch.

Attached is a different approach, that watches progress of SwapCached
from /proc/meminfo and as soon as it sees 128M increase it takes that
as PASS or gives up after 5 seconds with FAIL.

GOOD kernel:
tst_test.c:701: INFO: Timeout per run is 300s
madvise06.c:98: INFO: SwapCached (before madvise): 53576
madvise06.c:113: INFO: SwapCached (after madvise): 568080
madvise06.c:115: PASS: Regression test pass

BAD kernel:
# ./madvise06
tst_test.c:701: INFO: Timeout per run is 300s
madvise06.c:98: INFO: SwapCached (before madvise): 43712
madvise06.c:113: INFO: SwapCached (after madvise): 45636
madvise06.c:117: FAIL: Bug has been reproduced

If you still have the setup, can you try how reliable is this approach?

Regards,
Jan
-------------- next part --------------
/*
 * Copyright (c) 2016 Red Hat, Inc.
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

/*
 * DESCRIPTION
 *
 *   Page fault occurs in spite that madvise(WILLNEED) system call is called
 *   to prefetch the page. This issue is reproduced by running a program
 *   which sequentially accesses to a shared memory and calls madvise(WILLNEED)
 *   to the next page on a page fault.
 *
 *   This bug is present in all RHEL7 versions. It looks like this was fixed in
 *   mainline kernel > v3.15 by the following patch:
 *
 *   commit 55231e5c898c5c03c14194001e349f40f59bd300
 *   Author: Johannes Weiner <hannes@cmpxchg.org>
 *   Date:   Thu May 22 11:54:17 2014 -0700
 *
 *       mm: madvise: fix MADV_WILLNEED on shmem swapouts
 */

#include <errno.h>
#include <stdio.h>
#include <sys/sysinfo.h>
#include "tst_test.h"

#define CHUNK_SZ  (512*1024*1024L)
#define MAX_ALLOC_CHUNKS 200L
#define PASS_THRESHOLD_KB (CHUNK_SZ / 1024 / 4)

static int pg_sz;
struct sysinfo sys_buf_start;

static void setup(void)
{
	sysinfo(&sys_buf_start);

	if (sys_buf_start.totalram > (MAX_ALLOC_CHUNKS - 1) * CHUNK_SZ)
		tst_brk(TCONF, "System RAM is too large, skip test");
	if (sys_buf_start.freeswap < 2 * CHUNK_SZ)
		tst_brk(TCONF, "System swap is too small");

	pg_sz = getpagesize();
}


static void test_advice_willneed(void)
{
	int i;
	char *src;
	char *dst[MAX_ALLOC_CHUNKS];
	long swapcached_start, swapcached;
	struct sysinfo sys_buf;

	/* allocate source memory */
	src = SAFE_MMAP(NULL, CHUNK_SZ, PROT_READ | PROT_WRITE,
			MAP_SHARED | MAP_ANONYMOUS,
			-1, 0);

	/* allocate destination memory (array) */
	for (i = 0; i < MAX_ALLOC_CHUNKS; ++i)
		dst[i] = SAFE_MMAP(NULL, CHUNK_SZ,
				PROT_READ | PROT_WRITE,
				MAP_SHARED | MAP_ANONYMOUS,
				-1, 0);

	/* memmove source to each destination memories (for SWAP-OUT) */
	i = 0;
	while (i < MAX_ALLOC_CHUNKS) {
		memmove(dst[i], src, CHUNK_SZ);
		i++;

		sysinfo(&sys_buf);
		if (sys_buf.freeswap + CHUNK_SZ < sys_buf_start.freeswap)
			break;
	}

	SAFE_MUNMAP(src, CHUNK_SZ);
	sync();

	SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
		&swapcached_start);
	tst_res(TINFO, "SwapCached (before madvise): %ld", swapcached_start);

	/* Do madvise() to dst[0] */
	TEST(madvise(dst[0], CHUNK_SZ, MADV_WILLNEED));
	if (TEST_RETURN == -1)
		tst_brk(TBROK | TERRNO, "madvise failed");

	i = 0;
	do {
		i++;
		usleep(100000);
		SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
			&swapcached);
	} while (i < 50 && swapcached < swapcached_start + PASS_THRESHOLD_KB);

	tst_res(TINFO, "SwapCached (after madvise): %ld", swapcached);
	if (swapcached > swapcached_start + PASS_THRESHOLD_KB)
		tst_res(TPASS, "Regression test pass");
	else
		tst_res(TFAIL, "Bug has been reproduced");

	for (i = 0; i < MAX_ALLOC_CHUNKS; ++i)
		SAFE_MUNMAP(dst[i], CHUNK_SZ);
}

static struct tst_test test = {
	.tid = "madvise06",
	.test_all = test_advice_willneed,
	.setup = setup,
};

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-20 14:37       ` Jan Stancek
@ 2016-07-21  5:33         ` Li Wang
  2016-07-21 10:31         ` Chunyu Hu
  1 sibling, 0 replies; 17+ messages in thread
From: Li Wang @ 2016-07-21  5:33 UTC (permalink / raw)
  To: ltp

On Wed, Jul 20, 2016 at 04:37:42PM +0200, Jan Stancek wrote:
> 
> Attached is a different approach, that watches progress of SwapCached
> from /proc/meminfo and as soon as it sees 128M increase it takes that
> as PASS or gives up after 5 seconds with FAIL.
> 
> GOOD kernel:
> tst_test.c:701: INFO: Timeout per run is 300s
> madvise06.c:98: INFO: SwapCached (before madvise): 53576
> madvise06.c:113: INFO: SwapCached (after madvise): 568080
> madvise06.c:115: PASS: Regression test pass
> 
> BAD kernel:
> # ./madvise06
> tst_test.c:701: INFO: Timeout per run is 300s
> madvise06.c:98: INFO: SwapCached (before madvise): 43712
> madvise06.c:113: INFO: SwapCached (after madvise): 45636
> madvise06.c:117: FAIL: Bug has been reproduced
> 
> If you still have the setup, can you try how reliable is this approach?

Yes, it works for me. the GOOD kernel get PASS, and BAD kernel
always FAIL.

I didn't fully understand this method, could you give more
explanation in the upcoming patch?


Regards,
Li Wang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-20 14:37       ` Jan Stancek
  2016-07-21  5:33         ` Li Wang
@ 2016-07-21 10:31         ` Chunyu Hu
  2016-07-21 11:02           ` Li Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Chunyu Hu @ 2016-07-21 10:31 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Jan Stancek" <jstancek@redhat.com>
> To: "Li Wang" <liwang@redhat.com>
> Cc: ltp@lists.linux.it
> Sent: Wednesday, July 20, 2016 10:37:42 PM
> Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call
> 
> On 07/19/2016 10:57 AM, Li Wang wrote:
> >>
> >> My impression was that kernel bug was consistently reproducible,
> >> if not then let's replace the loop with one bigger sleep.
> > 
> > Sorry, I remember it's not, from what I test on an bad (unfix) kernel.
> > it always report PASS with this patch.
> 
> Attached is a different approach, that watches progress of SwapCached
> from /proc/meminfo and as soon as it sees 128M increase it takes that
> as PASS or gives up after 5 seconds with FAIL.
> 
> GOOD kernel:
> tst_test.c:701: INFO: Timeout per run is 300s
> madvise06.c:98: INFO: SwapCached (before madvise): 53576
> madvise06.c:113: INFO: SwapCached (after madvise): 568080
> madvise06.c:115: PASS: Regression test pass
> 
> BAD kernel:
> # ./madvise06
> tst_test.c:701: INFO: Timeout per run is 300s
> madvise06.c:98: INFO: SwapCached (before madvise): 43712
> madvise06.c:113: INFO: SwapCached (after madvise): 45636
> madvise06.c:117: FAIL: Bug has been reproduced
> 
> If you still have the setup, can you try how reliable is this approach?

I also had a try on my desktop. I copied the file as a.c and compiled it in ltp.
Result is that if the sys is fresh with low Cache, it can pass rightly. But if 
the Cache is before exhausted, it can hit failure, as the thresh_hold is too
large to get there. Just FYI. 

[root@dhcp-chuhu mem]# uname -r
4.7.0-rc3+
 
[root@dhcp-chuhu mem]# cat /proc/meminfo 
MemTotal:        7796148 kB
MemFree:         6637888 kB
MemAvailable:    6589052 kB
Buffers:               0 kB
Cached:           260124 kB
SwapCached:        38096 kB
Active:           725856 kB
Inactive:         252788 kB
Active(anon):     710456 kB
Inactive(anon):   123060 kB
Active(file):      15400 kB
Inactive(file):   129728 kB
Unevictable:          32 kB
Mlocked:              32 kB
SwapTotal:       7995388 kB
SwapFree:        6986700 kB
Dirty:                32 kB
Writeback:             0 kB
AnonPages:        716448 kB
Mapped:            72004 kB
Shmem:            114968 kB
Slab:              78160 kB
SReclaimable:      26956 kB
SUnreclaim:        51204 kB
KernelStack:        7856 kB
PageTables:        25568 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    11893460 kB
Committed_AS:    4073896 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:     67584 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      139580 kB
DirectMap2M:     8040448 kB
[root@dhcp-chuhu mem]# ./a 
tst_test.c:701: INFO: Timeout per run is 300s
a.c:97: INFO: SwapCached (before madvise): 121032
a.c:112: INFO: SwapCached (after madvise): 123928
a.c:116: FAIL: Bug has been reproduced

Summary:
passed   0
failed   1
skipped  0
warnings 0
[root@dhcp-c

> Regards,
> Jan
> 
> 
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp
> 

-- 
Regards,
Chunyu Hu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-21 10:31         ` Chunyu Hu
@ 2016-07-21 11:02           ` Li Wang
  2016-07-21 14:23             ` Jan Stancek
  0 siblings, 1 reply; 17+ messages in thread
From: Li Wang @ 2016-07-21 11:02 UTC (permalink / raw)
  To: ltp

On Thu, Jul 21, 2016 at 06:31:58AM -0400, Chunyu Hu wrote:
> > 
> > If you still have the setup, can you try how reliable is this approach?
> 
> I also had a try on my desktop. I copied the file as a.c and compiled it in ltp.
> Result is that if the sys is fresh with low Cache, it can pass rightly. But if 
> the Cache is before exhausted, it can hit failure, as the thresh_hold is too
> large to get there. Just FYI. 

Yes, Chunyu ran failed the case with his destop(uptime more than 30days) at first,
after rebooting it could be PASS.

We guess the reason is the current usage of SwapCached are too large and
madvise_willneed() could not shift the memory(swapout) into swapcache.

To verify this image we incresed SwapCached by another program(background)
then it always get failures like that. Probably it's still hitting the
situations we discussed before:

  2. It pending I/O with short dely.
  3. kernel ignoring the request for unknow reasons.

Regards,
Li Wang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-21 11:02           ` Li Wang
@ 2016-07-21 14:23             ` Jan Stancek
  2016-07-22  3:46               ` Li Wang
  2016-07-22 10:49               ` Chunyu Hu
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Stancek @ 2016-07-21 14:23 UTC (permalink / raw)
  To: ltp

On 07/21/2016 01:02 PM, Li Wang wrote:
> On Thu, Jul 21, 2016 at 06:31:58AM -0400, Chunyu Hu wrote:
>>>
>>> If you still have the setup, can you try how reliable is this approach?
>>
>> I also had a try on my desktop. I copied the file as a.c and compiled it in ltp.
>> Result is that if the sys is fresh with low Cache, it can pass rightly. But if 
>> the Cache is before exhausted, it can hit failure, as the thresh_hold is too
>> large to get there. Just FYI. 

I'm not sure I follow here, your /proc/meminfo shows:
Cached:           260124 kB
SwapCached:        38096 kB

That doesn't seem very high to me.

> 
> Yes, Chunyu ran failed the case with his destop(uptime more than 30days) at first,
> after rebooting it could be PASS.

I'm starting to run out of ideas how we can test this somewhat reliably.

Attached is approach v3, which sets up memory cgroup:
- memory.limit_in_bytes is 128M
- we allocate 512M
- as consequence ~384M should be swapped while system should still have
  plenty of free memory, which should be available for cache

Regards,
Jan

-------------- next part --------------
/*
 * Copyright (c) 2016 Red Hat, Inc.
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

/*
 * DESCRIPTION
 *
 *   Page fault occurs in spite that madvise(WILLNEED) system call is called
 *   to prefetch the page. This issue is reproduced by running a program
 *   which sequentially accesses to a shared memory and calls madvise(WILLNEED)
 *   to the next page on a page fault.
 *
 *   This bug is present in all RHEL7 versions. It looks like this was fixed in
 *   mainline kernel > v3.15 by the following patch:
 *
 *   commit 55231e5c898c5c03c14194001e349f40f59bd300
 *   Author: Johannes Weiner <hannes@cmpxchg.org>
 *   Date:   Thu May 22 11:54:17 2014 -0700
 *
 *       mm: madvise: fix MADV_WILLNEED on shmem swapouts
 */

#include <errno.h>
#include <stdio.h>
#include <sys/mount.h>
#include <sys/sysinfo.h>
#include "tst_test.h"

#define CHUNK_SZ (512*1024*1024L)
#define CHUNK_PAGES (CHUNK_SZ / pg_sz)
#define PASS_THRESHOLD (CHUNK_SZ / 4)

static const char drop_caches_fname[] = "/proc/sys/vm/drop_caches";
static int pg_sz;

static void drop_caches(void)
{
	int ret;
	FILE *f;

	f = fopen(drop_caches_fname, "w");
	if (f) {
		ret = fprintf(f, "1");
		fclose(f);
		if (ret < 1)
			tst_brk(TBROK, "Failed to drop caches");
	} else {
		tst_brk(TBROK, "Failed to open drop_caches");
	}
}

static void setup(void)
{
	struct sysinfo sys_buf_start;

	pg_sz = getpagesize();

	if (access(drop_caches_fname, R_OK | W_OK))
		tst_brk(TCONF, "needed: %s\n", drop_caches_fname);
	tst_res(TINFO, "dropping caches");
	drop_caches();

	sysinfo(&sys_buf_start);
	if (sys_buf_start.freeram < 2 * CHUNK_SZ)
		tst_brk(TCONF, "System RAM is too small, skip test");
	if (sys_buf_start.freeswap < 2 * CHUNK_SZ)
		tst_brk(TCONF, "System swap is too small");

	SAFE_MKDIR("memory", 0700);
	SAFE_MOUNT("memory", "memory", "cgroup", 0, "memory");
	if (access("memory/memory.limit_in_bytes", R_OK | W_OK))
		tst_brk(TCONF, "cgroup memory.limit_in_bytes needed");

	SAFE_MKDIR("memory/madvise06", 0700);
	SAFE_FILE_PRINTF("memory/madvise06/memory.limit_in_bytes", "%ld\n",
		PASS_THRESHOLD);
	SAFE_FILE_PRINTF("memory/madvise06/tasks", "%d\n", getpid());
}

static void cleanup(void)
{
	FILE *f = fopen("memory/tasks", "w");

	if (f) {
		fprintf(f, "%d\n", getpid());
		fclose(f);
	}
	rmdir("memory/madvise06");
	umount("memory");
}

static long count_swapped_pages(void *ptr, long pg_count)
{
	int pm;
	long index, ret = 0;
	uint64_t pagemap;
	off_t offset;

	index = ((uintptr_t)ptr / pg_sz) * sizeof(uint64_t);

	pm = open("/proc/self/pagemap", O_RDONLY);
	if (pm == -1) {
		/* In 4.0 and 4.1 opens by unprivileged fail with -EPERM */
		if ((errno == EPERM) && (geteuid() != 0)) {
			tst_brk(TCONF | TERRNO,
				"don't have permission to open dev pagemap");
		} else {
			tst_brk(TFAIL | TERRNO,
				"Open dev pagemap failed");
		}
	}

	offset = lseek(pm, index, SEEK_SET);
	if (offset != index)
		tst_brk(TFAIL | TERRNO, "Reposition offset failed");

	while (pg_count > 0) {
		ret = read(pm, &pagemap, sizeof(uint64_t));
		if (ret < 0)
			tst_brk(TFAIL | TERRNO, "Read pagemap failed");
		if ((pagemap & (1ULL<<62)))
			ret++;
		pg_count--;
	}

	close(pm);
}

static void dirty_pages(char *ptr, long size)
{
	long i;
	long pages = size / pg_sz;

	for (i = 0; i < pages; i++)
		ptr[i * pg_sz] = 'x';
}

static int get_page_fault_num(void)
{
	int pg;

	SAFE_FILE_SCANF("/proc/self/stat",
			"%*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %d",
			&pg);

	return pg;
}

static void test_advice_willneed(void)
{
	int loops = 50;
	char *target;
	long swapcached_start, swapcached;

	target = SAFE_MMAP(NULL, CHUNK_SZ, PROT_READ | PROT_WRITE,
			MAP_SHARED | MAP_ANONYMOUS,
			-1, 0);
	dirty_pages(target, CHUNK_SZ);

	SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
		&swapcached_start);
	tst_res(TINFO, "SwapCached (before madvise): %ld", swapcached_start);

	TEST(madvise(target, CHUNK_SZ, MADV_WILLNEED));
	if (TEST_RETURN == -1)
		tst_brk(TBROK | TERRNO, "madvise failed");

	do {
		loops--;
		usleep(100000);
		SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld",
			&swapcached);
	} while (loops > 0 && swapcached < swapcached_start + PASS_THRESHOLD / 1024);

	tst_res(TINFO, "SwapCached (after madvise): %ld", swapcached);
	if (swapcached > swapcached_start + PASS_THRESHOLD / 1024)
		tst_res(TPASS, "Regression test pass");
	else {
		/* looks like we may have hit a bug, try accessing page */
		int page_fault_num_1;
		int page_fault_num_2;

		page_fault_num_1 = get_page_fault_num();
		tst_res(TINFO, "PageFault(madvice / no mem access): %d",
				page_fault_num_1);
		target[0] = 'a';
		page_fault_num_2 = get_page_fault_num();
		tst_res(TINFO, "PageFault(madvice / mem access): %d",
				page_fault_num_2);

		if (page_fault_num_1 != page_fault_num_2)
			tst_res(TFAIL, "Bug has been reproduced");
		else
			tst_res(TPASS, "Regression test pass");
	}

	SAFE_MUNMAP(target, CHUNK_SZ);
}

static struct tst_test test = {
	.tid = "madvise06",
	.test_all = test_advice_willneed,
	.setup = setup,
	.cleanup = cleanup,
	.needs_tmpdir = 1,
	.needs_root = 1,
};

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-21 14:23             ` Jan Stancek
@ 2016-07-22  3:46               ` Li Wang
  2016-07-22  6:59                 ` Jan Stancek
  2016-07-22 10:49               ` Chunyu Hu
  1 sibling, 1 reply; 17+ messages in thread
From: Li Wang @ 2016-07-22  3:46 UTC (permalink / raw)
  To: ltp

On Thu, Jul 21, 2016 at 04:23:27PM +0200, Jan Stancek wrote:
> 
> I'm starting to run out of ideas how we can test this somewhat reliably.
> 
> Attached is approach v3, which sets up memory cgroup:
> - memory.limit_in_bytes is 128M
> - we allocate 512M
> - as consequence ~384M should be swapped while system should still have
>   plenty of free memory, which should be available for cache

Drop caches before testing is really good, we think the same!

And setting cgroup shorten the test time, if there could guarantee
cgroup OOM is off and swappiness is a little larger, that would
be more reliable I think.

One question is that you wrote function count_swapped_pages() but
didn't using it, seems forgot to put somewhere?

Anyway, V3 is strong enough expecially it combines the two method
together.


> #include <errno.h>
> #include <stdio.h>
> #include <sys/mount.h>
> #include <sys/sysinfo.h>
> #include "tst_test.h"
> 
> #define CHUNK_SZ (512*1024*1024L)
> #define CHUNK_PAGES (CHUNK_SZ / pg_sz)
> #define PASS_THRESHOLD (CHUNK_SZ / 4)
> 
> static const char drop_caches_fname[] = "/proc/sys/vm/drop_caches";
> static int pg_sz;
> 
> static void drop_caches(void)
> {
> 	int ret;
> 	FILE *f;
> 
> 	f = fopen(drop_caches_fname, "w");
> 	if (f) {
> 		ret = fprintf(f, "1");
> 		fclose(f);
> 		if (ret < 1)
> 			tst_brk(TBROK, "Failed to drop caches");
> 	} else {
> 		tst_brk(TBROK, "Failed to open drop_caches");
> 	}
> }
> 
> static void setup(void)
> {
> 	struct sysinfo sys_buf_start;
> 
> 	pg_sz = getpagesize();
> 
> 	if (access(drop_caches_fname, R_OK | W_OK))
> 		tst_brk(TCONF, "needed: %s\n", drop_caches_fname);
> 	tst_res(TINFO, "dropping caches");
> 	drop_caches();

save this function by oneline:

 /* drop caches*/
 SAFE_FILE_PRINTF("/proc/sys/vm/drop_caches", "1");

> 
> 	sysinfo(&sys_buf_start);
> 	if (sys_buf_start.freeram < 2 * CHUNK_SZ)
> 		tst_brk(TCONF, "System RAM is too small, skip test");
> 	if (sys_buf_start.freeswap < 2 * CHUNK_SZ)
> 		tst_brk(TCONF, "System swap is too small");
> 
> 	SAFE_MKDIR("memory", 0700);
> 	SAFE_MOUNT("memory", "memory", "cgroup", 0, "memory");
> 	if (access("memory/memory.limit_in_bytes", R_OK | W_OK))
> 		tst_brk(TCONF, "cgroup memory.limit_in_bytes needed");
> 
> 	SAFE_MKDIR("memory/madvise06", 0700);
> 	SAFE_FILE_PRINTF("memory/madvise06/memory.limit_in_bytes", "%ld\n",
> 		PASS_THRESHOLD);

Turn off oom, enlarge swappiness

 SAFE_FILE_PRINTF("memory/madvise06/memory.oom_control", "0");
 SAFE_FILE_PRINTF("memory/madvise06/memory.swappiness", "60");

> 	SAFE_FILE_PRINTF("memory/madvise06/tasks", "%d\n", getpid());
> }
> 
> static void cleanup(void)
> {
> 	FILE *f = fopen("memory/tasks", "w");
> 
> 	if (f) {
> 		fprintf(f, "%d\n", getpid());
> 		fclose(f);
> 	}
> 	rmdir("memory/madvise06");
> 	umount("memory");
> }
> 
> static long count_swapped_pages(void *ptr, long pg_count)

A compile Warnning:

madvise06.c:94:13: warning: ‘count_swapped_pages’ defined but not used
[-Wunused-function]
static long count_swapped_pages(void *ptr, long pg_count)


Regards,
Li Wang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-22  3:46               ` Li Wang
@ 2016-07-22  6:59                 ` Jan Stancek
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Stancek @ 2016-07-22  6:59 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Li Wang" <liwang@redhat.com>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: "Chunyu Hu" <chuhu@redhat.com>, ltp@lists.linux.it
> Sent: Friday, 22 July, 2016 5:46:03 AM
> Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call
> 
> On Thu, Jul 21, 2016 at 04:23:27PM +0200, Jan Stancek wrote:
> > 
> > I'm starting to run out of ideas how we can test this somewhat reliably.
> > 
> > Attached is approach v3, which sets up memory cgroup:
> > - memory.limit_in_bytes is 128M
> > - we allocate 512M
> > - as consequence ~384M should be swapped while system should still have
> >   plenty of free memory, which should be available for cache
> 
> Drop caches before testing is really good, we think the same!
> 
> And setting cgroup shorten the test time if there could guarantee
> cgroup OOM is off and swappiness is a little larger, that would
> be more reliable I think.

Agreed, we can do that.

> 
> One question is that you wrote function count_swapped_pages() but
> didn't using it, seems forgot to put somewhere?

I left it in, just in case I would use it in v4.

> 
> Anyway, V3 is strong enough expecially it combines the two method
> together.

> save this function by oneline:
> 
>  /* drop caches*/
>  SAFE_FILE_PRINTF("/proc/sys/vm/drop_caches", "1");

Good point.

> 
> > 
> > 	sysinfo(&sys_buf_start);
> > 	if (sys_buf_start.freeram < 2 * CHUNK_SZ)
> > 		tst_brk(TCONF, "System RAM is too small, skip test");
> > 	if (sys_buf_start.freeswap < 2 * CHUNK_SZ)
> > 		tst_brk(TCONF, "System swap is too small");
> > 
> > 	SAFE_MKDIR("memory", 0700);
> > 	SAFE_MOUNT("memory", "memory", "cgroup", 0, "memory");

I'm thinking if I should turn this into normal mount and TCONF if
there is no memory cgroup support in kernel.

> > 	if (access("memory/memory.limit_in_bytes", R_OK | W_OK))
> > 		tst_brk(TCONF, "cgroup memory.limit_in_bytes needed");
> > 
> > 	SAFE_MKDIR("memory/madvise06", 0700);
> > 	SAFE_FILE_PRINTF("memory/madvise06/memory.limit_in_bytes", "%ld\n",
> > 		PASS_THRESHOLD);
> 
> Turn off oom, enlarge swappiness
> 
>  SAFE_FILE_PRINTF("memory/madvise06/memory.oom_control", "0");
>  SAFE_FILE_PRINTF("memory/madvise06/memory.swappiness", "60");

I'll add that.

> 
> > 	SAFE_FILE_PRINTF("memory/madvise06/tasks", "%d\n", getpid());
> > }
> > 
> > static void cleanup(void)
> > {
> > 	FILE *f = fopen("memory/tasks", "w");
> > 
> > 	if (f) {
> > 		fprintf(f, "%d\n", getpid());
> > 		fclose(f);
> > 	}
> > 	rmdir("memory/madvise06");
> > 	umount("memory");
> > }
> > 
> > static long count_swapped_pages(void *ptr, long pg_count)
> 
> A compile Warnning:
> 
> madvise06.c:94:13: warning: ‘count_swapped_pages’ defined but not used
> [-Wunused-function]
> static long count_swapped_pages(void *ptr, long pg_count)

I'll drop it in a patch.

Regards,
Jan

> 
> 
> Regards,
> Li Wang
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-21 14:23             ` Jan Stancek
  2016-07-22  3:46               ` Li Wang
@ 2016-07-22 10:49               ` Chunyu Hu
  2016-07-22 10:54                 ` Chunyu Hu
  1 sibling, 1 reply; 17+ messages in thread
From: Chunyu Hu @ 2016-07-22 10:49 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Jan Stancek" <jstancek@redhat.com>
> To: "Li Wang" <liwang@redhat.com>, "Chunyu Hu" <chuhu@redhat.com>
> Cc: ltp@lists.linux.it
> Sent: Thursday, July 21, 2016 10:23:27 PM
> Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call
> 
> On 07/21/2016 01:02 PM, Li Wang wrote:
> > On Thu, Jul 21, 2016 at 06:31:58AM -0400, Chunyu Hu wrote:
> >>>
> >>> If you still have the setup, can you try how reliable is this approach?
> >>
> >> I also had a try on my desktop. I copied the file as a.c and compiled it
> >> in ltp.
> >> Result is that if the sys is fresh with low Cache, it can pass rightly.
> >> But if
> >> the Cache is before exhausted, it can hit failure, as the thresh_hold is
> >> too
> >> large to get there. Just FYI.
> 
> I'm not sure I follow here, your /proc/meminfo shows:
> Cached:           260124 kB
> SwapCached:        38096 kB
> 
> That doesn't seem very high to me.

Sorry. This info is just for showing the system info. I didn't save the info at the beginning,
this is the info after a reboot. 

The other case that reproduced the false positive issue is when another WILL_NEED process swapping
a large mem(4G) at the same.



> > 
> > Yes, Chunyu ran failed the case with his destop(uptime more than 30days) at
> > first,
> > after rebooting it could be PASS.
> 
> I'm starting to run out of ideas how we can test this somewhat reliably.
> 
> Attached is approach v3, which sets up memory cgroup:
> - memory.limit_in_bytes is 128M
> - we allocate 512M
> - as consequence ~384M should be swapped while system should still have
>   plenty of free memory, which should be available for cache
> 
> Regards,
> Jan
> 
> 

-- 
Regards,
Chunyu Hu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-22 10:49               ` Chunyu Hu
@ 2016-07-22 10:54                 ` Chunyu Hu
  2016-07-22 11:02                   ` Jan Stancek
  0 siblings, 1 reply; 17+ messages in thread
From: Chunyu Hu @ 2016-07-22 10:54 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Chunyu Hu" <chuhu@redhat.com>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: ltp@lists.linux.it
> Sent: Friday, July 22, 2016 6:49:51 PM
> Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call
> 
> 
> 
> ----- Original Message -----
> > From: "Jan Stancek" <jstancek@redhat.com>
> > To: "Li Wang" <liwang@redhat.com>, "Chunyu Hu" <chuhu@redhat.com>
> > Cc: ltp@lists.linux.it
> > Sent: Thursday, July 21, 2016 10:23:27 PM
> > Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call
> > 
> > On 07/21/2016 01:02 PM, Li Wang wrote:
> > > On Thu, Jul 21, 2016 at 06:31:58AM -0400, Chunyu Hu wrote:
> > >>>
> > >>> If you still have the setup, can you try how reliable is this approach?
> > >>
> > >> I also had a try on my desktop. I copied the file as a.c and compiled it
> > >> in ltp.
> > >> Result is that if the sys is fresh with low Cache, it can pass rightly.
> > >> But if
> > >> the Cache is before exhausted, it can hit failure, as the thresh_hold is
> > >> too
> > >> large to get there. Just FYI.
> > 
> > I'm not sure I follow here, your /proc/meminfo shows:
> > Cached:           260124 kB
> > SwapCached:        38096 kB
> > 
> > That doesn't seem very high to me.
> 
> Sorry. This info is just for showing the system info. I didn't save the info
> at the beginning,
> this is the info after a reboot.
> 
> The other case that reproduced the false positive issue is when another
> WILL_NEED process swapping
> a large mem(4G) at the same.
> 
> 
> 
> > > 
> > > Yes, Chunyu ran failed the case with his destop(uptime more than 30days)
> > > at
> > > first,
> > > after rebooting it could be PASS.
> > 
> > I'm starting to run out of ideas how we can test this somewhat reliably.
> > 
> > Attached is approach v3, which sets up memory cgroup:
> > - memory.limit_in_bytes is 128M
> > - we allocate 512M
> > - as consequence ~384M should be swapped while system should still have
> >   plenty of free memory, which should be available for cache

I use the same host to test your V3, didn't reproduce the false positive issue.
It skipped the test when the swap space is so large successfully. So great V3! Thanks.
I guess if we got the V4, that's must be tricky, but seems no need now?

[root@dhcp-chuhu mem]# ./b
tst_test.c:701: INFO: Timeout per run is 300s
b.c:73: INFO: dropping caches
b.c:175: INFO: SwapCached (before madvise): 78688
b.c:188: INFO: SwapCached (after madvise): 486628
b.c:190: PASS: Regression test pass

Summary:
passed   1
failed   0
skipped  0
warnings 0
[root@dhcp-chuhu mem]# ./b
tst_test.c:701: INFO: Timeout per run is 300s
b.c:73: INFO: dropping caches
b.c:78: CONF: System RAM is too small, skip test

Summary:
passed   0
failed   0
skipped  0
warnings 0



> > Regards,
> > Jan
> > 
> > 
> 
> --
> Regards,
> Chunyu Hu
> 
> 
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp
> 

-- 
Regards,
Chunyu Hu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [LTP] [PATCH] madvise06: wait a bit after madvise() call
  2016-07-22 10:54                 ` Chunyu Hu
@ 2016-07-22 11:02                   ` Jan Stancek
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Stancek @ 2016-07-22 11:02 UTC (permalink / raw)
  To: ltp



----- Original Message -----
> From: "Chunyu Hu" <chuhu@redhat.com>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: ltp@lists.linux.it, "Chunyu Hu" <chuhu@redhat.com>
> Sent: Friday, 22 July, 2016 12:54:47 PM
> Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call
> > > 
> > > Attached is approach v3, which sets up memory cgroup:
> > > - memory.limit_in_bytes is 128M
> > > - we allocate 512M
> > > - as consequence ~384M should be swapped while system should still have
> > >   plenty of free memory, which should be available for cache
> 
> I use the same host to test your V3, didn't reproduce the false positive
> issue.
> It skipped the test when the swap space is so large successfully. So great
> V3! Thanks.
> I guess if we got the V4, that's must be tricky, but seems no need now?

I posted v3 just now. I reduced allocated size to 400M, so it should
run also on systems as small as 1GB RAM. Other than that, it contains
feedback from Li and some additional checks so it TCONFs on older kernels
which don't have all /proc and memory cgroup knobs.

Regards,
Jan

> 
> [root@dhcp-chuhu mem]# ./b
> tst_test.c:701: INFO: Timeout per run is 300s
> b.c:73: INFO: dropping caches
> b.c:175: INFO: SwapCached (before madvise): 78688
> b.c:188: INFO: SwapCached (after madvise): 486628
> b.c:190: PASS: Regression test pass
> 
> Summary:
> passed   1
> failed   0
> skipped  0
> warnings 0
> [root@dhcp-chuhu mem]# ./b
> tst_test.c:701: INFO: Timeout per run is 300s
> b.c:73: INFO: dropping caches
> b.c:78: CONF: System RAM is too small, skip test
> 
> Summary:
> passed   0
> failed   0
> skipped  0
> warnings 0
> 
> 
> 
> > > Regards,
> > > Jan
> > > 
> > > 
> > 
> > --
> > Regards,
> > Chunyu Hu
> > 
> > 
> > --
> > Mailing list info: https://lists.linux.it/listinfo/ltp
> > 
> 
> --
> Regards,
> Chunyu Hu
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-07-22 11:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-18 13:37 [LTP] [PATCH] madvise06: wait a bit after madvise() call Jan Stancek
2016-07-18 14:03 ` Cyril Hrubis
2016-07-18 14:22   ` Jan Stancek
2016-07-18 14:49     ` Cyril Hrubis
2016-07-19  5:58 ` Li Wang
2016-07-19  6:56   ` Jan Stancek
2016-07-19  8:57     ` Li Wang
2016-07-20 14:37       ` Jan Stancek
2016-07-21  5:33         ` Li Wang
2016-07-21 10:31         ` Chunyu Hu
2016-07-21 11:02           ` Li Wang
2016-07-21 14:23             ` Jan Stancek
2016-07-22  3:46               ` Li Wang
2016-07-22  6:59                 ` Jan Stancek
2016-07-22 10:49               ` Chunyu Hu
2016-07-22 10:54                 ` Chunyu Hu
2016-07-22 11:02                   ` Jan Stancek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.