All of lore.kernel.org
 help / color / mirror / Atom feed
* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
@ 2020-11-25 10:16 Li Wang
  2020-11-25 11:22 ` Richard Palethorpe
  0 siblings, 1 reply; 14+ messages in thread
From: Li Wang @ 2020-11-25 10:16 UTC (permalink / raw)
  To: ltp

It makes no sense to run parallel thread to simulate race conditions on
system with CPU number less than two, especially for kvm guest, it does
not have any chance to get real parallel running and probably encounter
failure as below:

=== 100% reproducible on a 1cpu guest ===

cmdline="af_alg07"
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s
../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period ended
../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = 0
../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 }
../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a  : { avg =  1915ns, avg_dev =   535ns, dev_ratio = 0.28 }
../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b  : { avg =  1885ns, avg_dev =    42ns, dev_ratio = 0.02 }
../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b    : { avg = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 }
../../../include/tst_fuzzy_sync.h:318: TINFO: spins            : { avg = 554786  , avg_dev =  7355  , dev_ratio = 0.01 }
../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, requesting exit
af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be vulnerable

Signed-off-by: Li Wang <liwang@redhat.com>
CC: Richard Palethorpe <rpalethorpe@suse.de>
---
 include/tst_fuzzy_sync.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h
index 4141f5c64..2e864b312 100644
--- a/include/tst_fuzzy_sync.h
+++ b/include/tst_fuzzy_sync.h
@@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s)
 static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair,
 				  void *(*run_b)(void *))
 {
+	if (get_nprocs() < 2)
+		tst_brk(TCONF, "Fuzzy Sync requires@least two CPUs available");
+
 	tst_fzsync_pair_cleanup(pair);
 
 	tst_init_stat(&pair->diff_ss);
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-25 10:16 [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2 Li Wang
@ 2020-11-25 11:22 ` Richard Palethorpe
  2020-11-25 11:54   ` Cyril Hrubis
  2020-11-25 11:56   ` Martin Doucha
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Palethorpe @ 2020-11-25 11:22 UTC (permalink / raw)
  To: ltp

Hello Li,

Li Wang <liwang@redhat.com> writes:

> It makes no sense to run parallel thread to simulate race conditions on
> system with CPU number less than two, especially for kvm guest, it does
> not have any chance to get real parallel running and probably encounter
> failure as below:

Most of the tests using FuzzySync do not need true parallism. We were
able to reproduce a number of race conditions on a single vCPU. Infact
it may actually benefit some races because one thread has to pause to
allow the other to run, perhaps creating a huge race window.

>
> === 100% reproducible on a 1cpu guest ===
>
> cmdline="af_alg07"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s
> ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period ended
> ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = 0
> ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 }
> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a  : { avg =  1915ns, avg_dev =   535ns, dev_ratio = 0.28 }
> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b  : { avg =  1885ns, avg_dev =    42ns, dev_ratio = 0.02 }
> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b    : { avg = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 }
> ../../../include/tst_fuzzy_sync.h:318: TINFO: spins            : { avg = 554786  , avg_dev =  7355  , dev_ratio = 0.01 }
> ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, requesting exit
> af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be vulnerable
>
> Signed-off-by: Li Wang <liwang@redhat.com>
> CC: Richard Palethorpe <rpalethorpe@suse.de>
> ---
>  include/tst_fuzzy_sync.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h
> index 4141f5c64..2e864b312 100644
> --- a/include/tst_fuzzy_sync.h
> +++ b/include/tst_fuzzy_sync.h
> @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s)
>  static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair,
>  				  void *(*run_b)(void *))
>  {
> +	if (get_nprocs() < 2)
> +		tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs available");
> +
>  	tst_fzsync_pair_cleanup(pair);
>  
>  	tst_init_stat(&pair->diff_ss);

Perhaps this test would pass with more loops and a big enough delay
range, but this is also wasting time on a single vCPU. I'm not sure
whether we should filter this test at the LTP level; it may trigger the
bug on some single CPU configs.

Why not print a warning instead of refusing to run?

-- 
Thank you,
Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-25 11:22 ` Richard Palethorpe
@ 2020-11-25 11:54   ` Cyril Hrubis
  2020-11-25 11:57     ` Martin Doucha
  2020-11-25 11:56   ` Martin Doucha
  1 sibling, 1 reply; 14+ messages in thread
From: Cyril Hrubis @ 2020-11-25 11:54 UTC (permalink / raw)
  To: ltp

Hi!
> Perhaps this test would pass with more loops and a big enough delay
> range, but this is also wasting time on a single vCPU. I'm not sure
> whether we should filter this test at the LTP level; it may trigger the
> bug on some single CPU configs.
> 
> Why not print a warning instead of refusing to run?

That's not a solution either, warning would end up in a test results as
well.

I guess that we can add something as .min_cpus to the tst_test structure
and set it for this test?

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-25 11:22 ` Richard Palethorpe
  2020-11-25 11:54   ` Cyril Hrubis
@ 2020-11-25 11:56   ` Martin Doucha
  2020-11-25 12:50     ` Li Wang
  2020-11-25 13:23     ` Richard Palethorpe
  1 sibling, 2 replies; 14+ messages in thread
From: Martin Doucha @ 2020-11-25 11:56 UTC (permalink / raw)
  To: ltp

On 25. 11. 20 12:22, Richard Palethorpe wrote:
> Hello Li,
> 
> Li Wang <liwang@redhat.com> writes:
> 
>> It makes no sense to run parallel thread to simulate race conditions on
>> system with CPU number less than two, especially for kvm guest, it does
>> not have any chance to get real parallel running and probably encounter
>> failure as below:
> 
> Most of the tests using FuzzySync do not need true parallism. We were
> able to reproduce a number of race conditions on a single vCPU. Infact
> it may actually benefit some races because one thread has to pause to
> allow the other to run, perhaps creating a huge race window.
> 
>>
>> === 100% reproducible on a 1cpu guest ===
>>
>> cmdline="af_alg07"
>> contacts=""
>> analysis=exit
>> <<<test_output>>>
>> tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s
>> ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period ended
>> ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = 0
>> ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 }
>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a  : { avg =  1915ns, avg_dev =   535ns, dev_ratio = 0.28 }
>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b  : { avg =  1885ns, avg_dev =    42ns, dev_ratio = 0.02 }
>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b    : { avg = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 }
>> ../../../include/tst_fuzzy_sync.h:318: TINFO: spins            : { avg = 554786  , avg_dev =  7355  , dev_ratio = 0.01 }
>> ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, requesting exit
>> af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be vulnerable
>>
>> Signed-off-by: Li Wang <liwang@redhat.com>
>> CC: Richard Palethorpe <rpalethorpe@suse.de>
>> ---
>>  include/tst_fuzzy_sync.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h
>> index 4141f5c64..2e864b312 100644
>> --- a/include/tst_fuzzy_sync.h
>> +++ b/include/tst_fuzzy_sync.h
>> @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s)
>>  static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair,
>>  				  void *(*run_b)(void *))
>>  {
>> +	if (get_nprocs() < 2)
>> +		tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs available");
>> +
>>  	tst_fzsync_pair_cleanup(pair);
>>  
>>  	tst_init_stat(&pair->diff_ss);
> 
> Perhaps this test would pass with more loops and a big enough delay
> range, but this is also wasting time on a single vCPU. I'm not sure
> whether we should filter this test at the LTP level; it may trigger the
> bug on some single CPU configs.

No, af_alg07 requires 2 CPUs, otherwise it'll report false positives.
The test will pass only if fchownat() hits a half-closed socket and
returns error. But IIRC the half-closed socket will be destroyed during
reschedule which means there's no race window to hit anymore. But it
would be better to put the TCONF condition into the test itself.

-- 
Martin Doucha   mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-25 11:54   ` Cyril Hrubis
@ 2020-11-25 11:57     ` Martin Doucha
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Doucha @ 2020-11-25 11:57 UTC (permalink / raw)
  To: ltp

On 25. 11. 20 12:54, Cyril Hrubis wrote:
> Hi!
>> Perhaps this test would pass with more loops and a big enough delay
>> range, but this is also wasting time on a single vCPU. I'm not sure
>> whether we should filter this test at the LTP level; it may trigger the
>> bug on some single CPU configs.
>>
>> Why not print a warning instead of refusing to run?
> 
> That's not a solution either, warning would end up in a test results as
> well.
> 
> I guess that we can add something as .min_cpus to the tst_test structure
> and set it for this test?

+1 for .min_cpus from me.

-- 
Martin Doucha   mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-25 11:56   ` Martin Doucha
@ 2020-11-25 12:50     ` Li Wang
  2020-11-25 13:13       ` Cyril Hrubis
  2020-11-25 13:23     ` Richard Palethorpe
  1 sibling, 1 reply; 14+ messages in thread
From: Li Wang @ 2020-11-25 12:50 UTC (permalink / raw)
  To: ltp

On Wed, Nov 25, 2020 at 7:56 PM Martin Doucha <mdoucha@suse.cz> wrote:

> On 25. 11. 20 12:22, Richard Palethorpe wrote:
> > Hello Li,
> >
> > Li Wang <liwang@redhat.com> writes:
> >
> >> It makes no sense to run parallel thread to simulate race conditions on
> >> system with CPU number less than two, especially for kvm guest, it does
> >> not have any chance to get real parallel running and probably encounter
> >> failure as below:
> >
> > Most of the tests using FuzzySync do not need true parallism. We were
> > able to reproduce a number of race conditions on a single vCPU. Infact
> > it may actually benefit some races because one thread has to pause to
> > allow the other to run, perhaps creating a huge race window.

>
> >>
> >> === 100% reproducible on a 1cpu guest ===
> >>
> >> cmdline="af_alg07"
> >> contacts=""
> >> analysis=exit
> >> <<<test_output>>>
> >> tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s
> >> ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period
> ended
> >> ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias =
> 0
> >> ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg
> = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 }
> >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a  : { avg
> =  1915ns, avg_dev =   535ns, dev_ratio = 0.28 }
> >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b  : { avg
> =  1885ns, avg_dev =    42ns, dev_ratio = 0.02 }
> >> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b    : { avg
> = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 }
> >> ../../../include/tst_fuzzy_sync.h:318: TINFO: spins            : { avg
> = 554786  , avg_dev =  7355  , dev_ratio = 0.01 }
> >> ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time,
> requesting exit
> >> af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be
> vulnerable
> >>
> >> Signed-off-by: Li Wang <liwang@redhat.com>
> >> CC: Richard Palethorpe <rpalethorpe@suse.de>
> >> ---
> >>  include/tst_fuzzy_sync.h | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h
> >> index 4141f5c64..2e864b312 100644
> >> --- a/include/tst_fuzzy_sync.h
> >> +++ b/include/tst_fuzzy_sync.h
> >> @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s)
> >>  static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair,
> >>                                void *(*run_b)(void *))
> >>  {
> >> +    if (get_nprocs() < 2)
> >> +            tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs
> available");
> >> +
> >>      tst_fzsync_pair_cleanup(pair);
> >>
> >>      tst_init_stat(&pair->diff_ss);
> >
> > Perhaps this test would pass with more loops and a big enough delay
> > range, but this is also wasting time on a single vCPU. I'm not sure
> > whether we should filter this test at the LTP level; it may trigger the
> > bug on some single CPU configs.
>
> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives.
> The test will pass only if fchownat() hits a half-closed socket and
> returns error. But IIRC the half-closed socket will be destroyed during
> reschedule which means there's no race window to hit anymore. But it
> would be better to put the TCONF condition into the test itself.
>

+1
Correct, I stand by Martin's point.

And we can avoid adding this patch to FuzzySync lib, but for af_alg07 2cpus
is required.
(maybe go with Cyril's suggest to add .min_cpus)

-- 
Regards,
Li Wang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20201125/630b5666/attachment.htm>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-25 12:50     ` Li Wang
@ 2020-11-25 13:13       ` Cyril Hrubis
  0 siblings, 0 replies; 14+ messages in thread
From: Cyril Hrubis @ 2020-11-25 13:13 UTC (permalink / raw)
  To: ltp

Hi!
> And we can avoid adding this patch to FuzzySync lib, but for af_alg07 2cpus
> is required.
> (maybe go with Cyril's suggest to add .min_cpus)

I would go for that and we can make use of it in getcwd04.c as well.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-25 11:56   ` Martin Doucha
  2020-11-25 12:50     ` Li Wang
@ 2020-11-25 13:23     ` Richard Palethorpe
  2020-11-30  7:53       ` Joerg Vehlow
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Palethorpe @ 2020-11-25 13:23 UTC (permalink / raw)
  To: ltp

Hello,

Martin Doucha <mdoucha@suse.cz> writes:

> On 25. 11. 20 12:22, Richard Palethorpe wrote:
>> Hello Li,
>> 
>> Li Wang <liwang@redhat.com> writes:
>> 
>>> It makes no sense to run parallel thread to simulate race conditions on
>>> system with CPU number less than two, especially for kvm guest, it does
>>> not have any chance to get real parallel running and probably encounter
>>> failure as below:
>> 
>> Most of the tests using FuzzySync do not need true parallism. We were
>> able to reproduce a number of race conditions on a single vCPU. Infact
>> it may actually benefit some races because one thread has to pause to
>> allow the other to run, perhaps creating a huge race window.
>> 
>>>
>>> === 100% reproducible on a 1cpu guest ===
>>>
>>> cmdline="af_alg07"
>>> contacts=""
>>> analysis=exit
>>> <<<test_output>>>
>>> tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s
>>> ../../../include/tst_fuzzy_sync.h:507: TINFO: Minimum sampling period ended
>>> ../../../include/tst_fuzzy_sync.h:330: TINFO: loop = 1024, delay_bias = 0
>>> ../../../include/tst_fuzzy_sync.h:318: TINFO: start_a - start_b: { avg = -137522ns, avg_dev = 854248ns, dev_ratio = 6.21 }
>>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - start_a  : { avg =  1915ns, avg_dev =   535ns, dev_ratio = 0.28 }
>>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_b - start_b  : { avg =  1885ns, avg_dev =    42ns, dev_ratio = 0.02 }
>>> ../../../include/tst_fuzzy_sync.h:318: TINFO: end_a - end_b    : { avg = -137492ns, avg_dev = 854818ns, dev_ratio = 6.22 }
>>> ../../../include/tst_fuzzy_sync.h:318: TINFO: spins            : { avg = 554786  , avg_dev =  7355  , dev_ratio = 0.01 }
>>> ../../../include/tst_fuzzy_sync.h:636: TINFO: Exceeded execution time, requesting exit
>>> af_alg07.c:96: TFAIL: fchownat() failed to fail, kernel may be vulnerable
>>>
>>> Signed-off-by: Li Wang <liwang@redhat.com>
>>> CC: Richard Palethorpe <rpalethorpe@suse.de>
>>> ---
>>>  include/tst_fuzzy_sync.h | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h
>>> index 4141f5c64..2e864b312 100644
>>> --- a/include/tst_fuzzy_sync.h
>>> +++ b/include/tst_fuzzy_sync.h
>>> @@ -281,6 +281,9 @@ static void tst_init_stat(struct tst_fzsync_stat *s)
>>>  static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair,
>>>  				  void *(*run_b)(void *))
>>>  {
>>> +	if (get_nprocs() < 2)
>>> +		tst_brk(TCONF, "Fuzzy Sync requires at least two CPUs available");
>>> +
>>>  	tst_fzsync_pair_cleanup(pair);
>>>  
>>>  	tst_init_stat(&pair->diff_ss);
>> 
>> Perhaps this test would pass with more loops and a big enough delay
>> range, but this is also wasting time on a single vCPU. I'm not sure
>> whether we should filter this test at the LTP level; it may trigger the
>> bug on some single CPU configs.
>
> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives.
> The test will pass only if fchownat() hits a half-closed socket and
> returns error. But IIRC the half-closed socket will be destroyed during
> reschedule which means there's no race window to hit anymore. But it
> would be better to put the TCONF condition into the test itself.

Interesting, I wonder if this is also true for the real-time kernel with
the threads set to RT priority?

-- 
Thank you,
Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-25 13:23     ` Richard Palethorpe
@ 2020-11-30  7:53       ` Joerg Vehlow
  2020-11-30  8:14         ` Li Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Joerg Vehlow @ 2020-11-30  7:53 UTC (permalink / raw)
  To: ltp

Hi,
>> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives.
>> The test will pass only if fchownat() hits a half-closed socket and
>> returns error. But IIRC the half-closed socket will be destroyed during
>> reschedule which means there's no race window to hit anymore. But it
>> would be better to put the TCONF condition into the test itself.
> Interesting, I wonder if this is also true for the real-time kernel with
> the threads set to RT priority?
It looks like the test can fail even with more than one cpu. I've seen 
this sporadic failure on different hardware with more than two cores, at 
least on intel denverton (x86_64) and renesas r-car (aarch64) systems. 
Both with kernel 4.19 with the fix included, on the denverton system the 
rt parches were included and on the r-car not. The test passes most of 
the time, but sometimes fails with the message Li posted.

It also seems to fail sporadically on other systems as well: 
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860

Additionally I tested on qemu-x86 with 4.19 with and without rt patches. 
The test succeeds even with only one virtualized cpu. So either Martin's 
assumption is wrong or it holds only for newer kernel versions?

J?rg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-30  7:53       ` Joerg Vehlow
@ 2020-11-30  8:14         ` Li Wang
  2020-11-30  8:39           ` Joerg Vehlow
  2020-11-30  9:01           ` Richard Palethorpe
  0 siblings, 2 replies; 14+ messages in thread
From: Li Wang @ 2020-11-30  8:14 UTC (permalink / raw)
  To: ltp

Hi Joerg,

On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de> wrote:

> Hi,
> >> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives.
> >> The test will pass only if fchownat() hits a half-closed socket and
> >> returns error. But IIRC the half-closed socket will be destroyed during
> >> reschedule which means there's no race window to hit anymore. But it
> >> would be better to put the TCONF condition into the test itself.
> > Interesting, I wonder if this is also true for the real-time kernel with
> > the threads set to RT priority?
> It looks like the test can fail even with more than one cpu. I've seen
> this sporadic failure on different hardware with more than two cores, at
> least on intel denverton (x86_64) and renesas r-car (aarch64) systems.
> Both with kernel 4.19 with the fix included, on the denverton system the
> rt parches were included and on the r-car not. The test passes most of
> the time, but sometimes fails with the message Li posted.
>
> It also seems to fail sporadically on other systems as well:
> https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860
>
> Additionally I tested on qemu-x86 with 4.19 with and without rt patches.
> The test succeeds even with only one virtualized cpu. So either Martin's
> assumption is wrong or it holds only for newer kernel versions?
>

No, Mertin is not wrong, and you are also right.

They are totally two different issues of af_alg07, the test on 1CPU
should be fixed with TCONF. But the fail with aarch64 is more like a
hardware issue, Chunyu has a drafted patch to add init delay value for
such a system.

Can you try this on your aarm64 platform?
-----------------------------
fzsync can't get a random delay range on hpe-moonshot systems, so run with
delay=0 during all the tests. This is probably the hardware issue such as
cache line design so can't get a stable state during the execution of the
critical
section. Provide an experience delay value on hpe-moonshot to make it hit
the race window immediately without exceeding samples.

---
 testcases/kernel/crypto/af_alg07.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/testcases/kernel/crypto/af_alg07.c
b/testcases/kernel/crypto/af_alg07.c
index 6ad86f4f3..24f5b8088 100644
--- a/testcases/kernel/crypto/af_alg07.c
+++ b/testcases/kernel/crypto/af_alg07.c
@@ -47,6 +47,7 @@ static void setup(void)
  fd = SAFE_OPEN("tmpfile", O_RDWR | O_CREAT, 0644);

  tst_fzsync_pair_init(&fzsync_pair);
+ fzsync_pair.delay_bias = 700;
 }

 static void *thread_run(void *arg)
-- 
2.19.1

-- 
Regards,
Li Wang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20201130/c06d8f1b/attachment-0001.htm>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-30  8:14         ` Li Wang
@ 2020-11-30  8:39           ` Joerg Vehlow
  2020-11-30  9:03             ` Li Wang
  2020-11-30  9:01           ` Richard Palethorpe
  1 sibling, 1 reply; 14+ messages in thread
From: Joerg Vehlow @ 2020-11-30  8:39 UTC (permalink / raw)
  To: ltp

Hi Li,

On 11/30/2020 9:14 AM, Li Wang wrote:
> Hi Joerg,
>
> On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de 
> <mailto:lkml@jv-coder.de>> wrote:
>
>     Hi,
>     >> No, af_alg07 requires 2 CPUs, otherwise it'll report false
>     positives.
>     >> The test will pass only if fchownat() hits a half-closed socket and
>     >> returns error. But IIRC the half-closed socket will be
>     destroyed during
>     >> reschedule which means there's no race window to hit anymore.
>     But it
>     >> would be better to put the TCONF condition into the test itself.
>     > Interesting, I wonder if this is also true for the real-time
>     kernel with
>     > the threads set to RT priority?
>     It looks like the test can fail even with more than one cpu. I've
>     seen
>     this sporadic failure on different hardware with more than two
>     cores, at
>     least on intel denverton (x86_64) and renesas r-car (aarch64)
>     systems.
>     Both with kernel 4.19 with the fix included, on the denverton
>     system the
>     rt parches were included and on the r-car not. The test passes
>     most of
>     the time, but sometimes fails with the message Li posted.
>
>     It also seems to fail sporadically on other systems as well:
>     https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860
>     <https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860>
>
>     Additionally I tested on qemu-x86 with 4.19 with and without rt
>     patches.
>     The test succeeds even with only one virtualized cpu. So either
>     Martin's
>     assumption is wrong or it holds only for newer kernel versions?
>
>
> No, Mertin is not wrong, and you are also right.
>
> They are totally two different issues of af_alg07, the test on 1CPU
> should?be fixed with TCONF. But the fail with aarch64 is more like a
> hardware issue, Chunyu has a drafted patch to add init delay value for
> such a system.
I think you misunderstood something. I see random fails with "TFAIL: 
fchownat() failed to fail, kernel may be vulnerable" on both x86_64 and 
aarch64 with more than one cpu core (4 for x86_64 and 2 or 4 for aarch64).

I see no error ("TPASS: fchownat() failed successfully: ENOENT (2)") on 
single core qemu-x86. This is why I think Martin's assumption may be 
wrong. If it was right, it should never succeed on a single core system 
right?

J?rg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-30  8:14         ` Li Wang
  2020-11-30  8:39           ` Joerg Vehlow
@ 2020-11-30  9:01           ` Richard Palethorpe
  2020-11-30 14:16             ` Martin Doucha
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Palethorpe @ 2020-11-30  9:01 UTC (permalink / raw)
  To: ltp

Hello,

Li Wang <liwang@redhat.com> writes:

> Hi Joerg,
>
> On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de> wrote:
>
>> Hi,
>> >> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives.
>> >> The test will pass only if fchownat() hits a half-closed socket and
>> >> returns error. But IIRC the half-closed socket will be destroyed during
>> >> reschedule which means there's no race window to hit anymore. But it
>> >> would be better to put the TCONF condition into the test itself.
>> > Interesting, I wonder if this is also true for the real-time kernel with
>> > the threads set to RT priority?
>> It looks like the test can fail even with more than one cpu. I've seen
>> this sporadic failure on different hardware with more than two cores, at
>> least on intel denverton (x86_64) and renesas r-car (aarch64) systems.
>> Both with kernel 4.19 with the fix included, on the denverton system the
>> rt parches were included and on the r-car not. The test passes most of
>> the time, but sometimes fails with the message Li posted.
>>
>> It also seems to fail sporadically on other systems as well:
>> https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860
>>
>> Additionally I tested on qemu-x86 with 4.19 with and without rt patches.
>> The test succeeds even with only one virtualized cpu. So either Martin's
>> assumption is wrong or it holds only for newer kernel versions?
>>
>
> No, Mertin is not wrong, and you are also right.
>
> They are totally two different issues of af_alg07, the test on 1CPU
> should be fixed with TCONF. But the fail with aarch64 is more like a
> hardware issue, Chunyu has a drafted patch to add init delay value for
> such a system.
>
> Can you try this on your aarm64 platform?
> -----------------------------
> fzsync can't get a random delay range on hpe-moonshot systems, so run with
> delay=0 during all the tests. This is probably the hardware issue such as
> cache line design so can't get a stable state during the execution of the
> critical
> section. Provide an experience delay value on hpe-moonshot to make it hit
> the race window immediately without exceeding samples.
>
> ---
>  testcases/kernel/crypto/af_alg07.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/testcases/kernel/crypto/af_alg07.c
> b/testcases/kernel/crypto/af_alg07.c
> index 6ad86f4f3..24f5b8088 100644
> --- a/testcases/kernel/crypto/af_alg07.c
> +++ b/testcases/kernel/crypto/af_alg07.c
> @@ -47,6 +47,7 @@ static void setup(void)
>   fd = SAFE_OPEN("tmpfile", O_RDWR | O_CREAT, 0644);
>
>   tst_fzsync_pair_init(&fzsync_pair);
> + fzsync_pair.delay_bias = 700;

I hope there is some way to set this dynamically. Similar to
CVE-2016-7117.

If we know that we should get some particular error we could modify the
bias until the error happens.

>  }
>
>  static void *thread_run(void *arg)
> -- 
> 2.19.1


-- 
Thank you,
Richard.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-30  8:39           ` Joerg Vehlow
@ 2020-11-30  9:03             ` Li Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Li Wang @ 2020-11-30  9:03 UTC (permalink / raw)
  To: ltp

Hi Joerg,

On Mon, Nov 30, 2020 at 4:39 PM Joerg Vehlow <lkml@jv-coder.de> wrote:

> Hi Li,
>
> On 11/30/2020 9:14 AM, Li Wang wrote:
> > Hi Joerg,
> >
> > On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de
> > <mailto:lkml@jv-coder.de>> wrote:
> >
> >     Hi,
> >     >> No, af_alg07 requires 2 CPUs, otherwise it'll report false
> >     positives.
> >     >> The test will pass only if fchownat() hits a half-closed socket
> and
> >     >> returns error. But IIRC the half-closed socket will be
> >     destroyed during
> >     >> reschedule which means there's no race window to hit anymore.
> >     But it
> >     >> would be better to put the TCONF condition into the test itself.
> >     > Interesting, I wonder if this is also true for the real-time
> >     kernel with
> >     > the threads set to RT priority?
> >     It looks like the test can fail even with more than one cpu. I've
> >     seen
> >     this sporadic failure on different hardware with more than two
> >     cores, at
> >     least on intel denverton (x86_64) and renesas r-car (aarch64)
> >     systems.
> >     Both with kernel 4.19 with the fix included, on the denverton
> >     system the
> >     rt parches were included and on the r-car not. The test passes
> >     most of
> >     the time, but sometimes fails with the message Li posted.
> >
> >     It also seems to fail sporadically on other systems as well:
> >     https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860
> >     <https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860>
> >
> >     Additionally I tested on qemu-x86 with 4.19 with and without rt
> >     patches.
> >     The test succeeds even with only one virtualized cpu. So either
> >     Martin's
> >     assumption is wrong or it holds only for newer kernel versions?
> >
> >
> > No, Mertin is not wrong, and you are also right.
> >
> > They are totally two different issues of af_alg07, the test on 1CPU
> > should be fixed with TCONF. But the fail with aarch64 is more like a
> > hardware issue, Chunyu has a drafted patch to add init delay value for
> > such a system.
> I think you misunderstood something. I see random fails with "TFAIL:
> fchownat() failed to fail, kernel may be vulnerable" on both x86_64 and
> aarch64 with more than one cpu core (4 for x86_64 and 2 or 4 for aarch64).
>

Well, seems I was somewhat arbitrary on this problem a moment ago.

Probably I was missing the 4cores fails on x86_64 you mentioned, we just
observed that FAIL on 1CPU x86_64 and hpe_moonshot(aarch64) so far.
The tentative conclusion of our debugging result:

  1. FAIL with 1CPU KVM x86_64 is false positives
  2. FAIL with hpe_moonshot aarch64 is caused by cache line design


>
> I see no error ("TPASS: fchownat() failed successfully: ENOENT (2)") on
> single core qemu-x86. This is why I think Martin's assumption may be
> wrong. If it was right, it should never succeed on a single core system
> right?
>

Hmm, it's hard to say never, it is also possible to create a race window on
a single-core system.
Anyway, we need to do more investigation.

-- 
Regards,
Li Wang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20201130/14a94947/attachment-0001.htm>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2
  2020-11-30  9:01           ` Richard Palethorpe
@ 2020-11-30 14:16             ` Martin Doucha
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Doucha @ 2020-11-30 14:16 UTC (permalink / raw)
  To: ltp

On 30. 11. 20 10:01, Richard Palethorpe wrote:
> Hello,
> 
> Li Wang <liwang@redhat.com> writes:
> 
>> Hi Joerg,
>>
>> On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de> wrote:
>>
>>> Hi,
>>>>> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives.
>>>>> The test will pass only if fchownat() hits a half-closed socket and
>>>>> returns error. But IIRC the half-closed socket will be destroyed during
>>>>> reschedule which means there's no race window to hit anymore. But it
>>>>> would be better to put the TCONF condition into the test itself.
>>>> Interesting, I wonder if this is also true for the real-time kernel with
>>>> the threads set to RT priority?
>>> It looks like the test can fail even with more than one cpu. I've seen
>>> this sporadic failure on different hardware with more than two cores, at
>>> least on intel denverton (x86_64) and renesas r-car (aarch64) systems.
>>> Both with kernel 4.19 with the fix included, on the denverton system the
>>> rt parches were included and on the r-car not. The test passes most of
>>> the time, but sometimes fails with the message Li posted.
>>>
>>> It also seems to fail sporadically on other systems as well:
>>> https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860
>>>
>>> Additionally I tested on qemu-x86 with 4.19 with and without rt patches.
>>> The test succeeds even with only one virtualized cpu. So either Martin's
>>> assumption is wrong or it holds only for newer kernel versions?
>>>
>>
>> No, Mertin is not wrong, and you are also right.
>>
>> They are totally two different issues of af_alg07, the test on 1CPU
>> should be fixed with TCONF. But the fail with aarch64 is more like a
>> hardware issue, Chunyu has a drafted patch to add init delay value for
>> such a system.
>>
>> Can you try this on your aarm64 platform?
>> -----------------------------
>> fzsync can't get a random delay range on hpe-moonshot systems, so run with
>> delay=0 during all the tests. This is probably the hardware issue such as
>> cache line design so can't get a stable state during the execution of the
>> critical
>> section. Provide an experience delay value on hpe-moonshot to make it hit
>> the race window immediately without exceeding samples.
>>
>> ---
>>  testcases/kernel/crypto/af_alg07.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/testcases/kernel/crypto/af_alg07.c
>> b/testcases/kernel/crypto/af_alg07.c
>> index 6ad86f4f3..24f5b8088 100644
>> --- a/testcases/kernel/crypto/af_alg07.c
>> +++ b/testcases/kernel/crypto/af_alg07.c
>> @@ -47,6 +47,7 @@ static void setup(void)
>>   fd = SAFE_OPEN("tmpfile", O_RDWR | O_CREAT, 0644);
>>
>>   tst_fzsync_pair_init(&fzsync_pair);
>> + fzsync_pair.delay_bias = 700;
> 
> I hope there is some way to set this dynamically. Similar to
> CVE-2016-7117.
> 
> If we know that we should get some particular error we could modify the
> bias until the error happens.

There are three possible outcomes of the race:
1) fchownat() returns 0 => fchownat() was called too early or the kernel
is vulnerable, you can adjust bias here
2) fchwonat() fails with ENOENT => kernel is fixed, print TPASS and exit
3) fchownat() fails with EBADF => fchownat() was called too late, you
can adjust bias here

IIRC I didn't play with bias in this test because on x86_64 it passes
almost instantly on a fixed kernel. Feel free to add dynamic bias
adjustment for ARM.

-- 
Martin Doucha   mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-11-30 14:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25 10:16 [LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2 Li Wang
2020-11-25 11:22 ` Richard Palethorpe
2020-11-25 11:54   ` Cyril Hrubis
2020-11-25 11:57     ` Martin Doucha
2020-11-25 11:56   ` Martin Doucha
2020-11-25 12:50     ` Li Wang
2020-11-25 13:13       ` Cyril Hrubis
2020-11-25 13:23     ` Richard Palethorpe
2020-11-30  7:53       ` Joerg Vehlow
2020-11-30  8:14         ` Li Wang
2020-11-30  8:39           ` Joerg Vehlow
2020-11-30  9:03             ` Li Wang
2020-11-30  9:01           ` Richard Palethorpe
2020-11-30 14:16             ` Martin Doucha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.