linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rcutorture: meaning of "End of test: RCU_HOTPLUG"
@ 2019-01-22  3:40 Su Yue
  2019-01-22  4:01 ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Su Yue @ 2019-01-22  3:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai, Li,
	Philip, lkp-developer

Hi, guys
   While running rcutorture tests with "onoff_interval", some tests
failed and results show like:

=====================================================================
[  316.354501] srcud-torture:--- End of test: RCU_HOTPLUG: nreaders=1 
nfakewriters=4 stat_interval=60 verbose=2 test_no_idle_hz=1 
shuffle_interval=3 stutter=5 irqreader=1 fq\
s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 
test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 stall_cpu=0 
stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
====================================================================

I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
cpu hotplug is enabled in the test? Or just represents another type of
failure?

Thanks,
Su



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
  2019-01-22  3:40 rcutorture: meaning of "End of test: RCU_HOTPLUG" Su Yue
@ 2019-01-22  4:01 ` Paul E. McKenney
  2019-01-22  8:42   ` Su Yue
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2019-01-22  4:01 UTC (permalink / raw)
  To: Su Yue
  Cc: linux-kernel, josh, rostedt, mathieu.desnoyers, jiangshanlai, Li,
	Philip, lkp-developer

On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote:
> Hi, guys
>   While running rcutorture tests with "onoff_interval", some tests
> failed and results show like:
> 
> =====================================================================
> [  316.354501] srcud-torture:--- End of test: RCU_HOTPLUG:
> nreaders=1 nfakewriters=4 stat_interval=60 verbose=2
> test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\
> s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0
> test_boost_interval=7 test_boost_duration=4 shutdown_secs=0
> stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
> rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
> ====================================================================
> 
> I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
> cpu hotplug is enabled in the test? Or just represents another type of
> failure?

This says that at least one CPU hotplug operation failed, that is,
the CPU didn't actually come online or go offline as requested.  If you
are introducing CPU hotplug to an architecture, this usually indicates
that you have bugs in your CPU-hotplug code.  Or it nmight be that
RCU grace periods failed to progress -- though this would normally
also result in RCU CPU stall warnings.

There should be lines containing "ver:" in your console output.  What
does one of the later one of these say?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
  2019-01-22  4:01 ` Paul E. McKenney
@ 2019-01-22  8:42   ` Su Yue
  2019-01-23  3:22     ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Su Yue @ 2019-01-22  8:42 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, josh, rostedt, mathieu.desnoyers, jiangshanlai, Li,
	Philip, lkp-developer

Thanks for your quick reply! Paul

On 1/22/19 12:01 PM, Paul E. McKenney wrote:
> On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote:
>> Hi, guys
>>    While running rcutorture tests with "onoff_interval", some tests
>> failed and results show like:
>>
>> =====================================================================
>> [  316.354501] srcud-torture:--- End of test: RCU_HOTPLUG:
>> nreaders=1 nfakewriters=4 stat_interval=60 verbose=2
>> test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\
>> s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0
>> test_boost_interval=7 test_boost_duration=4 shutdown_secs=0
>> stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
>> rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
>> ====================================================================
>>
>> I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
>> cpu hotplug is enabled in the test? Or just represents another type of
>> failure?
> 
> This says that at least one CPU hotplug operation failed, that is,
> the CPU didn't actually come online or go offline as requested.  If you
> are introducing CPU hotplug to an architecture, this usually indicates
> that you have bugs in your CPU-hotplug code.  Or it nmight be that

It should hit the case since there is no RCU CPU stall warnings.

> RCU grace periods failed to progress -- though this would normally
> also result in RCU CPU stall warnings.
> 
> There should be lines containing "ver:" in your console output.  What
> does one of the later one of these say?
> 

The line says:
======================================================================
[  318.850175] busted_srcud-torture: rtc:           (null) ver: 27040 
tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 
rtbf: 0 rtb: 0 \
nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542 (HZ=1000) 
barrier: 0/0:0

=====================================================================

And here are useful errors:
=====================================================================
kern  :info  : [  135.379693] KVM setup async PF for cpu 1
kern  :info  : [  135.381412] kvm-stealtime: cpu 1, msr 23fd16180
kern  :alert : [  135.386897] busted_srcud-torture:torture_onoff task: 
onlined 1
kern  :alert : [  135.408241] busted_srcud-torture:torture_onoff task: 
offlining 1
kern  :info  : [  135.423310] Unregister pv shared memory for cpu 1
kern  :info  : [  135.427940] smpboot: CPU 1 is now offline
kern  :alert : [  135.430106] busted_srcud-torture:torture_onoff task: 
offlined 1
kern  :alert : [  135.436404] busted_srcud-torture:torture_onoff task: 
offlining 0
kern  :alert : [  135.446173] busted_srcud-torture:torture_onoff task: 
offline 0 failed: errno -16
kern  :alert : [  135.453076] busted_srcud-torture:torture_onoff task: 
offlining 0
kern  :alert : [  135.457461] busted_srcud-torture:torture_onoff task: 
offline 0 failed: errno -16


=====================================================================
There are only two CPUs on the VM. Torture try to offline the last one
but -EBUSY occured.

I spent time to understand kernel/torture.c.
There is torture_onoff():

225        while (!torture_must_stop()) {
226                cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
227                if (!torture_offline(cpu,
228                                     &n_offline_attempts, 
&n_offline_successes,
229                                     &sum_offline, &min_offline, 
&max_offline))
230                        torture_online(cpu,
231                                       &n_online_attempts, 
&n_online_successes,
232                                       &sum_online, &min_online, 
&max_online);
233                schedule_timeout_interruptible(onoff_interval);
234        }
235

torture_offline() and torture_offline() don't pre judge if the current
cpu is only one usable.

Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If
there are only one oneline and hotplugable cpux, then
n_offline_successes != n_offline_attempts which caused "End of test:
RCU_HOTPLUG".

Does I misunderstand something above? Feel free to correct me.


Thanks,
Su

> 							Thanx, Paul
> 
> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
  2019-01-22  8:42   ` Su Yue
@ 2019-01-23  3:22     ` Paul E. McKenney
  2019-01-24  7:00       ` Su Yue
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2019-01-23  3:22 UTC (permalink / raw)
  To: Su Yue
  Cc: linux-kernel, josh, rostedt, mathieu.desnoyers, jiangshanlai, Li,
	Philip, lkp-developer

On Tue, Jan 22, 2019 at 04:42:19PM +0800, Su Yue wrote:
> Thanks for your quick reply! Paul
> 
> On 1/22/19 12:01 PM, Paul E. McKenney wrote:
> >On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote:
> >>Hi, guys
> >>   While running rcutorture tests with "onoff_interval", some tests
> >>failed and results show like:
> >>
> >>=====================================================================
> >>[  316.354501] srcud-torture:--- End of test: RCU_HOTPLUG:
> >>nreaders=1 nfakewriters=4 stat_interval=60 verbose=2
> >>test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\
> >>s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0
> >>test_boost_interval=7 test_boost_duration=4 shutdown_secs=0
> >>stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
> >>rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
> >>====================================================================
> >>
> >>I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
> >>cpu hotplug is enabled in the test? Or just represents another type of
> >>failure?
> >
> >This says that at least one CPU hotplug operation failed, that is,
> >the CPU didn't actually come online or go offline as requested.  If you
> >are introducing CPU hotplug to an architecture, this usually indicates
> >that you have bugs in your CPU-hotplug code.  Or it nmight be that
> 
> It should hit the case since there is no RCU CPU stall warnings.
> 
> >RCU grace periods failed to progress -- though this would normally
> >also result in RCU CPU stall warnings.
> >
> >There should be lines containing "ver:" in your console output.  What
> >does one of the later one of these say?
> >
> 
> The line says:
> ======================================================================
> [  318.850175] busted_srcud-torture: rtc:           (null) ver:
> 27040 tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke:
> 0 rtbre: 0 rtbf: 0 rtb: 0 \
> nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542
> (HZ=1000) barrier: 0/0:0

Yes, you have many more offline attempts than successes, which is
why RCU_HOTPLUG was printed.

> =====================================================================
> 
> And here are useful errors:
> =====================================================================
> kern  :info  : [  135.379693] KVM setup async PF for cpu 1
> kern  :info  : [  135.381412] kvm-stealtime: cpu 1, msr 23fd16180
> kern  :alert : [  135.386897] busted_srcud-torture:torture_onoff

Just so your know, busted_srcud can sometimes fail by design.  Hence
the "busted" in the name.  But failure didn't happen this time.

> task: onlined 1
> kern  :alert : [  135.408241] busted_srcud-torture:torture_onoff
> task: offlining 1
> kern  :info  : [  135.423310] Unregister pv shared memory for cpu 1
> kern  :info  : [  135.427940] smpboot: CPU 1 is now offline
> kern  :alert : [  135.430106] busted_srcud-torture:torture_onoff
> task: offlined 1
> kern  :alert : [  135.436404] busted_srcud-torture:torture_onoff
> task: offlining 0
> kern  :alert : [  135.446173] busted_srcud-torture:torture_onoff
> task: offline 0 failed: errno -16
> kern  :alert : [  135.453076] busted_srcud-torture:torture_onoff
> task: offlining 0
> kern  :alert : [  135.457461] busted_srcud-torture:torture_onoff
> task: offline 0 failed: errno -16
> 
> 
> =====================================================================
> There are only two CPUs on the VM. Torture try to offline the last one
> but -EBUSY occured.
> 
> I spent time to understand kernel/torture.c.
> There is torture_onoff():
> 
> 225        while (!torture_must_stop()) {
> 226                cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
> 227                if (!torture_offline(cpu,
> 228                                     &n_offline_attempts,
> &n_offline_successes,
> 229                                     &sum_offline, &min_offline,
> &max_offline))
> 230                        torture_online(cpu,
> 231                                       &n_online_attempts,
> &n_online_successes,
> 232                                       &sum_online, &min_online,
> &max_online);
> 233                schedule_timeout_interruptible(onoff_interval);
> 234        }
> 235
> 
> torture_offline() and torture_offline() don't pre judge if the current
> cpu is only one usable.

That does appear to be the case, and that would be a problem with
the CONFIG_BOOTPARAM_HOTPLUG_CPU0 listed below.

Good catch!

> Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If
> there are only one oneline and hotplugable cpux, then
> n_offline_successes != n_offline_attempts which caused "End of test:
> RCU_HOTPLUG".
> 
> Does I misunderstand something above? Feel free to correct me.

Does the following patch help?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/torture.c b/kernel/torture.c
index a03ff722352b..2b6700ca2a43 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -101,6 +101,8 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
 
 	if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
 		return false;
+	if (num_online_cpus() <= 1)
+		return false;  /* Can't offline the last CPU. */
 
 	if (verbose > 1)
 		pr_alert("%s" TORTURE_FLAG


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
  2019-01-23  3:22     ` Paul E. McKenney
@ 2019-01-24  7:00       ` Su Yue
  2019-01-24 20:54         ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Su Yue @ 2019-01-24  7:00 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, josh, rostedt, mathieu.desnoyers, jiangshanlai, Li,
	Philip, lkp-developer



On 1/23/19 11:22 AM, Paul E. McKenney wrote:
> On Tue, Jan 22, 2019 at 04:42:19PM +0800, Su Yue wrote:
>> Thanks for your quick reply! Paul
>>
>> On 1/22/19 12:01 PM, Paul E. McKenney wrote:
>>> On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote:
>>>> Hi, guys
>>>>    While running rcutorture tests with "onoff_interval", some tests
>>>> failed and results show like:
>>>>
>>>> =====================================================================
>>>> [  316.354501] srcud-torture:--- End of test: RCU_HOTPLUG:
>>>> nreaders=1 nfakewriters=4 stat_interval=60 verbose=2
>>>> test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\
>>>> s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0
>>>> test_boost_interval=7 test_boost_duration=4 shutdown_secs=0
>>>> stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
>>>> rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
>>>> ====================================================================
>>>>
>>>> I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
>>>> cpu hotplug is enabled in the test? Or just represents another type of
>>>> failure?
>>>
>>> This says that at least one CPU hotplug operation failed, that is,
>>> the CPU didn't actually come online or go offline as requested.  If you
>>> are introducing CPU hotplug to an architecture, this usually indicates
>>> that you have bugs in your CPU-hotplug code.  Or it nmight be that
>>
>> It should hit the case since there is no RCU CPU stall warnings.
>>
>>> RCU grace periods failed to progress -- though this would normally
>>> also result in RCU CPU stall warnings.
>>>
>>> There should be lines containing "ver:" in your console output.  What
>>> does one of the later one of these say?
>>>
>>
>> The line says:
>> ======================================================================
>> [  318.850175] busted_srcud-torture: rtc:           (null) ver:
>> 27040 tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke:
>> 0 rtbre: 0 rtbf: 0 rtb: 0 \
>> nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542
>> (HZ=1000) barrier: 0/0:0
> 
> Yes, you have many more offline attempts than successes, which is
> why RCU_HOTPLUG was printed.
> 
>> =====================================================================
>>
>> And here are useful errors:
>> =====================================================================
>> kern  :info  : [  135.379693] KVM setup async PF for cpu 1
>> kern  :info  : [  135.381412] kvm-stealtime: cpu 1, msr 23fd16180
>> kern  :alert : [  135.386897] busted_srcud-torture:torture_onoff
> 
> Just so your know, busted_srcud can sometimes fail by design.  Hence
> the "busted" in the name.  But failure didn't happen this time.
> 

Yes..The corner case I mentioned actually happened in every "onoff"
tests whatever the torture_type is.

>> task: onlined 1
>> kern  :alert : [  135.408241] busted_srcud-torture:torture_onoff
>> task: offlining 1
>> kern  :info  : [  135.423310] Unregister pv shared memory for cpu 1
>> kern  :info  : [  135.427940] smpboot: CPU 1 is now offline
>> kern  :alert : [  135.430106] busted_srcud-torture:torture_onoff
>> task: offlined 1
>> kern  :alert : [  135.436404] busted_srcud-torture:torture_onoff
>> task: offlining 0
>> kern  :alert : [  135.446173] busted_srcud-torture:torture_onoff
>> task: offline 0 failed: errno -16
>> kern  :alert : [  135.453076] busted_srcud-torture:torture_onoff
>> task: offlining 0
>> kern  :alert : [  135.457461] busted_srcud-torture:torture_onoff
>> task: offline 0 failed: errno -16
>>
>>
>> =====================================================================
>> There are only two CPUs on the VM. Torture try to offline the last one
>> but -EBUSY occured.
>>
>> I spent time to understand kernel/torture.c.
>> There is torture_onoff():
>>
>> 225        while (!torture_must_stop()) {
>> 226                cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
>> 227                if (!torture_offline(cpu,
>> 228                                     &n_offline_attempts,
>> &n_offline_successes,
>> 229                                     &sum_offline, &min_offline,
>> &max_offline))
>> 230                        torture_online(cpu,
>> 231                                       &n_online_attempts,
>> &n_online_successes,
>> 232                                       &sum_online, &min_online,
>> &max_online);
>> 233                schedule_timeout_interruptible(onoff_interval);
>> 234        }
>> 235
>>
>> torture_offline() and torture_offline() don't pre judge if the current
>> cpu is only one usable.
> 
> That does appear to be the case, and that would be a problem with
> the CONFIG_BOOTPARAM_HOTPLUG_CPU0 listed below.
> 
> Good catch!
> 
>> Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If
>> there are only one oneline and hotplugable cpux, then
>> n_offline_successes != n_offline_attempts which caused "End of test:
>> RCU_HOTPLUG".
>>
>> Does I misunderstand something above? Feel free to correct me.
> 
> Does the following patch help?
> 

Yes, no more "errnor: -16" in dmesg and "End of test: SUCCESS" is in
the end.

Thanks for your patch.
If the patch is to be sent in format, you can add:

Tested-By: Su Yue <suy.fnst@cn.fujitsu.com>


---
Su
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/torture.c b/kernel/torture.c
> index a03ff722352b..2b6700ca2a43 100644
> --- a/kernel/torture.c
> +++ b/kernel/torture.c
> @@ -101,6 +101,8 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
>   
>   	if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
>   		return false;
> +	if (num_online_cpus() <= 1)
> +		return false;  /* Can't offline the last CPU. */
>   
>   	if (verbose > 1)
>   		pr_alert("%s" TORTURE_FLAG
> 
> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
  2019-01-24  7:00       ` Su Yue
@ 2019-01-24 20:54         ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2019-01-24 20:54 UTC (permalink / raw)
  To: Su Yue
  Cc: linux-kernel, josh, rostedt, mathieu.desnoyers, jiangshanlai, Li,
	Philip, lkp-developer

On Thu, Jan 24, 2019 at 03:00:37PM +0800, Su Yue wrote:
> On 1/23/19 11:22 AM, Paul E. McKenney wrote:
> >On Tue, Jan 22, 2019 at 04:42:19PM +0800, Su Yue wrote:
> >>Thanks for your quick reply! Paul
> >>
> >>On 1/22/19 12:01 PM, Paul E. McKenney wrote:
> >>>On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote:
> >>>>Hi, guys
> >>>>   While running rcutorture tests with "onoff_interval", some tests
> >>>>failed and results show like:
> >>>>
> >>>>=====================================================================
> >>>>[  316.354501] srcud-torture:--- End of test: RCU_HOTPLUG:
> >>>>nreaders=1 nfakewriters=4 stat_interval=60 verbose=2
> >>>>test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\
> >>>>s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0
> >>>>test_boost_interval=7 test_boost_duration=4 shutdown_secs=0
> >>>>stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
> >>>>rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
> >>>>====================================================================
> >>>>
> >>>>I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
> >>>>cpu hotplug is enabled in the test? Or just represents another type of
> >>>>failure?
> >>>
> >>>This says that at least one CPU hotplug operation failed, that is,
> >>>the CPU didn't actually come online or go offline as requested.  If you
> >>>are introducing CPU hotplug to an architecture, this usually indicates
> >>>that you have bugs in your CPU-hotplug code.  Or it nmight be that
> >>
> >>It should hit the case since there is no RCU CPU stall warnings.
> >>
> >>>RCU grace periods failed to progress -- though this would normally
> >>>also result in RCU CPU stall warnings.
> >>>
> >>>There should be lines containing "ver:" in your console output.  What
> >>>does one of the later one of these say?
> >>>
> >>
> >>The line says:
> >>======================================================================
> >>[  318.850175] busted_srcud-torture: rtc:           (null) ver:
> >>27040 tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke:
> >>0 rtbre: 0 rtbf: 0 rtb: 0 \
> >>nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542
> >>(HZ=1000) barrier: 0/0:0
> >
> >Yes, you have many more offline attempts than successes, which is
> >why RCU_HOTPLUG was printed.
> >
> >>=====================================================================
> >>
> >>And here are useful errors:
> >>=====================================================================
> >>kern  :info  : [  135.379693] KVM setup async PF for cpu 1
> >>kern  :info  : [  135.381412] kvm-stealtime: cpu 1, msr 23fd16180
> >>kern  :alert : [  135.386897] busted_srcud-torture:torture_onoff
> >
> >Just so your know, busted_srcud can sometimes fail by design.  Hence
> >the "busted" in the name.  But failure didn't happen this time.
> >
> 
> Yes..The corner case I mentioned actually happened in every "onoff"
> tests whatever the torture_type is.
> 
> >>task: onlined 1
> >>kern  :alert : [  135.408241] busted_srcud-torture:torture_onoff
> >>task: offlining 1
> >>kern  :info  : [  135.423310] Unregister pv shared memory for cpu 1
> >>kern  :info  : [  135.427940] smpboot: CPU 1 is now offline
> >>kern  :alert : [  135.430106] busted_srcud-torture:torture_onoff
> >>task: offlined 1
> >>kern  :alert : [  135.436404] busted_srcud-torture:torture_onoff
> >>task: offlining 0
> >>kern  :alert : [  135.446173] busted_srcud-torture:torture_onoff
> >>task: offline 0 failed: errno -16
> >>kern  :alert : [  135.453076] busted_srcud-torture:torture_onoff
> >>task: offlining 0
> >>kern  :alert : [  135.457461] busted_srcud-torture:torture_onoff
> >>task: offline 0 failed: errno -16
> >>
> >>
> >>=====================================================================
> >>There are only two CPUs on the VM. Torture try to offline the last one
> >>but -EBUSY occured.
> >>
> >>I spent time to understand kernel/torture.c.
> >>There is torture_onoff():
> >>
> >>225        while (!torture_must_stop()) {
> >>226                cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
> >>227                if (!torture_offline(cpu,
> >>228                                     &n_offline_attempts,
> >>&n_offline_successes,
> >>229                                     &sum_offline, &min_offline,
> >>&max_offline))
> >>230                        torture_online(cpu,
> >>231                                       &n_online_attempts,
> >>&n_online_successes,
> >>232                                       &sum_online, &min_online,
> >>&max_online);
> >>233                schedule_timeout_interruptible(onoff_interval);
> >>234        }
> >>235
> >>
> >>torture_offline() and torture_offline() don't pre judge if the current
> >>cpu is only one usable.
> >
> >That does appear to be the case, and that would be a problem with
> >the CONFIG_BOOTPARAM_HOTPLUG_CPU0 listed below.
> >
> >Good catch!
> >
> >>Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If
> >>there are only one oneline and hotplugable cpux, then
> >>n_offline_successes != n_offline_attempts which caused "End of test:
> >>RCU_HOTPLUG".
> >>
> >>Does I misunderstand something above? Feel free to correct me.
> >
> >Does the following patch help?
> >
> 
> Yes, no more "errnor: -16" in dmesg and "End of test: SUCCESS" is in
> the end.
> 
> Thanks for your patch.
> If the patch is to be sent in format, you can add:
> 
> Tested-By: Su Yue <suy.fnst@cn.fujitsu.com>

Very good, applied!  And especially thank you for finding this and
bringing it to my attention!!!

							Thanx, Paul

> ---
> Su
> >							Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >diff --git a/kernel/torture.c b/kernel/torture.c
> >index a03ff722352b..2b6700ca2a43 100644
> >--- a/kernel/torture.c
> >+++ b/kernel/torture.c
> >@@ -101,6 +101,8 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
> >  	if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
> >  		return false;
> >+	if (num_online_cpus() <= 1)
> >+		return false;  /* Can't offline the last CPU. */
> >  	if (verbose > 1)
> >  		pr_alert("%s" TORTURE_FLAG
> >
> >
> >
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-01-24 20:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-22  3:40 rcutorture: meaning of "End of test: RCU_HOTPLUG" Su Yue
2019-01-22  4:01 ` Paul E. McKenney
2019-01-22  8:42   ` Su Yue
2019-01-23  3:22     ` Paul E. McKenney
2019-01-24  7:00       ` Su Yue
2019-01-24 20:54         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).