Re: I found the jitter was greater, when using the patch "cobalt/init: Fail if CPU 0 is missing from real-time CPU mask"

From: Jan Kiszka <jan.kiszka@siemens.com>
To: linz <powertree@163.com>, xenomai@xenomai.org
Subject: Re: I found the jitter was greater, when using the patch "cobalt/init: Fail if CPU 0 is missing from real-time CPU mask"
Date: Fri, 17 Feb 2023 09:21:26 +0100	[thread overview]
Message-ID: <e98f8ab5-0242-46e9-ba7a-e3235dd00292@siemens.com> (raw)
In-Reply-To: <b5410402-4c54-ac6e-63fb-3db8075b8589@163.com>

On 17.02.23 03:09, linz wrote:
> 
> 在 2023/2/16 17:53, Jan Kiszka 写道:
>> On 16.02.23 10:25, linz wrote:
>>> Hi, I met a question when using xenomai v3.2.2. The CPU on my
>>> development board has four cores, CPU0, CPU1, CPU2, CPU3 I used CPU0
>>> and CPU3 for xenomai, and CPU1 and CPU3 for linux. The bootargs is as
>>> follows: setenv bootargs  isolcpus=0,3 xenomai.supported_cpus=0x9
>>> nohz_full=0,3 rcu_nocbs=0,3 irqaffinity=1,2 nosoftlockup
>>> nmi_watchdog=0; Then I runned latency testsuit, I found the jitter
>>> was greater than before. So I tried to use ftrace to look for the
>>> reason, and found the thread runing in CPU0 and CPU3 compete for
>>> nklock. The ftrace is as follows:             sshd-2187    [000]
>>> *.~2  6695.901950: ___xnlock_get <-___xnsched_run
>>> /////////////////////////////////////////////// CPU0 got xnlock
>>>           <idle>-0       [003] *.~1  6695.901950:
>>> rcu_oob_prepare_lock <-irq_find_mapping           <idle>-0      
>>> [003] *.~1  6695.901951: __rcu_read_lock <-irq_find_mapping          
>>> <idle>-0       [003] *.~1  6695.901951: __rcu_read_unlock
>>> <-irq_find_mapping             sshd-2187    [000] *.~2  6695.901951:
>>> xnsched_pick_next <-___xnsched_run           <idle>-0       [003]
>>> *.~1  6695.901952: rcu_oob_finish_lock <-irq_find_mapping          
>>> <idle>-0       [003] *.~1  6695.901952: generic_pipeline_irq
>>> <-gic_handle_irq           <idle>-0       [003] *.~1  6695.901952:
>>> generic_pipeline_irq_desc <-generic_pipeline_irq            
>>> sshd-2187    [000] *.~2  6695.901953: ktime_get_mono_fast_ns
>>> <-___xnsched_run           <idle>-0       [003] *.~1  6695.901953:
>>> handle_percpu_devid_irq <-generic_pipeline_irq_desc            
>>> sshd-2187    [000] *.~2  6695.901953: arch_counter_read
>>> <-ktime_get_mono_fast_ns           <idle>-0       [003] *.~1 
>>> 6695.901953: handle_oob_irq <-handle_percpu_devid_irq          
>>> <idle>-0       [003] *.~1  6695.901954: do_oob_irq <-handle_oob_irq
>>>           <idle>-0       [003] *.~1  6695.901954:
>>> arch_timer_handler_phys <-do_oob_irq             sshd-2187    [000]
>>> *.~2  6695.901954: pipeline_switch_to <-___xnsched_run          
>>> <idle>-0       [003] *.~1  6695.901955: xnintr_core_clock_handler
>>> <-arch_timer_handler_phys           <idle>-0       [003] *.~1 
>>> 6695.901955: ___xnlock_get <-xnintr_core_clock_handler
>>> ///////////////////////////////////////////////  CPU3 wanted to get
>>> xnlock           <idle>-0       [003] *.~1  6695.901955:
>>> queued_spin_lock_slowpath <-___xnlock_get
>>> ///////////////////////////////////////////////  CPU3 failed and
>>> waited             sshd-2187    [000] *.~2  6695.901956:
>>> dovetail_context_switch <-pipeline_switch_to             sshd-2187   
>>> [000] *.~2  6695.901956: check_and_switch_context
>>> <-dovetail_context_switch             sshd-2187    [000] *.~2 
>>> 6695.901957: cpu_do_switch_mm <-check_and_switch_context            
>>> sshd-2187    [000] *.~2  6695.901958: post_ttbr_update_workaround
>>> <-cpu_do_switch_mm             sshd-2187    [000] *.~2  6695.901958:
>>> fpsimd_thread_switch <-__switch_to             sshd-2187    [000]
>>> *.~2  6695.901959: __get_cpu_fpsimd_context <-fpsimd_thread_switch
>>>             sshd-2187    [000] *.~2  6695.901960: __fpsimd_save
>>> <-fpsimd_thread_switch             sshd-2187    [000] *.~2 
>>> 6695.901960: __put_cpu_fpsimd_context <-fpsimd_thread_switch
>>>             sshd-2187    [000] *.~2  6695.901961:
>>> hw_breakpoint_thread_switch <-__switch_to             sshd-2187   
>>> [000] *.~2  6695.901962: uao_thread_switch <-__switch_to            
>>> sshd-2187    [000] *.~2  6695.901962:
>>> spectre_v4_enable_task_mitigation <-__switch_to            
>>> sshd-2187    [000] *.~2  6695.901963: spectre_v4_mitigations_off
>>> <-spectre_v4_enable_task_mitigation             sshd-2187    [000]
>>> *.~2  6695.901963: cpu_mitigations_off <-spectre_v4_mitigations_off
>>>             sshd-2187    [000] *.~2  6695.901964:
>>> spectre_v4_mitigations_off <-spectre_v4_enable_task_mitigation
>>>             sshd-2187    [000] *.~2  6695.901965: cpu_mitigations_off
>>> <-spectre_v4_mitigations_off             sshd-2187    [000] *.~2 
>>> 6695.901965: erratum_1418040_thread_switch <-__switch_to            
>>> sshd-2187    [000] *.~2  6695.901966: this_cpu_has_cap
>>> <-erratum_1418040_thread_switch             sshd-2187    [000] *.~2 
>>> 6695.901967: is_affected_midr_range_list <-this_cpu_has_cap
>>>             sshd-2187    [000] *.~2  6695.901967: mte_thread_switch
>>> <-__switch_to            <...>-2294    [000] *..2  6695.901968:
>>> inband_switch_tail <-__schedule
>>> /////////////////////////////////////////////// CPU0 switch thread
>>> sshd-2187 -> stress-2294            <...>-2294    [000] *..2 
>>> 6695.901969: preempt_count_add <-inband_switch_tail           
>>> <...>-2294    [000] *.~2  6695.901969: fpsimd_restore_current_oob
>>> <-dovetail_leave_inband            <...>-2294    [000] *.~2 
>>> 6695.901970: fpsimd_restore_current_state
>>> <-fpsimd_restore_current_oob            <...>-2294    [000] *.~2 
>>> 6695.901970: hard_preempt_disable <-fpsimd_restore_current_state
>>>            <...>-2294    [000] *.~2  6695.901971:
>>> __get_cpu_fpsimd_context <-fpsimd_restore_current_state           
>>> <...>-2294    [000] *.~2  6695.901972: __put_cpu_fpsimd_context
>>> <-fpsimd_restore_current_state            <...>-2294    [000] *.~2 
>>> 6695.901973: hard_preempt_enable <-fpsimd_restore_current_state
>>>            <...>-2294    [000] *.~2  6695.901973: ___xnlock_put
>>> <-xnthread_harden ///////////////////////////////////////////////
>>> CPU0 released xnlock           <idle>-0       [003] *.~1 
>>> 6695.901974: xnclock_tick <-xnintr_core_clock_handler
>>> /////////////////////////////////////////////// CPU3 got xnlock
>>> finally, but lost 901974-901955==19us I try to revert the patch
>>> "cobalt/init: Fail if CPU 0 is missing from real-time CPU mask
>>> "(website is
>>> https://source.denx.de/Xenomai/xenomai/-/commit/5ac4984a6d50a2538139193350eef82b60a42001), and then use the follow bootargs: setenv bootargs isolcpus=3 xenomai.supported_cpus=0x9 nohz_full=3 rcu_nocbs=3 irqaffinity=0,1,2 nosoftlockup nmi_watchdog=0; finally, the problem is resolved.
>> Why do you have to revert this commit? Your supported_cpus here still
>> contains CPU 0, thus should not trigger that check. Sorry, I wrote
>> wrongly, The bootargs should be: setenv bootargs isolcpus=3
>> xenomai.supported_cpus=0x8 nohz_full=3 rcu_nocbs=3 irqaffinity=0,1,2
>> nosoftlockup nmi_watchdog=0 So, I supported_cpus didn't contain CPU 0.
>> After reverting the patch, with the above bootargs, the jitter is less
>> than 7us using latency testsuit. But the jitter is about 15us using
>> latency testsuit using the follow bootargs: setenv bootargs
>> isolcpus=0,3 xenomai.supported_cpus=0x9 nohz_full=0,3 rcu_nocbs=0,3
>> irqaffinity=1,2 nosoftlockup nmi_watchdog=0; The reason is CPU0 and
>> CPU3 compete for xnlock, that is, results presented by ftrace.
>>> My question is: If I revert the patch, what is the impact on the
>>> system? Can you specify where CPU 0 is supposed to be real-time?
>> You can currently only specify setups where CPU 0 included because of
>> the mentioned restrictions in the cobalt core. I do not recall all
>> places where this assumption would be violated, just
>> kernel/cobalt/dovetail/tick.c: pipeline_timer_name() from quickly
>> re-reading the patch context. Can't you move all your RT workload to
>> CPU 0 and all non-RT to the others?
> 
> In the customer's actual environment,
> if moving all your RT workload to CPU 0 and all non-RT to the others,
> the customer will feel troublesome,
> because they feel incompatible with xenomai 3.1.x, that is, the ipipe
> core has no restrictions on CPU0.
> 

I wouldn't be surprised if that was just wrong in 3.1 to permit
excluding CPU 0 because the same internal assumptions applied. If you
were lucky, you just didn't trigger them.

If we want to relax this restriction, we need to systematically look for
all CPU0-assumptions and somehow resolve them.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux