linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: tick/sched: iowait_sleeptime resides in tick_cpu_sched structure is a negative value
       [not found] <074a28e8-3794-5fee-f2ab-f08787149481@linux.alibaba.com>
@ 2022-01-06 13:40 ` Frederic Weisbecker
  2022-01-07  1:15   ` luanshi
  0 siblings, 1 reply; 3+ messages in thread
From: Frederic Weisbecker @ 2022-01-06 13:40 UTC (permalink / raw)
  To: luanshi; +Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar, linux-kernel

On Thu, Jan 06, 2022 at 05:14:02PM +0800, luanshi wrote:
> Hi All,
> 
> We encounted a hardlockup problem on ARM server, after some debug found
> that:
> 
> PID: 0      TASK: ffff0400064de300  CPU: 126  COMMAND: "swapper/126"
>  #0 [ffff8000250f3c90] __crash_kexec at ffff80001013a064
>  #1 [ffff8000250f3e30] panic at ffff800010afd028
>  #2 [ffff8000250f3f10] nmi_panic at ffff80001004a9e0
>  #3 [ffff8000250f3f20] watchdog_hardlockup_check at ffff80001017b7f0
>  #4 [ffff8000250f3f40] sdei_watchdog_callback at ffff80001003c9a4
>  #5 [ffff8000250f3f50] sdei_event_handler at ffff8000108816b8
>  #6 [ffff8000250f3f80] _sdei_handler at ffff800010b1c2e4
>  #7 [ffff8000250f3fd0] __sdei_handler at ffff800010b139b0
>  #8 [ffff8000250f3ff0] __sdei_asm_handler at ffff800010014c18
> --- <IRQ stack> ---
>  #9 [ffff800013973f10] __cpu_do_idle at ffff800010b13764
> #10 [ffff800013973f20] arch_cpu_idle at ffff800010b137ec
> #11 [ffff800013973f30] default_idle_call at ffff800010b1b5cc
> #12 [ffff800013973f50] cpuidle_idle_call at ffff80001009648c
> #13 [ffff800013973f90] do_idle at ffff8000100965b0
> #14 [ffff800013973fc0] cpu_startup_entry at ffff8000100967d4
> #15 [ffff800013973fe0] secondary_start_kernel at ffff800010026bb0
> 
> 
> per_cpu(tick_cpu_sched, 126) = $1 = {
>   sched_timer = {
>     node = {
>       node = {
>         __rb_parent_color = 18446603337117384112,
>         rb_right = 0x0,
>         rb_left = 0x0
>       },
>       expires = 6108564000000
>     },
>     _softexpires = 6108564000000,
>     function = 0xffff800010122ec0 <tick_sched_timer>,
>     base = 0xffff04473bbcc780,
>     state = 1 '\001',
>     is_rel = 0 '\000',
>     is_soft = 0 '\000',
>     is_hard = 1 '\001'
>   },
>   check_clocks = 0,
>   nohz_mode = NOHZ_MODE_INACTIVE,
>   inidle = 1,
>   tick_stopped = 0,
>   idle_active = 1,
>   do_timer_last = 0,
>   got_idle_tick = 1,
>   last_tick = 0,
>   next_tick = 0,
>   idle_jiffies = 0,
>   idle_calls = 0,
>   idle_sleeps = 0,
>   idle_entrytime = 5012087709249,
>   idle_waketime = 0,
>   idle_exittime = 0,
>   idle_sleeptime = 4936136669951,
>   iowait_sleeptime = -1942739704,
>   last_jiffies = 0,
>   timer_expires = 0,
>   timer_expires_base = 0,
>   next_timer = 0,
>   idle_expires = 0,
>   tick_dep_mask = {
>     counter = 0
>   }
> }
> 
> iowait_sleeptime = -1942739704,
> ----------
> iowait_sleeptime is monotonically increasing, under what circumstances iowait_sleeptime
> can be a negative value?
> 
> For detailed information:
> https://bugzilla.kernel.org/show_bug.cgi?id=215458
> 
> 
> Can you give me some suggestions for debugging.

Some racy updates can happen if cpufreq or "cat /proc/stat" do concurrent
updates. But for that nohz needs to be running and I see your CPU clock has
NOHZ_MODE_INACTIVE. Perhaps it's only for that CPU though.

What is the value of tick_nohz_active in your dump?


> 
> 
> 
> Thanks,
> Liguang
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: tick/sched: iowait_sleeptime resides in tick_cpu_sched structure is a negative value
  2022-01-06 13:40 ` tick/sched: iowait_sleeptime resides in tick_cpu_sched structure is a negative value Frederic Weisbecker
@ 2022-01-07  1:15   ` luanshi
  0 siblings, 0 replies; 3+ messages in thread
From: luanshi @ 2022-01-07  1:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar, linux-kernel

Hi Frederi,

在 2022/1/6 21:40, Frederic Weisbecker 写道:
> On Thu, Jan 06, 2022 at 05:14:02PM +0800, luanshi wrote:
>> Hi All,
>>
>> We encounted a hardlockup problem on ARM server, after some debug found
>> that:
>>
>> PID: 0      TASK: ffff0400064de300  CPU: 126  COMMAND: "swapper/126"
>>   #0 [ffff8000250f3c90] __crash_kexec at ffff80001013a064
>>   #1 [ffff8000250f3e30] panic at ffff800010afd028
>>   #2 [ffff8000250f3f10] nmi_panic at ffff80001004a9e0
>>   #3 [ffff8000250f3f20] watchdog_hardlockup_check at ffff80001017b7f0
>>   #4 [ffff8000250f3f40] sdei_watchdog_callback at ffff80001003c9a4
>>   #5 [ffff8000250f3f50] sdei_event_handler at ffff8000108816b8
>>   #6 [ffff8000250f3f80] _sdei_handler at ffff800010b1c2e4
>>   #7 [ffff8000250f3fd0] __sdei_handler at ffff800010b139b0
>>   #8 [ffff8000250f3ff0] __sdei_asm_handler at ffff800010014c18
>> --- <IRQ stack> ---
>>   #9 [ffff800013973f10] __cpu_do_idle at ffff800010b13764
>> #10 [ffff800013973f20] arch_cpu_idle at ffff800010b137ec
>> #11 [ffff800013973f30] default_idle_call at ffff800010b1b5cc
>> #12 [ffff800013973f50] cpuidle_idle_call at ffff80001009648c
>> #13 [ffff800013973f90] do_idle at ffff8000100965b0
>> #14 [ffff800013973fc0] cpu_startup_entry at ffff8000100967d4
>> #15 [ffff800013973fe0] secondary_start_kernel at ffff800010026bb0
>>
>>
>> per_cpu(tick_cpu_sched, 126) = $1 = {
>>    sched_timer = {
>>      node = {
>>        node = {
>>          __rb_parent_color = 18446603337117384112,
>>          rb_right = 0x0,
>>          rb_left = 0x0
>>        },
>>        expires = 6108564000000
>>      },
>>      _softexpires = 6108564000000,
>>      function = 0xffff800010122ec0 <tick_sched_timer>,
>>      base = 0xffff04473bbcc780,
>>      state = 1 '\001',
>>      is_rel = 0 '\000',
>>      is_soft = 0 '\000',
>>      is_hard = 1 '\001'
>>    },
>>    check_clocks = 0,
>>    nohz_mode = NOHZ_MODE_INACTIVE,
>>    inidle = 1,
>>    tick_stopped = 0,
>>    idle_active = 1,
>>    do_timer_last = 0,
>>    got_idle_tick = 1,
>>    last_tick = 0,
>>    next_tick = 0,
>>    idle_jiffies = 0,
>>    idle_calls = 0,
>>    idle_sleeps = 0,
>>    idle_entrytime = 5012087709249,
>>    idle_waketime = 0,
>>    idle_exittime = 0,
>>    idle_sleeptime = 4936136669951,
>>    iowait_sleeptime = -1942739704,
>>    last_jiffies = 0,
>>    timer_expires = 0,
>>    timer_expires_base = 0,
>>    next_timer = 0,
>>    idle_expires = 0,
>>    tick_dep_mask = {
>>      counter = 0
>>    }
>> }
>>
>> iowait_sleeptime = -1942739704,
>> ----------
>> iowait_sleeptime is monotonically increasing, under what circumstances iowait_sleeptime
>> can be a negative value?
>>
>> For detailed information:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215458
>>
>>
>> Can you give me some suggestions for debugging.
> Some racy updates can happen if cpufreq or "cat /proc/stat" do concurrent
> updates. But for that nohz needs to be running and I see your CPU clock has
> NOHZ_MODE_INACTIVE. Perhaps it's only for that CPU though.
>
> What is the value of tick_nohz_active in your dump?

nohz=off was configured in the startup parameters

crash> tick_nohz_enabled
tick_nohz_enabled = $1 = false
crash> tick_nohz_active
tick_nohz_active = $2 = 0


and current no cpufreq driver was used:

crash> cpufreq_driver
cpufreq_driver = $4 = (struct cpufreq_driver *) 0x0


Thanks,

Liguang

>
>
>>
>>
>> Thanks,
>> Liguang
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* tick/sched: iowait_sleeptime resides in tick_cpu_sched structure is a negative value
@ 2022-01-06  9:49 luanshi
  0 siblings, 0 replies; 3+ messages in thread
From: luanshi @ 2022-01-06  9:49 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar; +Cc: linux-kernel

Hi All,

We encounted a hardlockup problem on ARM server, after some debug found 
that:

PID: 0      TASK: ffff0400064de300  CPU: 126  COMMAND: "swapper/126"
  #0 [ffff8000250f3c90] __crash_kexec at ffff80001013a064
  #1 [ffff8000250f3e30] panic at ffff800010afd028
  #2 [ffff8000250f3f10] nmi_panic at ffff80001004a9e0
  #3 [ffff8000250f3f20] watchdog_hardlockup_check at ffff80001017b7f0
  #4 [ffff8000250f3f40] sdei_watchdog_callback at ffff80001003c9a4
  #5 [ffff8000250f3f50] sdei_event_handler at ffff8000108816b8
  #6 [ffff8000250f3f80] _sdei_handler at ffff800010b1c2e4
  #7 [ffff8000250f3fd0] __sdei_handler at ffff800010b139b0
  #8 [ffff8000250f3ff0] __sdei_asm_handler at ffff800010014c18
--- <IRQ stack> ---
  #9 [ffff800013973f10] __cpu_do_idle at ffff800010b13764
#10 [ffff800013973f20] arch_cpu_idle at ffff800010b137ec
#11 [ffff800013973f30] default_idle_call at ffff800010b1b5cc
#12 [ffff800013973f50] cpuidle_idle_call at ffff80001009648c
#13 [ffff800013973f90] do_idle at ffff8000100965b0
#14 [ffff800013973fc0] cpu_startup_entry at ffff8000100967d4
#15 [ffff800013973fe0] secondary_start_kernel at ffff800010026bb0


per_cpu(tick_cpu_sched, 126) = $1 = {
   sched_timer = {
     node = {
       node = {
         __rb_parent_color = 18446603337117384112,
         rb_right = 0x0,
         rb_left = 0x0
       },
       expires = 6108564000000
     },
     _softexpires = 6108564000000,
     function = 0xffff800010122ec0 <tick_sched_timer>,
     base = 0xffff04473bbcc780,
     state = 1 '\001',
     is_rel = 0 '\000',
     is_soft = 0 '\000',
     is_hard = 1 '\001'
   },
   check_clocks = 0,
   nohz_mode = NOHZ_MODE_INACTIVE,
   inidle = 1,
   tick_stopped = 0,
   idle_active = 1,
   do_timer_last = 0,
   got_idle_tick = 1,
   last_tick = 0,
   next_tick = 0,
   idle_jiffies = 0,
   idle_calls = 0,
   idle_sleeps = 0,
   idle_entrytime = 5012087709249,
   idle_waketime = 0,
   idle_exittime = 0,
   idle_sleeptime = 4936136669951,
   iowait_sleeptime = -1942739704,
   last_jiffies = 0,
   timer_expires = 0,
   timer_expires_base = 0,
   next_timer = 0,
   idle_expires = 0,
   tick_dep_mask = {
     counter = 0
   }
}

iowait_sleeptime = -1942739704,
----------
iowait_sleeptime is monotonically increasing, under what circumstances 
iowait_sleeptime
can be a negative value?

For detailed information:
https://bugzilla.kernel.org/show_bug.cgi?id=215458


Can you give me some suggestions for debugging.



Thanks,
Liguang


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-07  1:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <074a28e8-3794-5fee-f2ab-f08787149481@linux.alibaba.com>
2022-01-06 13:40 ` tick/sched: iowait_sleeptime resides in tick_cpu_sched structure is a negative value Frederic Weisbecker
2022-01-07  1:15   ` luanshi
2022-01-06  9:49 luanshi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).