linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
@ 2023-06-13 18:49 Bhatnagar, Rishabh
  2023-06-13 18:58 ` Bhatnagar, Rishabh
  0 siblings, 1 reply; 17+ messages in thread
From: Bhatnagar, Rishabh @ 2023-06-13 18:49 UTC (permalink / raw)
  To: bigeasy, gregkh; +Cc: linux-kernel, tglx, sashal, luizcap, abuehaze

Hi Sebastian/Greg

We are seeing RCU stall warnings from recent stable tree updates:
5.4.243, 5.10.180, 5.15.113, 6.1.31 onwards.
This is seen in the upstream stable trees without any downstream patches.

The issue is seen few minutes after booting without any workload.
We launch hundred's of virtual instances and this shows up in 1-2 
instances, so its hard to reproduce.
Attaching a few stack traces below.

The issue can be seen on virtual and baremetal instances.
Another interesting point is we only see this on x86 based instances.
We also did test this on linux-mainline but were not able to reproduce 
the issue.
So maybe there's a fixup or related commit that has gone in?

We tried bisecting the stable trees and found that after reverting the
below commit we couldn't reproduce this in any of the kernels consistently.

tick/common: Align tick period with the HZ tick. [ Upstream commit
e9523a0d81899361214d118ad60ef76f0e92f71d ]


Not exactly sure how this commit is affecting all stable kernels.
Can you take a look at this issue and share your insight?

A few example stack traces of the observed RCU stall.

Seen in kernel-5.10.181:

rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 37-....: (1818 ticks this GP) idle=cf2/1/0x4000000000000000
softirq=151/151 fqs=7245
   (t=14809 jiffies g=-759 q=3290)
NMI backtrace for cpu 37
Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
Call Trace:
   <IRQ>
   dump_stack+0x57/0x70
   ? lapic_can_unplug_cpu.cold+0x3a/0x3a
   nmi_cpu_backtrace.cold+0x32/0x68
   nmi_trigger_cpumask_backtrace+0xdf/0xe6
   rcu_dump_cpu_stacks+0xa5/0xd7
   print_cpu_stall.cold+0xa4/0x149
   check_cpu_stall+0xee/0x210
   rcu_pending+0x26/0xc0
   rcu_sched_clock_irq+0x43/0x110
   update_process_times+0x8c/0xc0
   tick_periodic+0x27/0x80
   tick_handle_periodic+0x20/0x70
   xen_timer_interrupt+0x1e/0x30
   __handle_irq_event_percpu+0x3d/0x160
   handle_irq_event_percpu+0x31/0x80
   handle_percpu_irq+0x37/0x60
   generic_handle_irq+0x4b/0x60
   evtchn_2l_handle_events+0x26d/0x280
   __xen_evtchn_do_upcall+0x66/0xb0
   __sysvec_xen_hvm_callback+0x22/0x30
   asm_call_irq_on_stack+0x12/0x20
   </IRQ>
   sysvec_xen_hvm_callback+0x72/0x80
   asm_sysvec_xen_hvm_callback+0x12/0x20
RIP: 0010:_raw_spin_unlock_irqrestore+0xe/0x20
RSP: 0000:ffffc900188ffa28 EFLAGS: 00000246
   pci_conf1_read+0xa4/0x100
   pci_bus_read_config_byte+0x3f/0x70
   __pci_find_next_cap_ttl+0x3b/0xd0
   pci_find_capability+0x71/0xa0
   pci_pm_init+0x6c/0x1c0
   ? pci_allocate_vc_save_buffers+0x6d/0x8b
   pci_device_add+0xb0/0x1b0
   pci_scan_single_device+0xbd/0xf0
   pci_scan_slot+0x53/0x120
   pci_scan_child_bus_extend+0x3a/0x2a0
   acpi_pci_root_create+0x20c/0x26a
   pci_acpi_scan_root+0x19a/0x1c0
   ? negotiate_os_control+0xdd/0x2b9
   acpi_pci_root_add.cold+0x59/0x1b0
   acpi_bus_attach+0xee/0x210
   acpi_bus_attach+0x63/0x210
   acpi_bus_attach+0x63/0x210
   acpi_bus_scan+0x43/0x90
   ? acpi_bus_init+0x15f/0x15f
   acpi_scan_init+0x10e/0x1b4
   acpi_init+0xba/0x11d
   ? intel_idle_init+0x2a2/0x2a2
   do_one_initcall+0x44/0x1c4
   do_initcalls+0xc6/0xdf
   kernel_init_freeable+0x14d/0x198
   ? rest_init+0xb4/0xb4
   kernel_init+0xa/0x11c
   ret_from_fork+0x22/0x30

Seen in kernel-5.4.246:

NMI backtrace for cpu 7
CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.4.246 #1
Hardware name: Amazon EC2 m5.2xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:lapic_next_deadline+0x26/0x40
RSP: 0000:ffffc900001ecf98 EFLAGS: 00000002
Call Trace:
  <NMI>
  ? nmi_cpu_backtrace+0x4c/0x90
  ? nmi_cpu_backtrace_handler+0xd/0x20
  ? nmi_handle+0x71/0x160
  ? default_do_nmi+0x4e/0x100
  ? do_nmi+0x15b/0x190
  ? end_repeat_nmi+0x16/0x62
  ? lapic_next_deadline+0x26/0x40
  ? lapic_next_deadline+0x26/0x40
  ? lapic_next_deadline+0x26/0x40
  </NMI>
  <IRQ>
  clockevents_program_event+0xcf/0x100
  tick_handle_periodic+0x48/0x60
  smp_apic_timer_interrupt+0x6a/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
  ? vprintk_emit+0x19b/0x280
  ? printk+0x52/0x6e
  ? pci_hp_add+0x189/0x300
  ? acpiphp_register_hotplug_slot+0xe0/0xf0
  ? acpiphp_add_context+0x3b1/0x460
  ? acpi_ns_walk_namespace+0x108/0x220
  ? acpiphp_put_context.part.12+0x30/0x30
  ? acpi_walk_namespace+0xc8/0xf0
  ? kmem_cache_alloc_trace+0x1f8/0x210
  ? acpiphp_enumerate_slots+0x172/0x250
  ? acpi_pci_add_bus+0x50/0xd0
  ? pci_register_host_bridge+0x22b/0x460
  ? pci_create_root_bus+0x87/0xb0
  ? acpi_pci_root_create+0x128/0x290
  ? pci_acpi_scan_root+0x14a/0x1b0
  ? acpi_pci_root_add+0x1ff/0x580
  ? acpi_evaluate_integer+0x52/0x90
  ? set_debug_rodata+0x2f/0x2f
  ? acpi_bus_attach+0x144/0x1d0
  ? acpi_bus_attach+0x80/0x1d0
  ? acpi_bus_attach+0x80/0x1d0
  ? acpi_sleep_proc_init+0x24/0x24
  ? acpi_sleep_proc_init+0x24/0x24
  ? acpi_bus_scan+0x43/0x90
  ? acpi_scan_init+0xf1/0x235
  ? acpi_sleep_proc_init+0x24/0x24
  ? acpi_init+0x2e9/0x349
  ? do_one_initcall+0x46/0x200
  ? kernel_init_freeable+0x1db/0x286
  ? rest_init+0xb0/0xb0
  ? kernel_init+0xa/0x110
  ? ret_from_fork+0x35/0x40


Seen in 5.10.183:

rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
acpiphp: Slot [19] registered
rcu:     3-....: (349 ticks this GP) idle=936/1/0x4000000000000000 
softirq=94/94 fqs=151
  (detected by 0, t=14752 jiffies, g=-1063, q=591)
Sending NMI from CPU 0 to CPUs 3:
NMI backtrace for cpu 3
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.183 #1
Hardware name: Amazon EC2 m5.2xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:io_serial_in+0x14/0x20
RSP: 0000:ffffbc9640017720 EFLAGS: 00000006
PKRU: 55555554
Call Trace:
  <NMI>
  ? nmi_cpu_backtrace+0x57/0xb0
  ? nmi_cpu_backtrace_handler+0xd/0x20
  ? nmi_handle+0x62/0x130
  ? default_do_nmi+0x49/0x100
  ? exc_nmi+0x152/0x170
  ? end_repeat_nmi+0x16/0x67
  ? mem16_serial_in+0x10/0x20
  ? io_serial_in+0x14/0x20
  ? io_serial_in+0x14/0x20
  ? io_serial_in+0x14/0x20
  </NMI>
  serial8250_console_write+0x8e/0x310
  ? record_print_text+0xbf/0x150
  console_unlock+0x485/0x4c0
  vprintk_emit+0xe4/0x240
  printk+0x52/0x72
  ? pci_hp_add+0x18d/0x300
  acpiphp_register_hotplug_slot+0xe4/0x100
  acpiphp_add_context+0x3b5/0x460
  acpi_ns_walk_namespace+0x10c/0x220
  ? acpiphp_put_context.part.13+0x30/0x30
  ? acpiphp_put_context.part.13+0x30/0x30
  acpi_walk_namespace+0xd4/0xf0
  ? kmem_cache_alloc_trace+0x3be/0x400
  acpiphp_enumerate_slots+0x175/0x260
  acpi_pci_add_bus+0x50/0xe0
  pci_register_host_bridge+0x233/0x4d0
  ? complete_all+0x2a/0x40
  pci_create_root_bus+0x87/0xc0
  acpi_pci_root_create+0x128/0x2a0
  pci_acpi_scan_root+0x14e/0x1b0
  acpi_pci_root_add+0x203/0x580
  ? acpi_evaluate_integer+0x52/0x90
  ? rdinit_setup+0x2f/0x2f
  acpi_bus_attach+0x148/0x1d0
  acpi_bus_attach+0x84/0x1d0
  acpi_bus_attach+0x84/0x1d0
  ? acpi_sleep_proc_init+0x24/0x24
  ? acpi_sleep_proc_init+0x24/0x24
  acpi_bus_scan+0x43/0x90
  acpi_scan_init+0xff/0x247
  ? acpi_sleep_proc_init+0x24/0x24
  acpi_init+0x2e9/0x34d
  do_one_initcall+0x44/0x1e0
  kernel_init_freeable+0x229/0x286
  ? rest_init+0xc0/0xc0
  kernel_init+0xa/0x120
  ret_from_fork+0x22/0x30

Thanks
Rishabh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-13 18:49 Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees Bhatnagar, Rishabh
@ 2023-06-13 18:58 ` Bhatnagar, Rishabh
  2023-06-14  9:14   ` gregkh
  2023-06-14  9:14   ` gregkh
  0 siblings, 2 replies; 17+ messages in thread
From: Bhatnagar, Rishabh @ 2023-06-13 18:58 UTC (permalink / raw)
  To: bigeasy, gregkh; +Cc: linux-kernel, tglx, sashal, luizcap, abuehaze, stable


On 6/13/23 11:49 AM, Bhatnagar, Rishabh wrote:
> Hi Sebastian/Greg
>
> We are seeing RCU stall warnings from recent stable tree updates:
> 5.4.243, 5.10.180, 5.15.113, 6.1.31 onwards.
> This is seen in the upstream stable trees without any downstream patches.
>
> The issue is seen few minutes after booting without any workload.
> We launch hundred's of virtual instances and this shows up in 1-2 
> instances, so its hard to reproduce.
> Attaching a few stack traces below.
>
> The issue can be seen on virtual and baremetal instances.
> Another interesting point is we only see this on x86 based instances.
> We also did test this on linux-mainline but were not able to reproduce 
> the issue.
> So maybe there's a fixup or related commit that has gone in?
>
> We tried bisecting the stable trees and found that after reverting the
> below commit we couldn't reproduce this in any of the kernels 
> consistently.
>
> tick/common: Align tick period with the HZ tick. [ Upstream commit
> e9523a0d81899361214d118ad60ef76f0e92f71d ]
>
>
> Not exactly sure how this commit is affecting all stable kernels.
> Can you take a look at this issue and share your insight?
>
> A few example stack traces of the observed RCU stall.
>
> Seen in kernel-5.10.181:
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 37-....: (1818 ticks this GP) idle=cf2/1/0x4000000000000000
> softirq=151/151 fqs=7245
>   (t=14809 jiffies g=-759 q=3290)
> NMI backtrace for cpu 37
> Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
> Call Trace:
>   <IRQ>
>   dump_stack+0x57/0x70
>   ? lapic_can_unplug_cpu.cold+0x3a/0x3a
>   nmi_cpu_backtrace.cold+0x32/0x68
>   nmi_trigger_cpumask_backtrace+0xdf/0xe6
>   rcu_dump_cpu_stacks+0xa5/0xd7
>   print_cpu_stall.cold+0xa4/0x149
>   check_cpu_stall+0xee/0x210
>   rcu_pending+0x26/0xc0
>   rcu_sched_clock_irq+0x43/0x110
>   update_process_times+0x8c/0xc0
>   tick_periodic+0x27/0x80
>   tick_handle_periodic+0x20/0x70
>   xen_timer_interrupt+0x1e/0x30
>   __handle_irq_event_percpu+0x3d/0x160
>   handle_irq_event_percpu+0x31/0x80
>   handle_percpu_irq+0x37/0x60
>   generic_handle_irq+0x4b/0x60
>   evtchn_2l_handle_events+0x26d/0x280
>   __xen_evtchn_do_upcall+0x66/0xb0
>   __sysvec_xen_hvm_callback+0x22/0x30
>   asm_call_irq_on_stack+0x12/0x20
>   </IRQ>
>   sysvec_xen_hvm_callback+0x72/0x80
>   asm_sysvec_xen_hvm_callback+0x12/0x20
> RIP: 0010:_raw_spin_unlock_irqrestore+0xe/0x20
> RSP: 0000:ffffc900188ffa28 EFLAGS: 00000246
>   pci_conf1_read+0xa4/0x100
>   pci_bus_read_config_byte+0x3f/0x70
>   __pci_find_next_cap_ttl+0x3b/0xd0
>   pci_find_capability+0x71/0xa0
>   pci_pm_init+0x6c/0x1c0
>   ? pci_allocate_vc_save_buffers+0x6d/0x8b
>   pci_device_add+0xb0/0x1b0
>   pci_scan_single_device+0xbd/0xf0
>   pci_scan_slot+0x53/0x120
>   pci_scan_child_bus_extend+0x3a/0x2a0
>   acpi_pci_root_create+0x20c/0x26a
>   pci_acpi_scan_root+0x19a/0x1c0
>   ? negotiate_os_control+0xdd/0x2b9
>   acpi_pci_root_add.cold+0x59/0x1b0
>   acpi_bus_attach+0xee/0x210
>   acpi_bus_attach+0x63/0x210
>   acpi_bus_attach+0x63/0x210
>   acpi_bus_scan+0x43/0x90
>   ? acpi_bus_init+0x15f/0x15f
>   acpi_scan_init+0x10e/0x1b4
>   acpi_init+0xba/0x11d
>   ? intel_idle_init+0x2a2/0x2a2
>   do_one_initcall+0x44/0x1c4
>   do_initcalls+0xc6/0xdf
>   kernel_init_freeable+0x14d/0x198
>   ? rest_init+0xb4/0xb4
>   kernel_init+0xa/0x11c
>   ret_from_fork+0x22/0x30
>
> Seen in kernel-5.4.246:
>
> NMI backtrace for cpu 7
> CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.4.246 #1
> Hardware name: Amazon EC2 m5.2xlarge/, BIOS 1.0 10/16/2017
> RIP: 0010:lapic_next_deadline+0x26/0x40
> RSP: 0000:ffffc900001ecf98 EFLAGS: 00000002
> Call Trace:
>  <NMI>
>  ? nmi_cpu_backtrace+0x4c/0x90
>  ? nmi_cpu_backtrace_handler+0xd/0x20
>  ? nmi_handle+0x71/0x160
>  ? default_do_nmi+0x4e/0x100
>  ? do_nmi+0x15b/0x190
>  ? end_repeat_nmi+0x16/0x62
>  ? lapic_next_deadline+0x26/0x40
>  ? lapic_next_deadline+0x26/0x40
>  ? lapic_next_deadline+0x26/0x40
>  </NMI>
>  <IRQ>
>  clockevents_program_event+0xcf/0x100
>  tick_handle_periodic+0x48/0x60
>  smp_apic_timer_interrupt+0x6a/0x130
>  apic_timer_interrupt+0xf/0x20
>  </IRQ>
>  ? vprintk_emit+0x19b/0x280
>  ? printk+0x52/0x6e
>  ? pci_hp_add+0x189/0x300
>  ? acpiphp_register_hotplug_slot+0xe0/0xf0
>  ? acpiphp_add_context+0x3b1/0x460
>  ? acpi_ns_walk_namespace+0x108/0x220
>  ? acpiphp_put_context.part.12+0x30/0x30
>  ? acpi_walk_namespace+0xc8/0xf0
>  ? kmem_cache_alloc_trace+0x1f8/0x210
>  ? acpiphp_enumerate_slots+0x172/0x250
>  ? acpi_pci_add_bus+0x50/0xd0
>  ? pci_register_host_bridge+0x22b/0x460
>  ? pci_create_root_bus+0x87/0xb0
>  ? acpi_pci_root_create+0x128/0x290
>  ? pci_acpi_scan_root+0x14a/0x1b0
>  ? acpi_pci_root_add+0x1ff/0x580
>  ? acpi_evaluate_integer+0x52/0x90
>  ? set_debug_rodata+0x2f/0x2f
>  ? acpi_bus_attach+0x144/0x1d0
>  ? acpi_bus_attach+0x80/0x1d0
>  ? acpi_bus_attach+0x80/0x1d0
>  ? acpi_sleep_proc_init+0x24/0x24
>  ? acpi_sleep_proc_init+0x24/0x24
>  ? acpi_bus_scan+0x43/0x90
>  ? acpi_scan_init+0xf1/0x235
>  ? acpi_sleep_proc_init+0x24/0x24
>  ? acpi_init+0x2e9/0x349
>  ? do_one_initcall+0x46/0x200
>  ? kernel_init_freeable+0x1db/0x286
>  ? rest_init+0xb0/0xb0
>  ? kernel_init+0xa/0x110
>  ? ret_from_fork+0x35/0x40
>
>
> Seen in 5.10.183:
>
> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> acpiphp: Slot [19] registered
> rcu:     3-....: (349 ticks this GP) idle=936/1/0x4000000000000000 
> softirq=94/94 fqs=151
>  (detected by 0, t=14752 jiffies, g=-1063, q=591)
> Sending NMI from CPU 0 to CPUs 3:
> NMI backtrace for cpu 3
> CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.183 #1
> Hardware name: Amazon EC2 m5.2xlarge/, BIOS 1.0 10/16/2017
> RIP: 0010:io_serial_in+0x14/0x20
> RSP: 0000:ffffbc9640017720 EFLAGS: 00000006
> PKRU: 55555554
> Call Trace:
>  <NMI>
>  ? nmi_cpu_backtrace+0x57/0xb0
>  ? nmi_cpu_backtrace_handler+0xd/0x20
>  ? nmi_handle+0x62/0x130
>  ? default_do_nmi+0x49/0x100
>  ? exc_nmi+0x152/0x170
>  ? end_repeat_nmi+0x16/0x67
>  ? mem16_serial_in+0x10/0x20
>  ? io_serial_in+0x14/0x20
>  ? io_serial_in+0x14/0x20
>  ? io_serial_in+0x14/0x20
>  </NMI>
>  serial8250_console_write+0x8e/0x310
>  ? record_print_text+0xbf/0x150
>  console_unlock+0x485/0x4c0
>  vprintk_emit+0xe4/0x240
>  printk+0x52/0x72
>  ? pci_hp_add+0x18d/0x300
>  acpiphp_register_hotplug_slot+0xe4/0x100
>  acpiphp_add_context+0x3b5/0x460
>  acpi_ns_walk_namespace+0x10c/0x220
>  ? acpiphp_put_context.part.13+0x30/0x30
>  ? acpiphp_put_context.part.13+0x30/0x30
>  acpi_walk_namespace+0xd4/0xf0
>  ? kmem_cache_alloc_trace+0x3be/0x400
>  acpiphp_enumerate_slots+0x175/0x260
>  acpi_pci_add_bus+0x50/0xe0
>  pci_register_host_bridge+0x233/0x4d0
>  ? complete_all+0x2a/0x40
>  pci_create_root_bus+0x87/0xc0
>  acpi_pci_root_create+0x128/0x2a0
>  pci_acpi_scan_root+0x14e/0x1b0
>  acpi_pci_root_add+0x203/0x580
>  ? acpi_evaluate_integer+0x52/0x90
>  ? rdinit_setup+0x2f/0x2f
>  acpi_bus_attach+0x148/0x1d0
>  acpi_bus_attach+0x84/0x1d0
>  acpi_bus_attach+0x84/0x1d0
>  ? acpi_sleep_proc_init+0x24/0x24
>  ? acpi_sleep_proc_init+0x24/0x24
>  acpi_bus_scan+0x43/0x90
>  acpi_scan_init+0xff/0x247
>  ? acpi_sleep_proc_init+0x24/0x24
>  acpi_init+0x2e9/0x34d
>  do_one_initcall+0x44/0x1e0
>  kernel_init_freeable+0x229/0x286
>  ? rest_init+0xc0/0xc0
>  kernel_init+0xa/0x120
>  ret_from_fork+0x22/0x30
>
> Thanks
> Rishabh
CCing stable kernel list.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-13 18:58 ` Bhatnagar, Rishabh
@ 2023-06-14  9:14   ` gregkh
  2023-06-14 13:08     ` Luiz Capitulino
  2023-06-14  9:14   ` gregkh
  1 sibling, 1 reply; 17+ messages in thread
From: gregkh @ 2023-06-14  9:14 UTC (permalink / raw)
  To: Bhatnagar, Rishabh
  Cc: bigeasy, linux-kernel, tglx, sashal, luizcap, abuehaze, stable

On Tue, Jun 13, 2023 at 11:58:05AM -0700, Bhatnagar, Rishabh wrote:
> 
> On 6/13/23 11:49 AM, Bhatnagar, Rishabh wrote:
> > Hi Sebastian/Greg
> > 
> > We are seeing RCU stall warnings from recent stable tree updates:
> > 5.4.243, 5.10.180, 5.15.113, 6.1.31 onwards.
> > This is seen in the upstream stable trees without any downstream patches.
> > 
> > The issue is seen few minutes after booting without any workload.
> > We launch hundred's of virtual instances and this shows up in 1-2
> > instances, so its hard to reproduce.
> > Attaching a few stack traces below.
> > 
> > The issue can be seen on virtual and baremetal instances.
> > Another interesting point is we only see this on x86 based instances.
> > We also did test this on linux-mainline but were not able to reproduce
> > the issue.
> > So maybe there's a fixup or related commit that has gone in?
> > 
> > We tried bisecting the stable trees and found that after reverting the
> > below commit we couldn't reproduce this in any of the kernels
> > consistently.
> > 
> > tick/common: Align tick period with the HZ tick. [ Upstream commit
> > e9523a0d81899361214d118ad60ef76f0e92f71d ]
> > 
> > 
> > Not exactly sure how this commit is affecting all stable kernels.
> > Can you take a look at this issue and share your insight?

Does this issue also show up in 6.3.y and in 6.4-rc5?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-13 18:58 ` Bhatnagar, Rishabh
  2023-06-14  9:14   ` gregkh
@ 2023-06-14  9:14   ` gregkh
  2023-06-14  9:20     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 17+ messages in thread
From: gregkh @ 2023-06-14  9:14 UTC (permalink / raw)
  To: Bhatnagar, Rishabh
  Cc: bigeasy, linux-kernel, tglx, sashal, luizcap, abuehaze, stable

On Tue, Jun 13, 2023 at 11:58:05AM -0700, Bhatnagar, Rishabh wrote:
> 
> On 6/13/23 11:49 AM, Bhatnagar, Rishabh wrote:
> > Hi Sebastian/Greg
> > 
> > We are seeing RCU stall warnings from recent stable tree updates:
> > 5.4.243, 5.10.180, 5.15.113, 6.1.31 onwards.
> > This is seen in the upstream stable trees without any downstream patches.
> > 
> > The issue is seen few minutes after booting without any workload.
> > We launch hundred's of virtual instances and this shows up in 1-2
> > instances, so its hard to reproduce.
> > Attaching a few stack traces below.
> > 
> > The issue can be seen on virtual and baremetal instances.
> > Another interesting point is we only see this on x86 based instances.
> > We also did test this on linux-mainline but were not able to reproduce
> > the issue.
> > So maybe there's a fixup or related commit that has gone in?

Oops, missed this.

Yes, there might be, can you do 'git bisect' and track down the patch
that fixed this?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-14  9:14   ` gregkh
@ 2023-06-14  9:20     ` Sebastian Andrzej Siewior
  2023-06-14 13:10       ` Luiz Capitulino
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-06-14  9:20 UTC (permalink / raw)
  To: gregkh
  Cc: Bhatnagar, Rishabh, linux-kernel, tglx, sashal, luizcap,
	abuehaze, stable

On 2023-06-14 11:14:49 [+0200], gregkh@linuxfoundation.org wrote:
> Oops, missed this.
> 
> Yes, there might be, can you do 'git bisect' and track down the patch
> that fixed this?

There was a report of a lockup during boot in VMs yesterday. If I
remember correctly this still exists and might be related to this
report. I'm going to have a look.

> thanks,
> 
> greg k-h

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-14  9:14   ` gregkh
@ 2023-06-14 13:08     ` Luiz Capitulino
  0 siblings, 0 replies; 17+ messages in thread
From: Luiz Capitulino @ 2023-06-14 13:08 UTC (permalink / raw)
  To: gregkh, Bhatnagar, Rishabh
  Cc: bigeasy, linux-kernel, tglx, sashal, abuehaze, stable



On 2023-06-14 05:14, gregkh@linuxfoundation.org wrote:

> 
> 
> 
> On Tue, Jun 13, 2023 at 11:58:05AM -0700, Bhatnagar, Rishabh wrote:
>>
>> On 6/13/23 11:49 AM, Bhatnagar, Rishabh wrote:
>>> Hi Sebastian/Greg
>>>
>>> We are seeing RCU stall warnings from recent stable tree updates:
>>> 5.4.243, 5.10.180, 5.15.113, 6.1.31 onwards.
>>> This is seen in the upstream stable trees without any downstream patches.
>>>
>>> The issue is seen few minutes after booting without any workload.
>>> We launch hundred's of virtual instances and this shows up in 1-2
>>> instances, so its hard to reproduce.
>>> Attaching a few stack traces below.
>>>
>>> The issue can be seen on virtual and baremetal instances.
>>> Another interesting point is we only see this on x86 based instances.
>>> We also did test this on linux-mainline but were not able to reproduce
>>> the issue.
>>> So maybe there's a fixup or related commit that has gone in?
>>>
>>> We tried bisecting the stable trees and found that after reverting the
>>> below commit we couldn't reproduce this in any of the kernels
>>> consistently.
>>>
>>> tick/common: Align tick period with the HZ tick. [ Upstream commit
>>> e9523a0d81899361214d118ad60ef76f0e92f71d ]
>>>
>>>
>>> Not exactly sure how this commit is affecting all stable kernels.
>>> Can you take a look at this issue and share your insight?
> 
> Does this issue also show up in 6.3.y and in 6.4-rc5?

We haven't tried those yet, will try it today.

Just to give you a bit of context: we have a quick and a long duration
reproducer for this (which is our internal testing infrastructure).
In the quick reproducer we can more or less reliably reproduce with
5.4.246 and 5.10.183 but not with 5.15.113, 6.1.33 and latest Linus
tree (64569520920a3ca5d456ddd9f4f95fc6ea9b8b45). However, we did
reproduce something similar in the long reproducer with our downstream
versions of 5.15.113 and 6.1.33 (starting with 6.1.28).

- Luiz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-14  9:20     ` Sebastian Andrzej Siewior
@ 2023-06-14 13:10       ` Luiz Capitulino
  2023-06-14 13:45         ` Sven-Haegar Koch
  0 siblings, 1 reply; 17+ messages in thread
From: Luiz Capitulino @ 2023-06-14 13:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, gregkh
  Cc: Bhatnagar, Rishabh, linux-kernel, tglx, sashal, abuehaze, stable



On 2023-06-14 05:20, Sebastian Andrzej Siewior wrote:
  
> 
> 
> On 2023-06-14 11:14:49 [+0200], gregkh@linuxfoundation.org wrote:
>> Oops, missed this.
>>
>> Yes, there might be, can you do 'git bisect' and track down the patch
>> that fixed this?
> 
> There was a report of a lockup during boot in VMs yesterday. If I
> remember correctly this still exists and might be related to this
> report. I'm going to have a look.

Thanks, Sebastian. Do you have a link for the discussion?

- Luiz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-14 13:10       ` Luiz Capitulino
@ 2023-06-14 13:45         ` Sven-Haegar Koch
  2023-06-14 13:57           ` Luiz Capitulino
  0 siblings, 1 reply; 17+ messages in thread
From: Sven-Haegar Koch @ 2023-06-14 13:45 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Sebastian Andrzej Siewior, gregkh, Bhatnagar, Rishabh,
	linux-kernel, tglx, sashal, abuehaze, stable

On Wed, 14 Jun 2023, Luiz Capitulino wrote:

> On 2023-06-14 05:20, Sebastian Andrzej Siewior wrote:
>  
> > On 2023-06-14 11:14:49 [+0200], gregkh@linuxfoundation.org wrote:
> > > Oops, missed this.
> > > 
> > > Yes, there might be, can you do 'git bisect' and track down the patch
> > > that fixed this?
> > 
> > There was a report of a lockup during boot in VMs yesterday. If I
> > remember correctly this still exists and might be related to this
> > report. I'm going to have a look.
> 
> Thanks, Sebastian. Do you have a link for the discussion?

May be this, talking about the same commit as cause as this thread:

Subject: Re: [PATCH] timekeeping: Align tick_sched_timer() with the HZ 
tick. -- regression report
https://lore.kernel.org/lkml/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net/

May not have been the best idea to respond with such big analysis to a 3 
months old dead thread, gets lost extremely easy.

c'ya
sven-haegar

-- 
Three may keep a secret, if two of them are dead.
- Ben F.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-14 13:45         ` Sven-Haegar Koch
@ 2023-06-14 13:57           ` Luiz Capitulino
  2023-06-14 14:30             ` Mathias Krause
  0 siblings, 1 reply; 17+ messages in thread
From: Luiz Capitulino @ 2023-06-14 13:57 UTC (permalink / raw)
  To: Sven-Haegar Koch
  Cc: Sebastian Andrzej Siewior, gregkh, Bhatnagar, Rishabh,
	linux-kernel, tglx, sashal, abuehaze, stable, minipli



On 2023-06-14 09:45, Sven-Haegar Koch wrote:

> 
> 
> 
> On Wed, 14 Jun 2023, Luiz Capitulino wrote:
> 
>> On 2023-06-14 05:20, Sebastian Andrzej Siewior wrote:
>>
>>> On 2023-06-14 11:14:49 [+0200], gregkh@linuxfoundation.org wrote:
>>>> Oops, missed this.
>>>>
>>>> Yes, there might be, can you do 'git bisect' and track down the patch
>>>> that fixed this?
>>>
>>> There was a report of a lockup during boot in VMs yesterday. If I
>>> remember correctly this still exists and might be related to this
>>> report. I'm going to have a look.
>>
>> Thanks, Sebastian. Do you have a link for the discussion?
> 
> May be this, talking about the same commit as cause as this thread:
> 
> Subject: Re: [PATCH] timekeeping: Align tick_sched_timer() with the HZ
> tick. -- regression report
> https://lore.kernel.org/lkml/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net/

Thank you, Sven.

Sebastian, except for the detailed analysis which we haven't done yet, the
issue described by Mathias matches 100% what we're observing. Also, we do
observe this on bare-metal instances which could mean that the initial
reports are against VMs because those are rebooted more often (our quick
reproducer boots hundreds of instances in AWS and only 1 or 2 reproduces this).

IMHO, I'd suggest we revert this for now from Linus tree and stable trees.
We can help testing for the fix maybe for the next merge window.

- Luiz

> 
> May not have been the best idea to respond with such big analysis to a 3
> months old dead thread, gets lost extremely easy.
> 
> c'ya
> sven-haegar
> 
> --
> Three may keep a secret, if two of them are dead.
> - Ben F.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees
  2023-06-14 13:57           ` Luiz Capitulino
@ 2023-06-14 14:30             ` Mathias Krause
  2023-06-15  9:18               ` [PATCH] tick/common: Align tick period during sched_timer setup Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Mathias Krause @ 2023-06-14 14:30 UTC (permalink / raw)
  To: Luiz Capitulino, Sven-Haegar Koch
  Cc: Sebastian Andrzej Siewior, gregkh, Bhatnagar, Rishabh,
	linux-kernel, tglx, sashal, abuehaze, stable

On 14.06.23 15:57, Luiz Capitulino wrote:
> On 2023-06-14 09:45, Sven-Haegar Koch wrote:
>> May be this, talking about the same commit as cause as this thread:
>>
>> Subject: Re: [PATCH] timekeeping: Align tick_sched_timer() with the HZ
>> tick. -- regression report
>> https://lore.kernel.org/lkml/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net/
> 
> Thank you, Sven.
> 
> Sebastian, except for the detailed analysis which we haven't done yet, the
> issue described by Mathias matches 100% what we're observing. Also, we do
> observe this on bare-metal instances which could mean that the initial
> reports are against VMs because those are rebooted more often (our quick
> reproducer boots hundreds of instances in AWS and only 1 or 2 reproduces
> this).

Yeah, we're doing VM based testing more often than bare metal -- less so
on a AWS scale. That's why we observed it first in VMs. But that wasn't
meant to exclude bare metal, not at all. It's just, that we haven't
tried hard enough yet and testing VMs is so much more pleasant when it
comes to debugging boot issues ;)

Thanks,
Mathias

> IMHO, I'd suggest we revert this for now from Linus tree and stable trees.
> We can help testing for the fix maybe for the next merge window.
> 
> - Luiz
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] tick/common: Align tick period during sched_timer setup.
  2023-06-14 14:30             ` Mathias Krause
@ 2023-06-15  9:18               ` Sebastian Andrzej Siewior
  2023-06-15 11:13                 ` Mathias Krause
                                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-06-15  9:18 UTC (permalink / raw)
  To: Mathias Krause
  Cc: Luiz Capitulino, Sven-Haegar Koch, gregkh, Bhatnagar, Rishabh,
	linux-kernel, tglx, sashal, abuehaze, stable,
	Frederic Weisbecker, Ingo Molnar

From: Thomas Gleixner <tglx@linutronix.de>

The tick period is aligned very early while the first clock_event_device
is registered. The system runs in periodic mode and switches later to
one-shot mode if possible.

The next wake-up event is programmed based on aligned value
(tick_next_period) but the delta value, that is used to program the
clock_event_device, is computed based on ktime_get().

With the subtracted offset, the devices fires in less than the exacted
time frame. With a large enough offset the system programs the timer for
the next wake-up and the remaining time left is too little to make any
boot progress. The system hangs.

Move the alignment later to the setup of tick_sched timer. At this point
the system switches to oneshot mode and a highres clocksource is
available. It safe to update tick_next_period ktime_get() will now
return accurate (not jiffies based) time.

[bigeasy: Patch description + testing].

Reported-by: Mathias Krause <minipli@grsecurity.net>
Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
---
 kernel/time/tick-common.c | 11 +----------
 kernel/time/tick-sched.c  | 13 ++++++++++++-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 65b8658da829e..b85f2f9c32426 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
 		 * this cpu:
 		 */
 		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
-			ktime_t next_p;
-			u32 rem;
 
 			tick_do_timer_cpu = cpu;
 
-			next_p = ktime_get();
-			div_u64_rem(next_p, TICK_NSEC, &rem);
-			if (rem) {
-				next_p -= rem;
-				next_p += TICK_NSEC;
-			}
-
-			tick_next_period = next_p;
+			tick_next_period = ktime_get();
 #ifdef CONFIG_NO_HZ_FULL
 			/*
 			 * The boot CPU may be nohz_full, in which case set
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 52254679ec489..42c0be3080bde 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
 	raw_spin_lock(&jiffies_lock);
 	write_seqcount_begin(&jiffies_seq);
 	/* Did we start the jiffies update yet ? */
-	if (last_jiffies_update == 0)
+	if (last_jiffies_update == 0) {
+		u32 rem;
+
+		/*
+		 * Ensure that the tick is aligned to a multiple of
+		 * TICK_NSEC.
+		 */
+		div_u64_rem(tick_next_period, TICK_NSEC, &rem);
+		if (rem)
+			tick_next_period += TICK_NSEC - rem;
+
 		last_jiffies_update = tick_next_period;
+	}
 	period = last_jiffies_update;
 	write_seqcount_end(&jiffies_seq);
 	raw_spin_unlock(&jiffies_lock);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] tick/common: Align tick period during sched_timer setup.
  2023-06-15  9:18               ` [PATCH] tick/common: Align tick period during sched_timer setup Sebastian Andrzej Siewior
@ 2023-06-15 11:13                 ` Mathias Krause
  2023-06-15 12:20                 ` Richard W.M. Jones
                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Mathias Krause @ 2023-06-15 11:13 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Luiz Capitulino, Sven-Haegar Koch, gregkh, Bhatnagar, Rishabh,
	linux-kernel, tglx, sashal, abuehaze, stable,
	Frederic Weisbecker, Ingo Molnar


On 15.06.23 11:18, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
> 
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
> 
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
> 
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
> 
> [bigeasy: Patch description + testing].
> 
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")

Cc: stable, maybe? This commit already ended up in quite a few "stable"
kernels (v6.3.2, v6.2.15, v6.1.28, v5.15.111, v5.10.180 and v5.4.243)
and it might be better to list them explicitly to avoid one of them
getting missed.

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
> ---
>  kernel/time/tick-common.c | 11 +----------
>  kernel/time/tick-sched.c  | 13 ++++++++++++-
>  2 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 65b8658da829e..b85f2f9c32426 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
>  		 * this cpu:
>  		 */
>  		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> -			ktime_t next_p;
> -			u32 rem;
>  
>  			tick_do_timer_cpu = cpu;
>  
> -			next_p = ktime_get();
> -			div_u64_rem(next_p, TICK_NSEC, &rem);
> -			if (rem) {
> -				next_p -= rem;
> -				next_p += TICK_NSEC;
> -			}
> -
> -			tick_next_period = next_p;
> +			tick_next_period = ktime_get();
>  #ifdef CONFIG_NO_HZ_FULL
>  			/*
>  			 * The boot CPU may be nohz_full, in which case set
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec489..42c0be3080bde 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
>  	raw_spin_lock(&jiffies_lock);
>  	write_seqcount_begin(&jiffies_seq);
>  	/* Did we start the jiffies update yet ? */
> -	if (last_jiffies_update == 0)
> +	if (last_jiffies_update == 0) {
> +		u32 rem;
> +
> +		/*
> +		 * Ensure that the tick is aligned to a multiple of
> +		 * TICK_NSEC.
> +		 */
> +		div_u64_rem(tick_next_period, TICK_NSEC, &rem);
> +		if (rem)
> +			tick_next_period += TICK_NSEC - rem;
> +
>  		last_jiffies_update = tick_next_period;
> +	}
>  	period = last_jiffies_update;
>  	write_seqcount_end(&jiffies_seq);
>  	raw_spin_unlock(&jiffies_lock);

Hah, nice spot. So you implemented what I suggested and it, indeed,
works as expected, thereby:

Tested-by: Mathias Krause <minipli@grsecurity.net>

Thanks,
Mathias

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tick/common: Align tick period during sched_timer setup.
  2023-06-15  9:18               ` [PATCH] tick/common: Align tick period during sched_timer setup Sebastian Andrzej Siewior
  2023-06-15 11:13                 ` Mathias Krause
@ 2023-06-15 12:20                 ` Richard W.M. Jones
  2023-06-15 14:15                 ` Luiz Capitulino
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Richard W.M. Jones @ 2023-06-15 12:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Mathias Krause, Luiz Capitulino, Sven-Haegar Koch, gregkh,
	Bhatnagar, Rishabh, linux-kernel, tglx, sashal, abuehaze, stable,
	Frederic Weisbecker, Ingo Molnar

On Thu, Jun 15, 2023 at 11:18:30AM +0200, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
> 
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
> 
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
> 
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
> 
> [bigeasy: Patch description + testing].
> 
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com

Tested-by: Richard W.M. Jones <rjones@redhat.com>

... fixing this bug which we thought originally was in qemu, then in
an unrelated kernel commit:

https://gitlab.com/qemu-project/qemu/-/issues/1696
https://lore.kernel.org/all/20230613134105.GA10301@redhat.com/

Rich.

>  kernel/time/tick-common.c | 11 +----------
>  kernel/time/tick-sched.c  | 13 ++++++++++++-
>  2 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 65b8658da829e..b85f2f9c32426 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
>  		 * this cpu:
>  		 */
>  		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> -			ktime_t next_p;
> -			u32 rem;
>  
>  			tick_do_timer_cpu = cpu;
>  
> -			next_p = ktime_get();
> -			div_u64_rem(next_p, TICK_NSEC, &rem);
> -			if (rem) {
> -				next_p -= rem;
> -				next_p += TICK_NSEC;
> -			}
> -
> -			tick_next_period = next_p;
> +			tick_next_period = ktime_get();
>  #ifdef CONFIG_NO_HZ_FULL
>  			/*
>  			 * The boot CPU may be nohz_full, in which case set
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec489..42c0be3080bde 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
>  	raw_spin_lock(&jiffies_lock);
>  	write_seqcount_begin(&jiffies_seq);
>  	/* Did we start the jiffies update yet ? */
> -	if (last_jiffies_update == 0)
> +	if (last_jiffies_update == 0) {
> +		u32 rem;
> +
> +		/*
> +		 * Ensure that the tick is aligned to a multiple of
> +		 * TICK_NSEC.
> +		 */
> +		div_u64_rem(tick_next_period, TICK_NSEC, &rem);
> +		if (rem)
> +			tick_next_period += TICK_NSEC - rem;
> +
>  		last_jiffies_update = tick_next_period;
> +	}
>  	period = last_jiffies_update;
>  	write_seqcount_end(&jiffies_seq);
>  	raw_spin_unlock(&jiffies_lock);
> -- 
> 2.40.1

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tick/common: Align tick period during sched_timer setup.
  2023-06-15  9:18               ` [PATCH] tick/common: Align tick period during sched_timer setup Sebastian Andrzej Siewior
  2023-06-15 11:13                 ` Mathias Krause
  2023-06-15 12:20                 ` Richard W.M. Jones
@ 2023-06-15 14:15                 ` Luiz Capitulino
  2023-06-15 16:00                 ` SeongJae Park
  2023-06-19  6:18                 ` Greg KH
  4 siblings, 0 replies; 17+ messages in thread
From: Luiz Capitulino @ 2023-06-15 14:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Mathias Krause
  Cc: Sven-Haegar Koch, gregkh, Bhatnagar, Rishabh, linux-kernel, tglx,
	sashal, abuehaze, stable, Frederic Weisbecker, Ingo Molnar



On 2023-06-15 05:18, Sebastian Andrzej Siewior wrote:

> 
> 
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
> 
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
> 
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
> 
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
> 
> [bigeasy: Patch description + testing].
> 
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com


I've tested this against 5.10.184 (which is where it reproduces quick
for me):

Tested-by: Luiz Capitulino <luizcap@amazon.com>

> ---
>   kernel/time/tick-common.c | 11 +----------
>   kernel/time/tick-sched.c  | 13 ++++++++++++-
>   2 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 65b8658da829e..b85f2f9c32426 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
>                   * this cpu:
>                   */
>                  if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> -                       ktime_t next_p;
> -                       u32 rem;
> 
>                          tick_do_timer_cpu = cpu;
> 
> -                       next_p = ktime_get();
> -                       div_u64_rem(next_p, TICK_NSEC, &rem);
> -                       if (rem) {
> -                               next_p -= rem;
> -                               next_p += TICK_NSEC;
> -                       }
> -
> -                       tick_next_period = next_p;
> +                       tick_next_period = ktime_get();
>   #ifdef CONFIG_NO_HZ_FULL
>                          /*
>                           * The boot CPU may be nohz_full, in which case set
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec489..42c0be3080bde 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
>          raw_spin_lock(&jiffies_lock);
>          write_seqcount_begin(&jiffies_seq);
>          /* Did we start the jiffies update yet ? */
> -       if (last_jiffies_update == 0)
> +       if (last_jiffies_update == 0) {
> +               u32 rem;
> +
> +               /*
> +                * Ensure that the tick is aligned to a multiple of
> +                * TICK_NSEC.
> +                */
> +               div_u64_rem(tick_next_period, TICK_NSEC, &rem);
> +               if (rem)
> +                       tick_next_period += TICK_NSEC - rem;
> +
>                  last_jiffies_update = tick_next_period;
> +       }
>          period = last_jiffies_update;
>          write_seqcount_end(&jiffies_seq);
>          raw_spin_unlock(&jiffies_lock);
> --
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tick/common: Align tick period during sched_timer setup.
  2023-06-15  9:18               ` [PATCH] tick/common: Align tick period during sched_timer setup Sebastian Andrzej Siewior
                                   ` (2 preceding siblings ...)
  2023-06-15 14:15                 ` Luiz Capitulino
@ 2023-06-15 16:00                 ` SeongJae Park
  2023-06-19  6:18                 ` Greg KH
  4 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2023-06-15 16:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Mathias Krause, Luiz Capitulino, Sven-Haegar Koch, gregkh,
	Bhatnagar, Rishabh, linux-kernel, tglx, sashal, abuehaze, stable,
	Frederic Weisbecker, Ingo Molnar, SeongJae Park

On Thu, 15 Jun 2023 11:18:30 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
> 
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
> 
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
> 
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
> 
> [bigeasy: Patch description + testing].
> 
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com

I guess adding 'Cc: stable@' might further help stable maintainers?

I also left one very tirival cosmetic comment below, but I dont think those
could be blockers.

Acked-by: SeongJae Park <sj@kernel.org>


Thanks,
SJ

> ---
>  kernel/time/tick-common.c | 11 +----------
>  kernel/time/tick-sched.c  | 13 ++++++++++++-
>  2 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 65b8658da829e..b85f2f9c32426 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -218,19 +218,10 @@ static void tick_setup_device(struct tick_device *td,
>  		 * this cpu:
>  		 */
>  		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
> -			ktime_t next_p;
> -			u32 rem;
>  

Nit: I guess we'd like to remove above one blank line together?

>  			tick_do_timer_cpu = cpu;
>  
> -			next_p = ktime_get();
> -			div_u64_rem(next_p, TICK_NSEC, &rem);
> -			if (rem) {
> -				next_p -= rem;
> -				next_p += TICK_NSEC;
> -			}
> -
> -			tick_next_period = next_p;
> +			tick_next_period = ktime_get();
>  #ifdef CONFIG_NO_HZ_FULL
>  			/*
>  			 * The boot CPU may be nohz_full, in which case set
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 52254679ec489..42c0be3080bde 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -161,8 +161,19 @@ static ktime_t tick_init_jiffy_update(void)
>  	raw_spin_lock(&jiffies_lock);
>  	write_seqcount_begin(&jiffies_seq);
>  	/* Did we start the jiffies update yet ? */
> -	if (last_jiffies_update == 0)
> +	if (last_jiffies_update == 0) {
> +		u32 rem;
> +
> +		/*
> +		 * Ensure that the tick is aligned to a multiple of
> +		 * TICK_NSEC.
> +		 */
> +		div_u64_rem(tick_next_period, TICK_NSEC, &rem);
> +		if (rem)
> +			tick_next_period += TICK_NSEC - rem;
> +
>  		last_jiffies_update = tick_next_period;
> +	}
>  	period = last_jiffies_update;
>  	write_seqcount_end(&jiffies_seq);
>  	raw_spin_unlock(&jiffies_lock);
> -- 
> 2.40.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tick/common: Align tick period during sched_timer setup.
  2023-06-15  9:18               ` [PATCH] tick/common: Align tick period during sched_timer setup Sebastian Andrzej Siewior
                                   ` (3 preceding siblings ...)
  2023-06-15 16:00                 ` SeongJae Park
@ 2023-06-19  6:18                 ` Greg KH
  2023-06-19  7:03                   ` Mathias Krause
  4 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2023-06-19  6:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Mathias Krause, Luiz Capitulino, Sven-Haegar Koch, Bhatnagar,
	Rishabh, linux-kernel, tglx, sashal, abuehaze, stable,
	Frederic Weisbecker, Ingo Molnar

On Thu, Jun 15, 2023 at 11:18:30AM +0200, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The tick period is aligned very early while the first clock_event_device
> is registered. The system runs in periodic mode and switches later to
> one-shot mode if possible.
> 
> The next wake-up event is programmed based on aligned value
> (tick_next_period) but the delta value, that is used to program the
> clock_event_device, is computed based on ktime_get().
> 
> With the subtracted offset, the devices fires in less than the exacted
> time frame. With a large enough offset the system programs the timer for
> the next wake-up and the remaining time left is too little to make any
> boot progress. The system hangs.
> 
> Move the alignment later to the setup of tick_sched timer. At this point
> the system switches to oneshot mode and a highres clocksource is
> available. It safe to update tick_next_period ktime_get() will now
> return accurate (not jiffies based) time.
> 
> [bigeasy: Patch description + testing].
> 
> Reported-by: Mathias Krause <minipli@grsecurity.net>
> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
> ---
>  kernel/time/tick-common.c | 11 +----------
>  kernel/time/tick-sched.c  | 13 ++++++++++++-
>  2 files changed, 13 insertions(+), 11 deletions(-)

What's the status of this fix, I didn't see it in -rc7, am I looking in
the wrong place?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] tick/common: Align tick period during sched_timer setup.
  2023-06-19  6:18                 ` Greg KH
@ 2023-06-19  7:03                   ` Mathias Krause
  0 siblings, 0 replies; 17+ messages in thread
From: Mathias Krause @ 2023-06-19  7:03 UTC (permalink / raw)
  To: Greg KH, Sebastian Andrzej Siewior
  Cc: Luiz Capitulino, Sven-Haegar Koch, Bhatnagar, Rishabh,
	linux-kernel, tglx, sashal, abuehaze, stable,
	Frederic Weisbecker, Ingo Molnar

On 19.06.23 08:18, Greg KH wrote:
> On Thu, Jun 15, 2023 at 11:18:30AM +0200, Sebastian Andrzej Siewior wrote:
>> From: Thomas Gleixner <tglx@linutronix.de>
>>
>> The tick period is aligned very early while the first clock_event_device
>> is registered. The system runs in periodic mode and switches later to
>> one-shot mode if possible.
>>
>> The next wake-up event is programmed based on aligned value
>> (tick_next_period) but the delta value, that is used to program the
>> clock_event_device, is computed based on ktime_get().
>>
>> With the subtracted offset, the devices fires in less than the exacted
>> time frame. With a large enough offset the system programs the timer for
>> the next wake-up and the remaining time left is too little to make any
>> boot progress. The system hangs.
>>
>> Move the alignment later to the setup of tick_sched timer. At this point
>> the system switches to oneshot mode and a highres clocksource is
>> available. It safe to update tick_next_period ktime_get() will now
>> return accurate (not jiffies based) time.
>>
>> [bigeasy: Patch description + testing].
>>
>> Reported-by: Mathias Krause <minipli@grsecurity.net>
>> Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
>> Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
>> Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
>> ---
>>  kernel/time/tick-common.c | 11 +----------
>>  kernel/time/tick-sched.c  | 13 ++++++++++++-
>>  2 files changed, 13 insertions(+), 11 deletions(-)
> 
> What's the status of this fix, I didn't see it in -rc7, am I looking in
> the wrong place?

It's in the tip tree since Friday, but yeah, no pull request yet:

https://git.kernel.org/tip/13bb06f8dd42

Thanks,
Mathias

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-19  7:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-13 18:49 Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees Bhatnagar, Rishabh
2023-06-13 18:58 ` Bhatnagar, Rishabh
2023-06-14  9:14   ` gregkh
2023-06-14 13:08     ` Luiz Capitulino
2023-06-14  9:14   ` gregkh
2023-06-14  9:20     ` Sebastian Andrzej Siewior
2023-06-14 13:10       ` Luiz Capitulino
2023-06-14 13:45         ` Sven-Haegar Koch
2023-06-14 13:57           ` Luiz Capitulino
2023-06-14 14:30             ` Mathias Krause
2023-06-15  9:18               ` [PATCH] tick/common: Align tick period during sched_timer setup Sebastian Andrzej Siewior
2023-06-15 11:13                 ` Mathias Krause
2023-06-15 12:20                 ` Richard W.M. Jones
2023-06-15 14:15                 ` Luiz Capitulino
2023-06-15 16:00                 ` SeongJae Park
2023-06-19  6:18                 ` Greg KH
2023-06-19  7:03                   ` Mathias Krause

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).