All of lore.kernel.org
 help / color / mirror / Atom feed
* x86 on dovetail: Stress-ng gets my system down
@ 2021-05-03  9:45 Bezdeka, Florian
  2021-05-03 10:01 ` Henning Schild
  2021-05-03 13:09 ` Philippe Gerum
  0 siblings, 2 replies; 6+ messages in thread
From: Bezdeka, Florian @ 2021-05-03  9:45 UTC (permalink / raw)
  To: xenomai

Hi,

while trying to debug one of the Xenomai 3.2 issues listed at [1] I run
into the situation described below on my x86 system. The problem (or at
least the "system hang" is reproducible on real hardware and on
qemu/kvm.

Once the system is frozen, attaching GDB to the qemu process shows me:

(gdb) info threads
  Id   Target Id                  Frame 
* 1    Thread 1 (CPU#0 [running]) csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
  2    Thread 2 (CPU#1 [running]) 0x0000564f0b05d36d in ?? ()
  3    Thread 3 (CPU#2 [running]) csd_lock_wait (csd=0xffff88803e82f1e0) at kernel/smp.c:228
  4    Thread 4 (CPU#3 [running]) csd_lock_wait (csd=0xffff88803e82f200) at kernel/smp.c:228

So three of my CPUs are waiting for other CPUs to complete a function
call IPI. It looks like CPU1 is not responding anymore. The system is
completely unusable at this point.

(gdb) bt
#0  csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
#1  smp_call_function_many_cond (mask=mask@entry=0xffff88800448c340, func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>, wait=wait@entry=true, 
    cond_func=cond_func@entry=0xffffffff810550b0 <tlb_is_not_lazy>) at kernel/smp.c:693
#2  0xffffffff810f56f5 in on_each_cpu_cond_mask (cond_func=cond_func@entry=0xffffffff810550b0 <tlb_is_not_lazy>, func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>, wait=wait@entry=true, 
    mask=mask@entry=0xffff88800448c340) at kernel/smp.c:904
#3  0xffffffff81055538 in native_flush_tlb_others (cpumask=cpumask@entry=0xffff88800448c340, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>) at arch/x86/mm/tlb.c:840
#4  0xffffffff81055fac in flush_tlb_others (info=0xffffffff8200acc0 <full_flush_tlb_info>, cpumask=0xffff88800448c340) at arch/x86/mm/tlb.c:1170
#5  arch_tlbbatch_flush (batch=batch@entry=0xffff88800448c340) at arch/x86/mm/tlb.c:1170
#6  0xffffffff811ae3e1 in try_to_unmap_flush () at mm/rmap.c:602
#7  0xffffffff8117d9d3 in shrink_page_list (page_list=page_list@entry=0xffffc9000306f910, pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18, stat=stat@entry=0xffffc9000306f924, ignore_references=ignore_references@entry=false) at mm/vmscan.c:1487
#8  0xffffffff8117f79c in shrink_inactive_list (nr_to_scan=<optimized out>, lruvec=lruvec@entry=0xffff88803ffde508, sc=sc@entry=0xffffc9000306fb18, lru=lru@entry=LRU_INACTIVE_FILE) at mm/vmscan.c:1962
#9  0xffffffff811800dc in shrink_list (sc=0xffffc9000306fb18, lruvec=0xffff88803ffde508, nr_to_scan=<optimized out>, lru=<optimized out>) at mm/vmscan.c:2169
#10 shrink_lruvec (lruvec=lruvec@entry=0xffff88803ffde508, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2464
#11 0xffffffff81180374 in shrink_node_memcgs (sc=0xffffc9000306fb18, pgdat=0xffff88803ffdb000) at mm/vmscan.c:2652
#12 shrink_node (pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2769
#13 0xffffffff811806c8 in shrink_zones (sc=0xffffc9000306fb18, zonelist=0xffff88803ffdc400) at mm/vmscan.c:2972
#14 do_try_to_free_pages (zonelist=zonelist@entry=0xffff88803ffdc400, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:3027
#15 0xffffffff811817f6 in try_to_free_pages (zonelist=0xffff88803ffdc400, order=order@entry=1, gfp_mask=gfp_mask@entry=4197824, nodemask=<optimized out>) at mm/vmscan.c:3266
#16 0xffffffff811ba411 in __perform_reclaim (ac=0xffffc9000306fc90, ac=0xffffc9000306fc90, order=1, gfp_mask=4197824) at mm/page_alloc.c:4335
#17 __alloc_pages_direct_reclaim (did_some_progress=<synthetic pointer>, ac=0xffffc9000306fc90, alloc_flags=2112, order=1, gfp_mask=4197824) at mm/page_alloc.c:4356
#18 __alloc_pages_slowpath (gfp_mask=<optimized out>, gfp_mask@entry=4197824, order=order@entry=1, ac=ac@entry=0xffffc9000306fc90) at mm/page_alloc.c:4760
#19 0xffffffff811baf44 in __alloc_pages_nodemask (gfp_mask=<optimized out>, gfp_mask@entry=4197824, order=order@entry=1, preferred_nid=<optimized out>, nodemask=0x0 <fixed_percpu_data>) at mm/page_alloc.c:4970
#20 0xffffffff811ce039 in alloc_pages_current (gfp=gfp@entry=4197824, order=order@entry=1) at ./include/linux/topology.h:88
#21 0xffffffff811b6248 in alloc_pages (order=order@entry=1, gfp_mask=4197824) at ./include/linux/gfp.h:547
#22 __get_free_pages (gfp_mask=gfp_mask@entry=4197824, order=order@entry=1) at mm/page_alloc.c:4994
#23 0xffffffff8105482c in _pgd_alloc () at arch/x86/mm/pgtable.c:430
#24 pgd_alloc (mm=mm@entry=0xffff88800315e400) at arch/x86/mm/pgtable.c:430
#25 0xffffffff8105efae in mm_alloc_pgd (mm=0xffff88800315e400) at kernel/fork.c:1054
#26 mm_init (mm=mm@entry=0xffff88800315e400, user_ns=<optimized out>, p=0xffff888002bbc880) at kernel/fork.c:1054
#27 0xffffffff8105f624 in dup_mm (oldmm=0xffff888004efa800, tsk=0xffff888002bbc880) at kernel/fork.c:1369
#28 0xffffffff810616a5 in copy_mm (tsk=0xffff888002bbc880, clone_flags=0) at ./arch/x86/include/asm/current.h:15
#29 copy_process (pid=pid@entry=0x0 <fixed_percpu_data>, trace=trace@entry=0, node=node@entry=-1, args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2110
#30 0xffffffff81061934 in kernel_clone (args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2471
#31 0xffffffff81061c8f in __do_sys_fork (__unused=<optimized out>) at kernel/fork.c:2534
#32 0xffffffff81b41693 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000306ff58) at arch/x86/entry/common.c:55
#33 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
#34 0x00000000000526aa in ?? ()
#35 0x0000564f0b0bc2b0 in ?? ()
#36 0x0000000000000001 in fixed_percpu_data ()
#37 0x00007ffe0ee549f0 in ?? ()
#38 0x0000564f0b657260 in ?? ()
#39 0x0000000000000000 in ?? ()


Kernel-Config: Attached. It's a x86_64 defconfig with the following
modifications:
 - CONFIG_XENOMAI disabled
 - CONFIG_DOVETAIL disabled
 - CONFIG_MIGRATION disabled
 - CONFIG_DEBUG_INFO enabled (to be able to debug)
 - CONFIG_GDB_SCRIPTS enabled (debugging...)

I disabled Xenomai and Dovetail to limit the search scope. The problem
remains reproducible without them.


Workload:
Stressing the system with stress-ng. After 45 to 60 minutes the system
is frozen.
cmdline: stress-ng --cpu 4 --io 2 --vm 2 --vm-bytes 128M --fork 4 --timeout 0


IRQ flag:
All CPUs (or gdb threads) waiting at kernel/smp.c:228 have the IF flag
(part of eflag register) unset, while other CPUs have it set:

(gdb) info register
eflags         0x2                 [ ]

vs

eflags         0x202               [ IF ]


smp_call_function_many_cond() has some notes about deadlocks that might
appear when being called with IRQs disabled, but I actually never saw
one of the warnings that should come up. As IF flag is unset, someone
has to turn off IRQs later (while waiting) and that might be the reason
for the deadlock.

Ideas / feedback / advice welcome. Thanks!

Best regards,
Florian

[1] https://gitlab.com/Xenomai/xenomai-hacker-space/-/issues/16
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT23394.config_pipe_nodovetail_noxeno
URL: <http://xenomai.org/pipermail/xenomai/attachments/20210503/73c26aac/attachment.ksh>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: x86 on dovetail: Stress-ng gets my system down
  2021-05-03  9:45 x86 on dovetail: Stress-ng gets my system down Bezdeka, Florian
@ 2021-05-03 10:01 ` Henning Schild
  2021-05-03 10:06   ` Bezdeka, Florian
  2021-05-03 13:09 ` Philippe Gerum
  1 sibling, 1 reply; 6+ messages in thread
From: Henning Schild @ 2021-05-03 10:01 UTC (permalink / raw)
  To: Bezdeka, Florian (T RDA IOT SES-DE); +Cc: xenomai, rpm, Kiszka, Jan (T RDA IOT)

Am Mon, 3 May 2021 11:45:52 +0200
schrieb "Bezdeka, Florian (T RDA IOT SES-DE)"
<florian.bezdeka@siemens.com>:

> Hi,
> 
> while trying to debug one of the Xenomai 3.2 issues listed at [1] I
> run into the situation described below on my x86 system. The problem
> (or at least the "system hang" is reproducible on real hardware and on
> qemu/kvm.
> 
> Once the system is frozen, attaching GDB to the qemu process shows me:
> 
> (gdb) info threads
>   Id   Target Id                  Frame
> * 1    Thread 1 (CPU#0 [running]) csd_lock_wait
> (csd=0xffff88803e92ea00) at kernel/smp.c:228 2    Thread 2 (CPU#1
> [running]) 0x0000564f0b05d36d in ?? () 3    Thread 3 (CPU#2
> [running]) csd_lock_wait (csd=0xffff88803e82f1e0) at kernel/smp.c:228
> 4    Thread 4 (CPU#3 [running]) csd_lock_wait
> (csd=0xffff88803e82f200) at kernel/smp.c:228
> 
> So three of my CPUs are waiting for other CPUs to complete a function
> call IPI. It looks like CPU1 is not responding anymore. The system is
> completely unusable at this point.
> 
> (gdb) bt
> #0  csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
> #1  smp_call_function_many_cond (mask=mask@entry=0xffff88800448c340,
> func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>,
> info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>,
> wait=wait@entry=true, cond_func=cond_func@entry=0xffffffff810550b0
> <tlb_is_not_lazy>) at kernel/smp.c:693 #2  0xffffffff810f56f5 in
> on_each_cpu_cond_mask (cond_func=cond_func@entry=0xffffffff810550b0
> <tlb_is_not_lazy>, func=func@entry=0xffffffff81055bb0
> <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0
> <full_flush_tlb_info>, wait=wait@entry=true,
> mask=mask@entry=0xffff88800448c340) at kernel/smp.c:904 #3
> 0xffffffff81055538 in native_flush_tlb_others
> (cpumask=cpumask@entry=0xffff88800448c340,
> info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>) at
> arch/x86/mm/tlb.c:840 #4  0xffffffff81055fac in flush_tlb_others
> (info=0xffffffff8200acc0 <full_flush_tlb_info>,
> cpumask=0xffff88800448c340) at arch/x86/mm/tlb.c:1170 #5
> arch_tlbbatch_flush (batch=batch@entry=0xffff88800448c340) at
> arch/x86/mm/tlb.c:1170 #6  0xffffffff811ae3e1 in try_to_unmap_flush
> () at mm/rmap.c:602 #7  0xffffffff8117d9d3 in shrink_page_list
> (page_list=page_list@entry=0xffffc9000306f910,
> pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18,
> stat=stat@entry=0xffffc9000306f924,
> ignore_references=ignore_references@entry=false) at mm/vmscan.c:1487
> #8  0xffffffff8117f79c in shrink_inactive_list (nr_to_scan=<optimized
> out>, lruvec=lruvec@entry=0xffff88803ffde508,
> out>sc=sc@entry=0xffffc9000306fb18, lru=lru@entry=LRU_INACTIVE_FILE)
> out>at mm/vmscan.c:1962 #9  0xffffffff811800dc in shrink_list
> out>(sc=0xffffc9000306fb18, lruvec=0xffff88803ffde508,
> out>nr_to_scan=<optimized out>, lru=<optimized out>) at
> out>mm/vmscan.c:2169 #10 shrink_lruvec
> out>(lruvec=lruvec@entry=0xffff88803ffde508,
> out>sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2464 #11
> out>0xffffffff81180374 in shrink_node_memcgs (sc=0xffffc9000306fb18,
> out>pgdat=0xffff88803ffdb000) at mm/vmscan.c:2652 #12 shrink_node
> out>(pgdat=pgdat@entry=0xffff88803ffdb000,
> out>sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2769 #13
> out>0xffffffff811806c8 in shrink_zones (sc=0xffffc9000306fb18,
> out>zonelist=0xffff88803ffdc400) at mm/vmscan.c:2972 #14
> out>do_try_to_free_pages (zonelist=zonelist@entry=0xffff88803ffdc400,
> out>sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:3027 #15
> out>0xffffffff811817f6 in try_to_free_pages
> out>(zonelist=0xffff88803ffdc400, order=order@entry=1,
> out>gfp_mask=gfp_mask@entry=4197824, nodemask=<optimized out>) at
> out>mm/vmscan.c:3266 #16 0xffffffff811ba411 in __perform_reclaim
> out>(ac=0xffffc9000306fc90, ac=0xffffc9000306fc90, order=1,
> out>gfp_mask=4197824) at mm/page_alloc.c:4335 #17
> out>__alloc_pages_direct_reclaim (did_some_progress=<synthetic
> out>pointer>, ac=0xffffc9000306fc90, alloc_flags=2112, order=1,
> out>pointer>gfp_mask=4197824) at mm/page_alloc.c:4356 #18
> out>pointer>__alloc_pages_slowpath (gfp_mask=<optimized out>,
> out>pointer>gfp_mask@entry=4197824, order=order@entry=1,
> out>pointer>ac=ac@entry=0xffffc9000306fc90) at mm/page_alloc.c:4760
> out>pointer>#19 0xffffffff811baf44 in __alloc_pages_nodemask
> out>pointer>(gfp_mask=<optimized out>, gfp_mask@entry=4197824,
> out>pointer>order=order@entry=1, preferred_nid=<optimized out>,
> out>pointer>nodemask=0x0 <fixed_percpu_data>) at mm/page_alloc.c:4970
> out>pointer>#20 0xffffffff811ce039 in alloc_pages_current
> out>pointer>(gfp=gfp@entry=4197824, order=order@entry=1) at
> out>pointer>./include/linux/topology.h:88 #21 0xffffffff811b6248 in
> out>pointer>alloc_pages (order=order@entry=1, gfp_mask=4197824) at
> out>pointer>./include/linux/gfp.h:547 #22 __get_free_pages
> out>pointer>(gfp_mask=gfp_mask@entry=4197824, order=order@entry=1) at
> out>pointer>mm/page_alloc.c:4994 #23 0xffffffff8105482c in _pgd_alloc
> out>pointer>() at arch/x86/mm/pgtable.c:430
> #24 pgd_alloc (mm=mm@entry=0xffff88800315e400) at
> arch/x86/mm/pgtable.c:430 #25 0xffffffff8105efae in mm_alloc_pgd
> (mm=0xffff88800315e400) at kernel/fork.c:1054 #26 mm_init
> (mm=mm@entry=0xffff88800315e400, user_ns=<optimized out>,
> p=0xffff888002bbc880) at kernel/fork.c:1054 #27 0xffffffff8105f624 in
> dup_mm (oldmm=0xffff888004efa800, tsk=0xffff888002bbc880) at
> kernel/fork.c:1369 #28 0xffffffff810616a5 in copy_mm
> (tsk=0xffff888002bbc880, clone_flags=0) at
> ./arch/x86/include/asm/current.h:15 #29 copy_process
> (pid=pid@entry=0x0 <fixed_percpu_data>, trace=trace@entry=0,
> node=node@entry=-1, args=args@entry=0xffffc9000306fed0) at
> kernel/fork.c:2110 #30 0xffffffff81061934 in kernel_clone
> (args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2471 #31
> 0xffffffff81061c8f in __do_sys_fork (__unused=<optimized out>) at
> kernel/fork.c:2534 #32 0xffffffff81b41693 in do_syscall_64
> (nr=<optimized out>, regs=0xffffc9000306ff58) at
> arch/x86/entry/common.c:55 #33 0xffffffff81c0007c in entry_SYSCALL_64
> () at arch/x86/entry/entry_64.S:120 #34 0x00000000000526aa in ?? ()
> #35 0x0000564f0b0bc2b0 in ?? () #36 0x0000000000000001 in
> fixed_percpu_data () #37 0x00007ffe0ee549f0 in ?? () #38
> 0x0000564f0b657260 in ?? () #39 0x0000000000000000 in ?? ()
> 
> 
> Kernel-Config: Attached. It's a x86_64 defconfig with the following
> modifications:
>  - CONFIG_XENOMAI disabled
>  - CONFIG_DOVETAIL disabled
>  - CONFIG_MIGRATION disabled
>  - CONFIG_DEBUG_INFO enabled (to be able to debug)
>  - CONFIG_GDB_SCRIPTS enabled (debugging...)
> 
> I disabled Xenomai and Dovetail to limit the search scope. The problem
> remains reproducible without them.
> 
> 
> Workload:
> Stressing the system with stress-ng. After 45 to 60 minutes the system
> is frozen.
> cmdline: stress-ng --cpu 4 --io 2 --vm 2 --vm-bytes 128M --fork 4
> --timeout 0
> 
> 
> IRQ flag:
> All CPUs (or gdb threads) waiting at kernel/smp.c:228 have the IF flag
> (part of eflag register) unset, while other CPUs have it set:
> 
> (gdb) info register
> eflags         0x2                 [ ]
> 
> vs
> 
> eflags         0x202               [ IF ]
> 
> 
> smp_call_function_many_cond() has some notes about deadlocks that
> might appear when being called with IRQs disabled, but I actually
> never saw one of the warnings that should come up. As IF flag is
> unset, someone has to turn off IRQs later (while waiting) and that
> might be the reason for the deadlock.
> 
> Ideas / feedback / advice welcome. Thanks!

Anything in the kernel log? We have this tsc=reliable thing that causes
all sorts of funny issues it seems.

Henning

> Best regards,
> Florian
> 
> [1]
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2FXenomai%2Fxenomai-hacker-space%2F-%2Fissues%2F16&amp;data=04%7C01%7Cde173c00-e982-4fda-8644-47edf4671d63%40ad011.siemens.com%7C47300555f9b6471527d708d90e183dd7%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C637556319536919748%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jzx060h5pSzs15pPpdPtIhPL8zDDRD5NHFH0Oz6BbqQ%3D&amp;reserved=0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: x86 on dovetail: Stress-ng gets my system down
  2021-05-03 10:01 ` Henning Schild
@ 2021-05-03 10:06   ` Bezdeka, Florian
  0 siblings, 0 replies; 6+ messages in thread
From: Bezdeka, Florian @ 2021-05-03 10:06 UTC (permalink / raw)
  To: henning.schild; +Cc: rpm, xenomai, jan.kiszka

On Mon, 2021-05-03 at 12:01 +0200, Henning Schild wrote:
> > Ideas / feedback / advice welcome. Thanks!
> 
> Anything in the kernel log? We have this tsc=reliable thing that causes
> all sorts of funny issues it seems.

No. At least nothing that seems directly related.

I'm actually searching  for the root cause why tsc=reliable is
necessary at all at the moment. As documented in [2] I'm seeing some
rcu stalls which might or might not have the same root cause.

[2] https://gitlab.com/Xenomai/xenomai-hacker-space/-/issues/16


> 
> Henning


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: x86 on dovetail: Stress-ng gets my system down
  2021-05-03  9:45 x86 on dovetail: Stress-ng gets my system down Bezdeka, Florian
  2021-05-03 10:01 ` Henning Schild
@ 2021-05-03 13:09 ` Philippe Gerum
  2021-05-03 13:28   ` Bezdeka, Florian
  1 sibling, 1 reply; 6+ messages in thread
From: Philippe Gerum @ 2021-05-03 13:09 UTC (permalink / raw)
  To: Bezdeka, Florian; +Cc: xenomai, jan.kiszka, henning.schild


Bezdeka, Florian <florian.bezdeka@siemens.com> writes:

> Hi,
>
> while trying to debug one of the Xenomai 3.2 issues listed at [1] I run
> into the situation described below on my x86 system. The problem (or at
> least the "system hang" is reproducible on real hardware and on
> qemu/kvm.
>
> Once the system is frozen, attaching GDB to the qemu process shows me:
>
> (gdb) info threads
>   Id   Target Id                  Frame 
> * 1    Thread 1 (CPU#0 [running]) csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
>   2    Thread 2 (CPU#1 [running]) 0x0000564f0b05d36d in ?? ()
>   3    Thread 3 (CPU#2 [running]) csd_lock_wait (csd=0xffff88803e82f1e0) at kernel/smp.c:228
>   4    Thread 4 (CPU#3 [running]) csd_lock_wait (csd=0xffff88803e82f200) at kernel/smp.c:228
>
> So three of my CPUs are waiting for other CPUs to complete a function
> call IPI. It looks like CPU1 is not responding anymore. The system is
> completely unusable at this point.
>
> (gdb) bt
> #0  csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
> #1  smp_call_function_many_cond (mask=mask@entry=0xffff88800448c340, func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>, wait=wait@entry=true, 
>     cond_func=cond_func@entry=0xffffffff810550b0 <tlb_is_not_lazy>) at kernel/smp.c:693
> #2  0xffffffff810f56f5 in on_each_cpu_cond_mask (cond_func=cond_func@entry=0xffffffff810550b0 <tlb_is_not_lazy>, func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>, wait=wait@entry=true, 
>     mask=mask@entry=0xffff88800448c340) at kernel/smp.c:904
> #3  0xffffffff81055538 in native_flush_tlb_others (cpumask=cpumask@entry=0xffff88800448c340, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>) at arch/x86/mm/tlb.c:840
> #4  0xffffffff81055fac in flush_tlb_others (info=0xffffffff8200acc0 <full_flush_tlb_info>, cpumask=0xffff88800448c340) at arch/x86/mm/tlb.c:1170
> #5  arch_tlbbatch_flush (batch=batch@entry=0xffff88800448c340) at arch/x86/mm/tlb.c:1170
> #6  0xffffffff811ae3e1 in try_to_unmap_flush () at mm/rmap.c:602
> #7  0xffffffff8117d9d3 in shrink_page_list (page_list=page_list@entry=0xffffc9000306f910, pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18, stat=stat@entry=0xffffc9000306f924, ignore_references=ignore_references@entry=false) at mm/vmscan.c:1487
> #8  0xffffffff8117f79c in shrink_inactive_list (nr_to_scan=<optimized out>, lruvec=lruvec@entry=0xffff88803ffde508, sc=sc@entry=0xffffc9000306fb18, lru=lru@entry=LRU_INACTIVE_FILE) at mm/vmscan.c:1962
> #9  0xffffffff811800dc in shrink_list (sc=0xffffc9000306fb18, lruvec=0xffff88803ffde508, nr_to_scan=<optimized out>, lru=<optimized out>) at mm/vmscan.c:2169
> #10 shrink_lruvec (lruvec=lruvec@entry=0xffff88803ffde508, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2464
> #11 0xffffffff81180374 in shrink_node_memcgs (sc=0xffffc9000306fb18, pgdat=0xffff88803ffdb000) at mm/vmscan.c:2652
> #12 shrink_node (pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2769
> #13 0xffffffff811806c8 in shrink_zones (sc=0xffffc9000306fb18, zonelist=0xffff88803ffdc400) at mm/vmscan.c:2972
> #14 do_try_to_free_pages (zonelist=zonelist@entry=0xffff88803ffdc400, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:3027
> #15 0xffffffff811817f6 in try_to_free_pages (zonelist=0xffff88803ffdc400, order=order@entry=1, gfp_mask=gfp_mask@entry=4197824, nodemask=<optimized out>) at mm/vmscan.c:3266
> #16 0xffffffff811ba411 in __perform_reclaim (ac=0xffffc9000306fc90, ac=0xffffc9000306fc90, order=1, gfp_mask=4197824) at mm/page_alloc.c:4335
> #17 __alloc_pages_direct_reclaim (did_some_progress=<synthetic pointer>, ac=0xffffc9000306fc90, alloc_flags=2112, order=1, gfp_mask=4197824) at mm/page_alloc.c:4356
> #18 __alloc_pages_slowpath (gfp_mask=<optimized out>, gfp_mask@entry=4197824, order=order@entry=1, ac=ac@entry=0xffffc9000306fc90) at mm/page_alloc.c:4760
> #19 0xffffffff811baf44 in __alloc_pages_nodemask (gfp_mask=<optimized out>, gfp_mask@entry=4197824, order=order@entry=1, preferred_nid=<optimized out>, nodemask=0x0 <fixed_percpu_data>) at mm/page_alloc.c:4970
> #20 0xffffffff811ce039 in alloc_pages_current (gfp=gfp@entry=4197824, order=order@entry=1) at ./include/linux/topology.h:88
> #21 0xffffffff811b6248 in alloc_pages (order=order@entry=1, gfp_mask=4197824) at ./include/linux/gfp.h:547
> #22 __get_free_pages (gfp_mask=gfp_mask@entry=4197824, order=order@entry=1) at mm/page_alloc.c:4994
> #23 0xffffffff8105482c in _pgd_alloc () at arch/x86/mm/pgtable.c:430
> #24 pgd_alloc (mm=mm@entry=0xffff88800315e400) at arch/x86/mm/pgtable.c:430
> #25 0xffffffff8105efae in mm_alloc_pgd (mm=0xffff88800315e400) at kernel/fork.c:1054
> #26 mm_init (mm=mm@entry=0xffff88800315e400, user_ns=<optimized out>, p=0xffff888002bbc880) at kernel/fork.c:1054
> #27 0xffffffff8105f624 in dup_mm (oldmm=0xffff888004efa800, tsk=0xffff888002bbc880) at kernel/fork.c:1369
> #28 0xffffffff810616a5 in copy_mm (tsk=0xffff888002bbc880, clone_flags=0) at ./arch/x86/include/asm/current.h:15
> #29 copy_process (pid=pid@entry=0x0 <fixed_percpu_data>, trace=trace@entry=0, node=node@entry=-1, args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2110
> #30 0xffffffff81061934 in kernel_clone (args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2471
> #31 0xffffffff81061c8f in __do_sys_fork (__unused=<optimized out>) at kernel/fork.c:2534
> #32 0xffffffff81b41693 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000306ff58) at arch/x86/entry/common.c:55
> #33 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
> #34 0x00000000000526aa in ?? ()
> #35 0x0000564f0b0bc2b0 in ?? ()
> #36 0x0000000000000001 in fixed_percpu_data ()
> #37 0x00007ffe0ee549f0 in ?? ()
> #38 0x0000564f0b657260 in ?? ()
> #39 0x0000000000000000 in ?? ()
>
>
> Kernel-Config: Attached. It's a x86_64 defconfig with the following
> modifications:
>  - CONFIG_XENOMAI disabled
>  - CONFIG_DOVETAIL disabled
>  - CONFIG_MIGRATION disabled
>  - CONFIG_DEBUG_INFO enabled (to be able to debug)
>  - CONFIG_GDB_SCRIPTS enabled (debugging...)
>
> I disabled Xenomai and Dovetail to limit the search scope. The problem
> remains reproducible without them.
>
>
> Workload:
> Stressing the system with stress-ng. After 45 to 60 minutes the system
> is frozen.
> cmdline: stress-ng --cpu 4 --io 2 --vm 2 --vm-bytes 128M --fork 4 --timeout 0
>
>
> IRQ flag:
> All CPUs (or gdb threads) waiting at kernel/smp.c:228 have the IF flag
> (part of eflag register) unset, while other CPUs have it set:
>
> (gdb) info register
> eflags         0x2                 [ ]
>
> vs
>
> eflags         0x202               [ IF ]
>
>
> smp_call_function_many_cond() has some notes about deadlocks that might
> appear when being called with IRQs disabled, but I actually never saw
> one of the warnings that should come up. As IF flag is unset, someone
> has to turn off IRQs later (while waiting) and that might be the reason
> for the deadlock.

You should be able to prove it that way:

diff --git a/kernel/smp.c b/kernel/smp.c
index 25240fb2df94922..35c439d1011b8fc 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -211,6 +211,8 @@ static __always_inline void csd_lock_wait(call_single_data_t *csd)
 
 	ts1 = ts0 = sched_clock();
 	for (;;) {
+		WARN_ON_ONCE(hard_irqs_disabled());
+		WARN_ON_ONCE(irqs_disabled());
 		if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id))
 			break;
 		cpu_relax();

It did happen that an issue in trap/irq mgmt caused the (virtual)
interrupt state to become suddenly broken inside a loop. That snippet
would detect it, without pinpointing the offender precisely though.

A couple of questions:

- how long does it take to lockup a KVM instance with stress-ng that
  way?

- are you running this kernel?
https://git.evlproject.org/linux-evl.git/log/?h=dovetail/v5.10

Thanks,

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: x86 on dovetail: Stress-ng gets my system down
  2021-05-03 13:09 ` Philippe Gerum
@ 2021-05-03 13:28   ` Bezdeka, Florian
  2021-05-03 14:17     ` Bezdeka, Florian
  0 siblings, 1 reply; 6+ messages in thread
From: Bezdeka, Florian @ 2021-05-03 13:28 UTC (permalink / raw)
  To: rpm; +Cc: xenomai, jan.kiszka, henning.schild

On Mon, 2021-05-03 at 15:09 +0200, Philippe Gerum wrote:
> Bezdeka, Florian <florian.bezdeka@siemens.com> writes:
> 
> > Hi,
> > 
> > while trying to debug one of the Xenomai 3.2 issues listed at [1] I run
> > into the situation described below on my x86 system. The problem (or at
> > least the "system hang" is reproducible on real hardware and on
> > qemu/kvm.
> > 
> > Once the system is frozen, attaching GDB to the qemu process shows me:
> > 
> > (gdb) info threads
> >   Id   Target Id                  Frame 
> > * 1    Thread 1 (CPU#0 [running]) csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
> >   2    Thread 2 (CPU#1 [running]) 0x0000564f0b05d36d in ?? ()
> >   3    Thread 3 (CPU#2 [running]) csd_lock_wait (csd=0xffff88803e82f1e0) at kernel/smp.c:228
> >   4    Thread 4 (CPU#3 [running]) csd_lock_wait (csd=0xffff88803e82f200) at kernel/smp.c:228
> > 
> > So three of my CPUs are waiting for other CPUs to complete a function
> > call IPI. It looks like CPU1 is not responding anymore. The system is
> > completely unusable at this point.
> > 
> > (gdb) bt
> > #0  csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
> > #1  smp_call_function_many_cond (mask=mask@entry=0xffff88800448c340, func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>, wait=wait@entry=true, 
> >     cond_func=cond_func@entry=0xffffffff810550b0 <tlb_is_not_lazy>) at kernel/smp.c:693
> > #2  0xffffffff810f56f5 in on_each_cpu_cond_mask (cond_func=cond_func@entry=0xffffffff810550b0 <tlb_is_not_lazy>, func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>, wait=wait@entry=true, 
> >     mask=mask@entry=0xffff88800448c340) at kernel/smp.c:904
> > #3  0xffffffff81055538 in native_flush_tlb_others (cpumask=cpumask@entry=0xffff88800448c340, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>) at arch/x86/mm/tlb.c:840
> > #4  0xffffffff81055fac in flush_tlb_others (info=0xffffffff8200acc0 <full_flush_tlb_info>, cpumask=0xffff88800448c340) at arch/x86/mm/tlb.c:1170
> > #5  arch_tlbbatch_flush (batch=batch@entry=0xffff88800448c340) at arch/x86/mm/tlb.c:1170
> > #6  0xffffffff811ae3e1 in try_to_unmap_flush () at mm/rmap.c:602
> > #7  0xffffffff8117d9d3 in shrink_page_list (page_list=page_list@entry=0xffffc9000306f910, pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18, stat=stat@entry=0xffffc9000306f924, ignore_references=ignore_references@entry=false) at mm/vmscan.c:1487
> > #8  0xffffffff8117f79c in shrink_inactive_list (nr_to_scan=<optimized out>, lruvec=lruvec@entry=0xffff88803ffde508, sc=sc@entry=0xffffc9000306fb18, lru=lru@entry=LRU_INACTIVE_FILE) at mm/vmscan.c:1962
> > #9  0xffffffff811800dc in shrink_list (sc=0xffffc9000306fb18, lruvec=0xffff88803ffde508, nr_to_scan=<optimized out>, lru=<optimized out>) at mm/vmscan.c:2169
> > #10 shrink_lruvec (lruvec=lruvec@entry=0xffff88803ffde508, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2464
> > #11 0xffffffff81180374 in shrink_node_memcgs (sc=0xffffc9000306fb18, pgdat=0xffff88803ffdb000) at mm/vmscan.c:2652
> > #12 shrink_node (pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2769
> > #13 0xffffffff811806c8 in shrink_zones (sc=0xffffc9000306fb18, zonelist=0xffff88803ffdc400) at mm/vmscan.c:2972
> > #14 do_try_to_free_pages (zonelist=zonelist@entry=0xffff88803ffdc400, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:3027
> > #15 0xffffffff811817f6 in try_to_free_pages (zonelist=0xffff88803ffdc400, order=order@entry=1, gfp_mask=gfp_mask@entry=4197824, nodemask=<optimized out>) at mm/vmscan.c:3266
> > #16 0xffffffff811ba411 in __perform_reclaim (ac=0xffffc9000306fc90, ac=0xffffc9000306fc90, order=1, gfp_mask=4197824) at mm/page_alloc.c:4335
> > #17 __alloc_pages_direct_reclaim (did_some_progress=<synthetic pointer>, ac=0xffffc9000306fc90, alloc_flags=2112, order=1, gfp_mask=4197824) at mm/page_alloc.c:4356
> > #18 __alloc_pages_slowpath (gfp_mask=<optimized out>, gfp_mask@entry=4197824, order=order@entry=1, ac=ac@entry=0xffffc9000306fc90) at mm/page_alloc.c:4760
> > #19 0xffffffff811baf44 in __alloc_pages_nodemask (gfp_mask=<optimized out>, gfp_mask@entry=4197824, order=order@entry=1, preferred_nid=<optimized out>, nodemask=0x0 <fixed_percpu_data>) at mm/page_alloc.c:4970
> > #20 0xffffffff811ce039 in alloc_pages_current (gfp=gfp@entry=4197824, order=order@entry=1) at ./include/linux/topology.h:88
> > #21 0xffffffff811b6248 in alloc_pages (order=order@entry=1, gfp_mask=4197824) at ./include/linux/gfp.h:547
> > #22 __get_free_pages (gfp_mask=gfp_mask@entry=4197824, order=order@entry=1) at mm/page_alloc.c:4994
> > #23 0xffffffff8105482c in _pgd_alloc () at arch/x86/mm/pgtable.c:430
> > #24 pgd_alloc (mm=mm@entry=0xffff88800315e400) at arch/x86/mm/pgtable.c:430
> > #25 0xffffffff8105efae in mm_alloc_pgd (mm=0xffff88800315e400) at kernel/fork.c:1054
> > #26 mm_init (mm=mm@entry=0xffff88800315e400, user_ns=<optimized out>, p=0xffff888002bbc880) at kernel/fork.c:1054
> > #27 0xffffffff8105f624 in dup_mm (oldmm=0xffff888004efa800, tsk=0xffff888002bbc880) at kernel/fork.c:1369
> > #28 0xffffffff810616a5 in copy_mm (tsk=0xffff888002bbc880, clone_flags=0) at ./arch/x86/include/asm/current.h:15
> > #29 copy_process (pid=pid@entry=0x0 <fixed_percpu_data>, trace=trace@entry=0, node=node@entry=-1, args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2110
> > #30 0xffffffff81061934 in kernel_clone (args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2471
> > #31 0xffffffff81061c8f in __do_sys_fork (__unused=<optimized out>) at kernel/fork.c:2534
> > #32 0xffffffff81b41693 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000306ff58) at arch/x86/entry/common.c:55
> > #33 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
> > #34 0x00000000000526aa in ?? ()
> > #35 0x0000564f0b0bc2b0 in ?? ()
> > #36 0x0000000000000001 in fixed_percpu_data ()
> > #37 0x00007ffe0ee549f0 in ?? ()
> > #38 0x0000564f0b657260 in ?? ()
> > #39 0x0000000000000000 in ?? ()
> > 
> > 
> > Kernel-Config: Attached. It's a x86_64 defconfig with the following
> > modifications:
> >  - CONFIG_XENOMAI disabled
> >  - CONFIG_DOVETAIL disabled
> >  - CONFIG_MIGRATION disabled
> >  - CONFIG_DEBUG_INFO enabled (to be able to debug)
> >  - CONFIG_GDB_SCRIPTS enabled (debugging...)
> > 
> > I disabled Xenomai and Dovetail to limit the search scope. The problem
> > remains reproducible without them.
> > 
> > 
> > Workload:
> > Stressing the system with stress-ng. After 45 to 60 minutes the system
> > is frozen.
> > cmdline: stress-ng --cpu 4 --io 2 --vm 2 --vm-bytes 128M --fork 4 --timeout 0
> > 
> > 
> > IRQ flag:
> > All CPUs (or gdb threads) waiting at kernel/smp.c:228 have the IF flag
> > (part of eflag register) unset, while other CPUs have it set:
> > 
> > (gdb) info register
> > eflags         0x2                 [ ]
> > 
> > vs
> > 
> > eflags         0x202               [ IF ]
> > 
> > 
> > smp_call_function_many_cond() has some notes about deadlocks that might
> > appear when being called with IRQs disabled, but I actually never saw
> > one of the warnings that should come up. As IF flag is unset, someone
> > has to turn off IRQs later (while waiting) and that might be the reason
> > for the deadlock.
> 
> You should be able to prove it that way:
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 25240fb2df94922..35c439d1011b8fc 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -211,6 +211,8 @@ static __always_inline void csd_lock_wait(call_single_data_t *csd)
>   	ts1 = ts0 = sched_clock();
> 
>  	for (;;) {
> +		WARN_ON_ONCE(hard_irqs_disabled());
> +		WARN_ON_ONCE(irqs_disabled());
>  		if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id))
>  			break;
>  		cpu_relax();
> 
> It did happen that an issue in trap/irq mgmt caused the (virtual)
> interrupt state to become suddenly broken inside a loop. That snippet
> would detect it, without pinpointing the offender precisely though.

I will give it a try and report back.

> 
> A couple of questions:
> 
> - how long does it take to lockup a KVM instance with stress-ng that
>   way?

In my case about 45 minutes.

> 
> - are you running this kernel?
> https://git.evlproject.org/linux-evl.git/log/?h=dovetail/v5.10

Yes. Sorry for not mentioning that.

> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: x86 on dovetail: Stress-ng gets my system down
  2021-05-03 13:28   ` Bezdeka, Florian
@ 2021-05-03 14:17     ` Bezdeka, Florian
  0 siblings, 0 replies; 6+ messages in thread
From: Bezdeka, Florian @ 2021-05-03 14:17 UTC (permalink / raw)
  To: rpm; +Cc: xenomai

On Mon, 2021-05-03 at 13:28 +0000, Bezdeka, Florian via Xenomai wrote:
> On Mon, 2021-05-03 at 15:09 +0200, Philippe Gerum wrote:
> > Bezdeka, Florian <florian.bezdeka@siemens.com> writes:
> > 
> > > Hi,
> > > 
> > > while trying to debug one of the Xenomai 3.2 issues listed at [1] I run
> > > into the situation described below on my x86 system. The problem (or at
> > > least the "system hang" is reproducible on real hardware and on
> > > qemu/kvm.
> > > 
> > > Once the system is frozen, attaching GDB to the qemu process shows me:
> > > 
> > > (gdb) info threads
> > >   Id   Target Id                  Frame 
> > > * 1    Thread 1 (CPU#0 [running]) csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
> > >   2    Thread 2 (CPU#1 [running]) 0x0000564f0b05d36d in ?? ()
> > >   3    Thread 3 (CPU#2 [running]) csd_lock_wait (csd=0xffff88803e82f1e0) at kernel/smp.c:228
> > >   4    Thread 4 (CPU#3 [running]) csd_lock_wait (csd=0xffff88803e82f200) at kernel/smp.c:228
> > > 
> > > So three of my CPUs are waiting for other CPUs to complete a function
> > > call IPI. It looks like CPU1 is not responding anymore. The system is
> > > completely unusable at this point.
> > > 
> > > (gdb) bt
> > > #0  csd_lock_wait (csd=0xffff88803e92ea00) at kernel/smp.c:228
> > > #1  smp_call_function_many_cond (mask=mask@entry=0xffff88800448c340, func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>, wait=wait@entry=true, 
> > >     cond_func=cond_func@entry=0xffffffff810550b0 <tlb_is_not_lazy>) at kernel/smp.c:693
> > > #2  0xffffffff810f56f5 in on_each_cpu_cond_mask (cond_func=cond_func@entry=0xffffffff810550b0 <tlb_is_not_lazy>, func=func@entry=0xffffffff81055bb0 <flush_tlb_func_remote>, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>, wait=wait@entry=true, 
> > >     mask=mask@entry=0xffff88800448c340) at kernel/smp.c:904
> > > #3  0xffffffff81055538 in native_flush_tlb_others (cpumask=cpumask@entry=0xffff88800448c340, info=info@entry=0xffffffff8200acc0 <full_flush_tlb_info>) at arch/x86/mm/tlb.c:840
> > > #4  0xffffffff81055fac in flush_tlb_others (info=0xffffffff8200acc0 <full_flush_tlb_info>, cpumask=0xffff88800448c340) at arch/x86/mm/tlb.c:1170
> > > #5  arch_tlbbatch_flush (batch=batch@entry=0xffff88800448c340) at arch/x86/mm/tlb.c:1170
> > > #6  0xffffffff811ae3e1 in try_to_unmap_flush () at mm/rmap.c:602
> > > #7  0xffffffff8117d9d3 in shrink_page_list (page_list=page_list@entry=0xffffc9000306f910, pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18, stat=stat@entry=0xffffc9000306f924, ignore_references=ignore_references@entry=false) at mm/vmscan.c:1487
> > > #8  0xffffffff8117f79c in shrink_inactive_list (nr_to_scan=<optimized out>, lruvec=lruvec@entry=0xffff88803ffde508, sc=sc@entry=0xffffc9000306fb18, lru=lru@entry=LRU_INACTIVE_FILE) at mm/vmscan.c:1962
> > > #9  0xffffffff811800dc in shrink_list (sc=0xffffc9000306fb18, lruvec=0xffff88803ffde508, nr_to_scan=<optimized out>, lru=<optimized out>) at mm/vmscan.c:2169
> > > #10 shrink_lruvec (lruvec=lruvec@entry=0xffff88803ffde508, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2464
> > > #11 0xffffffff81180374 in shrink_node_memcgs (sc=0xffffc9000306fb18, pgdat=0xffff88803ffdb000) at mm/vmscan.c:2652
> > > #12 shrink_node (pgdat=pgdat@entry=0xffff88803ffdb000, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:2769
> > > #13 0xffffffff811806c8 in shrink_zones (sc=0xffffc9000306fb18, zonelist=0xffff88803ffdc400) at mm/vmscan.c:2972
> > > #14 do_try_to_free_pages (zonelist=zonelist@entry=0xffff88803ffdc400, sc=sc@entry=0xffffc9000306fb18) at mm/vmscan.c:3027
> > > #15 0xffffffff811817f6 in try_to_free_pages (zonelist=0xffff88803ffdc400, order=order@entry=1, gfp_mask=gfp_mask@entry=4197824, nodemask=<optimized out>) at mm/vmscan.c:3266
> > > #16 0xffffffff811ba411 in __perform_reclaim (ac=0xffffc9000306fc90, ac=0xffffc9000306fc90, order=1, gfp_mask=4197824) at mm/page_alloc.c:4335
> > > #17 __alloc_pages_direct_reclaim (did_some_progress=<synthetic pointer>, ac=0xffffc9000306fc90, alloc_flags=2112, order=1, gfp_mask=4197824) at mm/page_alloc.c:4356
> > > #18 __alloc_pages_slowpath (gfp_mask=<optimized out>, gfp_mask@entry=4197824, order=order@entry=1, ac=ac@entry=0xffffc9000306fc90) at mm/page_alloc.c:4760
> > > #19 0xffffffff811baf44 in __alloc_pages_nodemask (gfp_mask=<optimized out>, gfp_mask@entry=4197824, order=order@entry=1, preferred_nid=<optimized out>, nodemask=0x0 <fixed_percpu_data>) at mm/page_alloc.c:4970
> > > #20 0xffffffff811ce039 in alloc_pages_current (gfp=gfp@entry=4197824, order=order@entry=1) at ./include/linux/topology.h:88
> > > #21 0xffffffff811b6248 in alloc_pages (order=order@entry=1, gfp_mask=4197824) at ./include/linux/gfp.h:547
> > > #22 __get_free_pages (gfp_mask=gfp_mask@entry=4197824, order=order@entry=1) at mm/page_alloc.c:4994
> > > #23 0xffffffff8105482c in _pgd_alloc () at arch/x86/mm/pgtable.c:430
> > > #24 pgd_alloc (mm=mm@entry=0xffff88800315e400) at arch/x86/mm/pgtable.c:430
> > > #25 0xffffffff8105efae in mm_alloc_pgd (mm=0xffff88800315e400) at kernel/fork.c:1054
> > > #26 mm_init (mm=mm@entry=0xffff88800315e400, user_ns=<optimized out>, p=0xffff888002bbc880) at kernel/fork.c:1054
> > > #27 0xffffffff8105f624 in dup_mm (oldmm=0xffff888004efa800, tsk=0xffff888002bbc880) at kernel/fork.c:1369
> > > #28 0xffffffff810616a5 in copy_mm (tsk=0xffff888002bbc880, clone_flags=0) at ./arch/x86/include/asm/current.h:15
> > > #29 copy_process (pid=pid@entry=0x0 <fixed_percpu_data>, trace=trace@entry=0, node=node@entry=-1, args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2110
> > > #30 0xffffffff81061934 in kernel_clone (args=args@entry=0xffffc9000306fed0) at kernel/fork.c:2471
> > > #31 0xffffffff81061c8f in __do_sys_fork (__unused=<optimized out>) at kernel/fork.c:2534
> > > #32 0xffffffff81b41693 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000306ff58) at arch/x86/entry/common.c:55
> > > #33 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
> > > #34 0x00000000000526aa in ?? ()
> > > #35 0x0000564f0b0bc2b0 in ?? ()
> > > #36 0x0000000000000001 in fixed_percpu_data ()
> > > #37 0x00007ffe0ee549f0 in ?? ()
> > > #38 0x0000564f0b657260 in ?? ()
> > > #39 0x0000000000000000 in ?? ()
> > > 
> > > 
> > > Kernel-Config: Attached. It's a x86_64 defconfig with the following
> > > modifications:
> > >  - CONFIG_XENOMAI disabled
> > >  - CONFIG_DOVETAIL disabled
> > >  - CONFIG_MIGRATION disabled
> > >  - CONFIG_DEBUG_INFO enabled (to be able to debug)
> > >  - CONFIG_GDB_SCRIPTS enabled (debugging...)
> > > 
> > > I disabled Xenomai and Dovetail to limit the search scope. The problem
> > > remains reproducible without them.
> > > 
> > > 
> > > Workload:
> > > Stressing the system with stress-ng. After 45 to 60 minutes the system
> > > is frozen.
> > > cmdline: stress-ng --cpu 4 --io 2 --vm 2 --vm-bytes 128M --fork 4 --timeout 0
> > > 
> > > 
> > > IRQ flag:
> > > All CPUs (or gdb threads) waiting at kernel/smp.c:228 have the IF flag
> > > (part of eflag register) unset, while other CPUs have it set:
> > > 
> > > (gdb) info register
> > > eflags         0x2                 [ ]
> > > 
> > > vs
> > > 
> > > eflags         0x202               [ IF ]
> > > 
> > > 
> > > smp_call_function_many_cond() has some notes about deadlocks that might
> > > appear when being called with IRQs disabled, but I actually never saw
> > > one of the warnings that should come up. As IF flag is unset, someone
> > > has to turn off IRQs later (while waiting) and that might be the reason
> > > for the deadlock.
> > 
> > You should be able to prove it that way:
> > 
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index 25240fb2df94922..35c439d1011b8fc 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -211,6 +211,8 @@ static __always_inline void csd_lock_wait(call_single_data_t *csd)
> >   	ts1 = ts0 = sched_clock();
> > 
> >  	for (;;) {
> > +		WARN_ON_ONCE(hard_irqs_disabled());
> > +		WARN_ON_ONCE(irqs_disabled());
> >  		if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id))
> >  			break;
> >  		cpu_relax();
> > 
> > It did happen that an issue in trap/irq mgmt caused the (virtual)
> > interrupt state to become suddenly broken inside a loop. That snippet
> > would detect it, without pinpointing the offender precisely though.
> 
> I will give it a try and report back.

I enabled CONFIG_CSD_LOCK_WAIT_DEBUG (so that the snipped above could
work) and got the following:

[  826.155857] smp: csd: Detected non-responsive CSD lock (#1) on CPU#2, waiting 5000000002 ns for CPU#01 do_flush_tlb_all+0x0/0x50(0x0).
[  826.159581] smp: 	csd: CSD lock (#1) unresponsive.
[  826.161228] Sending NMI from CPU 2 to CPUs 1:
[  826.163775] NMI backtrace for cpu 1
[  826.163776] CPU: 1 PID: 257 Comm: stress-ng Not tainted 5.10.32+ #5
[  826.163778] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
[  826.163778] IRQ stage: Linux
[  826.163779] RIP: 0033:0x564a6ca0b8a2
[  826.163781] Code: 00 f2 0f 10 25 7f 94 02 00 31 db 66 0f 2f c4 0f 83 75 02 00 00 f2 48 0f 2c c8 49 89 cc 89 c8 49 c1 ec 20 4c 89 e5
 4d 0f af e6 <48> 0f af ee 48 0f af f0 49 0f af c6 41 89 ed 48 c1 ed 20 49 89 f7
[  826.163782] RSP: 002b:00007ffc28f70be8 EFLAGS: 00000206
[  826.163784] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
[  826.163785] RDX: 0000a8b638ad2d09 RSI: 00000000681be04e RDI: 00007ffc28f70ca0
[  826.163786] RBP: 0000000000000000 R08: ddae387c681be04e R09: 00007fc3286f426a
[  826.163787] R10: 00007ffc28f70cb0 R11: d5f691bd23e1e33f R12: 0000000000000000
[  826.163788] R13: 0000000000000000 R14: 00000000ddae387c R15: 00007ffc28f70ca0
[  826.163788] FS:  00007f2dfcc68c00 GS:  0000000000000000
[  826.163792] smp: csd: Re-sending CSD lock (#1) IPI from CPU#02 to CPU#01
[  826.171423] systemd-journald[173]: Sent WATCHDOG=1 notification.
[  826.172400] CPU: 2 PID: 43 Comm: kworker/2:1 Not tainted 5.10.32+ #5
[  826.202894] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
[  826.205640] IRQ stage: Linux
[  826.206758] Workqueue: events free_work
[  826.208155] Call Trace:
[  826.209287]  dump_stack+0x7b/0x93
[  826.210940]  smp_call_function_many_cond.cold+0x204/0x255
[  826.212982]  ? generic_pipeline_irq+0xfd/0x1c0
[  826.214793]  ? __flush_tlb_all+0x50/0x50
[  826.216592]  on_each_cpu+0x26/0x50
[  826.218534]  __purge_vmap_area_lazy+0x58/0x650
[  826.220411]  free_vmap_area_noflush+0xce/0xe0
[  826.222339]  remove_vm_area+0x8a/0xa0
[  826.224076]  __vunmap+0x95/0x290
[  826.225475]  free_work+0x1c/0x30
[  826.226584]  process_one_work+0x1d3/0x370
[  826.227906]  worker_thread+0x48/0x3d0
[  826.229092]  ? rescuer_thread+0x3b0/0x3b0
[  826.230605]  kthread+0x116/0x130
[  826.231867]  ? kthread_create_worker_on_cpu+0x60/0x60
[  826.233723]  ret_from_fork+0x1f/0x30
[  826.235264] smp: csd: CSD lock (#1) got unstuck on CPU#02, CPU#01 released the lock.


> 
> > 
> > A couple of questions:
> > 
> > - how long does it take to lockup a KVM instance with stress-ng that
> >   way?
> 
> In my case about 45 minutes.
> 
> > 
> > - are you running this kernel?
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.evlproject.org%2Flinux-evl.git%2Flog%2F%3Fh%3Ddovetail%2Fv5.10&amp;data=04%7C01%7Cflorian.bezdeka%40siemens.com%7C007a6006a44d49c718b308d90e3769fe%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C637556453430272787%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bxE4Ey5Ev21%2BS9VoYx1v5QbSW5XMcda8Q%2FdJp%2Fmx0Hs%3D&amp;reserved=0
> 
> Yes. Sorry for not mentioning that.
> 
> > 
> > Thanks,
> > 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-03 14:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-03  9:45 x86 on dovetail: Stress-ng gets my system down Bezdeka, Florian
2021-05-03 10:01 ` Henning Schild
2021-05-03 10:06   ` Bezdeka, Florian
2021-05-03 13:09 ` Philippe Gerum
2021-05-03 13:28   ` Bezdeka, Florian
2021-05-03 14:17     ` Bezdeka, Florian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.