* [LTP] LTP: futex_wake04 hangs forever on i386 @ 2018-10-08 13:21 Naresh Kamboju 2018-10-08 13:30 ` Cyril Hrubis 0 siblings, 1 reply; 20+ messages in thread From: Naresh Kamboju @ 2018-10-08 13:21 UTC (permalink / raw) To: ltp LTP syscall test case futex_wake04 hangs forever on i386 kernel running on x86_64 server machine. Test PASS on qemu_i386 and other devices. Test HANGs on i386 (i386 kernel booting on x86_64 machine). output, incrementing stop futex_wake04 0 TINFO : Hugepagesize 4194304 LAVA job id, https://lkft.validation.linaro.org/scheduler/job/399913 strace output of this test case, https://lkft.validation.linaro.org/scheduler/job/405706 Best regards Naresh Kamboju ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-08 13:21 [LTP] LTP: futex_wake04 hangs forever on i386 Naresh Kamboju @ 2018-10-08 13:30 ` Cyril Hrubis 2018-10-09 2:05 ` Rafael David Tinoco 0 siblings, 1 reply; 20+ messages in thread From: Cyril Hrubis @ 2018-10-08 13:30 UTC (permalink / raw) To: ltp Hi! > LTP syscall test case futex_wake04 hangs forever on i386 kernel > running on x86_64 server machine. > > Test PASS on qemu_i386 and other devices. > Test HANGs on i386 (i386 kernel booting on x86_64 machine). > > output, > incrementing stop > futex_wake04 0 TINFO : Hugepagesize 4194304 Looks like we are trying to fault 4GB hugepage, that may be quite slow, but then, the test is a kernel reproducer as well. Can you connect to the process with gdb and check if we happened to timeout before or after the two threads that are are supposed to trigger kernel bug are started? -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-08 13:30 ` Cyril Hrubis @ 2018-10-09 2:05 ` Rafael David Tinoco 2018-10-09 9:28 ` Cyril Hrubis 0 siblings, 1 reply; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-09 2:05 UTC (permalink / raw) To: ltp On 10/8/18 10:30 AM, Cyril Hrubis wrote: > Hi! >> LTP syscall test case futex_wake04 hangs forever on i386 kernel >> running on x86_64 server machine. >> >> Test PASS on qemu_i386 and other devices. Test HANGs on i386 (i386 >> kernel booting on x86_64 machine). >> >> output, incrementing stop futex_wake04 0 TINFO : Hugepagesize >> 4194304 > > Looks like we are trying to fault 4GB hugepage, that may be quite > slow, but then, the test is a kernel reproducer as well. This is i386 without PAE so the hugepage here is 4 MB (or else it would be a 2 MB with PAE). I have configured the same environment on my side: (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-max 4M:128 (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-min 4M:128 (k)inaddy@bug3984:~$ sudo hugeadm --pool-list Size Minimum Current Maximum Default 4194304 128 128 128 * And ran the test with 500 iterations, multiple times, without a single issue. (k)root@bug3984:~$ while true; do cat /proc/meminfo | grep -i hugepages_free; done Shoes me the hugepages are being used: HugePages_Free: 93 HugePages_Free: 4 HugePages_Free: 128 HugePages_Free: 0 HugePages_Free: 128 HugePages_Free: 1 HugePages_Free: 8 HugePages_Free: 128 HugePages_Free: 27 HugePages_Free: 0 HugePages_Free: 102 > Can you connect to the process with gdb and check if we happened to > timeout before or after the two threads that are are supposed to > trigger kernel bug are started? > Reading the test and commit 13d60f4b6a it looks like this test is testing futexes indexes stored in hugepages (one in head and the other outside the hugepage head) and a indexing problem on which futex() was signalized as non-contended. Kernel code was using page->index (of page hosting the futex) for futex offset and, after this commit, it started taking in consideration if the page was a hugepage (head or tail) or a compound page, when then it would use the compound_idx for the futex offset, fixing the issue. In order for the issue to happen we would have to face: test thread #1 test thread #2 -------------- -------------- [hugepage #1 head] *futex1 offset [hugepage #1 head] [page] [page] *futex2 offset [page] [page] [page] [page] ... ... where futex2 key points to futex1. And test case does NOT look like something intermittent (since it is a lock ordering issue - depending on pthreads sharing futexes - with wrong futexes indexes, like described in kernel commit). * Test case problem would either happen or not * I tried to extend a bit number of iterations and something caught my attention... when running the same test, multiple times, after some time I would face something like: futex_wake04 0 TINFO : Thread 7495 not sleeping yet futex_wake04 0 TINFO : Thread 7495 not sleeping yet futex_wake04 0 TINFO : Thread 7495 not sleeping yet futex_wake04 0 TINFO : Thread 7495 not sleeping yet futex_wake04 0 TINFO : Thread 7495 not sleeping yet futex_wake04 0 TINFO : Thread 7495 not sleeping yet futex_wake04 0 TINFO : Thread 7495 not sleeping yet futex_wake04 0 TINFO : Thread 7495 not sleeping yet futex_wake04 0 TINFO : Thread 7495 not sleeping yet And there wasn't a second thread task id inside: /proc/<pid>/task/{<pid>, <tid1>, <tid2>} like it should. Likely the task creation (because of error ? check last part) logic here was faster than wait_for_threads() logic could expect... causing a race for the test's logic. * This would explain the timeout error we had, right ? * Since stdout/stderr are being directed to a file: @Naresh, is it possible to get output from stdout/err files from the same LKFT URL you showed ? * Wait, there is more ... * While running the test, and monitoring meminfo, I could see the HugePages_Total oscillating: HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Total: 54 HugePages_Free: 53 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Total: 1 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Total: 101 HugePages_Free: 101 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Total: 128 HugePages_Free: 128 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Total: 1 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 despite the fact that I had configured: (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-min 4M:128 (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-max 4M:128 And had "128" for HugePages_Total in the beginning of the test. Number of hugepages total oscillates during a successful test and, at the very end, the number for HugePages_Total goes back to 128. I have disabled THP completely after this test and there was the same result. Reading hugetlb_report_meminfo() in hugetlb.c, I am not sure (struct hstat *h)->nr_huge_pages should change that much (through new hugepages being asked for, sysctl, or numa balance, among other options). I'll try to trace this tomorrow to see who is touching the HugePages_Total during the test. * Theory * HugePages_Total flacks and causes 1 pthread to have errors before the test readdir() logic on /proc/<pid>/task/X catches if thread went to sleep, causing a race in the readdir() logic. * Obs * I still need to finish this last part investigation. I haven't tried with other architectures yet, only x86 on non-pae kernel. I have tested a 4.18 kernel (debian) and latest mainline and both can reproduce this. Will get back to this soon... ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-09 2:05 ` Rafael David Tinoco @ 2018-10-09 9:28 ` Cyril Hrubis 2018-10-09 11:45 ` Rafael David Tinoco 2018-10-09 18:49 ` Rafael David Tinoco 0 siblings, 2 replies; 20+ messages in thread From: Cyril Hrubis @ 2018-10-09 9:28 UTC (permalink / raw) To: ltp Hi! > >> LTP syscall test case futex_wake04 hangs forever on i386 kernel > >> running on x86_64 server machine. > >> > >> Test PASS on qemu_i386 and other devices. Test HANGs on i386 (i386 > >> kernel booting on x86_64 machine). > >> > >> output, incrementing stop futex_wake04 0 TINFO : Hugepagesize > >> 4194304 > > > > Looks like we are trying to fault 4GB hugepage, that may be quite > > slow, but then, the test is a kernel reproducer as well. > > This is i386 without PAE so the hugepage here is 4 MB (or else it would > be a 2 MB with PAE). I have configured the same environment on my side: Ah, my bad, the test prints the size in bytes and not in kbytes as it's in /proc/meminfo. So indeed this is 4MB. > (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-max 4M:128 > (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-min 4M:128 > (k)inaddy@bug3984:~$ sudo hugeadm --pool-list > Size Minimum Current Maximum Default > 4194304 128 128 128 * > > And ran the test with 500 iterations, multiple times, without a single > issue. > > (k)root@bug3984:~$ while true; do cat /proc/meminfo | grep -i > hugepages_free; done > > Shoes me the hugepages are being used: > > HugePages_Free: 93 > HugePages_Free: 4 > HugePages_Free: 128 > HugePages_Free: 0 > HugePages_Free: 128 > HugePages_Free: 1 > HugePages_Free: 8 > HugePages_Free: 128 > HugePages_Free: 27 > HugePages_Free: 0 > HugePages_Free: 102 > > > Can you connect to the process with gdb and check if we happened to > > timeout before or after the two threads that are are supposed to > > trigger kernel bug are started? > > > > Reading the test and commit 13d60f4b6a it looks like this test is > testing futexes indexes stored in hugepages (one in head and the other > outside the hugepage head) and a indexing problem on which futex() was > signalized as non-contended. > > Kernel code was using page->index (of page hosting the futex) for futex > offset and, after this commit, it started taking in consideration if the > page was a hugepage (head or tail) or a compound page, when then it > would use the compound_idx for the futex offset, fixing the issue. > > In order for the issue to happen we would have to face: > > test thread #1 test thread #2 > -------------- -------------- > [hugepage #1 head] *futex1 offset [hugepage #1 head] > [page] [page] *futex2 offset > [page] [page] > [page] [page] > ... ... > > where futex2 key points to futex1. > > And test case does NOT look like something intermittent (since it is a > lock ordering issue - depending on pthreads sharing futexes - with wrong > futexes indexes, like described in kernel commit). > > * Test case problem would either happen or not * > > I tried to extend a bit number of iterations and something caught my > attention... when running the same test, multiple times, after some time > I would face something like: > > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > futex_wake04 0 TINFO : Thread 7495 not sleeping yet > > And there wasn't a second thread task id inside: > > /proc/<pid>/task/{<pid>, <tid1>, <tid2>} > > like it should. > > Likely the task creation (because of error ? check last part) logic here > was faster than wait_for_threads() logic could expect... causing a race > for the test's logic. That shouldn't really happen, the wait_for_threads(2) function is called in a loop until there are two sleeping threads under the /proc/$pid/task/ directory (minus the main thread). So unless there is a problem with starting a thread or locking on a futex the loop will end eventually and the futexes are unlocked only after this loop finishes. > * This would explain the timeout error we had, right ? * > > Since stdout/stderr are being directed to a file: @Naresh, is it > possible to get output from stdout/err files from the same LKFT URL you > showed ? > > * Wait, there is more ... * > > While running the test, and monitoring meminfo, I could see the > HugePages_Total oscillating: > > HugePages_Total: 1 > HugePages_Free: 1 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > HugePages_Total: 1 > HugePages_Free: 1 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > HugePages_Total: 54 > HugePages_Free: 53 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > HugePages_Total: 1 > HugePages_Free: 1 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > HugePages_Total: 1 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > HugePages_Total: 101 > HugePages_Free: 101 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > HugePages_Total: 128 > HugePages_Free: 128 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > HugePages_Total: 1 > HugePages_Free: 1 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > HugePages_Total: 1 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > > despite the fact that I had configured: > > (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-min 4M:128 > (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-max 4M:128 > > And had "128" for HugePages_Total in the beginning of the test. Number > of hugepages total oscillates during a successful test and, at the very > end, the number for HugePages_Total goes back to 128. I have disabled > THP completely after this test and there was the same result. > > Reading hugetlb_report_meminfo() in hugetlb.c, I am not sure (struct > hstat *h)->nr_huge_pages should change that much (through new hugepages > being asked for, sysctl, or numa balance, among other options). I'll try > to trace this tomorrow to see who is touching the HugePages_Total during > the test. That is because the test modifies the size of the huge pages pool itself in the test setup so that it can run on a systems where the default is set to 0. I guess that we should change it so that it check if the orig_hugepages is non-zero or not and only change the value if it's zero. -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-09 9:28 ` Cyril Hrubis @ 2018-10-09 11:45 ` Rafael David Tinoco 2018-10-09 18:49 ` Rafael David Tinoco 1 sibling, 0 replies; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-09 11:45 UTC (permalink / raw) To: ltp >> And there wasn't a second thread task id inside: >> >> /proc/<pid>/task/{<pid>, <tid1>, <tid2>} >> >> like it should. >> >> Likely the task creation (because of error ? check last part) logic here >> was faster than wait_for_threads() logic could expect... causing a race >> for the test's logic. > > That shouldn't really happen, the wait_for_threads(2) function is called > in a loop until there are two sleeping threads under the > /proc/$pid/task/ directory (minus the main thread). So unless there is a > problem with starting a thread or locking on a futex the loop will end > eventually and the futexes are unlocked only after this loop finishes. Yes, I'll check why. >> Reading hugetlb_report_meminfo() in hugetlb.c, I am not sure (struct >> hstat *h)->nr_huge_pages should change that much (through new hugepages >> being asked for, sysctl, or numa balance, among other options). I'll try >> to trace this tomorrow to see who is touching the HugePages_Total during >> the test. > > That is because the test modifies the size of the huge pages pool itself > in the test setup so that it can run on a systems where the default is > set to 0. I guess that we should change it so that it check if the > orig_hugepages is non-zero or not and only change the value if it's > zero. Hum, I missed that =), tks. Will get back to you soon on this. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-09 9:28 ` Cyril Hrubis 2018-10-09 11:45 ` Rafael David Tinoco @ 2018-10-09 18:49 ` Rafael David Tinoco 2018-10-09 21:06 ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco 2018-10-10 10:41 ` [LTP] LTP: futex_wake04 hangs forever on i386 Cyril Hrubis 1 sibling, 2 replies; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-09 18:49 UTC (permalink / raw) To: ltp >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> futex_wake04 0 TINFO : Thread 7495 not sleeping yet >> >> And there wasn't a second thread task id inside: >> >> /proc/<pid>/task/{<pid>, <tid1>, <tid2>} >> >> like it should. >> >> Likely the task creation (because of error ? check last part) logic here >> was faster than wait_for_threads() logic could expect... causing a race >> for the test's logic. > > That shouldn't really happen, the wait_for_threads(2) function is called > in a loop until there are two sleeping threads under the > /proc/$pid/task/ directory (minus the main thread). So unless there is a > problem with starting a thread or locking on a futex the loop will end > eventually and the futexes are unlocked only after this loop finishes. Okay, so this is what is happening... Like we predicted, one of the threads had an issue and the logic was stuck with 2 tasks only: parent and 1 thread. The parent in (R) mode and the thread in (S) mode. Quoting as an example this situation: futex_wake04 0 TINFO : Thread 11405 not sleeping yet futex_wake04 0 TINFO : Thread 11405 not sleeping yet futex_wake04 0 TINFO : Thread 11405 not sleeping yet futex_wake04 0 TINFO : Thread 11405 not sleeping yet ... Despite, in the beginning, we had a situation that would allow the "while(wait_for_threads(2))" to break, like showed here: (k)inaddy@bug3984:/proc$ PID=11405 ; echo ==== PID=$PID ====; ls -d $PID/task/* ; echo COMM ==== ; cat $PID/task/*/comm ; echo STAT ==== ; cat $PID/task/*/stat | awk '{print $1" "$2" "$3}' ; echo STACK ====; sudo cat $PID/task/*/stack ==== PID=11405 ==== 11405/task/11403 11405/task/11404 11405/task/11405 COMM ==== futex_wake04 futex_wake04 futex_wake04 STAT ==== 11403 (futex_wake04) S 11404 (futex_wake04) S 11405 (futex_wake04) R STACK ==== [<0>] hrtimer_nanosleep+0xe6/0x250 [<0>] sys_nanosleep+0x64/0xa0 [<0>] do_fast_syscall_32+0x7f/0x1b0 [<0>] entry_SYSENTER_32+0x4e/0x7c [<0>] 0xffffffff [<0>] futex_wait_queue_me+0xb8/0x120 [<0>] futex_wait+0xda/0x210 [<0>] do_futex+0x433/0x9b0 [<0>] sys_futex+0xe9/0x120 [<0>] do_fast_syscall_32+0x7f/0x1b0 [<0>] entry_SYSENTER_32+0x4e/0x7c [<0>] 0xffffffff [<0>] apic_timer_interrupt+0x3c/0x44 [<0>] 0xffffffff Something else happened, interfering in the expected behavior: (k)inaddy@bug3984:/proc$ PID=11405 ; echo ==== PID=$PID ====; ls -d $PID/task/* ; echo COMM ==== ; cat $PID/task/*/comm ; echo STAT ==== ; cat $PID/task/*/stat | awk '{print $1" "$2" "$3}' ; echo STACK ====; sudo cat $PID/task/*/{comm,stack} ==== PID=11405 ==== 11405/task/11403 11405/task/11405 COMM ==== futex_wake04 futex_wake04 STAT ==== 11403 (futex_wake04) S 11405 (futex_wake04) R STACK ==== futex_wake04 futex_wake04 [<0>] wait_woken+0x37/0x70 [<0>] n_tty_write+0x393/0x400 [<0>] tty_write+0x17d/0x310 [<0>] __vfs_write+0x1c/0x50 [<0>] vfs_write+0x94/0x1a0 [<0>] ksys_write+0x49/0xb0 [<0>] sys_write+0x16/0x20 [<0>] do_fast_syscall_32+0x7f/0x1b0 [<0>] entry_SYSENTER_32+0x4e/0x7c [<0>] 0xffffffff [<0>] apic_timer_interrupt+0x3c/0x44 [<0>] 0xffffffff 11404 thread is missing in this last sample, meaning it could already be gone, but the parent, 11405, running the wait_for_threads() function, did not catch that (yet ?) due to a CPU stall related to the terminal controller interrupts: ============================================================== [ 147.582036] INFO: rcu_sched detected stalls on CPUs/tasks: [ 147.583782] 1-....: (2 GPs behind) idle=602/1/1073741828 softirq=20147/20148 fqs=2239 [ 147.585973] (detected by 2, t=5252 jiffies, g=7585, c=7584, q=62923) [ 147.587806] Sending NMI from CPU 2 to CPUs 1: [ 147.590144] NMI backtrace for cpu 1 [ 147.590145] CPU: 1 PID: 11405 Comm: futex_wake04 Not tainted 4.18.0-1-686 #1 Debian 4.18.6-1 [ 147.590146] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 [ 147.590146] EIP: io_serial_out+0x17/0x20 [ 147.590147] Code: ... [ 147.590165] EAX: 00000079 EBX: 00000079 ECX: 00000000 EDX: 000003f8 [ 147.590165] ESI: f3be0000 EDI: d1446d60 EBP: f393bf2c ESP: f393bf28 [ 147.590166] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000002 [ 147.590166] CR0: 80050033 CR2: 0192144c CR3: 31f3b000 CR4: 000406d0 [ 147.590167] Call Trace: [ 147.590167] <IRQ> [ 147.590168] serial8250_tx_chars+0xf5/0x1f0 [ 147.590168] serial8250_handle_irq.part.21+0xa7/0xd0 [ 147.590168] serial8250_default_handle_irq+0x49/0x60 [ 147.590169] serial8250_interrupt+0x61/0x130 [ 147.590169] __handle_irq_event_percpu+0x42/0x1a0 [ 147.590170] ? handle_level_irq+0x100/0x100 [ 147.590170] handle_irq_event_percpu+0x2b/0x70 [ 147.590171] handle_irq_event+0x2f/0x50 [ 147.590171] ? handle_level_irq+0x100/0x100 [ 147.590172] handle_edge_irq+0x71/0x160 [ 147.590172] handle_irq+0x92/0xa0 [ 147.590173] </IRQ> [ 147.590173] <SOFTIRQ> [ 147.590173] do_IRQ+0x44/0xc0 [ 147.590174] ? __softirqentry_text_start+0x8/0x8 [ 147.590174] common_interrupt+0x3b/0x40 [ 147.590174] EIP: __do_softirq+0x5b/0x255 [ 147.590175] Code: ... [ 147.590192] EAX: 00000282 EBX: f273bf84 ECX: 0000000f EDX: f393e000 [ 147.590193] ESI: 00400040 EDI: d1647580 EBP: f393dff8 ESP: f393dfd4 [ 147.590194] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000202 [ 147.590194] ? __softirqentry_text_start+0x8/0x8 [ 147.590194] ? __softirqentry_text_start+0x8/0x8 [ 147.590195] call_on_stack+0x45/0x50 [ 147.590196] </SOFTIRQ> [ 147.590196] ? irq_exit+0xb5/0xc0 [ 147.590196] ? smp_apic_timer_interrupt+0x6c/0x120 [ 147.590197] ? apic_timer_interrupt+0x3c/0x44 ============================================================== Possibly the terminal I/O rate (stdout) has interfered with test behavior. When having, together with this, the adding/removing hugepages, I was able to "lock" this "sleep task detector" behavior for quite sometime. Testing all this in a ssh session I had no issues. Adding 1 second delay in between while() loop for the logic also mitigated the issue =). LKFT depends on serial console for the test results. Unfortunately we don't have a stack trace to analyze, but, I would bet the console rate limit could play tricks on us specially if we have already overloaded console serial lines with buffer from other runs. So, with all that, if you agree, I'll send a patch fixing both issues: 1) delay in between the wait_for_threads() logic 2) keep existing hugepages configured Opened to any comments ... Best, -Rafael ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep 2018-10-09 18:49 ` Rafael David Tinoco @ 2018-10-09 21:06 ` Rafael David Tinoco 2018-10-10 10:43 ` Cyril Hrubis 2018-10-10 10:41 ` [LTP] LTP: futex_wake04 hangs forever on i386 Cyril Hrubis 1 sibling, 1 reply; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-09 21:06 UTC (permalink / raw) To: ltp This commit fixes 2 observed issues: 1) usleep() time is too small if test is being executed in slower terminal devices (specially embedded systems). Raising it to 0.001 seconds was enough to finish 10240 iterations in around 90 seconds in a 4 vcpu kvm guest (fully emulated serial being used as console). 2) Test was changing number of hugepages during setup()/cleanup() phase despite the system had (or not) available hugepages. This was causing overhead of destroying (or creating) hugepages during the test execution. Now, if system has > 0 hugepages available, the test doesn't touch it. Link: https://bugs.linaro.org/show_bug.cgi?id=3984 Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org> --- testcases/kernel/syscalls/futex/futex_wake04.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/testcases/kernel/syscalls/futex/futex_wake04.c b/testcases/kernel/syscalls/futex/futex_wake04.c index 3c7038317..93d36ac33 100644 --- a/testcases/kernel/syscalls/futex/futex_wake04.c +++ b/testcases/kernel/syscalls/futex/futex_wake04.c @@ -76,14 +76,17 @@ static void setup(void) tst_tmpdir(); SAFE_FILE_SCANF(NULL, PATH_NR_HUGEPAGES, "%ld", &orig_hugepages); - SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 1); + + if (!orig_hugepages) + SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 1); TEST_PAUSE; } static void cleanup(void) { - SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%ld", orig_hugepages); + if (!orig_hugepages) + SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 0); tst_rmdir(); } @@ -172,8 +175,9 @@ static void wakeup_thread2(void) tst_strerrno(res)); } + /* 0.001 seconds: less might cause lockups for slower terminals */ while (wait_for_threads(2)) - usleep(100); + usleep(1000); futex_wake(futex2, 1, 0); -- 2.19.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep 2018-10-09 21:06 ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco @ 2018-10-10 10:43 ` Cyril Hrubis 2018-10-10 11:14 ` Rafael David Tinoco ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Cyril Hrubis @ 2018-10-10 10:43 UTC (permalink / raw) To: ltp Hi! > This commit fixes 2 observed issues: > > 1) usleep() time is too small if test is being executed in slower > terminal devices (specially embedded systems). Raising it to 0.001 > seconds was enough to finish 10240 iterations in around 90 seconds in a > 4 vcpu kvm guest (fully emulated serial being used as console). > > 2) Test was changing number of hugepages during setup()/cleanup() phase > despite the system had (or not) available hugepages. This was causing > overhead of destroying (or creating) hugepages during the test > execution. Now, if system has > 0 hugepages available, the test doesn't > touch it. Can you please split this into separate patches? Generally it's considered a good practice to split unrelated changes like these. > Link: https://bugs.linaro.org/show_bug.cgi?id=3984 > Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org> > --- > testcases/kernel/syscalls/futex/futex_wake04.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/testcases/kernel/syscalls/futex/futex_wake04.c b/testcases/kernel/syscalls/futex/futex_wake04.c > index 3c7038317..93d36ac33 100644 > --- a/testcases/kernel/syscalls/futex/futex_wake04.c > +++ b/testcases/kernel/syscalls/futex/futex_wake04.c > @@ -76,14 +76,17 @@ static void setup(void) > tst_tmpdir(); > > SAFE_FILE_SCANF(NULL, PATH_NR_HUGEPAGES, "%ld", &orig_hugepages); > - SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 1); > + > + if (!orig_hugepages) Maybe <= 0 would be a bit more robust. > + SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 1); > > TEST_PAUSE; > } > > static void cleanup(void) > { > - SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%ld", orig_hugepages); > + if (!orig_hugepages) > + SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 0); > > tst_rmdir(); > } > @@ -172,8 +175,9 @@ static void wakeup_thread2(void) > tst_strerrno(res)); > } > > + /* 0.001 seconds: less might cause lockups for slower terminals */ I would avoid comments like this, you can check the git changelog to figure out why the value was changed. > while (wait_for_threads(2)) > - usleep(100); > + usleep(1000); > > futex_wake(futex2, 1, 0); > > -- > 2.19.1 > -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep 2018-10-10 10:43 ` Cyril Hrubis @ 2018-10-10 11:14 ` Rafael David Tinoco 2018-10-10 12:06 ` Cyril Hrubis 2018-10-10 11:41 ` [LTP] [PATCH v2 1/2] futex/futex_wake04.c: fix hugepages setup for test Rafael David Tinoco 2018-10-10 14:20 ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco 2 siblings, 1 reply; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-10 11:14 UTC (permalink / raw) To: ltp On 10/10/18 7:43 AM, Cyril Hrubis wrote: > Hi! >> This commit fixes 2 observed issues: >> >> 1) usleep() time is too small if test is being executed in slower >> terminal devices (specially embedded systems). Raising it to 0.001 >> seconds was enough to finish 10240 iterations in around 90 seconds in a >> 4 vcpu kvm guest (fully emulated serial being used as console). >> >> 2) Test was changing number of hugepages during setup()/cleanup() phase >> despite the system had (or not) available hugepages. This was causing >> overhead of destroying (or creating) hugepages during the test >> execution. Now, if system has > 0 hugepages available, the test doesn't >> touch it. > > Can you please split this into separate patches? Generally it's > considered a good practice to split unrelated changes like these. Yes, will do. Ignore my last sentence in the other e-mail on the same thread (about this patch). Will post 2 patches "v2". Thanks a lot Cyril! ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep 2018-10-10 11:14 ` Rafael David Tinoco @ 2018-10-10 12:06 ` Cyril Hrubis 0 siblings, 0 replies; 20+ messages in thread From: Cyril Hrubis @ 2018-10-10 12:06 UTC (permalink / raw) To: ltp Hi! > > Can you please split this into separate patches? Generally it's > > considered a good practice to split unrelated changes like these. > > Yes, will do. Ignore my last sentence in the other e-mail on the same > thread (about this patch). Will post 2 patches "v2". There were some minor comments, if you agree with them I can amend the patches you send before pushing them... -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] [PATCH v2 1/2] futex/futex_wake04.c: fix hugepages setup for test 2018-10-10 10:43 ` Cyril Hrubis 2018-10-10 11:14 ` Rafael David Tinoco @ 2018-10-10 11:41 ` Rafael David Tinoco 2018-10-10 11:41 ` [LTP] [PATCH v2 2/2] futex/futex_wake04.c: raise delay waiting for threads Rafael David Tinoco 2018-10-10 14:20 ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco 2 siblings, 1 reply; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-10 11:41 UTC (permalink / raw) To: ltp Test was changing number of hugepages during setup()/cleanup() phase, despite system having (or not) already configured hugepages. This was causing some overhead - of destroying (or creating) hugepages - during the test execution, and also changing amount of available hugepages to the rest of the system. Now, if system has > 0 hugepages available, the test doesn't touch it. Link: https://bugs.linaro.org/show_bug.cgi?id=3984 Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org> --- testcases/kernel/syscalls/futex/futex_wake04.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/testcases/kernel/syscalls/futex/futex_wake04.c b/testcases/kernel/syscalls/futex/futex_wake04.c index 3c7038317..1ae9599ab 100644 --- a/testcases/kernel/syscalls/futex/futex_wake04.c +++ b/testcases/kernel/syscalls/futex/futex_wake04.c @@ -76,14 +76,17 @@ static void setup(void) tst_tmpdir(); SAFE_FILE_SCANF(NULL, PATH_NR_HUGEPAGES, "%ld", &orig_hugepages); - SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 1); + + if (!orig_hugepages) + SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 1); TEST_PAUSE; } static void cleanup(void) { - SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%ld", orig_hugepages); + if (!orig_hugepages) + SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 0); tst_rmdir(); } -- 2.19.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [LTP] [PATCH v2 2/2] futex/futex_wake04.c: raise delay waiting for threads 2018-10-10 11:41 ` [LTP] [PATCH v2 1/2] futex/futex_wake04.c: fix hugepages setup for test Rafael David Tinoco @ 2018-10-10 11:41 ` Rafael David Tinoco 0 siblings, 0 replies; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-10 11:41 UTC (permalink / raw) To: ltp usleep() time was too small if test is being executed in slower terminal devices (specially embedded systems). Raising it to 0.001 seconds was enough to finish 10240 iterations in around 90 seconds in a 4 vcpu kvm guest (fully emulated serial being used as console). Link: https://bugs.linaro.org/show_bug.cgi?id=3984 Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org> --- testcases/kernel/syscalls/futex/futex_wake04.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/testcases/kernel/syscalls/futex/futex_wake04.c b/testcases/kernel/syscalls/futex/futex_wake04.c index 1ae9599ab..bd10e56af 100644 --- a/testcases/kernel/syscalls/futex/futex_wake04.c +++ b/testcases/kernel/syscalls/futex/futex_wake04.c @@ -175,8 +175,9 @@ static void wakeup_thread2(void) tst_strerrno(res)); } + /* 0.001 seconds: less might cause lockups for slower terminals */ while (wait_for_threads(2)) - usleep(100); + usleep(1000); futex_wake(futex2, 1, 0); -- 2.19.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep 2018-10-10 10:43 ` Cyril Hrubis 2018-10-10 11:14 ` Rafael David Tinoco 2018-10-10 11:41 ` [LTP] [PATCH v2 1/2] futex/futex_wake04.c: fix hugepages setup for test Rafael David Tinoco @ 2018-10-10 14:20 ` Rafael David Tinoco 2018-10-11 12:23 ` Cyril Hrubis 2 siblings, 1 reply; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-10 14:20 UTC (permalink / raw) To: ltp On 10/10/18 7:43 AM, Cyril Hrubis wrote: > Hi! >> This commit fixes 2 observed issues: >> >> 1) usleep() time is too small if test is being executed in slower >> terminal devices (specially embedded systems). Raising it to 0.001 >> seconds was enough to finish 10240 iterations in around 90 seconds in a >> 4 vcpu kvm guest (fully emulated serial being used as console). >> >> 2) Test was changing number of hugepages during setup()/cleanup() phase >> despite the system had (or not) available hugepages. This was causing >> overhead of destroying (or creating) hugepages during the test >> execution. Now, if system has > 0 hugepages available, the test doesn't >> touch it. > > Can you please split this into separate patches? Generally it's > considered a good practice to split unrelated changes like these. > >> Link: https://bugs.linaro.org/show_bug.cgi?id=3984 >> Signed-off-by: Rafael David Tinoco <rafael.tinoco@linaro.org> >> --- >> testcases/kernel/syscalls/futex/futex_wake04.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/testcases/kernel/syscalls/futex/futex_wake04.c b/testcases/kernel/syscalls/futex/futex_wake04.c >> index 3c7038317..93d36ac33 100644 >> --- a/testcases/kernel/syscalls/futex/futex_wake04.c >> +++ b/testcases/kernel/syscalls/futex/futex_wake04.c >> @@ -76,14 +76,17 @@ static void setup(void) >> tst_tmpdir(); >> >> SAFE_FILE_SCANF(NULL, PATH_NR_HUGEPAGES, "%ld", &orig_hugepages); >> - SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 1); >> + >> + if (!orig_hugepages) > > Maybe <= 0 would be a bit more robust. > >> + SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 1); >> >> TEST_PAUSE; >> } >> >> static void cleanup(void) >> { >> - SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%ld", orig_hugepages); >> + if (!orig_hugepages) >> + SAFE_FILE_PRINTF(NULL, PATH_NR_HUGEPAGES, "%d", 0); >> >> tst_rmdir(); >> } >> @@ -172,8 +175,9 @@ static void wakeup_thread2(void) >> tst_strerrno(res)); >> } >> >> + /* 0.001 seconds: less might cause lockups for slower terminals */ > > I would avoid comments like this, you can check the git changelog to > figure out why the value was changed. > >> while (wait_for_threads(2)) >> - usleep(100); >> + usleep(1000); >> >> futex_wake(futex2, 1, 0); >> >> -- >> 2.19.1 >> > I missed these comments in the 2 v2 I sent. Please feel free to change or let me know if you prefer a v3. Tks a lot. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep 2018-10-10 14:20 ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco @ 2018-10-11 12:23 ` Cyril Hrubis 0 siblings, 0 replies; 20+ messages in thread From: Cyril Hrubis @ 2018-10-11 12:23 UTC (permalink / raw) To: ltp Hi! > I missed these comments in the 2 v2 I sent. Please feel free to change > or let me know if you prefer a v3. Tks a lot. Ammended and pushed, thanks. -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-09 18:49 ` Rafael David Tinoco 2018-10-09 21:06 ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco @ 2018-10-10 10:41 ` Cyril Hrubis 2018-10-10 11:13 ` Rafael David Tinoco 1 sibling, 1 reply; 20+ messages in thread From: Cyril Hrubis @ 2018-10-10 10:41 UTC (permalink / raw) To: ltp Jo! > > That shouldn't really happen, the wait_for_threads(2) function is called > > in a loop until there are two sleeping threads under the > > /proc/$pid/task/ directory (minus the main thread). So unless there is a > > problem with starting a thread or locking on a futex the loop will end > > eventually and the futexes are unlocked only after this loop finishes. > > Okay, so this is what is happening... > > Like we predicted, one of the threads had an issue and the logic was > stuck with 2 tasks only: parent and 1 thread. The parent in (R) mode and > the thread in (S) mode. Actually the parent is in the (S) mode, that is because the parent spends most of the time in the sleep() anyways. > Quoting as an example this situation: > > futex_wake04 0 TINFO : Thread 11405 not sleeping yet > futex_wake04 0 TINFO : Thread 11405 not sleeping yet > futex_wake04 0 TINFO : Thread 11405 not sleeping yet > futex_wake04 0 TINFO : Thread 11405 not sleeping yet > ... > > Despite, in the beginning, we had a situation that would allow the > "while(wait_for_threads(2))" to break, like showed here: That is not true, see below. > (k)inaddy@bug3984:/proc$ PID=11405 ; echo ==== PID=$PID ====; ls -d > $PID/task/* ; echo COMM ==== ; cat $PID/task/*/comm ; echo STAT ==== ; > cat $PID/task/*/stat | awk '{print $1" "$2" "$3}' ; echo STACK ====; > sudo cat $PID/task/*/stack > > ==== PID=11405 ==== > 11405/task/11403 11405/task/11404 11405/task/11405 > COMM ==== > futex_wake04 > futex_wake04 > futex_wake04 > STAT ==== > 11403 (futex_wake04) S > 11404 (futex_wake04) S > 11405 (futex_wake04) R The pids (tids) increment so the first one is the parent sleeping in sleep() the second one is the first forked thread locked on futex and the second one is the child that is supposed to lock on a futex. > STACK ==== > [<0>] hrtimer_nanosleep+0xe6/0x250 > [<0>] sys_nanosleep+0x64/0xa0 > [<0>] do_fast_syscall_32+0x7f/0x1b0 > [<0>] entry_SYSENTER_32+0x4e/0x7c > [<0>] 0xffffffff So this is the parent trace, as expected it sleeps in the nanosleep() timer. > [<0>] futex_wait_queue_me+0xb8/0x120 > [<0>] futex_wait+0xda/0x210 > [<0>] do_futex+0x433/0x9b0 > [<0>] sys_futex+0xe9/0x120 > [<0>] do_fast_syscall_32+0x7f/0x1b0 > [<0>] entry_SYSENTER_32+0x4e/0x7c > [<0>] 0xffffffff This is the thread that is locked on a futex. > [<0>] apic_timer_interrupt+0x3c/0x44 > [<0>] 0xffffffff Huh, are we even supposed to get apic_timer_interrupt in the process trace? > Something else happened, interfering in the expected behavior: > > (k)inaddy@bug3984:/proc$ PID=11405 ; echo ==== PID=$PID ====; ls -d > $PID/task/* ; echo COMM ==== ; cat $PID/task/*/comm ; echo STAT ==== ; > cat $PID/task/*/stat | awk '{print $1" "$2" "$3}' ; echo STACK ====; > sudo cat $PID/task/*/{comm,stack} > > ==== PID=11405 ==== > 11405/task/11403 11405/task/11405 > COMM ==== > futex_wake04 > futex_wake04 > STAT ==== > 11403 (futex_wake04) S > 11405 (futex_wake04) R Looks like the child that was locked on a futex is no longer there. If this was running for long enough (5 minutes) the test library process (parent of 11403 here) send SIGKILL to the whole process group and the only process killed was the one that was not stuck in the kernel at the moment. > STACK ==== > futex_wake04 > futex_wake04 > [<0>] wait_woken+0x37/0x70 > [<0>] n_tty_write+0x393/0x400 > [<0>] tty_write+0x17d/0x310 > [<0>] __vfs_write+0x1c/0x50 > [<0>] vfs_write+0x94/0x1a0 > [<0>] ksys_write+0x49/0xb0 > [<0>] sys_write+0x16/0x20 > [<0>] do_fast_syscall_32+0x7f/0x1b0 > [<0>] entry_SYSENTER_32+0x4e/0x7c > [<0>] 0xffffffff So supposedly the parent is writing something to stderr at this point. > [<0>] apic_timer_interrupt+0x3c/0x44 > [<0>] 0xffffffff And the child is (again) in the interrupt handler. > 11404 thread is missing in this last sample, meaning it could already be > gone, but the parent, 11405, running the wait_for_threads() function, > did not catch that (yet ?) due to a CPU stall related to the terminal > controller interrupts: The parent will not catch that at all, as the wait_for_threads() is very simple function that stops the loop only and only if there are two threads, different from the caller thread, sleeping in kernel. > ============================================================== > [ 147.582036] INFO: rcu_sched detected stalls on CPUs/tasks: > [ 147.583782] 1-....: (2 GPs behind) idle=602/1/1073741828 > softirq=20147/20148 fqs=2239 > [ 147.585973] (detected by 2, t=5252 jiffies, g=7585, c=7584, q=62923) > [ 147.587806] Sending NMI from CPU 2 to CPUs 1: > [ 147.590144] NMI backtrace for cpu 1 > > [ 147.590145] CPU: 1 PID: 11405 Comm: futex_wake04 Not tainted > 4.18.0-1-686 #1 Debian 4.18.6-1 > > [ 147.590146] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.11.1-1 04/01/2014 > [ 147.590146] EIP: io_serial_out+0x17/0x20 > [ 147.590147] Code: ... > [ 147.590165] EAX: 00000079 EBX: 00000079 ECX: 00000000 EDX: 000003f8 > [ 147.590165] ESI: f3be0000 EDI: d1446d60 EBP: f393bf2c ESP: f393bf28 > [ 147.590166] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000002 > [ 147.590166] CR0: 80050033 CR2: 0192144c CR3: 31f3b000 CR4: 000406d0 > [ 147.590167] Call Trace: > [ 147.590167] <IRQ> > [ 147.590168] serial8250_tx_chars+0xf5/0x1f0 > [ 147.590168] serial8250_handle_irq.part.21+0xa7/0xd0 > [ 147.590168] serial8250_default_handle_irq+0x49/0x60 > [ 147.590169] serial8250_interrupt+0x61/0x130 > [ 147.590169] __handle_irq_event_percpu+0x42/0x1a0 > [ 147.590170] ? handle_level_irq+0x100/0x100 > [ 147.590170] handle_irq_event_percpu+0x2b/0x70 > [ 147.590171] handle_irq_event+0x2f/0x50 > [ 147.590171] ? handle_level_irq+0x100/0x100 > [ 147.590172] handle_edge_irq+0x71/0x160 > [ 147.590172] handle_irq+0x92/0xa0 > [ 147.590173] </IRQ> > [ 147.590173] <SOFTIRQ> > [ 147.590173] do_IRQ+0x44/0xc0 > [ 147.590174] ? __softirqentry_text_start+0x8/0x8 > [ 147.590174] common_interrupt+0x3b/0x40 > [ 147.590174] EIP: __do_softirq+0x5b/0x255 > [ 147.590175] Code: ... > [ 147.590192] EAX: 00000282 EBX: f273bf84 ECX: 0000000f EDX: f393e000 > [ 147.590193] ESI: 00400040 EDI: d1647580 EBP: f393dff8 ESP: f393dfd4 > [ 147.590194] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000202 > [ 147.590194] ? __softirqentry_text_start+0x8/0x8 > [ 147.590194] ? __softirqentry_text_start+0x8/0x8 > [ 147.590195] call_on_stack+0x45/0x50 > [ 147.590196] </SOFTIRQ> > [ 147.590196] ? irq_exit+0xb5/0xc0 > [ 147.590196] ? smp_apic_timer_interrupt+0x6c/0x120 > [ 147.590197] ? apic_timer_interrupt+0x3c/0x44 > ============================================================== > > Possibly the terminal I/O rate (stdout) has interfered with test > behavior. When having, together with this, the adding/removing > hugepages, I was able to "lock" this "sleep task detector" behavior for > quite sometime. > > Testing all this in a ssh session I had no issues. Adding 1 second delay > in between while() loop for the logic also mitigated the issue =). > > LKFT depends on serial console for the test results. Unfortunately we > don't have a stack trace to analyze, but, I would bet the console rate > limit could play tricks on us specially if we have already overloaded > console serial lines with buffer from other runs. Doing a simple math if you have serial line on 115200 it can transwer about 11.5 bits in 100uS so yes we can easily overload serial connection if we print messages that fast, so increasing the sleep 10x times sounds only reasonable. On the other hand I guess that kernel shouldn't produce stalls just because we write to serial too fast. I guess that the task writing to serial should be sleeping in a waitqueue somewhere in kernel if it attempts to write to a full serial buffer, which will effectivelly slow down the writer and the test would work fine in this case... -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-10 10:41 ` [LTP] LTP: futex_wake04 hangs forever on i386 Cyril Hrubis @ 2018-10-10 11:13 ` Rafael David Tinoco 2018-10-10 12:05 ` Cyril Hrubis 0 siblings, 1 reply; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-10 11:13 UTC (permalink / raw) To: ltp On 10/10/18 7:41 AM, Cyril Hrubis wrote: > Jo! >>> That shouldn't really happen, the wait_for_threads(2) function is called >>> in a loop until there are two sleeping threads under the >>> /proc/$pid/task/ directory (minus the main thread). So unless there is a >>> problem with starting a thread or locking on a futex the loop will end >>> eventually and the futexes are unlocked only after this loop finishes. >> >> Okay, so this is what is happening... >> >> Like we predicted, one of the threads had an issue and the logic was >> stuck with 2 tasks only: parent and 1 thread. The parent in (R) mode and >> the thread in (S) mode. > > Actually the parent is in the (S) mode, that is because the parent > spends most of the time in the sleep() anyways. > >> Quoting as an example this situation: >> >> futex_wake04 0 TINFO : Thread 11405 not sleeping yet >> futex_wake04 0 TINFO : Thread 11405 not sleeping yet >> futex_wake04 0 TINFO : Thread 11405 not sleeping yet >> futex_wake04 0 TINFO : Thread 11405 not sleeping yet >> ... >> >> Despite, in the beginning, we had a situation that would allow the >> "while(wait_for_threads(2))" to break, like showed here: > > That is not true, see below. > >> (k)inaddy@bug3984:/proc$ PID=11405 ; echo ==== PID=$PID ====; ls -d >> $PID/task/* ; echo COMM ==== ; cat $PID/task/*/comm ; echo STAT ==== ; >> cat $PID/task/*/stat | awk '{print $1" "$2" "$3}' ; echo STACK ====; >> sudo cat $PID/task/*/stack >> >> ==== PID=11405 ==== >> 11405/task/11403 11405/task/11404 11405/task/11405 >> COMM ==== >> futex_wake04 >> futex_wake04 >> futex_wake04 >> STAT ==== >> 11403 (futex_wake04) S >> 11404 (futex_wake04) S >> 11405 (futex_wake04) R > > The pids (tids) increment so the first one is the parent sleeping in > sleep() the second one is the first forked thread locked on futex and > the second one is the child that is supposed to lock on a futex. Yes, I had realized this after I wrote this part, during conclusion, and never revisited the upper part of this e-mail, sorry. > >> STACK ==== >> [<0>] hrtimer_nanosleep+0xe6/0x250 >> [<0>] sys_nanosleep+0x64/0xa0 >> [<0>] do_fast_syscall_32+0x7f/0x1b0 >> [<0>] entry_SYSENTER_32+0x4e/0x7c >> [<0>] 0xffffffff > > So this is the parent trace, as expected it sleeps in the nanosleep() > timer. Yep > >> [<0>] futex_wait_queue_me+0xb8/0x120 >> [<0>] futex_wait+0xda/0x210 >> [<0>] do_futex+0x433/0x9b0 >> [<0>] sys_futex+0xe9/0x120 >> [<0>] do_fast_syscall_32+0x7f/0x1b0 >> [<0>] entry_SYSENTER_32+0x4e/0x7c >> [<0>] 0xffffffff > > This is the thread that is locked on a futex. > >> [<0>] apic_timer_interrupt+0x3c/0x44 >> [<0>] 0xffffffff > > Huh, are we even supposed to get apic_timer_interrupt in the process > trace? I believe this is the kernel stack for the process, showing it was preempted by the IRQ handler for the clock, right after it entered kernel.. IRQ handlers run with their own stacks, so we can't see the rest in here, but the function that prints the stack on dmesg does get it by checking "__softirqentry_text_start" session. >> Something else happened, interfering in the expected behavior: >> >> (k)inaddy@bug3984:/proc$ PID=11405 ; echo ==== PID=$PID ====; ls -d >> $PID/task/* ; echo COMM ==== ; cat $PID/task/*/comm ; echo STAT ==== ; >> cat $PID/task/*/stat | awk '{print $1" "$2" "$3}' ; echo STACK ====; >> sudo cat $PID/task/*/{comm,stack} >> >> ==== PID=11405 ==== >> 11405/task/11403 11405/task/11405 >> COMM ==== >> futex_wake04 >> futex_wake04 >> STAT ==== >> 11403 (futex_wake04) S >> 11405 (futex_wake04) R > > Looks like the child that was locked on a futex is no longer there. > > If this was running for long enough (5 minutes) the test library process > (parent of 11403 here) send SIGKILL to the whole process group and the > only process killed was the one that was not stuck in the kernel at the > moment. > >> STACK ==== >> futex_wake04 >> futex_wake04 >> [<0>] wait_woken+0x37/0x70 >> [<0>] n_tty_write+0x393/0x400 >> [<0>] tty_write+0x17d/0x310 >> [<0>] __vfs_write+0x1c/0x50 >> [<0>] vfs_write+0x94/0x1a0 >> [<0>] ksys_write+0x49/0xb0 >> [<0>] sys_write+0x16/0x20 >> [<0>] do_fast_syscall_32+0x7f/0x1b0 >> [<0>] entry_SYSENTER_32+0x4e/0x7c >> [<0>] 0xffffffff > > So supposedly the parent is writing something to stderr at this point. > >> [<0>] apic_timer_interrupt+0x3c/0x44 >> [<0>] 0xffffffff > > And the child is (again) in the interrupt handler. > >> 11404 thread is missing in this last sample, meaning it could already be >> gone, but the parent, 11405, running the wait_for_threads() function, >> did not catch that (yet ?) due to a CPU stall related to the terminal >> controller interrupts: > > The parent will not catch that at all, as the wait_for_threads() is very > simple function that stops the loop only and only if there are two > threads, different from the caller thread, sleeping in kernel. Agreed. > >> ============================================================== >> [ 147.582036] INFO: rcu_sched detected stalls on CPUs/tasks: >> [ 147.583782] 1-....: (2 GPs behind) idle=602/1/1073741828 >> softirq=20147/20148 fqs=2239 >> [ 147.585973] (detected by 2, t=5252 jiffies, g=7585, c=7584, q=62923) >> [ 147.587806] Sending NMI from CPU 2 to CPUs 1: >> [ 147.590144] NMI backtrace for cpu 1 >> >> [ 147.590145] CPU: 1 PID: 11405 Comm: futex_wake04 Not tainted >> 4.18.0-1-686 #1 Debian 4.18.6-1 >> >> [ 147.590146] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS 1.11.1-1 04/01/2014 >> [ 147.590146] EIP: io_serial_out+0x17/0x20 >> [ 147.590147] Code: ... >> [ 147.590165] EAX: 00000079 EBX: 00000079 ECX: 00000000 EDX: 000003f8 >> [ 147.590165] ESI: f3be0000 EDI: d1446d60 EBP: f393bf2c ESP: f393bf28 >> [ 147.590166] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000002 >> [ 147.590166] CR0: 80050033 CR2: 0192144c CR3: 31f3b000 CR4: 000406d0 >> [ 147.590167] Call Trace: >> [ 147.590167] <IRQ> >> [ 147.590168] serial8250_tx_chars+0xf5/0x1f0 >> [ 147.590168] serial8250_handle_irq.part.21+0xa7/0xd0 >> [ 147.590168] serial8250_default_handle_irq+0x49/0x60 >> [ 147.590169] serial8250_interrupt+0x61/0x130 >> [ 147.590169] __handle_irq_event_percpu+0x42/0x1a0 >> [ 147.590170] ? handle_level_irq+0x100/0x100 >> [ 147.590170] handle_irq_event_percpu+0x2b/0x70 >> [ 147.590171] handle_irq_event+0x2f/0x50 >> [ 147.590171] ? handle_level_irq+0x100/0x100 >> [ 147.590172] handle_edge_irq+0x71/0x160 >> [ 147.590172] handle_irq+0x92/0xa0 >> [ 147.590173] </IRQ> >> [ 147.590173] <SOFTIRQ> >> [ 147.590173] do_IRQ+0x44/0xc0 >> [ 147.590174] ? __softirqentry_text_start+0x8/0x8 >> [ 147.590174] common_interrupt+0x3b/0x40 >> [ 147.590174] EIP: __do_softirq+0x5b/0x255 >> [ 147.590175] Code: ... >> [ 147.590192] EAX: 00000282 EBX: f273bf84 ECX: 0000000f EDX: f393e000 >> [ 147.590193] ESI: 00400040 EDI: d1647580 EBP: f393dff8 ESP: f393dfd4 >> [ 147.590194] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000202 >> [ 147.590194] ? __softirqentry_text_start+0x8/0x8 >> [ 147.590194] ? __softirqentry_text_start+0x8/0x8 >> [ 147.590195] call_on_stack+0x45/0x50 >> [ 147.590196] </SOFTIRQ> >> [ 147.590196] ? irq_exit+0xb5/0xc0 >> [ 147.590196] ? smp_apic_timer_interrupt+0x6c/0x120 >> [ 147.590197] ? apic_timer_interrupt+0x3c/0x44 >> ============================================================== >> >> Possibly the terminal I/O rate (stdout) has interfered with test >> behavior. When having, together with this, the adding/removing >> hugepages, I was able to "lock" this "sleep task detector" behavior for >> quite sometime. >> >> Testing all this in a ssh session I had no issues. Adding 1 second delay >> in between while() loop for the logic also mitigated the issue =). >> >> LKFT depends on serial console for the test results. Unfortunately we >> don't have a stack trace to analyze, but, I would bet the console rate >> limit could play tricks on us specially if we have already overloaded >> console serial lines with buffer from other runs. > > Doing a simple math if you have serial line on 115200 it can transwer > about 11.5 bits in 100uS so yes we can easily overload serial connection > if we print messages that fast, so increasing the sleep 10x times sounds > only reasonable. > > On the other hand I guess that kernel shouldn't produce stalls just > because we write to serial too fast. I guess that the task writing to > serial should be sleeping in a waitqueue somewhere in kernel if it > attempts to write to a full serial buffer, which will effectivelly > slow down the writer and the test would work fine in this case... That is what happened, no ? Stack trace shows that possibly same process that had its execution interrupted by the time handler in a recent past (with a leftover frame pointer indicated by ? apic_timer_interrupt) also ran the handler for serial8250_interrupt() and that execution caused some dmesg warnings like: serial8250: too much work for irq which I had. I guess having the terminal with a smaller buffer would help here as well, since the calls to the interrupt handler would be more often and less time consuming (because of baud rate). Continuing, there was a software interrupt to flush serial 8250 contents and it took too long to flush to the device, causing the task, waiting on the line, to wait. HOWEVER, for this case, for example, every single byte written to a terminal memory mapped I/O register causes a VM_EXIT (in the host: cpu->kvm_run->exit_reason == 2), giving execution back to the QEMU vCPU (that had entered VM mode), that will schedule a function inside the emulation thread to deal with that I/O (for the virtualized serial HW being emulated) and calling the VM_ENTER again at some further point. This could make the interrupt handler even slower. I sent a patch in the thread yesterday, feel free to change it (interval) if you need. After those 2 fixes I would ask Naresh to test this same test for some time (patching our LTP running version on LKFT) to make sure we solved original complain (despite this behavior) and interpretations were correct. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-10 11:13 ` Rafael David Tinoco @ 2018-10-10 12:05 ` Cyril Hrubis 2018-10-10 12:15 ` Rafael David Tinoco 0 siblings, 1 reply; 20+ messages in thread From: Cyril Hrubis @ 2018-10-10 12:05 UTC (permalink / raw) To: ltp Hi! > > Doing a simple math if you have serial line on 115200 it can transwer > > about 11.5 bits in 100uS so yes we can easily overload serial connection > > if we print messages that fast, so increasing the sleep 10x times sounds > > only reasonable. > > > > On the other hand I guess that kernel shouldn't produce stalls just > > because we write to serial too fast. I guess that the task writing to > > serial should be sleeping in a waitqueue somewhere in kernel if it > > attempts to write to a full serial buffer, which will effectivelly > > slow down the writer and the test would work fine in this case... > > That is what happened, no ? Stack trace shows that possibly same process > that had its execution interrupted by the time handler in a recent past > (with a leftover frame pointer indicated by ? apic_timer_interrupt) also > ran the handler for serial8250_interrupt() and that execution caused > some dmesg warnings like: > > serial8250: too much work for irq > > which I had. I guess having the terminal with a smaller buffer would > help here as well, since the calls to the interrupt handler would be > more often and less time consuming (because of baud rate). > > Continuing, there was a software interrupt to flush serial 8250 contents > and it took too long to flush to the device, causing the task, waiting > on the line, to wait. > > HOWEVER, for this case, for example, every single byte written to a > terminal memory mapped I/O register causes a VM_EXIT (in the host: > cpu->kvm_run->exit_reason == 2), giving execution back to the QEMU vCPU > (that had entered VM mode), that will schedule a function inside the > emulation thread to deal with that I/O (for the virtualized serial HW > being emulated) and calling the VM_ENTER again at some further point. > This could make the interrupt handler even slower. Ok, I guess that I understand why this is so slow, we emulate PC memory mapped serial port and we go back and forth for each byte we write. And I guess that we cannot do much about this. Also I suppose that it could be "fixed" by switching to virtio serial driver that should be able to read the whole buffer in one go. -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-10 12:05 ` Cyril Hrubis @ 2018-10-10 12:15 ` Rafael David Tinoco 2018-10-10 12:33 ` Cyril Hrubis 0 siblings, 1 reply; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-10 12:15 UTC (permalink / raw) To: ltp On 10/10/18 9:05 AM, Cyril Hrubis wrote: >> That is what happened, no ? Stack trace shows that possibly same process >> that had its execution interrupted by the time handler in a recent past >> (with a leftover frame pointer indicated by ? apic_timer_interrupt) also >> ran the handler for serial8250_interrupt() and that execution caused >> some dmesg warnings like: >> >> serial8250: too much work for irq >> >> which I had. I guess having the terminal with a smaller buffer would >> help here as well, since the calls to the interrupt handler would be >> more often and less time consuming (because of baud rate). >> >> Continuing, there was a software interrupt to flush serial 8250 contents >> and it took too long to flush to the device, causing the task, waiting >> on the line, to wait. >> >> HOWEVER, for this case, for example, every single byte written to a >> terminal memory mapped I/O register causes a VM_EXIT (in the host: >> cpu->kvm_run->exit_reason == 2), giving execution back to the QEMU vCPU >> (that had entered VM mode), that will schedule a function inside the >> emulation thread to deal with that I/O (for the virtualized serial HW >> being emulated) and calling the VM_ENTER again at some further point. >> This could make the interrupt handler even slower. > Ok, I guess that I understand why this is so slow, we emulate PC memory > mapped serial port and we go back and forth for each byte we write. And > I guess that we cannot do much about this. > > Also I suppose that it could be "fixed" by switching to virtio serial > driver that should be able to read the whole buffer in one go. Just as curiosity... Unfortunately, ARM QEMU does not seem to have a virtio serial device (yet ?) and x86 can't have boot loaders on a virtio serial device, only systemd output and console, afaik. =(. ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-10 12:15 ` Rafael David Tinoco @ 2018-10-10 12:33 ` Cyril Hrubis 2018-10-10 12:48 ` Rafael David Tinoco 0 siblings, 1 reply; 20+ messages in thread From: Cyril Hrubis @ 2018-10-10 12:33 UTC (permalink / raw) To: ltp Hi! > > Ok, I guess that I understand why this is so slow, we emulate PC memory > > mapped serial port and we go back and forth for each byte we write. And > > I guess that we cannot do much about this. > > > > Also I suppose that it could be "fixed" by switching to virtio serial > > driver that should be able to read the whole buffer in one go. > > Just as curiosity... Unfortunately, ARM QEMU does not seem to have a > virtio serial device (yet ?) and x86 can't have boot loaders on a virtio > serial device, only systemd output and console, afaik. =(. Huh, I'm pretty sure we run aarch64 VMs with virtio serial devices. Also AFAIC you can enable both PC serial port emulation and virtio serial and configure the system to start console on the virtio serial device. -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 20+ messages in thread
* [LTP] LTP: futex_wake04 hangs forever on i386 2018-10-10 12:33 ` Cyril Hrubis @ 2018-10-10 12:48 ` Rafael David Tinoco 0 siblings, 0 replies; 20+ messages in thread From: Rafael David Tinoco @ 2018-10-10 12:48 UTC (permalink / raw) To: ltp On 10/10/18 9:33 AM, Cyril Hrubis wrote: > Hi! >>> Ok, I guess that I understand why this is so slow, we emulate PC memory >>> mapped serial port and we go back and forth for each byte we write. And >>> I guess that we cannot do much about this. >>> >>> Also I suppose that it could be "fixed" by switching to virtio serial >>> driver that should be able to read the whole buffer in one go. >> >> Just as curiosity... Unfortunately, ARM QEMU does not seem to have a >> virtio serial device (yet ?) and x86 can't have boot loaders on a virtio >> serial device, only systemd output and console, afaik. =(. > > Huh, I'm pretty sure we run aarch64 VMs with virtio serial devices. Yes, you are right. This is what I have in my dev machine: libvirt xml: <serial type='pty'> <source path='/dev/pts/1'/> <target type='system-serial' port='0'> <model name='pl011'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/1'> <source path='/dev/pts/1'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <console type='pty'> <source path='/dev/pts/2'/> <target type='virtio' port='1'/> <alias name='console1'/> </console> And the cmdline: -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 -chardev pty,id=charserial0 -serial chardev:charserial0 -chardev pty,id=charconsole1 -device virtconsole,chardev=charconsole1,id=console1 > Also AFAIC you can enable both PC serial port emulation and virtio > serial and configure the system to start console on the virtio serial > device. True. Best, Rafael ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2018-10-11 12:23 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-08 13:21 [LTP] LTP: futex_wake04 hangs forever on i386 Naresh Kamboju 2018-10-08 13:30 ` Cyril Hrubis 2018-10-09 2:05 ` Rafael David Tinoco 2018-10-09 9:28 ` Cyril Hrubis 2018-10-09 11:45 ` Rafael David Tinoco 2018-10-09 18:49 ` Rafael David Tinoco 2018-10-09 21:06 ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco 2018-10-10 10:43 ` Cyril Hrubis 2018-10-10 11:14 ` Rafael David Tinoco 2018-10-10 12:06 ` Cyril Hrubis 2018-10-10 11:41 ` [LTP] [PATCH v2 1/2] futex/futex_wake04.c: fix hugepages setup for test Rafael David Tinoco 2018-10-10 11:41 ` [LTP] [PATCH v2 2/2] futex/futex_wake04.c: raise delay waiting for threads Rafael David Tinoco 2018-10-10 14:20 ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco 2018-10-11 12:23 ` Cyril Hrubis 2018-10-10 10:41 ` [LTP] LTP: futex_wake04 hangs forever on i386 Cyril Hrubis 2018-10-10 11:13 ` Rafael David Tinoco 2018-10-10 12:05 ` Cyril Hrubis 2018-10-10 12:15 ` Rafael David Tinoco 2018-10-10 12:33 ` Cyril Hrubis 2018-10-10 12:48 ` Rafael David Tinoco
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.