On 2021/2/5 19:08, Martin Wilck wrote: > On Thu, 2021-02-04 at 16:06 +0100, Martin Wilck wrote: >> On Thu, 2021-02-04 at 09:40 +0800, lixiaokeng wrote: >>> >>> >>> On 2021/2/3 21:57, Martin Wilck wrote: >>>>> If exit() before all pthread_cancel in child of 0.7.7, there is >>>>> no >>>>> any crash. >>>> What do you mean with "exit() before all pthread_cancel"? If this >>>> happens on pthread_cancel(), and you don't call that function, >>>> this >>>> would actually be expected. >>> >>> When running_state is DAEMON_SHUTDOWN, break while then _exit(0). >>> But >>> is is not a great method. >> >> I wonder if it would be possible to figure out the LWP numbers >> (process >> IDs) of the different threads before the crash occurs, and compare >> this >> to the gdb output >> >> (gdb) info threads >>   Id   Target Id         Frame >> * 1    LWP 1997690       0x00007f59a0109647 in ?? () >>   2    LWP 1996840       0x00007f59a0531de7 in ?? () >>   3    LWP 1997692       0x00007f59a0109647 in ?? () >>   4    LWP 1996857       0x00007f59a020d169 in ?? () >> >> ... to identify which thread crashed, and if it's always the same >> one. > >>From the LWP numbers, thread 2 and 4 are probably TUR checkers > (temporary threads). thread 1 can't be easily identified. Could you > provide the stack of thread 3? From that, we might be able to infer > which thread crashed, because multipathd always starts its threads in > the same sequence. > Here is another core stack(attachment is core dumps): [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". Core was generated by `/sbin/multipathd -d -s'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fe17dd97456 in ?? () [Current thread is 1 (Thread 0x7fe17cc00700 (LWP 3093458))] (gdb) bt #0 0x00007fe17dd97456 in ?? () #1 0x0000006800007530 in ?? () #2 0x00007fe17cbffb2e in ?? () #3 0xffffffff00000053 in ?? () #4 0x0000000000002006 in ?? () #5 0x0000000000000000 in ?? () (gdb) info thread Id Target Id Frame * 1 Thread 0x7fe17cc00700 (LWP 3093458) 0x00007fe17dd97456 in ?? () 2 Thread 0x7fe17d421700 (LWP 3092869) 0x00007fe17da8a929 in __GI___poll (fds=fds@entry=0x0, nfds=nfds@entry=0, timeout=timeout@entry=10) at ../sysdeps/unix/sysv/linux/poll.c:29 3 Thread 0x7fe17d4a7a80 (LWP 3092860) 0x00007fe17da96e27 in socket () at ../sysdeps/unix/syscall-template.S:78 4 Thread 0x7fe17cc0e700 (LWP 3093459) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 5 Thread 0x7fe17cbc1700 (LWP 3093460) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 6 Thread 0x7fe17cb4c700 (LWP 3093461) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 7 Thread 0x7fe17cb5e700 (LWP 3093462) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 8 Thread 0x7fe17cb67700 (LWP 3093463) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 9 Thread 0x7fe17cb70700 (LWP 3093464) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 (gdb) thread 2 [Switching to thread 2 (Thread 0x7fe17d421700 (LWP 3092869))] #0 0x00007fe17da8a929 in __GI___poll (fds=fds@entry=0x0, nfds=nfds@entry=0, timeout=timeout@entry=10) at ../sysdeps/unix/sysv/linux/poll.c:29 29 return SYSCALL_CANCEL (poll, fds, nfds, timeout); (gdb) bt #0 0x00007fe17da8a929 in __GI___poll (fds=fds@entry=0x0, nfds=nfds@entry=0, timeout=timeout@entry=10) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007fe17dcdcd91 in poll (__timeout=10, __nfds=0, __fds=0x0) at /usr/include/bits/poll2.h:46 #2 call_rcu_thread (arg=0x557ab760e210) at urcu-call-rcu-impl.h:383 #3 0x00007fe17dcbef4b in start_thread (arg=0x7fe17d421700) at pthread_create.c:486 #4 0x00007fe17da957ef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) thread 3 [Switching to thread 3 (Thread 0x7fe17d4a7a80 (LWP 3092860))] #0 0x00007fe17da96e27 in socket () at ../sysdeps/unix/syscall-template.S:78 78 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007fe17da96e27 in socket () at ../sysdeps/unix/syscall-template.S:78 #1 0x00007fe17db95f34 in sd_pid_notify_with_fds (pid=0, unset_environment=0, state=0x557ab58af801 "ERRNO=0", fds=0x0, n_fds=0) at ../src/libsystemd/sd-daemon/sd-daemon.c:481 #2 0x0000557ab58a6ef7 in child (param=) at main.c:3140 #3 0x0000557ab589f503 in main (argc=, argv=0x7fffe51c3c08) at main.c:3325 (gdb) thread 4 [Switching to thread 4 (Thread 0x7fe17cc0e700 (LWP 3093459))] #0 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 78 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 #1 0x00007fe17dd97456 in ?? () #2 0x0000006900007530 in ?? () #3 0x00007fe17cc0db2e in ?? () #4 0xffffffff00000053 in ?? () #5 0x0000000000002006 in ?? () #6 0x0000000000000000 in ?? () (gdb) thread 5 [Switching to thread 5 (Thread 0x7fe17cbc1700 (LWP 3093460))] #0 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 78 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78 #1 0x00007fe17dd97456 in ?? () #2 0x0000006a00007530 in ?? () #3 0x00007fe17cbc0b2e in ?? () #4 0xffffffff00000053 in ?? () #5 0x0000000000002006 in ?? () #6 0x0000000000000000 in ?? () Regards Lixiaokeng