All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olivier Matz <olivier.matz@6wind.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,dev@dpdk.org
Cc: Anatoly Burakov <anatoly.burakov@intel.com>,
	Jianfeng Tan <jianfeng.tan@intel.com>,
	Thomas Monjalon <thomas@monjalon.net>
Subject: Re: pthread_barrier_deadlock in -rc1 (was: "Re: [PATCH v3 0/5] fix control thread affinities")
Date: Mon, 30 Apr 2018 20:46:58 +0200	[thread overview]
Message-ID: <4256B2F0-EF9D-4B22-AC1A-D440C002360A@6wind.com> (raw)
In-Reply-To: <b5d7aab1-25eb-0d0c-a01c-50356777cf9c@redhat.com>

Hi Maxime,

Le 30 avril 2018 17:45:52 GMT+02:00, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :
>Hi Olivier,
>
>On 04/24/2018 04:46 PM, Olivier Matz wrote:
>> Some parts of dpdk use their own management threads. Most of the
>time,
>> the affinity of the thread is not properly set: it should not be
>scheduled
>> on the dataplane cores, because interrupting them can cause packet
>losses.
>> 
>> This patchset introduces a new wrapper for thread creation that does
>> the job automatically, avoiding code duplication.
>> 
>> v3:
>> * new patch: use this API in examples when relevant.
>> * replace pthread_kill by pthread_cancel. Note that pthread_join()
>>    is still needed.
>> * rebase: vfio and pdump do not have control pthreads anymore, and
>eal
>>    has 2 new pthreads
>> * remove all calls to snprintf/strlcpy that truncate the thread name:
>>    all strings lengths are already < 16.
>> 
>> v2:
>> * set affinity to master core if no core is off, as suggested by
>>    Anatoly
>> 
>> Olivier Matz (5):
>>    eal: use sizeof to avoid a double use of a define
>>    eal: new function to create control threads
>>    eal: set name when creating a control thread
>>    eal: set affinity for control threads
>>    examples: use new API to create control threads
>> 
>>   drivers/net/kni/Makefile                     |  1 +
>>   drivers/net/kni/rte_eth_kni.c                |  3 +-
>>   examples/tep_termination/main.c              | 16 +++----
>>   examples/vhost/main.c                        | 19 +++-----
>>   lib/librte_eal/bsdapp/eal/eal.c              |  4 +-
>>   lib/librte_eal/bsdapp/eal/eal_thread.c       |  2 +-
>>   lib/librte_eal/common/eal_common_proc.c      | 15 ++----
>>   lib/librte_eal/common/eal_common_thread.c    | 72
>++++++++++++++++++++++++++++
>>   lib/librte_eal/common/include/rte_lcore.h    | 26 ++++++++++
>>   lib/librte_eal/linuxapp/eal/eal.c            |  4 +-
>>   lib/librte_eal/linuxapp/eal/eal_interrupts.c | 17 ++-----
>>   lib/librte_eal/linuxapp/eal/eal_thread.c     |  2 +-
>>   lib/librte_eal/linuxapp/eal/eal_timer.c      | 12 +----
>>   lib/librte_eal/rte_eal_version.map           |  1 +
>>   lib/librte_vhost/socket.c                    | 25 ++--------
>>   15 files changed, 135 insertions(+), 84 deletions(-)
>> 
>
>I face a deadlock issue with your series, that Jianfeng patch does not
>resolve ("eal: fix threads block on barrier"). Reverting the series and
>Jianfeng patch makes the issue to disappear.
>
>I face the problem in a VM (not seen on the host):
># ./install/bin/testpmd -l 0,1,2 --socket-mem 1024 -n 4 --proc-type
>auto 
>--file-prefix pg -- --portmask=3 --forward-mode=macswap 
>--port-topology=chained --disable-rss -i --rxq=1 --txq=1 --rxd=256 
>--txd=256 --nb-cores=2 --auto-start
>EAL: Detected 3 lcore(s)
>EAL: Detected 1 NUMA nodes
>EAL: Auto-detected process type: PRIMARY
>EAL: Multi-process socket /var/run/.pg_unix
>
>
>Then it is stuck. Attaching with GDB, I get below backtrace
>information:
>
>(gdb) info threads
>   Id   Target Id         Frame
>   3    Thread 0x7f63e1f9f700 (LWP 8808) "rte_mp_handle" 
>0x00007f63e2591bfd in recvmsg () at
>../sysdeps/unix/syscall-template.S:81
>   2    Thread 0x7f63e179e700 (LWP 8809) "rte_mp_async" 
>pthread_barrier_wait () at 
>../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
>* 1    Thread 0x7f63e32cec00 (LWP 8807) "testpmd" pthread_barrier_wait 
>() at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
>(gdb) bt full
>#0  pthread_barrier_wait () at 
>../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
>No locals.
>#1  0x0000000000520c54 in rte_ctrl_thread_create 
>(thread=thread@entry=0x7ffe5c895020, name=name@entry=0x869d86 
>"rte_mp_async", attr=attr@entry=0x0, 
>start_routine=start_routine@entry=0x521030 <async_reply_handle>, 
>arg=arg@entry=0x0)
>     at /root/src/dpdk/lib/librte_eal/common/eal_common_thread.c:207
>         params = 0x17b1e40
>         lcore_id = <optimized out>
>         cpuset = {__bits = {1, 0 <repeats 15 times>}}
>         cpu_found = <optimized out>
>         ret = 0
>#2  0x00000000005220b6 in rte_mp_channel_init () at 
>/root/src/dpdk/lib/librte_eal/common/eal_common_proc.c:674
>        path = "/var/run\000.pg_unix_*", '\000' <repeats 1301 times>...
>         dir_fd = 4
>         mp_handle_tid = 140066969745152
>         async_reply_handle_tid = 140066961352448
>#3  0x000000000050c227 in rte_eal_init (argc=argc@entry=23, 
>argv=argv@entry=0x7ffe5c896378) at 
>/root/src/dpdk/lib/librte_eal/linuxapp/eal/eal.c:775
>         i = <optimized out>
>         fctret = 11
>         ret = <optimized out>
>         thread_id = 140066989861888
>         run_once = {cnt = 1}
>         logid = 0x17b1e00 "testpmd"
>         cpuset = "T}\211\\\376\177", '\000' <repeats 117 times>, 
>"\020", '\000' <repeats 116 times>...
>      thread_name = "X}\211\\\376\177\000\000\226\301\036\342c\177\000"
>         __func__ = "rte_eal_init"
>#4  0x0000000000473214 in main (argc=23, argv=0x7ffe5c896378) at 
>/root/src/dpdk/app/test-pmd/testpmd.c:2597
>         diag = <optimized out>
>         port_id = <optimized out>
>         ret = <optimized out>
>         __func__ = "main"
>(gdb) thread 2
>[Switching to thread 2 (Thread 0x7f63e179e700 (LWP 8809))]
>#0  pthread_barrier_wait () at 
>../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
>71		cmpl	%edx, (%rdi)
>(gdb) bt full
>#0  pthread_barrier_wait () at 
>../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_barrier_wait.S:71
>No locals.
>#1  0x0000000000520777 in rte_thread_init (arg=<optimized out>) at 
>/root/src/dpdk/lib/librte_eal/common/eal_common_thread.c:156
>         params = <optimized out>
>         start_routine = 0x521030 <async_reply_handle>
>         routine_arg = 0x0
>#2  0x00007f63e258add5 in start_thread (arg=0x7f63e179e700) at 
>pthread_create.c:308
>         __res = <optimized out>
>         pd = 0x7f63e179e700
>         now = <optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140066961352448, 
>1212869169857371576, 0, 8392704, 0, 140066961352448, 
>-1291626103561052744, -1291619793368703560}, mask_was_saved = 0}}, priv
>
>= {pad = {0x0, 0x0, 0x0, 0x0}, data = {
>               prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>         not_first_call = <optimized out>
>         pagesize_m1 = <optimized out>
>         sp = <optimized out>
>         freesize = <optimized out>
>#3  0x00007f63e22b4b3d in clone () at 
>../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>No locals.
>(gdb) thread 3
>[Switching to thread 3 (Thread 0x7f63e1f9f700 (LWP 8808))]
>#0  0x00007f63e2591bfd in recvmsg () at 
>../sysdeps/unix/syscall-template.S:81
>81	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
>(gdb) bt full
>#0  0x00007f63e2591bfd in recvmsg () at 
>../sysdeps/unix/syscall-template.S:81
>No locals.
>#1  0x000000000052194e in read_msg (s=0x7f63e1f9d3b0, m=0x7f63e1f9d5a0)
>
>at /root/src/dpdk/lib/librte_eal/common/eal_common_proc.c:258
>         msglen = <optimized out>
>         control = 
>"\000\000\000\000\000\000\000\000\336~\f\343c\177\000\000\005", '\000' 
><repeats 23 times>, "\360\371\033\342c\177\000"
>         cmsg = <optimized out>
>         iov = {iov_base = 0x7f63e1f9d5a0, iov_len = 332}
>       msgh = {msg_name = 0x7f63e1f9d3b0, msg_namelen = 110, msg_iov = 
>0x7f63e1f9d370, msg_iovlen = 1, msg_control = 0x7f63e1f9d380, 
>msg_controllen = 48, msg_flags = 0}
>#2  mp_handle (arg=<optimized out>) at 
>/root/src/dpdk/lib/librte_eal/common/eal_common_proc.c:346
>         msg = {type = 0, msg = {name = '\000' <repeats 63 times>, 
>len_param = 0, num_fds = 0, param = '\000' <repeats 20 times>, "\002", 
>'\000' <repeats 234 times>, fds = {0, 0, 0, 0, 0, 0, 0, 0}}}
>         sa = {sun_family = 55104,
>           sun_path = 
>"\371\341c\177\000\000\352\372\f\343c\177\000\000\000\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377\000\367\371\341c\177\000\000\030\000\000\000\000\000\000\000p\327\371\341c\177\000\000\000\367\371\341c\177\000\000\000\367\371\341c\177",
>
>'\000' <repeats 34 times>, "\200\037\000\000\377\377"}
>#3  0x00007f63e258add5 in start_thread (arg=0x7f63e1f9f700) at 
>pthread_create.c:308
>         __res = <optimized out>
>         pd = 0x7f63e1f9f700
>         now = <optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140066969745152, 
>1212869169857371576, 0, 8392704, 0, 140066969745152, 
>-1291625004586295880, -1291619793368703560}, mask_was_saved = 0}}, priv
>
>= {pad = {0x0, 0x0, 0x0, 0x0}, data = {
>               prev = 0x0, cleanup = 0x0, canceltype = 0}}}
>         not_first_call = <optimized out>
>         pagesize_m1 = <optimized out>
>         sp = <optimized out>
>         freesize = <optimized out>
>#4  0x00007f63e22b4b3d in clone () at 
>../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>No locals.
>
>I don't have more info for now.
>


Thanks for the feedback on this issue. I don't see obvious reason for this deadlock yet.

I'll investigate it asap (not tomorrow, but wednesday). In the worst case, we can revert the series if I cannot find the root cause rapidly.

Olivier

  reply	other threads:[~2018-04-30 18:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-03 13:04 [PATCH v2 0/4] fix control thread affinities Olivier Matz
2018-04-03 13:04 ` [PATCH v2 1/4] eal: use sizeof to avoid a double use of a define Olivier Matz
2018-04-10 16:18   ` Burakov, Anatoly
2018-04-03 13:04 ` [PATCH v2 2/4] eal: new function to create control threads Olivier Matz
2018-04-10 16:18   ` Burakov, Anatoly
2018-04-03 13:04 ` [PATCH v2 3/4] eal: set name when creating a control thread Olivier Matz
2018-04-10 16:34   ` Burakov, Anatoly
2018-04-23 12:49     ` Olivier Matz
2018-04-17 22:32   ` Thomas Monjalon
2018-04-23 12:52     ` Olivier Matz
2018-04-03 13:04 ` [PATCH v2 4/4] eal: set affinity for control threads Olivier Matz
2018-04-10 16:18   ` Burakov, Anatoly
2018-04-03 13:13 ` [PATCH v2 0/4] fix control thread affinities Olivier Matz
2018-04-10 16:20 ` Burakov, Anatoly
2018-04-24 14:46 ` [PATCH v3 0/5] " Olivier Matz
2018-04-24 14:46   ` [PATCH v3 1/5] eal: use sizeof to avoid a double use of a define Olivier Matz
2018-04-24 14:46   ` [PATCH v3 2/5] eal: new function to create control threads Olivier Matz
2018-04-24 14:46   ` [PATCH v3 3/5] eal: set name when creating a control thread Olivier Matz
2018-04-24 16:08     ` Burakov, Anatoly
2018-04-27 15:46     ` Tan, Jianfeng
2018-04-27 16:17       ` Tan, Jianfeng
2018-04-27 16:46         ` Burakov, Anatoly
2018-04-24 14:46   ` [PATCH v3 4/5] eal: set affinity for control threads Olivier Matz
2018-04-24 14:46   ` [PATCH v3 5/5] examples: use new API to create " Olivier Matz
2018-04-24 22:53   ` [PATCH v3 0/5] fix control thread affinities Thomas Monjalon
2018-04-30 15:45   ` pthread_barrier_deadlock in -rc1 (was: "Re: [PATCH v3 0/5] fix control thread affinities") Maxime Coquelin
2018-04-30 18:46     ` Olivier Matz [this message]
2018-05-01  8:59       ` Thomas Monjalon
2018-05-02  8:19       ` pthread_barrier_deadlock in -rc1 Tan, Jianfeng
2018-05-02  8:34         ` Maxime Coquelin
2018-05-02  8:50           ` Tan, Jianfeng
2018-05-02  9:05             ` Maxime Coquelin
2018-05-02  9:20               ` Olivier Matz
2018-05-02  9:32                 ` Tan, Jianfeng
2018-05-02  9:41                   ` Maxime Coquelin
2018-05-02  9:30             ` Burakov, Anatoly
2018-05-02  9:38               ` Tan, Jianfeng
2018-05-02  9:57               ` Olivier Matz
2018-05-02 10:01                 ` Tan, Jianfeng
2018-05-02 10:08                   ` Olivier Matz
2018-05-02 10:16                     ` Tan, Jianfeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4256B2F0-EF9D-4B22-AC1A-D440C002360A@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=jianfeng.tan@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.