From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [PATCH] eal: fix threads block on barrier Date: Fri, 27 Apr 2018 10:03:37 -0700 Message-ID: <20180427100337.3fca7ca7@xeon-e3> References: <1524847302-88110-1-git-send-email-jianfeng.tan@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org, thomas@monjalon.net, Olivier Matz , Anatoly Burakov To: Jianfeng Tan Return-path: Received: from mail-pg0-f68.google.com (mail-pg0-f68.google.com [74.125.83.68]) by dpdk.org (Postfix) with ESMTP id 4CFF4AAED for ; Fri, 27 Apr 2018 19:03:40 +0200 (CEST) Received: by mail-pg0-f68.google.com with SMTP id b9-v6so2008721pgf.6 for ; Fri, 27 Apr 2018 10:03:40 -0700 (PDT) In-Reply-To: <1524847302-88110-1-git-send-email-jianfeng.tan@intel.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, 27 Apr 2018 16:41:42 +0000 Jianfeng Tan wrote: > Below commit introduced pthread barrier for synchronization. > But two IPC threads block on the barrier, and never wake up. > > (gdb) bt > #0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4) > at ../sysdeps/unix/sysv/linux/futex-internal.h:61 > #1 futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4) > at ../sysdeps/nptl/futex-internal.h:135 > #2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184 > #3 rte_thread_init (arg=0x7fffffffcfe0) > at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160 > #4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333 > #5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > > Through analysis, we find the barrier defined on the stack could be the > root cause. This patch will change to use heap memory as the barrier. > > Fixes: d651ee4919cd ("eal: set affinity for control threads") > > Cc: Olivier Matz > Cc: Anatoly Burakov > > Signed-off-by: Jianfeng Tan > --- > lib/librte_eal/common/eal_common_thread.c | 20 +++++++++++++------- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c > index 4e75cb8..da2b84f 100644 > --- a/lib/librte_eal/common/eal_common_thread.c > +++ b/lib/librte_eal/common/eal_common_thread.c > @@ -166,17 +166,21 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name, > const pthread_attr_t *attr, > void *(*start_routine)(void *), void *arg) > { > - struct rte_thread_ctrl_params params = { > - .start_routine = start_routine, > - .arg = arg, > - }; > + struct rte_thread_ctrl_params *params; > unsigned int lcore_id; > rte_cpuset_t cpuset; > int cpu_found, ret; > > - pthread_barrier_init(¶ms.configured, NULL, 2); > + params = malloc(sizeof(*params)); > + if (!params) > + return -1; > + > + params->start_routine = start_routine; > + params->arg = arg; > > - ret = pthread_create(thread, attr, rte_thread_init, (void *)¶ms); > + pthread_barrier_init(¶ms->configured, NULL, 2); > + > + ret = pthread_create(thread, attr, rte_thread_init, (void *)params); > if (ret != 0) > return ret; > > @@ -203,12 +207,14 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name, > if (ret < 0) > goto fail; > > - pthread_barrier_wait(¶ms.configured); > + pthread_barrier_wait(¶ms->configured); > + free(params); > > return 0; > > fail: > pthread_cancel(*thread); > pthread_join(*thread, NULL); > + free(params); > return ret; > } This looks like a library bug. If there is a race on the configured barrier, then putting on heap is just moving problem. It still has bug where other thread is referring to freed memory.