All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] eal: fix threads block on barrier
@ 2018-04-27 16:41 Jianfeng Tan
  2018-04-27 16:42 ` Thomas Monjalon
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jianfeng Tan @ 2018-04-27 16:41 UTC (permalink / raw)
  To: dev; +Cc: thomas, Jianfeng Tan, Olivier Matz, Anatoly Burakov

Below commit introduced pthread barrier for synchronization.
But two IPC threads block on the barrier, and never wake up.

  (gdb) bt
  #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
      at ../sysdeps/unix/sysv/linux/futex-internal.h:61
  #1  futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4)
      at ../sysdeps/nptl/futex-internal.h:135
  #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184
  #3  rte_thread_init (arg=0x7fffffffcfe0)
      at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
  #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
  #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Through analysis, we find the barrier defined on the stack could be the
root cause. This patch will change to use heap memory as the barrier.

Fixes: d651ee4919cd ("eal: set affinity for control threads")

Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 lib/librte_eal/common/eal_common_thread.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index 4e75cb8..da2b84f 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -166,17 +166,21 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
 		const pthread_attr_t *attr,
 		void *(*start_routine)(void *), void *arg)
 {
-	struct rte_thread_ctrl_params params = {
-		.start_routine = start_routine,
-		.arg = arg,
-	};
+	struct rte_thread_ctrl_params *params;
 	unsigned int lcore_id;
 	rte_cpuset_t cpuset;
 	int cpu_found, ret;
 
-	pthread_barrier_init(&params.configured, NULL, 2);
+	params = malloc(sizeof(*params));
+	if (!params)
+		return -1;
+
+	params->start_routine = start_routine;
+	params->arg = arg;
 
-	ret = pthread_create(thread, attr, rte_thread_init, (void *)&params);
+	pthread_barrier_init(&params->configured, NULL, 2);
+
+	ret = pthread_create(thread, attr, rte_thread_init, (void *)params);
 	if (ret != 0)
 		return ret;
 
@@ -203,12 +207,14 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
 	if (ret < 0)
 		goto fail;
 
-	pthread_barrier_wait(&params.configured);
+	pthread_barrier_wait(&params->configured);
+	free(params);
 
 	return 0;
 
 fail:
 	pthread_cancel(*thread);
 	pthread_join(*thread, NULL);
+	free(params);
 	return ret;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-27 16:41 [PATCH] eal: fix threads block on barrier Jianfeng Tan
@ 2018-04-27 16:42 ` Thomas Monjalon
  2018-04-27 17:03 ` Stephen Hemminger
  2018-04-27 17:36 ` Shreyansh Jain
  2 siblings, 0 replies; 11+ messages in thread
From: Thomas Monjalon @ 2018-04-27 16:42 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, Olivier Matz, Anatoly Burakov, shreyansh.jain

27/04/2018 18:41, Jianfeng Tan:
> Below commit introduced pthread barrier for synchronization.
> But two IPC threads block on the barrier, and never wake up.
> 
>   (gdb) bt
>   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
>       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
>   #1  futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4)
>       at ../sysdeps/nptl/futex-internal.h:135
>   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184
>   #3  rte_thread_init (arg=0x7fffffffcfe0)
>       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
>   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
>   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Through analysis, we find the barrier defined on the stack could be the
> root cause. This patch will change to use heap memory as the barrier.
> 
> Fixes: d651ee4919cd ("eal: set affinity for control threads")

Shreyansh (Cc'ed) is seeing some bugs with this patch too.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-27 16:41 [PATCH] eal: fix threads block on barrier Jianfeng Tan
  2018-04-27 16:42 ` Thomas Monjalon
@ 2018-04-27 17:03 ` Stephen Hemminger
  2018-04-27 17:36 ` Shreyansh Jain
  2 siblings, 0 replies; 11+ messages in thread
From: Stephen Hemminger @ 2018-04-27 17:03 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, thomas, Olivier Matz, Anatoly Burakov

On Fri, 27 Apr 2018 16:41:42 +0000
Jianfeng Tan <jianfeng.tan@intel.com> wrote:

> Below commit introduced pthread barrier for synchronization.
> But two IPC threads block on the barrier, and never wake up.
> 
>   (gdb) bt
>   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
>       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
>   #1  futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4)
>       at ../sysdeps/nptl/futex-internal.h:135
>   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184
>   #3  rte_thread_init (arg=0x7fffffffcfe0)
>       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
>   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
>   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Through analysis, we find the barrier defined on the stack could be the
> root cause. This patch will change to use heap memory as the barrier.
> 
> Fixes: d651ee4919cd ("eal: set affinity for control threads")
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>  lib/librte_eal/common/eal_common_thread.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
> index 4e75cb8..da2b84f 100644
> --- a/lib/librte_eal/common/eal_common_thread.c
> +++ b/lib/librte_eal/common/eal_common_thread.c
> @@ -166,17 +166,21 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
>  		const pthread_attr_t *attr,
>  		void *(*start_routine)(void *), void *arg)
>  {
> -	struct rte_thread_ctrl_params params = {
> -		.start_routine = start_routine,
> -		.arg = arg,
> -	};
> +	struct rte_thread_ctrl_params *params;
>  	unsigned int lcore_id;
>  	rte_cpuset_t cpuset;
>  	int cpu_found, ret;
>  
> -	pthread_barrier_init(&params.configured, NULL, 2);
> +	params = malloc(sizeof(*params));
> +	if (!params)
> +		return -1;
> +
> +	params->start_routine = start_routine;
> +	params->arg = arg;
>  
> -	ret = pthread_create(thread, attr, rte_thread_init, (void *)&params);
> +	pthread_barrier_init(&params->configured, NULL, 2);
> +
> +	ret = pthread_create(thread, attr, rte_thread_init, (void *)params);
>  	if (ret != 0)
>  		return ret;
>  
> @@ -203,12 +207,14 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
>  	if (ret < 0)
>  		goto fail;
>  
> -	pthread_barrier_wait(&params.configured);
> +	pthread_barrier_wait(&params->configured);
> +	free(params);
>  
>  	return 0;
>  
>  fail:
>  	pthread_cancel(*thread);
>  	pthread_join(*thread, NULL);
> +	free(params);
>  	return ret;
>  }

This looks like a library bug. If there is a race on the configured barrier, then
putting on heap is just moving problem. It still has bug where other thread is referring to freed memory.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-27 16:41 [PATCH] eal: fix threads block on barrier Jianfeng Tan
  2018-04-27 16:42 ` Thomas Monjalon
  2018-04-27 17:03 ` Stephen Hemminger
@ 2018-04-27 17:36 ` Shreyansh Jain
  2018-04-27 17:39   ` Stephen Hemminger
  2 siblings, 1 reply; 11+ messages in thread
From: Shreyansh Jain @ 2018-04-27 17:36 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: thomas, Olivier Matz, Anatoly Burakov

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianfeng Tan
> Sent: Friday, April 27, 2018 10:12 PM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; Jianfeng Tan <jianfeng.tan@intel.com>; Olivier
> Matz <olivier.matz@6wind.com>; Anatoly Burakov
> <anatoly.burakov@intel.com>
> Subject: [dpdk-dev] [PATCH] eal: fix threads block on barrier
> 
> Below commit introduced pthread barrier for synchronization.
> But two IPC threads block on the barrier, and never wake up.
> 
>   (gdb) bt
>   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
>       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
>   #1  futex_wait_simple (private=0, expected=0,
> futex_word=0x7fffffffcff4)
>       at ../sysdeps/nptl/futex-internal.h:135
>   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
> pthread_barrier_wait.c:184
>   #3  rte_thread_init (arg=0x7fffffffcfe0)
>       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
>   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
>   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Through analysis, we find the barrier defined on the stack could be the
> root cause. This patch will change to use heap memory as the barrier.
> 
> Fixes: d651ee4919cd ("eal: set affinity for control threads")
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Though I have seen Stephen's comment on this (possibly a library bug), this at least fixes an issue which was dogging dpaa and dpaa2 - generating bus errors and futex errors with variation in core masks provided to applications.

Thanks a lot for this.

Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-27 17:36 ` Shreyansh Jain
@ 2018-04-27 17:39   ` Stephen Hemminger
  2018-04-27 17:45     ` Shreyansh Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2018-04-27 17:39 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: Jianfeng Tan, dev, thomas, Olivier Matz, Anatoly Burakov

On Fri, 27 Apr 2018 17:36:56 +0000
Shreyansh Jain <shreyansh.jain@nxp.com> wrote:

> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianfeng Tan
> > Sent: Friday, April 27, 2018 10:12 PM
> > To: dev@dpdk.org
> > Cc: thomas@monjalon.net; Jianfeng Tan <jianfeng.tan@intel.com>; Olivier
> > Matz <olivier.matz@6wind.com>; Anatoly Burakov
> > <anatoly.burakov@intel.com>
> > Subject: [dpdk-dev] [PATCH] eal: fix threads block on barrier
> > 
> > Below commit introduced pthread barrier for synchronization.
> > But two IPC threads block on the barrier, and never wake up.
> > 
> >   (gdb) bt
> >   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
> >       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
> >   #1  futex_wait_simple (private=0, expected=0,
> > futex_word=0x7fffffffcff4)
> >       at ../sysdeps/nptl/futex-internal.h:135
> >   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
> > pthread_barrier_wait.c:184
> >   #3  rte_thread_init (arg=0x7fffffffcfe0)
> >       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
> >   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
> >   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > 
> > Through analysis, we find the barrier defined on the stack could be the
> > root cause. This patch will change to use heap memory as the barrier.
> > 
> > Fixes: d651ee4919cd ("eal: set affinity for control threads")
> > 
> > Cc: Olivier Matz <olivier.matz@6wind.com>
> > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > 
> > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>  
> 
> Though I have seen Stephen's comment on this (possibly a library bug), this at least fixes an issue which was dogging dpaa and dpaa2 - generating bus errors and futex errors with variation in core masks provided to applications.
> 
> Thanks a lot for this.
> 
> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

Could you verify there is not a use after free by using valgrind or some library that poisons memory on free.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-27 17:39   ` Stephen Hemminger
@ 2018-04-27 17:45     ` Shreyansh Jain
  2018-04-27 19:52       ` Thomas Monjalon
  0 siblings, 1 reply; 11+ messages in thread
From: Shreyansh Jain @ 2018-04-27 17:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jianfeng Tan, dev, thomas, Olivier Matz, Anatoly Burakov

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, April 27, 2018 11:10 PM
> To: Shreyansh Jain <shreyansh.jain@nxp.com>
> Cc: Jianfeng Tan <jianfeng.tan@intel.com>; dev@dpdk.org;
> thomas@monjalon.net; Olivier Matz <olivier.matz@6wind.com>; Anatoly
> Burakov <anatoly.burakov@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] eal: fix threads block on barrier
> 
> On Fri, 27 Apr 2018 17:36:56 +0000
> Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
> 
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianfeng Tan
> > > Sent: Friday, April 27, 2018 10:12 PM
> > > To: dev@dpdk.org
> > > Cc: thomas@monjalon.net; Jianfeng Tan <jianfeng.tan@intel.com>;
> Olivier
> > > Matz <olivier.matz@6wind.com>; Anatoly Burakov
> > > <anatoly.burakov@intel.com>
> > > Subject: [dpdk-dev] [PATCH] eal: fix threads block on barrier
> > >
> > > Below commit introduced pthread barrier for synchronization.
> > > But two IPC threads block on the barrier, and never wake up.
> > >
> > >   (gdb) bt
> > >   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
> > >       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
> > >   #1  futex_wait_simple (private=0, expected=0,
> > > futex_word=0x7fffffffcff4)
> > >       at ../sysdeps/nptl/futex-internal.h:135
> > >   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
> > > pthread_barrier_wait.c:184
> > >   #3  rte_thread_init (arg=0x7fffffffcfe0)
> > >       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
> > >   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
> > >   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > >
> > > Through analysis, we find the barrier defined on the stack could be
> the
> > > root cause. This patch will change to use heap memory as the
> barrier.
> > >
> > > Fixes: d651ee4919cd ("eal: set affinity for control threads")
> > >
> > > Cc: Olivier Matz <olivier.matz@6wind.com>
> > > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > >
> > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> >
> > Though I have seen Stephen's comment on this (possibly a library
> bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
> generating bus errors and futex errors with variation in core masks
> provided to applications.
> >
> > Thanks a lot for this.
> >
> > Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> 
> Could you verify there is not a use after free by using valgrind or
> some library that poisons memory on free.

I will probably do that soon - but for the time being I don't want this issue to block the dpaa/dpaa2 for RC1 - these drivers were completely unusable without this patch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-27 17:45     ` Shreyansh Jain
@ 2018-04-27 19:52       ` Thomas Monjalon
  2018-04-28  1:21         ` Stephen Hemminger
  2018-04-28  1:24         ` Stephen Hemminger
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Monjalon @ 2018-04-27 19:52 UTC (permalink / raw)
  To: Shreyansh Jain, Jianfeng Tan
  Cc: dev, Stephen Hemminger, Olivier Matz, Anatoly Burakov

27/04/2018 19:45, Shreyansh Jain:
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
> > > From: Jianfeng Tan
> > > > Below commit introduced pthread barrier for synchronization.
> > > > But two IPC threads block on the barrier, and never wake up.
> > > >
> > > >   (gdb) bt
> > > >   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
> > > >       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
> > > >   #1  futex_wait_simple (private=0, expected=0,
> > > > futex_word=0x7fffffffcff4)
> > > >       at ../sysdeps/nptl/futex-internal.h:135
> > > >   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
> > > > pthread_barrier_wait.c:184
> > > >   #3  rte_thread_init (arg=0x7fffffffcfe0)
> > > >       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
> > > >   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
> > > >   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > >
> > > > Through analysis, we find the barrier defined on the stack
> > > > could be the root cause. This patch will change to use heap
> > > > memory as the barrier.
> > > >
> > > > Fixes: d651ee4919cd ("eal: set affinity for control threads")
> > > >
> > > > Cc: Olivier Matz <olivier.matz@6wind.com>
> > > > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > > >
> > > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> > >
> > > Though I have seen Stephen's comment on this (possibly a library
> > bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
> > generating bus errors and futex errors with variation in core masks
> > provided to applications.
> > >
> > > Thanks a lot for this.
> > >
> > > Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

Applied, thanks Jianfeng.

> > Could you verify there is not a use after free by using valgrind or
> > some library that poisons memory on free.
> 
> I will probably do that soon - but for the time being I don't want
> this issue to block the dpaa/dpaa2 for RC1 - these drivers were
> completely unusable without this patch.

Please Shreyansh, continue the analysis of this bug.
Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-27 19:52       ` Thomas Monjalon
@ 2018-04-28  1:21         ` Stephen Hemminger
  2018-04-28  4:15           ` Tan, Jianfeng
  2018-04-28  1:24         ` Stephen Hemminger
  1 sibling, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2018-04-28  1:21 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Shreyansh Jain, Jianfeng Tan, dev, Olivier Matz, Anatoly Burakov

On Fri, 27 Apr 2018 21:52:26 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 27/04/2018 19:45, Shreyansh Jain:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]  
> > > Shreyansh Jain <shreyansh.jain@nxp.com> wrote:  
> > > > From: Jianfeng Tan  
> > > > > Below commit introduced pthread barrier for synchronization.
> > > > > But two IPC threads block on the barrier, and never wake up.
> > > > >
> > > > >   (gdb) bt
> > > > >   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
> > > > >       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
> > > > >   #1  futex_wait_simple (private=0, expected=0,
> > > > > futex_word=0x7fffffffcff4)
> > > > >       at ../sysdeps/nptl/futex-internal.h:135
> > > > >   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
> > > > > pthread_barrier_wait.c:184
> > > > >   #3  rte_thread_init (arg=0x7fffffffcfe0)
> > > > >       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
> > > > >   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
> > > > >   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > >
> > > > > Through analysis, we find the barrier defined on the stack
> > > > > could be the root cause. This patch will change to use heap
> > > > > memory as the barrier.
> > > > >
> > > > > Fixes: d651ee4919cd ("eal: set affinity for control threads")
> > > > >
> > > > > Cc: Olivier Matz <olivier.matz@6wind.com>
> > > > > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > > > >
> > > > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>  
> > > >
> > > > Though I have seen Stephen's comment on this (possibly a library  
> > > bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
> > > generating bus errors and futex errors with variation in core masks
> > > provided to applications.  
> > > >
> > > > Thanks a lot for this.
> > > >
> > > > Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>  
> 
> Applied, thanks Jianfeng.
> 
> > > Could you verify there is not a use after free by using valgrind or
> > > some library that poisons memory on free.  
> > 
> > I will probably do that soon - but for the time being I don't want
> > this issue to block the dpaa/dpaa2 for RC1 - these drivers were
> > completely unusable without this patch.  
> 
> Please Shreyansh, continue the analysis of this bug.
> Thanks
> 
> 

I think the patch needs to change.
The attributes need be either global (or leak and never free).

The glibc source for init keeps the pointer to the attributes.


static const struct pthread_barrierattr default_barrierattr =
  {
    .pshared = PTHREAD_PROCESS_PRIVATE
  };


int
__pthread_barrier_init (pthread_barrier_t *barrier,
			const pthread_barrierattr_t *attr, unsigned int count)
{
  struct pthread_barrier *ibarrier;

  /* XXX EINVAL is not specified by POSIX as a possible error code for COUNT
     being too large.  See pthread_barrier_wait for the reason for the
     comparison with BARRIER_IN_THRESHOLD.  */
  if (__glibc_unlikely (count == 0 || count >= BARRIER_IN_THRESHOLD))
    return EINVAL;

  const struct pthread_barrierattr *iattr
    = (attr != NULL
       ? (struct pthread_barrierattr *) attr
       : &default_barrierattr);

  ibarrier = (struct pthread_barrier *) barrier;

  /* Initialize the individual fields.  */
  ibarrier->in = 0;
  ibarrier->out = 0;
  ibarrier->count = count;
  ibarrier->current_round = 0;
  ibarrier->shared = (iattr->pshared == PTHREAD_PROCESS_PRIVATE
		      ? FUTEX_PRIVATE : FUTEX_SHARED);

  return 0;
}
weak_alias (__pthread_barrier_init, pthread_barrier_init)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-27 19:52       ` Thomas Monjalon
  2018-04-28  1:21         ` Stephen Hemminger
@ 2018-04-28  1:24         ` Stephen Hemminger
  2018-04-28  4:22           ` Tan, Jianfeng
  1 sibling, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2018-04-28  1:24 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Shreyansh Jain, Jianfeng Tan, dev, Olivier Matz, Anatoly Burakov

On Fri, 27 Apr 2018 21:52:26 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 27/04/2018 19:45, Shreyansh Jain:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]  
> > > Shreyansh Jain <shreyansh.jain@nxp.com> wrote:  
> > > > From: Jianfeng Tan  
> > > > > Below commit introduced pthread barrier for synchronization.
> > > > > But two IPC threads block on the barrier, and never wake up.
> > > > >
> > > > >   (gdb) bt
> > > > >   #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
> > > > >       at ../sysdeps/unix/sysv/linux/futex-internal.h:61
> > > > >   #1  futex_wait_simple (private=0, expected=0,
> > > > > futex_word=0x7fffffffcff4)
> > > > >       at ../sysdeps/nptl/futex-internal.h:135
> > > > >   #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
> > > > > pthread_barrier_wait.c:184
> > > > >   #3  rte_thread_init (arg=0x7fffffffcfe0)
> > > > >       at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
> > > > >   #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
> > > > >   #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > > > >
> > > > > Through analysis, we find the barrier defined on the stack
> > > > > could be the root cause. This patch will change to use heap
> > > > > memory as the barrier.
> > > > >
> > > > > Fixes: d651ee4919cd ("eal: set affinity for control threads")
> > > > >
> > > > > Cc: Olivier Matz <olivier.matz@6wind.com>
> > > > > Cc: Anatoly Burakov <anatoly.burakov@intel.com>
> > > > >
> > > > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>  
> > > >
> > > > Though I have seen Stephen's comment on this (possibly a library  
> > > bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
> > > generating bus errors and futex errors with variation in core masks
> > > provided to applications.  
> > > >
> > > > Thanks a lot for this.
> > > >
> > > > Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>  
> 
> Applied, thanks Jianfeng.
> 
> > > Could you verify there is not a use after free by using valgrind or
> > > some library that poisons memory on free.  
> > 
> > I will probably do that soon - but for the time being I don't want
> > this issue to block the dpaa/dpaa2 for RC1 - these drivers were
> > completely unusable without this patch.  
> 
> Please Shreyansh, continue the analysis of this bug.
> Thanks
> 
> 

The pthread_barrier should also be destroyed when it is no longer needed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-28  1:21         ` Stephen Hemminger
@ 2018-04-28  4:15           ` Tan, Jianfeng
  0 siblings, 0 replies; 11+ messages in thread
From: Tan, Jianfeng @ 2018-04-28  4:15 UTC (permalink / raw)
  To: Stephen Hemminger, Thomas Monjalon
  Cc: Shreyansh Jain, dev, Olivier Matz, Anatoly Burakov



On 4/28/2018 9:21 AM, Stephen Hemminger wrote:
> On Fri, 27 Apr 2018 21:52:26 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
>
>> 27/04/2018 19:45, Shreyansh Jain:
>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>>> Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
>>>>> From: Jianfeng Tan
>>>>>> Below commit introduced pthread barrier for synchronization.
>>>>>> But two IPC threads block on the barrier, and never wake up.
>>>>>>
>>>>>>    (gdb) bt
>>>>>>    #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
>>>>>>        at ../sysdeps/unix/sysv/linux/futex-internal.h:61
>>>>>>    #1  futex_wait_simple (private=0, expected=0,
>>>>>> futex_word=0x7fffffffcff4)
>>>>>>        at ../sysdeps/nptl/futex-internal.h:135
>>>>>>    #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
>>>>>> pthread_barrier_wait.c:184
>>>>>>    #3  rte_thread_init (arg=0x7fffffffcfe0)
>>>>>>        at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
>>>>>>    #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
>>>>>>    #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>>>
>>>>>> Through analysis, we find the barrier defined on the stack
>>>>>> could be the root cause. This patch will change to use heap
>>>>>> memory as the barrier.
>>>>>>
>>>>>> Fixes: d651ee4919cd ("eal: set affinity for control threads")
>>>>>>
>>>>>> Cc: Olivier Matz <olivier.matz@6wind.com>
>>>>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>>
>>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>>>> Though I have seen Stephen's comment on this (possibly a library
>>>> bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
>>>> generating bus errors and futex errors with variation in core masks
>>>> provided to applications.
>>>>> Thanks a lot for this.
>>>>>
>>>>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>> Applied, thanks Jianfeng.
>>
>>>> Could you verify there is not a use after free by using valgrind or
>>>> some library that poisons memory on free.
>>> I will probably do that soon - but for the time being I don't want
>>> this issue to block the dpaa/dpaa2 for RC1 - these drivers were
>>> completely unusable without this patch.
>> Please Shreyansh, continue the analysis of this bug.
>> Thanks
>>
>>
> I think the patch needs to change.
> The attributes need be either global (or leak and never free).
>
> The glibc source for init keeps the pointer to the attributes.

Did not follow why we need to add attr here. Besides, init only uses 
attr to decide futex type (private or shared); seems that it does not 
keep  the pointer.

So I cannot understand why we need to add a non-null attr parameter.

Thanks,
Jianfeng

>
>
> static const struct pthread_barrierattr default_barrierattr =
>    {
>      .pshared = PTHREAD_PROCESS_PRIVATE
>    };
>
>
> int
> __pthread_barrier_init (pthread_barrier_t *barrier,
> 			const pthread_barrierattr_t *attr, unsigned int count)
> {
>    struct pthread_barrier *ibarrier;
>
>    /* XXX EINVAL is not specified by POSIX as a possible error code for COUNT
>       being too large.  See pthread_barrier_wait for the reason for the
>       comparison with BARRIER_IN_THRESHOLD.  */
>    if (__glibc_unlikely (count == 0 || count >= BARRIER_IN_THRESHOLD))
>      return EINVAL;
>
>    const struct pthread_barrierattr *iattr
>      = (attr != NULL
>         ? (struct pthread_barrierattr *) attr
>         : &default_barrierattr);
>
>    ibarrier = (struct pthread_barrier *) barrier;
>
>    /* Initialize the individual fields.  */
>    ibarrier->in = 0;
>    ibarrier->out = 0;
>    ibarrier->count = count;
>    ibarrier->current_round = 0;
>    ibarrier->shared = (iattr->pshared == PTHREAD_PROCESS_PRIVATE
> 		      ? FUTEX_PRIVATE : FUTEX_SHARED);
>
>    return 0;
> }
> weak_alias (__pthread_barrier_init, pthread_barrier_init)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] eal: fix threads block on barrier
  2018-04-28  1:24         ` Stephen Hemminger
@ 2018-04-28  4:22           ` Tan, Jianfeng
  0 siblings, 0 replies; 11+ messages in thread
From: Tan, Jianfeng @ 2018-04-28  4:22 UTC (permalink / raw)
  To: Stephen Hemminger, Thomas Monjalon
  Cc: Shreyansh Jain, dev, Olivier Matz, Anatoly Burakov



On 4/28/2018 9:24 AM, Stephen Hemminger wrote:
> On Fri, 27 Apr 2018 21:52:26 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
>
>> 27/04/2018 19:45, Shreyansh Jain:
>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>>> Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
>>>>> From: Jianfeng Tan
>>>>>> Below commit introduced pthread barrier for synchronization.
>>>>>> But two IPC threads block on the barrier, and never wake up.
>>>>>>
>>>>>>    (gdb) bt
>>>>>>    #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
>>>>>>        at ../sysdeps/unix/sysv/linux/futex-internal.h:61
>>>>>>    #1  futex_wait_simple (private=0, expected=0,
>>>>>> futex_word=0x7fffffffcff4)
>>>>>>        at ../sysdeps/nptl/futex-internal.h:135
>>>>>>    #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
>>>>>> pthread_barrier_wait.c:184
>>>>>>    #3  rte_thread_init (arg=0x7fffffffcfe0)
>>>>>>        at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
>>>>>>    #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
>>>>>>    #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>>>
>>>>>> Through analysis, we find the barrier defined on the stack
>>>>>> could be the root cause. This patch will change to use heap
>>>>>> memory as the barrier.
>>>>>>
>>>>>> Fixes: d651ee4919cd ("eal: set affinity for control threads")
>>>>>>
>>>>>> Cc: Olivier Matz <olivier.matz@6wind.com>
>>>>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>>
>>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>>>> Though I have seen Stephen's comment on this (possibly a library
>>>> bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
>>>> generating bus errors and futex errors with variation in core masks
>>>> provided to applications.
>>>>> Thanks a lot for this.
>>>>>
>>>>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>> Applied, thanks Jianfeng.
>>
>>>> Could you verify there is not a use after free by using valgrind or
>>>> some library that poisons memory on free.
>>> I will probably do that soon - but for the time being I don't want
>>> this issue to block the dpaa/dpaa2 for RC1 - these drivers were
>>> completely unusable without this patch.
>> Please Shreyansh, continue the analysis of this bug.
>> Thanks
>>
>>
> The pthread_barrier should also be destroyed when it is no longer needed.

I tried this could also kick the sleeping thread; but due to "The effect 
of subsequent use of the barrier is undefined", I did not use that way.

Anyway, I agree that destroy() shall be called for completeness.

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-04-28  4:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 16:41 [PATCH] eal: fix threads block on barrier Jianfeng Tan
2018-04-27 16:42 ` Thomas Monjalon
2018-04-27 17:03 ` Stephen Hemminger
2018-04-27 17:36 ` Shreyansh Jain
2018-04-27 17:39   ` Stephen Hemminger
2018-04-27 17:45     ` Shreyansh Jain
2018-04-27 19:52       ` Thomas Monjalon
2018-04-28  1:21         ` Stephen Hemminger
2018-04-28  4:15           ` Tan, Jianfeng
2018-04-28  1:24         ` Stephen Hemminger
2018-04-28  4:22           ` Tan, Jianfeng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.