netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] af_unix: fix a fatal race with bit fields
@ 2013-05-01  1:12 Eric Dumazet
  2013-05-01  1:39 ` Benjamin Herrenschmidt
  2013-05-01  1:51 ` Anton Blanchard
  0 siblings, 2 replies; 23+ messages in thread
From: Eric Dumazet @ 2013-05-01  1:12 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Benjamin Herrenschmidt, Paul Mackerras,
	Ambrose Feinstein, linuxppc-dev

From: Eric Dumazet <edumazet@google.com>

Using bit fields is dangerous on ppc64, as the compiler uses 64bit
instructions to manipulate them. If the 64bit word includes any
atomic_t or spinlock_t, we can lose critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---

Could ppc64 experts confirm using byte is safe, or should we really add
a 32bit hole after the spinlock ? If so, I wonder how many other places
need a change...

 include/net/af_unix.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index a8836e8..4520a23f 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -57,8 +57,8 @@ struct unix_sock {
 	struct list_head	link;
 	atomic_long_t		inflight;
 	spinlock_t		lock;
-	unsigned int		gc_candidate : 1;
-	unsigned int		gc_maybe_cycle : 1;
+	unsigned char		gc_candidate;
+	unsigned char		gc_maybe_cycle;
 	unsigned char		recursion_level;
 	struct socket_wq	peer_wq;
 };

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  1:12 [PATCH net-next] af_unix: fix a fatal race with bit fields Eric Dumazet
@ 2013-05-01  1:39 ` Benjamin Herrenschmidt
  2013-05-01  7:36   ` David Miller
                     ` (2 more replies)
  2013-05-01  1:51 ` Anton Blanchard
  1 sibling, 3 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2013-05-01  1:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Paul Mackerras, Ambrose Feinstein, linuxppc-dev

On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> instructions to manipulate them. If the 64bit word includes any
> atomic_t or spinlock_t, we can lose critical concurrent changes.
> 
> This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> gc_maybe_cycle/lock share the same 64bit word.
> 
> This leads to fatal deadlock, as one/several cpus spin forever
> on a spinlock that will never be available again.
> 
> Reported-by: Ambrose Feinstein <ambrose@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
> 
> Could ppc64 experts confirm using byte is safe, or should we really add
> a 32bit hole after the spinlock ? If so, I wonder how many other places
> need a change...

Wow, nice one !

I'm not even completely certain bytes are safe to be honest, though
probably more than bitfields. I'll poke our compiler people.

The worry is of course how many more of these do we potentially have ? 
We might be able to automate finding these issues with sparse, I
suppose.

Also I'd be surprised if ppc64 is the only one with that problem... what
about sparc64 and arm64 ?

Cheers,
Ben.

>  include/net/af_unix.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index a8836e8..4520a23f 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -57,8 +57,8 @@ struct unix_sock {
>  	struct list_head	link;
>  	atomic_long_t		inflight;
>  	spinlock_t		lock;
> -	unsigned int		gc_candidate : 1;
> -	unsigned int		gc_maybe_cycle : 1;
> +	unsigned char		gc_candidate;
> +	unsigned char		gc_maybe_cycle;
>  	unsigned char		recursion_level;
>  	struct socket_wq	peer_wq;
>  };
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  1:12 [PATCH net-next] af_unix: fix a fatal race with bit fields Eric Dumazet
  2013-05-01  1:39 ` Benjamin Herrenschmidt
@ 2013-05-01  1:51 ` Anton Blanchard
  2013-05-01  2:24   ` Eric Dumazet
  1 sibling, 1 reply; 23+ messages in thread
From: Anton Blanchard @ 2013-05-01  1:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, linuxppc-dev, Paul Mackerras,
	Ambrose Feinstein, amodra

Hi Eric,

> From: Eric Dumazet <edumazet@google.com>
> 
> Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> instructions to manipulate them. If the 64bit word includes any
> atomic_t or spinlock_t, we can lose critical concurrent changes.
> 
> This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> gc_maybe_cycle/lock share the same 64bit word.
> 
> This leads to fatal deadlock, as one/several cpus spin forever
> on a spinlock that will never be available again.

I just spoke to Alan Modra and he suspects this is a compiler
bug. Can you give us your compiler version info?

Anton

> Reported-by: Ambrose Feinstein <ambrose@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> ---
> 
> Could ppc64 experts confirm using byte is safe, or should we really
> add a 32bit hole after the spinlock ? If so, I wonder how many other
> places need a change...
> 
>  include/net/af_unix.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index a8836e8..4520a23f 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -57,8 +57,8 @@ struct unix_sock {
>  	struct list_head	link;
>  	atomic_long_t		inflight;
>  	spinlock_t		lock;
> -	unsigned int		gc_candidate : 1;
> -	unsigned int		gc_maybe_cycle : 1;
> +	unsigned char		gc_candidate;
> +	unsigned char		gc_maybe_cycle;
>  	unsigned char		recursion_level;
>  	struct socket_wq	peer_wq;
>  };
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  1:51 ` Anton Blanchard
@ 2013-05-01  2:24   ` Eric Dumazet
  2013-05-01  3:54     ` Alan Modra
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2013-05-01  2:24 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: David Miller, netdev, linuxppc-dev, Paul Mackerras,
	Ambrose Feinstein, amodra

On Wed, 2013-05-01 at 11:51 +1000, Anton Blanchard wrote:
> Hi Eric,
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> > instructions to manipulate them. If the 64bit word includes any
> > atomic_t or spinlock_t, we can lose critical concurrent changes.
> > 
> > This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> > gc_maybe_cycle/lock share the same 64bit word.
> > 
> > This leads to fatal deadlock, as one/several cpus spin forever
> > on a spinlock that will never be available again.
> 
> I just spoke to Alan Modra and he suspects this is a compiler
> bug. Can you give us your compiler version info?

$ gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc -v
Using built-in specs.
COLLECT_GCC=gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc
COLLECT_LTO_WRAPPER=/usr/local/google/home/edumazet/cross/gcc-4.6.3-nolibc/powerpc64-linux/bin/../libexec/gcc/powerpc64-linux/4.6.3/lto-wrapper
Target: powerpc64-linux
Configured with: /home/tony/buildall/src/gcc/configure
--target=powerpc64-linux --host=x86_64-linux-gnu
--build=x86_64-linux-gnu --enable-targets=all
--prefix=/opt/cross/gcc-4.6.3-nolibc/powerpc64-linux/
--enable-languages=c --with-newlib --without-headers
--enable-sjlj-exceptions --with-system-libunwind --disable-nls
--disable-threads --disable-shared --disable-libmudflap --disable-libssp
--disable-libgomp --disable-decimal-float --enable-checking=release
--with-mpfr=/home/tony/buildall/src/sys-x86_64
--with-gmp=/home/tony/buildall/src/sys-x86_64 --disable-bootstrap
--disable-libquadmath
Thread model: single
gcc version 4.6.3 (GCC) 


$ cat try.c ; gcc-4.6.3-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc
-O2 -S try.c ; cat try.s
struct s {
	unsigned int lock;
	unsigned int f1 : 1;
	unsigned int f2 : 1;
	void *ptr;
} *p ;

showbug()
{
	p->lock++;
	p->f1 = 1;
}
	.file	"try.c"
	.section	".toc","aw"
	.section	".text"
	.section	".toc","aw"
.LC0:
	.tc p[TC],p
	.section	".text"
	.align 2
	.globl showbug
	.section	".opd","aw"
	.align 3
showbug:
	.quad	.L.showbug,.TOC.@tocbase,0
	.previous
	.type	showbug, @function
.L.showbug:
	addis 9,2,.LC0@toc@ha
	ld 9,.LC0@toc@l(9)
	ld 9,0(9)
	lwz 11,0(9)
	addi 0,11,1
	stw 0,0(9)
	li 11,1
	ld 0,0(9)
	rldimi 0,11,31,32
	std 0,0(9)
	blr
	.long 0
	.byte 0,0,0,0,0,0,0,0
	.size	showbug,.-.L.showbug
	.comm	p,8,8
	.ident	"GCC: (GNU) 4.6.3"

You can see "ld 0,0(9)" is used : its a 64 bit load.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  2:24   ` Eric Dumazet
@ 2013-05-01  3:54     ` Alan Modra
  2013-05-01  5:04       ` Eric Dumazet
  2013-05-02 17:02       ` Scott Wood
  0 siblings, 2 replies; 23+ messages in thread
From: Alan Modra @ 2013-05-01  3:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev,
	Paul Mackerras, Ambrose Feinstein

On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
> 	li 11,1
> 	ld 0,0(9)
> 	rldimi 0,11,31,32
> 	std 0,0(9)
> 	blr
> 	.ident	"GCC: (GNU) 4.6.3"
> 
> You can see "ld 0,0(9)" is used : its a 64 bit load.

Yup.  This is not a powerpc64 specific problem.  See
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
Fixed in 4.8.0 and 4.7.3.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  3:54     ` Alan Modra
@ 2013-05-01  5:04       ` Eric Dumazet
  2013-05-01 15:10         ` Stephen Hemminger
  2013-05-03  1:31         ` Alan Modra
  2013-05-02 17:02       ` Scott Wood
  1 sibling, 2 replies; 23+ messages in thread
From: Eric Dumazet @ 2013-05-01  5:04 UTC (permalink / raw)
  To: Alan Modra
  Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev,
	Paul Mackerras, Ambrose Feinstein

On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote:
> On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
> > 	li 11,1
> > 	ld 0,0(9)
> > 	rldimi 0,11,31,32
> > 	std 0,0(9)
> > 	blr
> > 	.ident	"GCC: (GNU) 4.6.3"
> > 
> > You can see "ld 0,0(9)" is used : its a 64 bit load.
> 
> Yup.  This is not a powerpc64 specific problem.  See
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> Fixed in 4.8.0 and 4.7.3.

Ah thanks.

This seems a pretty serious issue, is it documented somewhere that
ppc64, sparc64 and others need such compiler version ?

These kind of errors are pretty hard to find, its a pity to spend time
on them.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  1:39 ` Benjamin Herrenschmidt
@ 2013-05-01  7:36   ` David Miller
  2013-05-01  8:08     ` Benjamin Herrenschmidt
  2013-05-01 15:24     ` [PATCH v2 " Eric Dumazet
  2013-05-01 12:08   ` [PATCH " Ben Hutchings
  2013-05-03 14:29   ` David Laight
  2 siblings, 2 replies; 23+ messages in thread
From: David Miller @ 2013-05-01  7:36 UTC (permalink / raw)
  To: benh; +Cc: eric.dumazet, netdev, paulus, ambrose, linuxppc-dev

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Wed, 01 May 2013 11:39:53 +1000

> I'm not even completely certain bytes are safe to be honest, though
> probably more than bitfields. I'll poke our compiler people.

Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
accesses are not atomic, and therefore use racey read-modify-write
sequences.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  7:36   ` David Miller
@ 2013-05-01  8:08     ` Benjamin Herrenschmidt
  2013-05-01 15:24     ` [PATCH v2 " Eric Dumazet
  1 sibling, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2013-05-01  8:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linuxppc-dev, paulus, ambrose, eric.dumazet

On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Wed, 01 May 2013 11:39:53 +1000
> 
> > I'm not even completely certain bytes are safe to be honest, though
> > probably more than bitfields. I'll poke our compiler people.
> 
> Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
> accesses are not atomic, and therefore use racey read-modify-write
> sequences.

In this case it depends whether the compiler will "chose" the smaller
(32-bit) size which hopefully won't overlap with the atomic/lock
provided the latter is aligned... lots of if's here, makes me nervous...

At least the bytes seem to fix it for ppc64 so far...

It would make feel generally better if we could get gcc to guarantee us
to always use the smallest access size that encompass the whole bitfield
(or at least not go larger than int when the bitfield is defined as
unsigned int). This would take care of all the cases we haven't spotted
yet (hopefully).

For all intend and purposes those two fields are bits of an unsigned
int, why the heck would the compiler use a larger access size anyway ? I
seem to recall that we have other places where such an assumption is
made that ints are accessed atomically, and Linus stating in the past
that a compiler doing anything else was not worth bothering with. I
don't see why bitfields of such int would be an exception to that rule
(though again, this is probably not a rule stated in the standard ... oh
well).

/me goes have a glass of wine and not think about this until tomorrow.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  1:39 ` Benjamin Herrenschmidt
  2013-05-01  7:36   ` David Miller
@ 2013-05-01 12:08   ` Ben Hutchings
  2013-05-03 14:29   ` David Laight
  2 siblings, 0 replies; 23+ messages in thread
From: Ben Hutchings @ 2013-05-01 12:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Eric Dumazet, David Miller, netdev, Paul Mackerras,
	Ambrose Feinstein, linuxppc-dev

On Wed, 2013-05-01 at 11:39 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2013-04-30 at 18:12 -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Using bit fields is dangerous on ppc64, as the compiler uses 64bit
> > instructions to manipulate them. If the 64bit word includes any
> > atomic_t or spinlock_t, we can lose critical concurrent changes.
> > 
> > This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> > gc_maybe_cycle/lock share the same 64bit word.
> > 
> > This leads to fatal deadlock, as one/several cpus spin forever
> > on a spinlock that will never be available again.
> > 
> > Reported-by: Ambrose Feinstein <ambrose@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > ---
> > 
> > Could ppc64 experts confirm using byte is safe, or should we really add
> > a 32bit hole after the spinlock ? If so, I wonder how many other places
> > need a change...
> 
> Wow, nice one !
> 
> I'm not even completely certain bytes are safe to be honest, though
> probably more than bitfields. I'll poke our compiler people.

There is a longstanding and hard-to-fix bug in gcc that is specific to
bitfields.  I think that the underlying type isn't propagated, so when
it comes to code generation the compiler doesn't know the natural width
for the memory access.

As for bytes - early Alphas couldn't load/store less than 32 bits, but I
doubt anyone cares any more.

> The worry is of course how many more of these do we potentially have ? 
> We might be able to automate finding these issues with sparse, I
> suppose.
> 
> Also I'd be surprised if ppc64 is the only one with that problem... what
> about sparc64 and arm64 ?

I expect they can have the same general problem, but gcc may be more or
less keen to generate 64-bit load/store instructions for bitfields on
different architectures.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  5:04       ` Eric Dumazet
@ 2013-05-01 15:10         ` Stephen Hemminger
  2013-05-02 21:11           ` Benjamin Herrenschmidt
  2013-05-03  1:31         ` Alan Modra
  1 sibling, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2013-05-01 15:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alan Modra, Anton Blanchard, David Miller, netdev, linuxppc-dev,
	Paul Mackerras, Ambrose Feinstein

On Tue, 30 Apr 2013 22:04:32 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote:
> > On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
> > > 	li 11,1
> > > 	ld 0,0(9)
> > > 	rldimi 0,11,31,32
> > > 	std 0,0(9)
> > > 	blr
> > > 	.ident	"GCC: (GNU) 4.6.3"
> > > 
> > > You can see "ld 0,0(9)" is used : its a 64 bit load.
> > 
> > Yup.  This is not a powerpc64 specific problem.  See
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> > Fixed in 4.8.0 and 4.7.3.
> 
> Ah thanks.
> 
> This seems a pretty serious issue, is it documented somewhere that
> ppc64, sparc64 and others need such compiler version ?
> 
> These kind of errors are pretty hard to find, its a pity to spend time
> on them.

There is a checkbin target inside arch/powerpc/Makefile
Shouldn't a check be added there to block building kernel with known
bad GCC versions?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  7:36   ` David Miller
  2013-05-01  8:08     ` Benjamin Herrenschmidt
@ 2013-05-01 15:24     ` Eric Dumazet
  2013-05-01 15:53       ` David Laight
  2013-05-01 19:14       ` David Miller
  1 sibling, 2 replies; 23+ messages in thread
From: Eric Dumazet @ 2013-05-01 15:24 UTC (permalink / raw)
  To: David Miller; +Cc: benh, netdev, paulus, ambrose, linuxppc-dev

On Wed, 2013-05-01 at 03:36 -0400, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Wed, 01 May 2013 11:39:53 +1000
> 
> > I'm not even completely certain bytes are safe to be honest, though
> > probably more than bitfields. I'll poke our compiler people.
> 
> Older Alpha only has 32-bit and 64-bit loads and stores, so byte sized
> accesses are not atomic, and therefore use racey read-modify-write
> sequences.

Right, so what about the following more general fix ?

Thanks !

[PATCH v2] af_unix: fix a fatal race with bit fields

Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.

As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().

recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.

[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
---
 include/net/af_unix.h |    5 +++--
 net/unix/garbage.c    |   12 ++++++------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index a8836e8..dbdfd2b 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -57,9 +57,10 @@ struct unix_sock {
 	struct list_head	link;
 	atomic_long_t		inflight;
 	spinlock_t		lock;
-	unsigned int		gc_candidate : 1;
-	unsigned int		gc_maybe_cycle : 1;
 	unsigned char		recursion_level;
+	unsigned long		gc_flags;
+#define UNIX_GC_CANDIDATE	0
+#define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
 };
 #define unix_sk(__sk) ((struct unix_sock *)__sk)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index d0f6545..9c6cc08 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -185,7 +185,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 					 * have been added to the queues after
 					 * starting the garbage collection
 					 */
-					if (u->gc_candidate) {
+					if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) {
 						hit = true;
 						func(u);
 					}
@@ -254,7 +254,7 @@ static void inc_inflight_move_tail(struct unix_sock *u)
 	 * of the list, so that it's checked even if it was already
 	 * passed over
 	 */
-	if (u->gc_maybe_cycle)
+	if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags))
 		list_move_tail(&u->link, &gc_candidates);
 }
 
@@ -315,8 +315,8 @@ void unix_gc(void)
 		BUG_ON(total_refs < inflight_refs);
 		if (total_refs == inflight_refs) {
 			list_move_tail(&u->link, &gc_candidates);
-			u->gc_candidate = 1;
-			u->gc_maybe_cycle = 1;
+			__set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
+			__set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
 		}
 	}
 
@@ -344,7 +344,7 @@ void unix_gc(void)
 
 		if (atomic_long_read(&u->inflight) > 0) {
 			list_move_tail(&u->link, &not_cycle_list);
-			u->gc_maybe_cycle = 0;
+			__clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);
 			scan_children(&u->sk, inc_inflight_move_tail, NULL);
 		}
 	}
@@ -356,7 +356,7 @@ void unix_gc(void)
 	 */
 	while (!list_empty(&not_cycle_list)) {
 		u = list_entry(not_cycle_list.next, struct unix_sock, link);
-		u->gc_candidate = 0;
+		__clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);
 		list_move_tail(&u->link, &gc_inflight_list);
 	}
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* RE: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
  2013-05-01 15:24     ` [PATCH v2 " Eric Dumazet
@ 2013-05-01 15:53       ` David Laight
  2013-05-01 16:00         ` Eric Dumazet
  2013-05-01 19:14       ` David Miller
  1 sibling, 1 reply; 23+ messages in thread
From: David Laight @ 2013-05-01 15:53 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: benh, netdev, paulus, ambrose, linuxppc-dev

> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index a8836e8..dbdfd2b 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -57,9 +57,10 @@ struct unix_sock {
>  	struct list_head	link;
>  	atomic_long_t		inflight;
>  	spinlock_t		lock;
> -	unsigned int		gc_candidate : 1;
> -	unsigned int		gc_maybe_cycle : 1;
>  	unsigned char		recursion_level;
> +	unsigned long		gc_flags;
> +#define UNIX_GC_CANDIDATE	0
> +#define UNIX_GC_MAYBE_CYCLE	1
>  	struct socket_wq	peer_wq;
>  };

Why not just change gc_candidate and gc_maybe_cycle to
unsigned char?
It might reduce the number of pad bytes somewhat.

	David


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
  2013-05-01 15:53       ` David Laight
@ 2013-05-01 16:00         ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2013-05-01 16:00 UTC (permalink / raw)
  To: David Laight; +Cc: David Miller, benh, netdev, paulus, ambrose, linuxppc-dev

On Wed, 2013-05-01 at 16:53 +0100, David Laight wrote:
> Why not just change gc_candidate and gc_maybe_cycle to
> unsigned char?
> It might reduce the number of pad bytes somewhat.

You didn't quite follow the discussion.

I used bytes on V1, and we are not 100% sure its safe on all arches.

unsigned long is guaranteed to be safe. We absolutely rely on this.

Better use more bytes on a socket (with no impact on real memory use),
than spending hours to debug some strange issues.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next] af_unix: fix a fatal race with bit fields
  2013-05-01 15:24     ` [PATCH v2 " Eric Dumazet
  2013-05-01 15:53       ` David Laight
@ 2013-05-01 19:14       ` David Miller
  1 sibling, 0 replies; 23+ messages in thread
From: David Miller @ 2013-05-01 19:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: benh, netdev, paulus, ambrose, linuxppc-dev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 01 May 2013 08:24:03 -0700

> [PATCH v2] af_unix: fix a fatal race with bit fields
> 
> Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
> uses 64bit instructions to manipulate them.
> If the 64bit word includes any atomic_t or spinlock_t, we can lose
> critical concurrent changes.
> 
> This is happening in af_unix, where unix_sk(sk)->gc_candidate/
> gc_maybe_cycle/lock share the same 64bit word.
> 
> This leads to fatal deadlock, as one/several cpus spin forever
> on a spinlock that will never be available again.
> 
> A safer way would be to use a long to store flags.
> This way we are sure compiler/arch wont do bad things.
> 
> As we own unix_gc_lock spinlock when clearing or setting bits,
> we can use the non atomic __set_bit()/__clear_bit().
> 
> recursion_level can share the same 64bit location with the spinlock,
> as it is set only with this spinlock held.
> 
> [1] bug fixed in gcc-4.8.0 :
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> 
> Reported-by: Ambrose Feinstein <ambrose@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  3:54     ` Alan Modra
  2013-05-01  5:04       ` Eric Dumazet
@ 2013-05-02 17:02       ` Scott Wood
  1 sibling, 0 replies; 23+ messages in thread
From: Scott Wood @ 2013-05-02 17:02 UTC (permalink / raw)
  To: Alan Modra
  Cc: Eric Dumazet, netdev, Ambrose Feinstein, Paul Mackerras,
	Anton Blanchard, linuxppc-dev, David Miller

On 04/30/2013 10:54:25 PM, Alan Modra wrote:
> On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
> > 	li 11,1
> > 	ld 0,0(9)
> > 	rldimi 0,11,31,32
> > 	std 0,0(9)
> > 	blr
> > 	.ident	"GCC: (GNU) 4.6.3"
> >
> > You can see "ld 0,0(9)" is used : its a 64 bit load.
> 
> Yup.  This is not a powerpc64 specific problem.  See
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
> Fixed in 4.8.0 and 4.7.3.

FWIW (especially if a GCC version check is added), it seems to have  
been fixed as far back as 4.7.1, not just 4.7.3.

-Scott

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01 15:10         ` Stephen Hemminger
@ 2013-05-02 21:11           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2013-05-02 21:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric Dumazet, Alan Modra, Anton Blanchard, David Miller, netdev,
	linuxppc-dev, Paul Mackerras, Ambrose Feinstein

On Wed, 2013-05-01 at 08:10 -0700, Stephen Hemminger wrote:
> > These kind of errors are pretty hard to find, its a pity to spend
> time
> > on them.
> 
> There is a checkbin target inside arch/powerpc/Makefile
> Shouldn't a check be added there to block building kernel with known
> bad GCC versions?

In this case that makes it all GCC versions except the *very
latest* .... not practical.

I suppose we should try to make sure that at least the next batch of
enterprise distro get that fix on gcc side.

Ben.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  5:04       ` Eric Dumazet
  2013-05-01 15:10         ` Stephen Hemminger
@ 2013-05-03  1:31         ` Alan Modra
  2013-05-03  8:20           ` David Laight
                             ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Alan Modra @ 2013-05-03  1:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev,
	Paul Mackerras, Ambrose Feinstein

On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote:
> These kind of errors are pretty hard to find, its a pity to spend time
> on them.

Well, yes.  From the first comment in gcc PR52080.  "For the following
testcase we generate a 8 byte RMW cycle on IA64 which causes locking
problems in the linux kernel btrfs filesystem."

Did someone fix btrfs, but not check other kernel locks?  Having now
hit the same problem again, have you checked that other kernel locks
don't have adjacent bit fields in the same 64-bit word?  And comment
the struct to ensure someone doesn't optimize those unsigned chars
back to bit fields.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-03  1:31         ` Alan Modra
@ 2013-05-03  8:20           ` David Laight
  2013-05-03 12:57           ` Benjamin Herrenschmidt
  2013-05-03 14:14           ` Eric Dumazet
  2 siblings, 0 replies; 23+ messages in thread
From: David Laight @ 2013-05-03  8:20 UTC (permalink / raw)
  To: Alan Modra, Eric Dumazet
  Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev,
	Paul Mackerras, Ambrose Feinstein

> Did someone fix btrfs, but not check other kernel locks?  Having now
> hit the same problem again, have you checked that other kernel locks
> don't have adjacent bit fields in the same 64-bit word?  And comment
> the struct to ensure someone doesn't optimize those unsigned chars
> back to bit fields.

Seems a good reason to have a general policy of not using
bit fields!

Separate char fields normally generate faster code - possibly
at the expense of an increase in the allocated size of a
structure.

	David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-03  1:31         ` Alan Modra
  2013-05-03  8:20           ` David Laight
@ 2013-05-03 12:57           ` Benjamin Herrenschmidt
  2013-05-03 14:14           ` Eric Dumazet
  2 siblings, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2013-05-03 12:57 UTC (permalink / raw)
  To: Alan Modra
  Cc: Eric Dumazet, netdev, Ambrose Feinstein, Paul Mackerras,
	Anton Blanchard, linuxppc-dev, David Miller

On Fri, 2013-05-03 at 11:01 +0930, Alan Modra wrote:
> On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote:
> > These kind of errors are pretty hard to find, its a pity to spend time
> > on them.
> 
> Well, yes.  From the first comment in gcc PR52080.  "For the following
> testcase we generate a 8 byte RMW cycle on IA64 which causes locking
> problems in the linux kernel btrfs filesystem."
> 
> Did someone fix btrfs, but not check other kernel locks?  Having now
> hit the same problem again, have you checked that other kernel locks
> don't have adjacent bit fields in the same 64-bit word?  And comment
> the struct to ensure someone doesn't optimize those unsigned chars
> back to bit fields.

Unfortunately, fixing "other" kernel locks is near impossible.

One could try to grep for all spinlock_t and maybe even all atomic_t,
may even write a script to spot automatically if a bitfield appears
to be around (though it could be hidden behind a structure etc...) but
what about an int accessed with cmxchg (a kernel macro doing a
lwarx/stwcx. loop on a value) for example ? There's plenty of these...

I don't think we can realistically "fix" all potential occurrences of
that bug in the kernel short of geting rid of all bitfields, which isn't
going to happen any time soon.

I'm afraid this *must* be fixed at the compiler level, with as backports
much as can realistically be done back to distros.

Ben.
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-03  1:31         ` Alan Modra
  2013-05-03  8:20           ` David Laight
  2013-05-03 12:57           ` Benjamin Herrenschmidt
@ 2013-05-03 14:14           ` Eric Dumazet
  2 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2013-05-03 14:14 UTC (permalink / raw)
  To: Alan Modra
  Cc: Anton Blanchard, David Miller, netdev, linuxppc-dev,
	Paul Mackerras, Ambrose Feinstein

On Fri, 2013-05-03 at 11:01 +0930, Alan Modra wrote:
> On Tue, Apr 30, 2013 at 10:04:32PM -0700, Eric Dumazet wrote:
> > These kind of errors are pretty hard to find, its a pity to spend time
> > on them.
> 
> Well, yes.  From the first comment in gcc PR52080.  "For the following
> testcase we generate a 8 byte RMW cycle on IA64 which causes locking
> problems in the linux kernel btrfs filesystem."
> 
> Did someone fix btrfs, but not check other kernel locks?  Having now
> hit the same problem again, have you checked that other kernel locks
> don't have adjacent bit fields in the same 64-bit word?  And comment
> the struct to ensure someone doesn't optimize those unsigned chars
> back to bit fields.

Not only spinlock, but atomic_t followed by bit fields.

BTW, if a spinlock is followed by bit fields, but bit fields
only changed when this spinlock is held, there is no problem, unless
spinlock is a ticket spinlock.

In af_unix, bug happens because the bit fields were changed without
spinlock being held (another global spinlock is used instead)

(ppc64 doesnt use ticket spinlocks yet)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-01  1:39 ` Benjamin Herrenschmidt
  2013-05-01  7:36   ` David Miller
  2013-05-01 12:08   ` [PATCH " Ben Hutchings
@ 2013-05-03 14:29   ` David Laight
  2013-05-03 15:02     ` Eric Dumazet
  2 siblings, 1 reply; 23+ messages in thread
From: David Laight @ 2013-05-03 14:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Eric Dumazet
  Cc: David Miller, netdev, Paul Mackerras, Ambrose Feinstein, linuxppc-dev

> > Could ppc64 experts confirm using byte is safe, or should we really add
> > a 32bit hole after the spinlock ? If so, I wonder how many other places
> > need a change...
...
> Also I'd be surprised if ppc64 is the only one with that problem... what
> about sparc64 and arm64 ?

Even x86 could be affected.
The width of the memory cycles used by the 'bit set and bit clear'
instructions isn't documented. They are certainly allowed to do
RMW on adjacent bytes.
I don't remember whether they are constrained to only do
32bit accesses, but nothing used to say that they wouldn't
do 32bit misaligned ones! (although I suspect they never have).

	David


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-03 14:29   ` David Laight
@ 2013-05-03 15:02     ` Eric Dumazet
  2013-05-03 15:44       ` David Laight
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2013-05-03 15:02 UTC (permalink / raw)
  To: David Laight
  Cc: Benjamin Herrenschmidt, David Miller, netdev, Paul Mackerras,
	Ambrose Feinstein, linuxppc-dev

On Fri, 2013-05-03 at 15:29 +0100, David Laight wrote:
> > > Could ppc64 experts confirm using byte is safe, or should we really add
> > > a 32bit hole after the spinlock ? If so, I wonder how many other places
> > > need a change...
> ...
> > Also I'd be surprised if ppc64 is the only one with that problem... what
> > about sparc64 and arm64 ?
> 
> Even x86 could be affected.
> The width of the memory cycles used by the 'bit set and bit clear'
> instructions isn't documented. They are certainly allowed to do
> RMW on adjacent bytes.
> I don't remember whether they are constrained to only do
> 32bit accesses, but nothing used to say that they wouldn't
> do 32bit misaligned ones! (although I suspect they never have).

x86 is not affected (or else we would have found the bug much earlier)

Setting 1-bit field to one/zero uses OR/AND instructions.

orb  $4,724(%reg)

doesn't load/store 64bits but 8bits.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH net-next] af_unix: fix a fatal race with bit fields
  2013-05-03 15:02     ` Eric Dumazet
@ 2013-05-03 15:44       ` David Laight
  0 siblings, 0 replies; 23+ messages in thread
From: David Laight @ 2013-05-03 15:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Benjamin Herrenschmidt, David Miller, netdev, Paul Mackerras,
	Ambrose Feinstein, linuxppc-dev

> > > Also I'd be surprised if ppc64 is the only one with that problem... what
> > > about sparc64 and arm64 ?
> >
> > Even x86 could be affected.
> > The width of the memory cycles used by the 'bit set and bit clear'
> > instructions isn't documented. They are certainly allowed to do
> > RMW on adjacent bytes.
> > I don't remember whether they are constrained to only do
> > 32bit accesses, but nothing used to say that they wouldn't
> > do 32bit misaligned ones! (although I suspect they never have).
> 
> x86 is not affected (or else we would have found the bug much earlier)
> 
> Setting 1-bit field to one/zero uses OR/AND instructions.
> 
> orb  $4,724(%reg)
> 
> doesn't load/store 64bits but 8bits.

I was thinking of code that might be using BT, BTC, BTR or BTS.
These are probably used with the 'lock' prefix - which would
(I think) make them safe.

The documented constraint is more specific than it used to be
the Intel version reads:

    When accessing a bit in memory, the processor may access 4 bytes
    starting from the memory address for a 32-bit operand size, using
    by the following relationship:
        Effective Address + (4 ∗ (BitOffset DIV 32))
    Or, it may access 2 bytes starting from the memory address for a
    16-bit operand, using this relationship:
        Effective Address + (2 ∗ (BitOffset DIV 16))
    It may do so even when only a single byte needs to be accessed to
    reach the given bit.
    When using this bit addressing mechanism, software should avoid
    referencing areas of memory close to address space holes.
    In particular, it should avoid references to memory-mapped I/O registers.
    Instead, software should use the MOV instructions to load from or store
    to these addresses, and use the register form of these instructions to
    manipulate the data.
    In 64-bit mode, the instruction’s default operation size is 32 bits.
    Using a REX prefix in the form of REX.R permits access to additional
    registers (R8-R15). Using a REX prefix in the form of REX.W promotes
    operation to 64 bit operands.

	David



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-05-03 15:45 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-01  1:12 [PATCH net-next] af_unix: fix a fatal race with bit fields Eric Dumazet
2013-05-01  1:39 ` Benjamin Herrenschmidt
2013-05-01  7:36   ` David Miller
2013-05-01  8:08     ` Benjamin Herrenschmidt
2013-05-01 15:24     ` [PATCH v2 " Eric Dumazet
2013-05-01 15:53       ` David Laight
2013-05-01 16:00         ` Eric Dumazet
2013-05-01 19:14       ` David Miller
2013-05-01 12:08   ` [PATCH " Ben Hutchings
2013-05-03 14:29   ` David Laight
2013-05-03 15:02     ` Eric Dumazet
2013-05-03 15:44       ` David Laight
2013-05-01  1:51 ` Anton Blanchard
2013-05-01  2:24   ` Eric Dumazet
2013-05-01  3:54     ` Alan Modra
2013-05-01  5:04       ` Eric Dumazet
2013-05-01 15:10         ` Stephen Hemminger
2013-05-02 21:11           ` Benjamin Herrenschmidt
2013-05-03  1:31         ` Alan Modra
2013-05-03  8:20           ` David Laight
2013-05-03 12:57           ` Benjamin Herrenschmidt
2013-05-03 14:14           ` Eric Dumazet
2013-05-02 17:02       ` Scott Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).