netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drivers/net/b44: Change to non-atomic bit operations
@ 2019-12-20 23:29 Fenghua Yu
  2019-12-25  0:18 ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Fenghua Yu @ 2019-12-20 23:29 UTC (permalink / raw)
  To: Michael Chan, netdev
  Cc: Thomas Gleixner, Andy Lutomirski, Peter Zijlstra, Tony Luck,
	David Laight, Ravi V Shankar, Fenghua Yu

Since "pwol_mask" is local and never exposed to concurrency, there is
no need to set bit in pwol_mask by costly atomic operations. On x86,
accessing data across two cache lines in one atomic bit operation
(aka split lock) can take over 1000 cycles.

Directly operate on the byte which contains the bit instead of using
__set_bit() to avoid any big endian concern due to type cast to
unsigned long in __set_bit().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 drivers/net/ethernet/broadcom/b44.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
index 035dbb1b2c98..ec25fd81985d 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1516,8 +1516,10 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
 	int ethaddr_bytes = ETH_ALEN;
 
 	memset(ppattern + offset, 0xff, magicsync);
-	for (j = 0; j < magicsync; j++)
-		set_bit(len++, (unsigned long *) pmask);
+	for (j = 0; j < magicsync; j++) {
+		pmask[len >> 3] |= BIT(len & 7);
+		len++;
+	}
 
 	for (j = 0; j < B44_MAX_PATTERNS; j++) {
 		if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
@@ -1529,7 +1531,8 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
 		for (k = 0; k< ethaddr_bytes; k++) {
 			ppattern[offset + magicsync +
 				(j * ETH_ALEN) + k] = macaddr[k];
-			set_bit(len++, (unsigned long *) pmask);
+			pmask[len >> 3] |= BIT(len & 7);
+			len++;
 		}
 	}
 	return len - 1;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] drivers/net/b44: Change to non-atomic bit operations
  2019-12-20 23:29 [PATCH] drivers/net/b44: Change to non-atomic bit operations Fenghua Yu
@ 2019-12-25  0:18 ` David Miller
  2019-12-25  1:10   ` Fenghua Yu
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2019-12-25  0:18 UTC (permalink / raw)
  To: fenghua.yu
  Cc: michael.chan, netdev, tglx, luto, peterz, tony.luck,
	David.Laight, ravi.v.shankar

From: Fenghua Yu <fenghua.yu@intel.com>
Date: Fri, 20 Dec 2019 15:29:11 -0800

> On x86, accessing data across two cache lines in one atomic bit
> operation (aka split lock) can take over 1000 cycles.

This happens during configuration of WOL, nobody cares that the atomic
operations done in this function take 1000 cycles each.

I'm not applying this patch.  It is gratuitous, and the commit message
talks about "performance" considuations (cycle counts) that completely
don't matter here.

If you are merely just arbitrarily trying to remove locked atomic
operations across the tree for it's own sake, then you should be
completely honest about that in your commit message.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drivers/net/b44: Change to non-atomic bit operations
  2019-12-25  0:18 ` David Miller
@ 2019-12-25  1:10   ` Fenghua Yu
  2019-12-25  2:23     ` David Miller
  2019-12-25 20:24     ` [PATCH] drivers/net/b44: Change to non-atomic bit operations Stephen Hemminger
  0 siblings, 2 replies; 9+ messages in thread
From: Fenghua Yu @ 2019-12-25  1:10 UTC (permalink / raw)
  To: David Miller
  Cc: michael.chan, netdev, tglx, luto, peterz, tony.luck,
	David.Laight, ravi.v.shankar

On Tue, Dec 24, 2019 at 04:18:26PM -0800, David Miller wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> Date: Fri, 20 Dec 2019 15:29:11 -0800
> 
> > On x86, accessing data across two cache lines in one atomic bit
> > operation (aka split lock) can take over 1000 cycles.
> 
> This happens during configuration of WOL, nobody cares that the atomic
> operations done in this function take 1000 cycles each.
> 
> I'm not applying this patch.  It is gratuitous, and the commit message
> talks about "performance" considuations (cycle counts) that completely
> don't matter here.
> 
> If you are merely just arbitrarily trying to remove locked atomic
> operations across the tree for it's own sake, then you should be
> completely honest about that in your commit message.

We are enabling split lock in the kernel (by default):
https://lkml.org/lkml/2019/12/12/1129

After applying the split lock detection patch, the set_bit() in b44.c
may cause split lock and kernel dies.

So should I change the commit message to add the above info?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drivers/net/b44: Change to non-atomic bit operations
  2019-12-25  1:10   ` Fenghua Yu
@ 2019-12-25  2:23     ` David Miller
  2020-01-02 21:27       ` [Patch v2] drivers/net/b44: Change to non-atomic bit operations on pwol_mask Luck, Tony
  2019-12-25 20:24     ` [PATCH] drivers/net/b44: Change to non-atomic bit operations Stephen Hemminger
  1 sibling, 1 reply; 9+ messages in thread
From: David Miller @ 2019-12-25  2:23 UTC (permalink / raw)
  To: fenghua.yu
  Cc: michael.chan, netdev, tglx, luto, peterz, tony.luck,
	David.Laight, ravi.v.shankar

From: Fenghua Yu <fenghua.yu@intel.com>
Date: Tue, 24 Dec 2019 17:10:20 -0800

> On Tue, Dec 24, 2019 at 04:18:26PM -0800, David Miller wrote:
>> From: Fenghua Yu <fenghua.yu@intel.com>
>> Date: Fri, 20 Dec 2019 15:29:11 -0800
>> 
>> > On x86, accessing data across two cache lines in one atomic bit
>> > operation (aka split lock) can take over 1000 cycles.
>> 
>> This happens during configuration of WOL, nobody cares that the atomic
>> operations done in this function take 1000 cycles each.
>> 
>> I'm not applying this patch.  It is gratuitous, and the commit message
>> talks about "performance" considuations (cycle counts) that completely
>> don't matter here.
>> 
>> If you are merely just arbitrarily trying to remove locked atomic
>> operations across the tree for it's own sake, then you should be
>> completely honest about that in your commit message.
> 
> We are enabling split lock in the kernel (by default):
> https://lkml.org/lkml/2019/12/12/1129
> 
> After applying the split lock detection patch, the set_bit() in b44.c
> may cause split lock and kernel dies.
> 
> So should I change the commit message to add the above info?

You're asking me if you should make your commit message accurate?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drivers/net/b44: Change to non-atomic bit operations
  2019-12-25  1:10   ` Fenghua Yu
  2019-12-25  2:23     ` David Miller
@ 2019-12-25 20:24     ` Stephen Hemminger
  2020-01-02 18:22       ` Luck, Tony
  1 sibling, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2019-12-25 20:24 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: David Miller, michael.chan, netdev, tglx, luto, peterz,
	tony.luck, David.Laight, ravi.v.shankar

On Tue, 24 Dec 2019 17:10:20 -0800
Fenghua Yu <fenghua.yu@intel.com> wrote:

> On Tue, Dec 24, 2019 at 04:18:26PM -0800, David Miller wrote:
> > From: Fenghua Yu <fenghua.yu@intel.com>
> > Date: Fri, 20 Dec 2019 15:29:11 -0800
> >   
> > > On x86, accessing data across two cache lines in one atomic bit
> > > operation (aka split lock) can take over 1000 cycles.  
> > 
> > This happens during configuration of WOL, nobody cares that the atomic
> > operations done in this function take 1000 cycles each.
> > 
> > I'm not applying this patch.  It is gratuitous, and the commit message
> > talks about "performance" considuations (cycle counts) that completely
> > don't matter here.
> > 
> > If you are merely just arbitrarily trying to remove locked atomic
> > operations across the tree for it's own sake, then you should be
> > completely honest about that in your commit message.  
> 
> We are enabling split lock in the kernel (by default):
> https://lkml.org/lkml/2019/12/12/1129
> 
> After applying the split lock detection patch, the set_bit() in b44.c
> may cause split lock and kernel dies.
> 
> So should I change the commit message to add the above info?
> 
> Thanks.
> 
> -Fenghua

Why not just make pwol_pattern aligned and choose the right word to do
the operation on?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] drivers/net/b44: Change to non-atomic bit operations
  2019-12-25 20:24     ` [PATCH] drivers/net/b44: Change to non-atomic bit operations Stephen Hemminger
@ 2020-01-02 18:22       ` Luck, Tony
  2020-01-03  8:47         ` David Laight
  0 siblings, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2020-01-02 18:22 UTC (permalink / raw)
  To: Stephen Hemminger, Yu, Fenghua
  Cc: David Miller, michael.chan, netdev, tglx, luto, peterz,
	David.Laight, Shankar, Ravi V

> Why not just make pwol_pattern aligned and choose the right word to do
> the operation on?

We use that approach for places where the operation needs to be atomic.

But this one doesn't need an atomic operation since there can be no other
entity operating on the same bitmap in parallel.

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Patch v2] drivers/net/b44: Change to non-atomic bit operations on pwol_mask
  2019-12-25  2:23     ` David Miller
@ 2020-01-02 21:27       ` Luck, Tony
  2020-01-05 22:22         ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2020-01-02 21:27 UTC (permalink / raw)
  To: David Miller
  Cc: fenghua.yu, michael.chan, netdev, tglx, luto, peterz,
	David.Laight, ravi.v.shankar


From: Fenghua Yu <fenghua.yu@intel.com>

Atomic operations that span cache lines are super-expensive on x86
(not just to the current processor, but also to other processes as all
memory operations are blocked until the operation completes). Upcoming
x86 processors have a switch to cause such operations to generate a #AC
trap. It is expected that some real time systems will enable this mode
in BIOS.

In preparation for this, it is necessary to fix code that may execute
atomic instructions with operands that cross cachelines because the #AC
trap will crash the kernel.

Since "pwol_mask" is local and never exposed to concurrency, there is
no need to set bits in pwol_mask using atomic operations.

Directly operate on the byte which contains the bit instead of using
__set_bit() to avoid any big endian concern due to type cast to
unsigned long in __set_bit().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

Tony: Updated commit comment with background information motivating
this change.  No changes to code.

 drivers/net/ethernet/broadcom/b44.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
index 035dbb1b2c98..ec25fd81985d 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1516,8 +1516,10 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
 	int ethaddr_bytes = ETH_ALEN;
 
 	memset(ppattern + offset, 0xff, magicsync);
-	for (j = 0; j < magicsync; j++)
-		set_bit(len++, (unsigned long *) pmask);
+	for (j = 0; j < magicsync; j++) {
+		pmask[len >> 3] |= BIT(len & 7);
+		len++;
+	}
 
 	for (j = 0; j < B44_MAX_PATTERNS; j++) {
 		if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
@@ -1529,7 +1531,8 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
 		for (k = 0; k< ethaddr_bytes; k++) {
 			ppattern[offset + magicsync +
 				(j * ETH_ALEN) + k] = macaddr[k];
-			set_bit(len++, (unsigned long *) pmask);
+			pmask[len >> 3] |= BIT(len & 7);
+			len++;
 		}
 	}
 	return len - 1;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH] drivers/net/b44: Change to non-atomic bit operations
  2020-01-02 18:22       ` Luck, Tony
@ 2020-01-03  8:47         ` David Laight
  0 siblings, 0 replies; 9+ messages in thread
From: David Laight @ 2020-01-03  8:47 UTC (permalink / raw)
  To: 'Luck, Tony', Stephen Hemminger, Yu, Fenghua
  Cc: David Miller, michael.chan, netdev, tglx, luto, peterz, Shankar, Ravi V

From: Luck, Tony
> Sent: 02 January 2020 18:23
> 
> > Why not just make pwol_pattern aligned and choose the right word to do
> > the operation on?
> 
> We use that approach for places where the operation needs to be atomic.
> 
> But this one doesn't need an atomic operation since there can be no other
> entity operating on the same bitmap in parallel.

From what I remember this code is setting up a bitmap that is transferred
to the hardware (it might be the multicast filter).
As such it shouldn't be relying on the implementation of the bitmap
functions (which operate on long[]) to generate the required byte map
required by the hardware.

The code is horrid.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Patch v2] drivers/net/b44: Change to non-atomic bit operations on pwol_mask
  2020-01-02 21:27       ` [Patch v2] drivers/net/b44: Change to non-atomic bit operations on pwol_mask Luck, Tony
@ 2020-01-05 22:22         ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2020-01-05 22:22 UTC (permalink / raw)
  To: tony.luck
  Cc: fenghua.yu, michael.chan, netdev, tglx, luto, peterz,
	David.Laight, ravi.v.shankar

From: "Luck, Tony" <tony.luck@intel.com>
Date: Thu, 2 Jan 2020 13:27:06 -0800

> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> Atomic operations that span cache lines are super-expensive on x86
> (not just to the current processor, but also to other processes as all
> memory operations are blocked until the operation completes). Upcoming
> x86 processors have a switch to cause such operations to generate a #AC
> trap. It is expected that some real time systems will enable this mode
> in BIOS.
> 
> In preparation for this, it is necessary to fix code that may execute
> atomic instructions with operands that cross cachelines because the #AC
> trap will crash the kernel.
> 
> Since "pwol_mask" is local and never exposed to concurrency, there is
> no need to set bits in pwol_mask using atomic operations.
> 
> Directly operate on the byte which contains the bit instead of using
> __set_bit() to avoid any big endian concern due to type cast to
> unsigned long in __set_bit().
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Applied, thanks.

I wonder if this is being used in an endian safe way.  Maybe the way
the filter is written into the chip makes it work out, I don't know.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-01-05 22:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-20 23:29 [PATCH] drivers/net/b44: Change to non-atomic bit operations Fenghua Yu
2019-12-25  0:18 ` David Miller
2019-12-25  1:10   ` Fenghua Yu
2019-12-25  2:23     ` David Miller
2020-01-02 21:27       ` [Patch v2] drivers/net/b44: Change to non-atomic bit operations on pwol_mask Luck, Tony
2020-01-05 22:22         ` David Miller
2019-12-25 20:24     ` [PATCH] drivers/net/b44: Change to non-atomic bit operations Stephen Hemminger
2020-01-02 18:22       ` Luck, Tony
2020-01-03  8:47         ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).