netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
@ 2011-09-30 13:23 Thadeu Lima de Souza Cascardo
  2011-10-02  8:58 ` Yevgeny Petrilin
  0 siblings, 1 reply; 36+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-09-30 13:23 UTC (permalink / raw)
  To: netdev; +Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin

With the addition of Blue Frame support in the network driver for mlx4,
the doorbell is not written in the path where blue frame is enabled and
the package follows some characteristics.

The consequence of that is that ICMP ECHO requests, for example, were
not transmitted by the device. A ping flood, for example, would make the
watchdog dispatch, because the ring was full and transmissions have
timed out.

After this fix, I could ping two systems using mlx4_en (both with the
fix).

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 6e03de0..270da80 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -811,10 +811,11 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		* before setting ownership of this descriptor to HW */
 		wmb();
 		tx_desc->ctrl.owner_opcode = op_own;
-		wmb();
-		writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
 	}
 
+	wmb();
+	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
+
 	/* Poll CQ here */
 	mlx4_en_xmit_poll(priv, tx_ind);
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-09-30 13:23 [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled Thadeu Lima de Souza Cascardo
@ 2011-10-02  8:58 ` Yevgeny Petrilin
  2011-10-03 14:37   ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 36+ messages in thread
From: Yevgeny Petrilin @ 2011-10-02  8:58 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, netdev


> With the addition of Blue Frame support in the network driver for mlx4, the doorbell is not written in the path where blue frame is enabled and the package follows some characteristics.
> 
> The consequence of that is that ICMP ECHO requests, for example, were not transmitted by the device. A ping flood, for example, would make the
> watchdog dispatch, because the ring was full and transmissions have timed out.
> 
> After this fix, I could ping two systems using mlx4_en (both with the fix).
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo
> <cascardo@linux.vnet.ibm.com>
> Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 6e03de0..270da80 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -811,10 +811,11 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb,
> struct net_device *dev)
>  		* before setting ownership of this descriptor to HW */
>  		wmb();
>  		tx_desc->ctrl.owner_opcode = op_own;
> -		wmb();
> -		writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
>  	}
> 
> +	wmb();
> +	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> +
>  	/* Poll CQ here */
>  	mlx4_en_xmit_poll(priv, tx_ind);
> 
> --
> 1.7.4.4

What this patch does is ringing the doorbell unconditionally, whether blue flame is used or not.
The whole idea, is to copy the data to the blueflame register and have the HW take the data from there and save the memory access to the work queue entry.
In this patch you do both, copy the data to the register, and then still the HW accesses the memory to take it.
For some reason Blue flame is not working on your system and we need to understand why, I'll be glad to work with you to debug it.
Anyway, this patch is not the solution (even that it works on your systems) for the problem.

Yevgeny

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-02  8:58 ` Yevgeny Petrilin
@ 2011-10-03 14:37   ` Thadeu Lima de Souza Cascardo
  2011-10-03 14:56     ` Yevgeny Petrilin
  0 siblings, 1 reply; 36+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-03 14:37 UTC (permalink / raw)
  To: Yevgeny Petrilin; +Cc: netdev

On Sun, Oct 02, 2011 at 08:58:21AM +0000, Yevgeny Petrilin wrote:
> 
> > With the addition of Blue Frame support in the network driver for mlx4, the doorbell is not written in the path where blue frame is enabled and the package follows some characteristics.
> > 
> > The consequence of that is that ICMP ECHO requests, for example, were not transmitted by the device. A ping flood, for example, would make the
> > watchdog dispatch, because the ring was full and transmissions have timed out.
> > 
> > After this fix, I could ping two systems using mlx4_en (both with the fix).
> > 
> > Signed-off-by: Thadeu Lima de Souza Cascardo
> > <cascardo@linux.vnet.ibm.com>
> > Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c |    5 +++--
> >  1 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > index 6e03de0..270da80 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > @@ -811,10 +811,11 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb,
> > struct net_device *dev)
> >  		* before setting ownership of this descriptor to HW */
> >  		wmb();
> >  		tx_desc->ctrl.owner_opcode = op_own;
> > -		wmb();
> > -		writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> >  	}
> > 
> > +	wmb();
> > +	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> > +
> >  	/* Poll CQ here */
> >  	mlx4_en_xmit_poll(priv, tx_ind);
> > 
> > --
> > 1.7.4.4
> 
> What this patch does is ringing the doorbell unconditionally, whether blue flame is used or not.
> The whole idea, is to copy the data to the blueflame register and have the HW take the data from there and save the memory access to the work queue entry.
> In this patch you do both, copy the data to the register, and then still the HW accesses the memory to take it.
> For some reason Blue flame is not working on your system and we need to understand why, I'll be glad to work with you to debug it.
> Anyway, this patch is not the solution (even that it works on your systems) for the problem.
> 
> Yevgeny

Hello, Yevgeny.

We use a MT26448 (lspci -v output bellow) on a POWER7. Any other
information, tests or debug patches you want me to try, just tell me.

I expected this was really not the proper fix, but thought it would be
better to send it than just guess. Any clues what the problem might be?
Perhaps, we would have to disable blue frame for this particular device?

Regards,
Cascardo.

---
lspci output
0006:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
---
lspci -v output
0006:01:00.0 0200: 15b3:6750 (rev b0)
        Subsystem: 1014:0416
        Flags: bus master, fast devsel, latency 0, IRQ 31
        Memory at 3da47be00000 (64-bit, non-prefetchable) [size=1M]
        Memory at 3da47c000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at 3da47bf00000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [148] Device Serial Number 00-02-c9-03-00-0f-50-be
        Kernel driver in use: mlx4_core
        Kernel modules: mlx4_en, mlx4_core
---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-03 14:37   ` Thadeu Lima de Souza Cascardo
@ 2011-10-03 14:56     ` Yevgeny Petrilin
  2011-10-03 20:53       ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 36+ messages in thread
From: Yevgeny Petrilin @ 2011-10-03 14:56 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo; +Cc: netdev, Eli Cohen, eli

> Hello, Yevgeny.
> 
> We use a MT26448 (lspci -v output bellow) on a POWER7. Any other
> information, tests or debug patches you want me to try, just tell me.
> 
> I expected this was really not the proper fix, but thought it would be
> better to send it than just guess. Any clues what the problem might be?
> Perhaps, we would have to disable blue frame for this particular device?
> 
> Regards,
> Cascardo.
> 
> ---
> lspci output
> 0006:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
> ---
> lspci -v output
> 0006:01:00.0 0200: 15b3:6750 (rev b0)
>         Subsystem: 1014:0416
>         Flags: bus master, fast devsel, latency 0, IRQ 31
>         Memory at 3da47be00000 (64-bit, non-prefetchable) [size=1M]
>         Memory at 3da47c000000 (64-bit, prefetchable) [size=32M]
>         Expansion ROM at 3da47bf00000 [disabled] [size=1M]
>         Capabilities: [40] Power Management version 3
>         Capabilities: [48] Vital Product Data
>         Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
>         Capabilities: [60] Express Endpoint, MSI 00
>         Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
>         Capabilities: [148] Device Serial Number 00-02-c9-03-00-0f-50-be
>         Kernel driver in use: mlx4_core
>         Kernel modules: mlx4_en, mlx4_core
> ---

Cascardo,

Can you also send me the output of ethtool -i?
It seems that there is a problem with write combining on Power processors, we will check this issue.

Yevgeny

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-03 14:56     ` Yevgeny Petrilin
@ 2011-10-03 20:53       ` Thadeu Lima de Souza Cascardo
  2011-10-04  6:02         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 36+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-03 20:53 UTC (permalink / raw)
  To: Yevgeny Petrilin
  Cc: netdev, Eli Cohen, eli, Benjamin Herrenschmidt, linuxppc-dev

On Mon, Oct 03, 2011 at 02:56:08PM +0000, Yevgeny Petrilin wrote:
> > Hello, Yevgeny.
> > 
> > We use a MT26448 (lspci -v output bellow) on a POWER7. Any other
> > information, tests or debug patches you want me to try, just tell me.
> > 
> > I expected this was really not the proper fix, but thought it would be
> > better to send it than just guess. Any clues what the problem might be?
> > Perhaps, we would have to disable blue frame for this particular device?
> > 
> > Regards,
> > Cascardo.
> > 
> > ---
> > lspci output
> > 0006:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
> > ---
> > lspci -v output
> > 0006:01:00.0 0200: 15b3:6750 (rev b0)
> >         Subsystem: 1014:0416
> >         Flags: bus master, fast devsel, latency 0, IRQ 31
> >         Memory at 3da47be00000 (64-bit, non-prefetchable) [size=1M]
> >         Memory at 3da47c000000 (64-bit, prefetchable) [size=32M]
> >         Expansion ROM at 3da47bf00000 [disabled] [size=1M]
> >         Capabilities: [40] Power Management version 3
> >         Capabilities: [48] Vital Product Data
> >         Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
> >         Capabilities: [60] Express Endpoint, MSI 00
> >         Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
> >         Capabilities: [148] Device Serial Number 00-02-c9-03-00-0f-50-be
> >         Kernel driver in use: mlx4_core
> >         Kernel modules: mlx4_en, mlx4_core
> > ---
> 
> Cascardo,
> 
> Can you also send me the output of ethtool -i?
> It seems that there is a problem with write combining on Power processors, we will check this issue.
> 
> Yevgeny

Hello, Yevgeny.

You will find the output of ethtool -i below.

I am copying Ben and powerpc list, in case this is an issue with Power
processors. They can provide us some more insight into this.

Thanks,
Cascardo.

---
# ethtool -i eth10
driver: mlx4_en
version: 1.5.4.1 (March 2011)
firmware-version: 2.9.4130
bus-info: 0006:01:00.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-03 20:53       ` Thadeu Lima de Souza Cascardo
@ 2011-10-04  6:02         ` Benjamin Herrenschmidt
  2011-10-04 20:26           ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-04  6:02 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Yevgeny Petrilin, netdev, Eli Cohen, eli, linuxppc-dev

On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:

 .../...

> > Can you also send me the output of ethtool -i?
> > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > 
> > Yevgeny
> 
> Hello, Yevgeny.
> 
> You will find the output of ethtool -i below.
> 
> I am copying Ben and powerpc list, in case this is an issue with Power
> processors. They can provide us some more insight into this.

May I get some background please ? :-)

I'm not aware of a specific issue with write combining but I'd need to
know more about what you are doing and the code to do it to comment on
whether it should work or not.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-04  6:02         ` Benjamin Herrenschmidt
@ 2011-10-04 20:26           ` Thadeu Lima de Souza Cascardo
  2011-10-05  8:15             ` Eli Cohen
  2011-10-09  7:19             ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 36+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-04 20:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yevgeny Petrilin, netdev, Eli Cohen, eli, linuxppc-dev

On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> 
>  .../...
> 
> > > Can you also send me the output of ethtool -i?
> > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > 
> > > Yevgeny
> > 
> > Hello, Yevgeny.
> > 
> > You will find the output of ethtool -i below.
> > 
> > I am copying Ben and powerpc list, in case this is an issue with Power
> > processors. They can provide us some more insight into this.
> 
> May I get some background please ? :-)
> 
> I'm not aware of a specific issue with write combining but I'd need to
> know more about what you are doing and the code to do it to comment on
> whether it should work or not.
> 
> Cheers,
> Ben.
> 
> 

Hello, Ben.

Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
added blue frame support, that does not require writing to the device
memory to indicate a new packet (the doorbell register as it is called).

Well, the ring is getting full with no interrupt or packets transmitted.
I simply added a write to the doorbell register and it works for me.
Yevgeny says this is not the right fix, claiming there is a problem with
write combining on POWER. The code uses memory barriers, so I don't know
why there is any problem.

I am posting the code here to show better what the situation is.
Yevgeny can tell more about the device and the driver.

The code below is the driver as of now, including a diff with what I
changed and had resulted OK for me. Before the blue frame support, the
only code executed was the else part. I can't tell much what the device
should be seeing and doing after the blue frame part of the code is
executed. But it does send the packet if I write to the doorbell
register.

Yevgeny, can you tell us what the device should be doing and why you
think this is a problem on POWER? Is it possible that this is simply a
problem with the firmware version?

Thanks,
Cascardo.

---
        if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
!vlan_tag) {
                *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
ring->doorbell_qpn;
                op_own |= htonl((bf_index & 0xffff) << 8);
                /* Ensure new descirptor hits memory
                * before setting ownership of this descriptor to HW */
                wmb();
                tx_desc->ctrl.owner_opcode = op_own;

                wmb();

                mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
long *) &tx_desc->ctrl,
                     desc_size);

                wmb();

                ring->bf.offset ^= ring->bf.buf_size;
        } else {
                /* Ensure new descirptor hits memory
                * before setting ownership of this descriptor to HW */
                wmb();
                tx_desc->ctrl.owner_opcode = op_own;
-               wmb();
-               writel(ring->doorbell_qpn, ring->bf.uar->map +
MLX4_SEND_DOORBELL);
        }

+       wmb();
+       writel(ring->doorbell_qpn, ring->bf.uar->map +
MLX4_SEND_DOORBELL);
+
---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-04 20:26           ` Thadeu Lima de Souza Cascardo
@ 2011-10-05  8:15             ` Eli Cohen
  2011-10-06 13:57               ` Eli Cohen
  2011-10-09  7:21               ` Benjamin Herrenschmidt
  2011-10-09  7:19             ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 36+ messages in thread
From: Eli Cohen @ 2011-10-05  8:15 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Benjamin Herrenschmidt, Yevgeny Petrilin, netdev, Eli Cohen,
	linuxppc-dev

On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:

I believe we have an endianess problem here. The source buffer is in
big endian - in x86 archs, it will rich the pci device unswapped since
both x86 and pci are little endian. In ppc, it wil be swapped by the
chipset so it will reach the device in little endian which is wrong.
So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
the dwords before the call to __iowrite64_copy. Of course which should
fix this in an arch independent manner. Let me know this works for
you.

> On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > 
> >  .../...
> > 
> > > > Can you also send me the output of ethtool -i?
> > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > 
> > > > Yevgeny
> > > 
> > > Hello, Yevgeny.
> > > 
> > > You will find the output of ethtool -i below.
> > > 
> > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > processors. They can provide us some more insight into this.
> > 
> > May I get some background please ? :-)
> > 
> > I'm not aware of a specific issue with write combining but I'd need to
> > know more about what you are doing and the code to do it to comment on
> > whether it should work or not.
> > 
> > Cheers,
> > Ben.
> > 
> > 
> 
> Hello, Ben.
> 
> Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> added blue frame support, that does not require writing to the device
> memory to indicate a new packet (the doorbell register as it is called).
> 
> Well, the ring is getting full with no interrupt or packets transmitted.
> I simply added a write to the doorbell register and it works for me.
> Yevgeny says this is not the right fix, claiming there is a problem with
> write combining on POWER. The code uses memory barriers, so I don't know
> why there is any problem.
> 
> I am posting the code here to show better what the situation is.
> Yevgeny can tell more about the device and the driver.
> 
> The code below is the driver as of now, including a diff with what I
> changed and had resulted OK for me. Before the blue frame support, the
> only code executed was the else part. I can't tell much what the device
> should be seeing and doing after the blue frame part of the code is
> executed. But it does send the packet if I write to the doorbell
> register.
> 
> Yevgeny, can you tell us what the device should be doing and why you
> think this is a problem on POWER? Is it possible that this is simply a
> problem with the firmware version?
> 
> Thanks,
> Cascardo.
> 
> ---
>         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> !vlan_tag) {
>                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> ring->doorbell_qpn;
>                 op_own |= htonl((bf_index & 0xffff) << 8);
>                 /* Ensure new descirptor hits memory
>                 * before setting ownership of this descriptor to HW */
>                 wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;
> 
>                 wmb();
> 
>                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> long *) &tx_desc->ctrl,
>                      desc_size);
> 
>                 wmb();
> 
>                 ring->bf.offset ^= ring->bf.buf_size;
>         } else {
>                 /* Ensure new descirptor hits memory
>                 * before setting ownership of this descriptor to HW */
>                 wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;
> -               wmb();
> -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> MLX4_SEND_DOORBELL);
>         }
> 
> +       wmb();
> +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> MLX4_SEND_DOORBELL);
> +
> ---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-05  8:15             ` Eli Cohen
@ 2011-10-06 13:57               ` Eli Cohen
  2011-10-06 14:10                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled David Laight
  2011-10-09  7:25                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled Benjamin Herrenschmidt
  2011-10-09  7:21               ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 36+ messages in thread
From: Eli Cohen @ 2011-10-06 13:57 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Benjamin Herrenschmidt, Yevgeny Petrilin, netdev, Eli Cohen,
	linuxppc-dev

On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:

How about this patch - can you give it a try?


>From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli@mellanox.co.il>
Date: Thu, 6 Oct 2011 15:50:02 +0200
Subject: [PATCH] mlx4_en: Fix blue flame on powerpc

The source buffer used for copying into the blue flame register is already in
big endian. However, when copying to device on powerpc, the endianess is
swapped so the data reaches th device in little endian which is wrong. On x86
based platform no swapping occurs so it reaches the device with the correct
endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
is no change; on BE there will be a swap.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
---
 drivers/net/mlx4/en_tx.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
index 16337fb..3743acc 100644
--- a/drivers/net/mlx4/en_tx.c
+++ b/drivers/net/mlx4/en_tx.c
@@ -601,6 +601,16 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
 
 static void mlx4_bf_copy(unsigned long *dst, unsigned long *src, unsigned bytecnt)
 {
+	int i;
+	__le32 *psrc = (__le32 *)src;
+
+	/*
+	 * the buffer is already in big endian. For little endian machines that's
+	 * fine. For big endain machines we must swap since the chipset swaps again
+	 */
+	for (i = 0; i < bytecnt / 4; ++i)
+		psrc[i] = le32_to_cpu(psrc[i]);
+
 	__iowrite64_copy(dst, src, bytecnt / 8);
 }
 
-- 
1.7.7.rc0.70.g82660



> On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> 
> I believe we have an endianess problem here. The source buffer is in
> big endian - in x86 archs, it will rich the pci device unswapped since
> both x86 and pci are little endian. In ppc, it wil be swapped by the
> chipset so it will reach the device in little endian which is wrong.
> So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
> the dwords before the call to __iowrite64_copy. Of course which should
> fix this in an arch independent manner. Let me know this works for
> you.
> 
> > On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > >  .../...
> > > 
> > > > > Can you also send me the output of ethtool -i?
> > > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > > 
> > > > > Yevgeny
> > > > 
> > > > Hello, Yevgeny.
> > > > 
> > > > You will find the output of ethtool -i below.
> > > > 
> > > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > > processors. They can provide us some more insight into this.
> > > 
> > > May I get some background please ? :-)
> > > 
> > > I'm not aware of a specific issue with write combining but I'd need to
> > > know more about what you are doing and the code to do it to comment on
> > > whether it should work or not.
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > 
> > 
> > Hello, Ben.
> > 
> > Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> > added blue frame support, that does not require writing to the device
> > memory to indicate a new packet (the doorbell register as it is called).
> > 
> > Well, the ring is getting full with no interrupt or packets transmitted.
> > I simply added a write to the doorbell register and it works for me.
> > Yevgeny says this is not the right fix, claiming there is a problem with
> > write combining on POWER. The code uses memory barriers, so I don't know
> > why there is any problem.
> > 
> > I am posting the code here to show better what the situation is.
> > Yevgeny can tell more about the device and the driver.
> > 
> > The code below is the driver as of now, including a diff with what I
> > changed and had resulted OK for me. Before the blue frame support, the
> > only code executed was the else part. I can't tell much what the device
> > should be seeing and doing after the blue frame part of the code is
> > executed. But it does send the packet if I write to the doorbell
> > register.
> > 
> > Yevgeny, can you tell us what the device should be doing and why you
> > think this is a problem on POWER? Is it possible that this is simply a
> > problem with the firmware version?
> > 
> > Thanks,
> > Cascardo.
> > 
> > ---
> >         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> > !vlan_tag) {
> >                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> > ring->doorbell_qpn;
> >                 op_own |= htonl((bf_index & 0xffff) << 8);
> >                 /* Ensure new descirptor hits memory
> >                 * before setting ownership of this descriptor to HW */
> >                 wmb();
> >                 tx_desc->ctrl.owner_opcode = op_own;
> > 
> >                 wmb();
> > 
> >                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> > long *) &tx_desc->ctrl,
> >                      desc_size);
> > 
> >                 wmb();
> > 
> >                 ring->bf.offset ^= ring->bf.buf_size;
> >         } else {
> >                 /* Ensure new descirptor hits memory
> >                 * before setting ownership of this descriptor to HW */
> >                 wmb();
> >                 tx_desc->ctrl.owner_opcode = op_own;
> > -               wmb();
> > -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> > MLX4_SEND_DOORBELL);
> >         }
> > 
> > +       wmb();
> > +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> > MLX4_SEND_DOORBELL);
> > +
> > ---

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled
  2011-10-06 13:57               ` Eli Cohen
@ 2011-10-06 14:10                 ` David Laight
  2011-10-07 22:29                   ` Thadeu Lima de Souza Cascardo
  2011-10-09  7:28                   ` Benjamin Herrenschmidt
  2011-10-09  7:25                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled Benjamin Herrenschmidt
  1 sibling, 2 replies; 36+ messages in thread
From: David Laight @ 2011-10-06 14:10 UTC (permalink / raw)
  To: Eli Cohen, Thadeu Lima de Souza Cascardo
  Cc: Eli Cohen, Yevgeny Petrilin, linuxppc-dev, netdev

 
>  static void mlx4_bf_copy(unsigned long *dst, unsigned long *src,
unsigned bytecnt) {
> +	int i;
> +	__le32 *psrc = (__le32 *)src;
> +
> +	/*
> +	 * the buffer is already in big endian. For little endian
machines that's
> +	 * fine. For big endain machines we must swap since the chipset
swaps again
> +	 */
> +	for (i = 0; i < bytecnt / 4; ++i)
> +		psrc[i] = le32_to_cpu(psrc[i]);
> +
>  	__iowrite64_copy(dst, src, bytecnt / 8);
>  }

That code looks horrid...
1) I'm not sure the caller expects the buffer to be corrupted.
2) It contains a lot of memory cycles.
3) It looked from the calls that this code is copying descriptors,
   so the transfer length is probably 1 or 2 words - so the loop
   is inefficient.
4) ppc doesn't have a fast byteswap instruction (very new gcc might
   use the byteswapping memery access for the le32_to_cpu() though),
   so it would be better getting the byteswap done inside
   __iowrite64_copy() - since that is probably requesting a byteswap
   anyway.
OTOH I'm not at all clear about the 64bit xfers....

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled
  2011-10-06 14:10                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled David Laight
@ 2011-10-07 22:29                   ` Thadeu Lima de Souza Cascardo
  2011-10-09  7:28                   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 36+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-07 22:29 UTC (permalink / raw)
  To: David Laight; +Cc: Eli Cohen, Eli Cohen, netdev, linuxppc-dev, Yevgeny Petrilin

On Thu, Oct 06, 2011 at 03:10:28PM +0100, David Laight wrote:
>  
> >  static void mlx4_bf_copy(unsigned long *dst, unsigned long *src,
> unsigned bytecnt) {
> > +	int i;
> > +	__le32 *psrc = (__le32 *)src;
> > +
> > +	/*
> > +	 * the buffer is already in big endian. For little endian
> machines that's
> > +	 * fine. For big endain machines we must swap since the chipset
> swaps again
> > +	 */
> > +	for (i = 0; i < bytecnt / 4; ++i)
> > +		psrc[i] = le32_to_cpu(psrc[i]);
> > +
> >  	__iowrite64_copy(dst, src, bytecnt / 8);
> >  }
> 
> That code looks horrid...
> 1) I'm not sure the caller expects the buffer to be corrupted.
> 2) It contains a lot of memory cycles.
> 3) It looked from the calls that this code is copying descriptors,
>    so the transfer length is probably 1 or 2 words - so the loop
>    is inefficient.
> 4) ppc doesn't have a fast byteswap instruction (very new gcc might
>    use the byteswapping memery access for the le32_to_cpu() though),
>    so it would be better getting the byteswap done inside
>    __iowrite64_copy() - since that is probably requesting a byteswap
>    anyway.
> OTOH I'm not at all clear about the 64bit xfers....
> 
> 

Plus, it doesn't work.  :-\

After doing some more investigation, I decided to add the ARP entries
manually and test for different packet sizes. Packets larger than 256
should use the non-BF path and work. Smaller packets should fail. To my
surprise, the smaller packets were fine. Tried many different sizes and
they all succeeded. Then, tried using broadcast and it worked too (after
setting the proper sysctl). Broadcast with 0, 8 and 16 ping size, all
OK. Removed the ARP entries, the ARP responses did not get to the other
side. So, I did a pretty nasty hack to not use BF for ARP packets and
after cleaning up the ARP table, everything seems to be working fine.
What follows is my patch, just for reference.

Regards,
Cascardo.

---
--- drivers/net/mlx4/en_tx.c.orig	2011-10-03 15:42:15.736104623 -0500
+++ drivers/net/mlx4/en_tx.c	2011-10-07 17:12:38.023792483 -0500
@@ -627,6 +627,7 @@
 	int lso_header_size;
 	void *fragptr;
 	bool bounce = false;
+	bool is_arp = false;
 
 	if (!priv->port_up)
 		goto tx_drop;
@@ -698,10 +699,11 @@
 		priv->port_stats.tx_chksum_offload++;
 	}
 
+	skb_reset_mac_header(skb);
+	ethh = eth_hdr(skb);
+	is_arp = ethh && ethh->h_proto == ETH_P_ARP;
 	if (unlikely(priv->validate_loopback)) {
 		/* Copy dst mac address to wqe */
-		skb_reset_mac_header(skb);
-		ethh = eth_hdr(skb);
 		if (ethh && ethh->h_dest) {
 			mac = mlx4_en_mac_to_u64(ethh->h_dest);
 			mac_h = (u32) ((mac & 0xffff00000000ULL) >> 16);
@@ -790,7 +792,7 @@
 	if (likely(!skb_shared(skb)))
 		skb_orphan(skb);
 
-	if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tag) {
+	if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tag && !is_arp) {
 		*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
 		op_own |= htonl((bf_index & 0xffff) << 8);
 		/* Ensure new descirptor hits memory
---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-04 20:26           ` Thadeu Lima de Souza Cascardo
  2011-10-05  8:15             ` Eli Cohen
@ 2011-10-09  7:19             ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-09  7:19 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: netdev, Eli Cohen, eli, linuxppc-dev, Yevgeny Petrilin

On Tue, 2011-10-04 at 17:26 -0300, Thadeu Lima de Souza Cascardo wrote:

>         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> !vlan_tag) {
>                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> ring->doorbell_qpn;

Could this have endianness problems ?

>                 op_own |= htonl((bf_index & 0xffff) << 8);

Probably better to use cpu_to_be/lexx() here. Also make sure your
descriptors are defined with the appropriate __beXX or __leXX types
so sparse can tell you when you get this wrong

>                 /* Ensure new descirptor hits memory
>                 * before setting ownership of this descriptor to HW */
>                 wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;

>                 wmb();
> 
>                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> long *) &tx_desc->ctrl,
>                      desc_size);

What does the above do ?

>                 wmb();
> 
>                 ring->bf.offset ^= ring->bf.buf_size;

Is this HW visible ? It's a bit odd, are you doing double bufferring and
trying to flip a bit ? If it's HW visible, is endian correct ?
 
        } else {
>                 /* Ensure new descirptor hits memory
>                 * before setting ownership of this descriptor to HW */
>                 wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;
> -               wmb();
> -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> MLX4_SEND_DOORBELL);
>         }
> 
> +       wmb();
> +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> MLX4_SEND_DOORBELL);
> +
> ---

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-05  8:15             ` Eli Cohen
  2011-10-06 13:57               ` Eli Cohen
@ 2011-10-09  7:21               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-09  7:21 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev

On Wed, 2011-10-05 at 10:15 +0200, Eli Cohen wrote:
> On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> 
> I believe we have an endianess problem here. The source buffer is in
> big endian - in x86 archs, it will rich the pci device unswapped since
> both x86 and pci are little endian. In ppc, it wil be swapped by the
> chipset 
  ^^^^^^^ ugh ?

> so it will reach the device in little endian which is wrong.
> So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
> the dwords before the call to __iowrite64_copy. Of course which should
> fix this in an arch independent manner. Let me know this works for
> you.

That's generally the wrong way to handle endian (to swap32 a whole
buffer).

But I need to know more about the structure of those descriptors etc...
to judge properly.

Cheers,
Ben.

> > On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > >  .../...
> > > 
> > > > > Can you also send me the output of ethtool -i?
> > > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > > 
> > > > > Yevgeny
> > > > 
> > > > Hello, Yevgeny.
> > > > 
> > > > You will find the output of ethtool -i below.
> > > > 
> > > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > > processors. They can provide us some more insight into this.
> > > 
> > > May I get some background please ? :-)
> > > 
> > > I'm not aware of a specific issue with write combining but I'd need to
> > > know more about what you are doing and the code to do it to comment on
> > > whether it should work or not.
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > 
> > 
> > Hello, Ben.
> > 
> > Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> > added blue frame support, that does not require writing to the device
> > memory to indicate a new packet (the doorbell register as it is called).
> > 
> > Well, the ring is getting full with no interrupt or packets transmitted.
> > I simply added a write to the doorbell register and it works for me.
> > Yevgeny says this is not the right fix, claiming there is a problem with
> > write combining on POWER. The code uses memory barriers, so I don't know
> > why there is any problem.
> > 
> > I am posting the code here to show better what the situation is.
> > Yevgeny can tell more about the device and the driver.
> > 
> > The code below is the driver as of now, including a diff with what I
> > changed and had resulted OK for me. Before the blue frame support, the
> > only code executed was the else part. I can't tell much what the device
> > should be seeing and doing after the blue frame part of the code is
> > executed. But it does send the packet if I write to the doorbell
> > register.
> > 
> > Yevgeny, can you tell us what the device should be doing and why you
> > think this is a problem on POWER? Is it possible that this is simply a
> > problem with the firmware version?
> > 
> > Thanks,
> > Cascardo.
> > 
> > ---
> >         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> > !vlan_tag) {
> >                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> > ring->doorbell_qpn;
> >                 op_own |= htonl((bf_index & 0xffff) << 8);
> >                 /* Ensure new descirptor hits memory
> >                 * before setting ownership of this descriptor to HW */
> >                 wmb();
> >                 tx_desc->ctrl.owner_opcode = op_own;
> > 
> >                 wmb();
> > 
> >                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> > long *) &tx_desc->ctrl,
> >                      desc_size);
> > 
> >                 wmb();
> > 
> >                 ring->bf.offset ^= ring->bf.buf_size;
> >         } else {
> >                 /* Ensure new descirptor hits memory
> >                 * before setting ownership of this descriptor to HW */
> >                 wmb();
> >                 tx_desc->ctrl.owner_opcode = op_own;
> > -               wmb();
> > -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> > MLX4_SEND_DOORBELL);
> >         }
> > 
> > +       wmb();
> > +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> > MLX4_SEND_DOORBELL);
> > +
> > ---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-06 13:57               ` Eli Cohen
  2011-10-06 14:10                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled David Laight
@ 2011-10-09  7:25                 ` Benjamin Herrenschmidt
  2011-10-09  7:35                   ` Eli Cohen
  2011-10-10  8:40                   ` David Laight
  1 sibling, 2 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-09  7:25 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev

On Thu, 2011-10-06 at 15:57 +0200, Eli Cohen wrote:
> On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:
> 
> How about this patch - can you give it a try?
> 
> 
> >From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
> From: Eli Cohen <eli@mellanox.co.il>
> Date: Thu, 6 Oct 2011 15:50:02 +0200
> Subject: [PATCH] mlx4_en: Fix blue flame on powerpc
> 
> The source buffer used for copying into the blue flame register is already in
> big endian. However, when copying to device on powerpc, the endianess is
> swapped so the data reaches th device in little endian which is wrong. On x86
> based platform no swapping occurs so it reaches the device with the correct
> endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
> is no change; on BE there will be a swap.

That looks wrong.

What is this __iowrite64_copy... oh I see

Nice, somebody _AGAIN_ added a bunch of "generic" IO accessors that are
utterly wrong on all archs except x86 (ok, -almost-). There isn't a
single bloody memory barrier in there !

So, __iowrite64_copy is doing raw_writel which will -not- swap, so your
buffer is going to have the same endianness in the destination than it
has in the source. This is _NOT_ the right place to do a swap.

It's the original construction of the descriptor that needs change. The
data itself should never need to be affected accross a copy operation
(unless your HW is terminally busted).

Cheers,
Ben.

> Signed-off-by: Eli Cohen <eli@mellanox.co.il>
> ---
>  drivers/net/mlx4/en_tx.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
> index 16337fb..3743acc 100644
> --- a/drivers/net/mlx4/en_tx.c
> +++ b/drivers/net/mlx4/en_tx.c
> @@ -601,6 +601,16 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
>  
>  static void mlx4_bf_copy(unsigned long *dst, unsigned long *src, unsigned bytecnt)
>  {
> +	int i;
> +	__le32 *psrc = (__le32 *)src;
> +
> +	/*
> +	 * the buffer is already in big endian. For little endian machines that's
> +	 * fine. For big endain machines we must swap since the chipset swaps again
> +	 */
> +	for (i = 0; i < bytecnt / 4; ++i)
> +		psrc[i] = le32_to_cpu(psrc[i]);
> +
>  	__iowrite64_copy(dst, src, bytecnt / 8);
>  }
>  
> -- 
> 1.7.7.rc0.70.g82660
> 
> 
> 
> > On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > 
> > I believe we have an endianess problem here. The source buffer is in
> > big endian - in x86 archs, it will rich the pci device unswapped since
> > both x86 and pci are little endian. In ppc, it wil be swapped by the
> > chipset so it will reach the device in little endian which is wrong.
> > So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
> > the dwords before the call to __iowrite64_copy. Of course which should
> > fix this in an arch independent manner. Let me know this works for
> > you.
> > 
> > > On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > > > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > > > 
> > > >  .../...
> > > > 
> > > > > > Can you also send me the output of ethtool -i?
> > > > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > > > 
> > > > > > Yevgeny
> > > > > 
> > > > > Hello, Yevgeny.
> > > > > 
> > > > > You will find the output of ethtool -i below.
> > > > > 
> > > > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > > > processors. They can provide us some more insight into this.
> > > > 
> > > > May I get some background please ? :-)
> > > > 
> > > > I'm not aware of a specific issue with write combining but I'd need to
> > > > know more about what you are doing and the code to do it to comment on
> > > > whether it should work or not.
> > > > 
> > > > Cheers,
> > > > Ben.
> > > > 
> > > > 
> > > 
> > > Hello, Ben.
> > > 
> > > Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> > > added blue frame support, that does not require writing to the device
> > > memory to indicate a new packet (the doorbell register as it is called).
> > > 
> > > Well, the ring is getting full with no interrupt or packets transmitted.
> > > I simply added a write to the doorbell register and it works for me.
> > > Yevgeny says this is not the right fix, claiming there is a problem with
> > > write combining on POWER. The code uses memory barriers, so I don't know
> > > why there is any problem.
> > > 
> > > I am posting the code here to show better what the situation is.
> > > Yevgeny can tell more about the device and the driver.
> > > 
> > > The code below is the driver as of now, including a diff with what I
> > > changed and had resulted OK for me. Before the blue frame support, the
> > > only code executed was the else part. I can't tell much what the device
> > > should be seeing and doing after the blue frame part of the code is
> > > executed. But it does send the packet if I write to the doorbell
> > > register.
> > > 
> > > Yevgeny, can you tell us what the device should be doing and why you
> > > think this is a problem on POWER? Is it possible that this is simply a
> > > problem with the firmware version?
> > > 
> > > Thanks,
> > > Cascardo.
> > > 
> > > ---
> > >         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> > > !vlan_tag) {
> > >                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> > > ring->doorbell_qpn;
> > >                 op_own |= htonl((bf_index & 0xffff) << 8);
> > >                 /* Ensure new descirptor hits memory
> > >                 * before setting ownership of this descriptor to HW */
> > >                 wmb();
> > >                 tx_desc->ctrl.owner_opcode = op_own;
> > > 
> > >                 wmb();
> > > 
> > >                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> > > long *) &tx_desc->ctrl,
> > >                      desc_size);
> > > 
> > >                 wmb();
> > > 
> > >                 ring->bf.offset ^= ring->bf.buf_size;
> > >         } else {
> > >                 /* Ensure new descirptor hits memory
> > >                 * before setting ownership of this descriptor to HW */
> > >                 wmb();
> > >                 tx_desc->ctrl.owner_opcode = op_own;
> > > -               wmb();
> > > -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> > > MLX4_SEND_DOORBELL);
> > >         }
> > > 
> > > +       wmb();
> > > +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> > > MLX4_SEND_DOORBELL);
> > > +
> > > ---

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled
  2011-10-06 14:10                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled David Laight
  2011-10-07 22:29                   ` Thadeu Lima de Souza Cascardo
@ 2011-10-09  7:28                   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-09  7:28 UTC (permalink / raw)
  To: David Laight
  Cc: Eli Cohen, Thadeu Lima de Souza Cascardo, Eli Cohen,
	Yevgeny Petrilin, linuxppc-dev, netdev

On Thu, 2011-10-06 at 15:10 +0100, David Laight wrote:
> horrid...
> 1) I'm not sure the caller expects the buffer to be corrupted.
> 2) It contains a lot of memory cycles.
> 3) It looked from the calls that this code is copying descriptors,
>    so the transfer length is probably 1 or 2 words - so the loop
>    is inefficient.
> 4) ppc doesn't have a fast byteswap instruction (very new gcc might
>    use the byteswapping memery access for the le32_to_cpu() though),
>    so it would be better getting the byteswap done inside
>    __iowrite64_copy() - since that is probably requesting a byteswap
>    anyway.
> OTOH I'm not at all clear about the 64bit xfers....

And it's just plain wrong anyway. You should never have to byteswap a
copy.

Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  7:25                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled Benjamin Herrenschmidt
@ 2011-10-09  7:35                   ` Eli Cohen
  2011-10-09  8:00                     ` Benjamin Herrenschmidt
  2011-10-10  8:40                   ` David Laight
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Cohen @ 2011-10-09  7:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev

On Sun, Oct 09, 2011 at 09:25:18AM +0200, Benjamin Herrenschmidt wrote:
> On Thu, 2011-10-06 at 15:57 +0200, Eli Cohen wrote:
> > On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:
> > 
> > How about this patch - can you give it a try?
> > 
> > 
> > >From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
> > From: Eli Cohen <eli@mellanox.co.il>
> > Date: Thu, 6 Oct 2011 15:50:02 +0200
> > Subject: [PATCH] mlx4_en: Fix blue flame on powerpc
> > 
> > The source buffer used for copying into the blue flame register is already in
> > big endian. However, when copying to device on powerpc, the endianess is
> > swapped so the data reaches th device in little endian which is wrong. On x86
> > based platform no swapping occurs so it reaches the device with the correct
> > endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
> > is no change; on BE there will be a swap.
> 
> That looks wrong.
Not sure I understand: are you saying that on ppc, when you call
__iowrite64_copy, it will not reach the device swapped?

The point is that we must always have the buffer ready in big endian
in memory. In the case of blue flame, we must also copy it to the
device registers in pci memory space. So if we use the buffer we
already prepared, we must have another swap. I can think of a nicer
way to implement this functionality but the question is do you think
my observation above is wrong and why.

> 
> What is this __iowrite64_copy... oh I see
> 
> Nice, somebody _AGAIN_ added a bunch of "generic" IO accessors that are
> utterly wrong on all archs except x86 (ok, -almost-). There isn't a
> single bloody memory barrier in there !
> 
> So, __iowrite64_copy is doing raw_writel which will -not- swap, so your
> buffer is going to have the same endianness in the destination than it
> has in the source. This is _NOT_ the right place to do a swap.
> 
> It's the original construction of the descriptor that needs change. The
> data itself should never need to be affected accross a copy operation
> (unless your HW is terminally busted).
> 
> Cheers,
> Ben.
> 
> > Signed-off-by: Eli Cohen <eli@mellanox.co.il>
> > ---
> >  drivers/net/mlx4/en_tx.c |   10 ++++++++++
> >  1 files changed, 10 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
> > index 16337fb..3743acc 100644
> > --- a/drivers/net/mlx4/en_tx.c
> > +++ b/drivers/net/mlx4/en_tx.c
> > @@ -601,6 +601,16 @@ u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
> >  
> >  static void mlx4_bf_copy(unsigned long *dst, unsigned long *src, unsigned bytecnt)
> >  {
> > +	int i;
> > +	__le32 *psrc = (__le32 *)src;
> > +
> > +	/*
> > +	 * the buffer is already in big endian. For little endian machines that's
> > +	 * fine. For big endain machines we must swap since the chipset swaps again
> > +	 */
> > +	for (i = 0; i < bytecnt / 4; ++i)
> > +		psrc[i] = le32_to_cpu(psrc[i]);
> > +
> >  	__iowrite64_copy(dst, src, bytecnt / 8);
> >  }
> >  
> > -- 
> > 1.7.7.rc0.70.g82660
> > 
> > 
> > 
> > > On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > > I believe we have an endianess problem here. The source buffer is in
> > > big endian - in x86 archs, it will rich the pci device unswapped since
> > > both x86 and pci are little endian. In ppc, it wil be swapped by the
> > > chipset so it will reach the device in little endian which is wrong.
> > > So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
> > > the dwords before the call to __iowrite64_copy. Of course which should
> > > fix this in an arch independent manner. Let me know this works for
> > > you.
> > > 
> > > > On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > > > > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > > > > 
> > > > >  .../...
> > > > > 
> > > > > > > Can you also send me the output of ethtool -i?
> > > > > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > > > > 
> > > > > > > Yevgeny
> > > > > > 
> > > > > > Hello, Yevgeny.
> > > > > > 
> > > > > > You will find the output of ethtool -i below.
> > > > > > 
> > > > > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > > > > processors. They can provide us some more insight into this.
> > > > > 
> > > > > May I get some background please ? :-)
> > > > > 
> > > > > I'm not aware of a specific issue with write combining but I'd need to
> > > > > know more about what you are doing and the code to do it to comment on
> > > > > whether it should work or not.
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > > 
> > > > > 
> > > > 
> > > > Hello, Ben.
> > > > 
> > > > Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> > > > added blue frame support, that does not require writing to the device
> > > > memory to indicate a new packet (the doorbell register as it is called).
> > > > 
> > > > Well, the ring is getting full with no interrupt or packets transmitted.
> > > > I simply added a write to the doorbell register and it works for me.
> > > > Yevgeny says this is not the right fix, claiming there is a problem with
> > > > write combining on POWER. The code uses memory barriers, so I don't know
> > > > why there is any problem.
> > > > 
> > > > I am posting the code here to show better what the situation is.
> > > > Yevgeny can tell more about the device and the driver.
> > > > 
> > > > The code below is the driver as of now, including a diff with what I
> > > > changed and had resulted OK for me. Before the blue frame support, the
> > > > only code executed was the else part. I can't tell much what the device
> > > > should be seeing and doing after the blue frame part of the code is
> > > > executed. But it does send the packet if I write to the doorbell
> > > > register.
> > > > 
> > > > Yevgeny, can you tell us what the device should be doing and why you
> > > > think this is a problem on POWER? Is it possible that this is simply a
> > > > problem with the firmware version?
> > > > 
> > > > Thanks,
> > > > Cascardo.
> > > > 
> > > > ---
> > > >         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> > > > !vlan_tag) {
> > > >                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> > > > ring->doorbell_qpn;
> > > >                 op_own |= htonl((bf_index & 0xffff) << 8);
> > > >                 /* Ensure new descirptor hits memory
> > > >                 * before setting ownership of this descriptor to HW */
> > > >                 wmb();
> > > >                 tx_desc->ctrl.owner_opcode = op_own;
> > > > 
> > > >                 wmb();
> > > > 
> > > >                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> > > > long *) &tx_desc->ctrl,
> > > >                      desc_size);
> > > > 
> > > >                 wmb();
> > > > 
> > > >                 ring->bf.offset ^= ring->bf.buf_size;
> > > >         } else {
> > > >                 /* Ensure new descirptor hits memory
> > > >                 * before setting ownership of this descriptor to HW */
> > > >                 wmb();
> > > >                 tx_desc->ctrl.owner_opcode = op_own;
> > > > -               wmb();
> > > > -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> > > > MLX4_SEND_DOORBELL);
> > > >         }
> > > > 
> > > > +       wmb();
> > > > +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> > > > MLX4_SEND_DOORBELL);
> > > > +
> > > > ---
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  7:35                   ` Eli Cohen
@ 2011-10-09  8:00                     ` Benjamin Herrenschmidt
  2011-10-09  8:07                       ` Eli Cohen
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-09  8:00 UTC (permalink / raw)
  To: Eli Cohen
  Cc: netdev, Eli Cohen, linuxppc-dev, Thadeu Lima de Souza Cascardo,
	Yevgeny Petrilin

On Sun, 2011-10-09 at 09:35 +0200, Eli Cohen wrote:
> On Sun, Oct 09, 2011 at 09:25:18AM +0200, Benjamin Herrenschmidt wrote:
> > On Thu, 2011-10-06 at 15:57 +0200, Eli Cohen wrote:
> > > On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:
> > > 
> > > How about this patch - can you give it a try?
> > > 
> > > 
> > > >From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
> > > From: Eli Cohen <eli@mellanox.co.il>
> > > Date: Thu, 6 Oct 2011 15:50:02 +0200
> > > Subject: [PATCH] mlx4_en: Fix blue flame on powerpc
> > > 
> > > The source buffer used for copying into the blue flame register is already in
> > > big endian. However, when copying to device on powerpc, the endianess is
> > > swapped so the data reaches th device in little endian which is wrong. On x86
> > > based platform no swapping occurs so it reaches the device with the correct
> > > endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
> > > is no change; on BE there will be a swap.
> > 
> > That looks wrong.
> Not sure I understand: are you saying that on ppc, when you call
> __iowrite64_copy, it will not reach the device swapped?

Well, first, what do you mean by "swapped" ? :-) But no, it won't for
all intend and purpose, this is a copy routine, copy routines never
swap, neither do fifo accesses for example.

> The point is that we must always have the buffer ready in big endian
> in memory. In the case of blue flame, we must also copy it to the
> device registers in pci memory space. So if we use the buffer we
> already prepared, we must have another swap. I can think of a nicer
> way to implement this functionality but the question is do you think
> my observation above is wrong and why.

No. If it's in memory BE then the copy routine will keep it BE. A copy
routine doesn't swap and doesn't affect endianness.

Additionally, a swapping phase like you proposed doing 32-bit swaps
means that you know for sure that the buffer is made of 32-bit
quantities, is that the case ? Even if you had needed that swap, if your
buffer had contained 16-bit or 64-bit quantities, you're toast.

What is this buffer anyway ? A descriptor or a network packet ?

If it's a packet, then it's data, endianness has no meaning (or rather
it has for individual fields of the packets but they are already in the
right format and a 32-bit swap will never be right).
 
It's almost never right to perform swapping when copying data (or
reading/writing a FIFO).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  8:00                     ` Benjamin Herrenschmidt
@ 2011-10-09  8:07                       ` Eli Cohen
  2011-10-09  8:38                         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Cohen @ 2011-10-09  8:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev

On Sun, Oct 09, 2011 at 10:00:54AM +0200, Benjamin Herrenschmidt wrote:
> On Sun, 2011-10-09 at 09:35 +0200, Eli Cohen wrote:
> > On Sun, Oct 09, 2011 at 09:25:18AM +0200, Benjamin Herrenschmidt wrote:
> > > On Thu, 2011-10-06 at 15:57 +0200, Eli Cohen wrote:
> > > > On Wed, Oct 05, 2011 at 10:15:02AM +0200, Eli Cohen wrote:
> > > > 
> > > > How about this patch - can you give it a try?
> > > > 
> > > > 
> > > > >From dee60547aa9e35a02835451d9e694cd80dd3072f Mon Sep 17 00:00:00 2001
> > > > From: Eli Cohen <eli@mellanox.co.il>
> > > > Date: Thu, 6 Oct 2011 15:50:02 +0200
> > > > Subject: [PATCH] mlx4_en: Fix blue flame on powerpc
> > > > 
> > > > The source buffer used for copying into the blue flame register is already in
> > > > big endian. However, when copying to device on powerpc, the endianess is
> > > > swapped so the data reaches th device in little endian which is wrong. On x86
> > > > based platform no swapping occurs so it reaches the device with the correct
> > > > endianess. Fix this by calling le32_to_cpu() on the buffer. On LE systems there
> > > > is no change; on BE there will be a swap.
> > > 
> > > That looks wrong.
> > Not sure I understand: are you saying that on ppc, when you call
> > __iowrite64_copy, it will not reach the device swapped?
> 
> Well, first, what do you mean by "swapped" ? :-) But no, it won't for
> all intend and purpose, this is a copy routine, copy routines never
> swap, neither do fifo accesses for example.
When I say swapped, I mean not necessairliy by software. I think that
the chipset will swap the the data. The reason I think so is that the
CPU arch is big endian, while PCI bus is defined as little endian.
That's why I think a swap will occur in ppc and not in x86.
> 
> > The point is that we must always have the buffer ready in big endian
> > in memory. In the case of blue flame, we must also copy it to the
> > device registers in pci memory space. So if we use the buffer we
> > already prepared, we must have another swap. I can think of a nicer
> > way to implement this functionality but the question is do you think
> > my observation above is wrong and why.
> 
> No. If it's in memory BE then the copy routine will keep it BE. A copy
> routine doesn't swap and doesn't affect endianness.
> 
> Additionally, a swapping phase like you proposed doing 32-bit swaps
> means that you know for sure that the buffer is made of 32-bit
> quantities, is that the case ?
Yes

> Even if you had needed that swap, if your
> buffer had contained 16-bit or 64-bit quantities, you're toast.
> 
> What is this buffer anyway ? A descriptor or a network packet ?
It's a special descriptor that resides both in memory and also written
to the device's register. An it contains both data and control
informartion.
> 
> If it's a packet, then it's data, endianness has no meaning (or rather
> it has for individual fields of the packets but they are already in the
> right format and a 32-bit swap will never be right).
>  
> It's almost never right to perform swapping when copying data (or
> reading/writing a FIFO).
> 
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  8:07                       ` Eli Cohen
@ 2011-10-09  8:38                         ` Benjamin Herrenschmidt
  2011-10-09  9:21                           ` Eli Cohen
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-09  8:38 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev

On Sun, 2011-10-09 at 10:07 +0200, Eli Cohen wrote:

> > Well, first, what do you mean by "swapped" ? :-) But no, it won't for
> > all intend and purpose, this is a copy routine, copy routines never
> > swap, neither do fifo accesses for example.
> When I say swapped, I mean not necessairliy by software. I think that
> the chipset will swap the the data. The reason I think so is that the
> CPU arch is big endian, while PCI bus is defined as little endian.
> That's why I think a swap will occur in ppc and not in x86.

No it won't "swap the data". The wiring between PCI and the CPU bus is
done in a way called "byte address invariant", and there is some kind of
flip of byte lanes related essentially to ensure that, but for all
intend and purpose it's transparent.

> It's a special descriptor that resides both in memory and also written
> to the device's register. An it contains both data and control
> informartion.

Data should not be swapped then. Only individual bits of control
information. In any case, the buffer should be generated in the right
format in memory to start with. The copy routine doesn't need to swap.

To go back to the driver code, the statements that ring a "bell" are:

	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;

This doesn't look right unless "doorbell_qpn" itself is already somewhat
in the appropriate byte order.

Is that vlan_tag a big or little endian quantity ? Either way, this
looks broken in either x86 or ppc unless doorbell_qpn itself is already
in the right endian.

But since later I see

	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);

That rings a nasty bell, it looks like doorbell_pqn is in CPU (native)
endian, so it should have to be swapped before being OR'ed into the
descriptor, either that or the HW does some black magic I don't
understand.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  8:38                         ` Benjamin Herrenschmidt
@ 2011-10-09  9:21                           ` Eli Cohen
  2011-10-09  9:52                             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Cohen @ 2011-10-09  9:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev

On Sun, Oct 09, 2011 at 10:38:56AM +0200, Benjamin Herrenschmidt wrote:
> On Sun, 2011-10-09 at 10:07 +0200, Eli Cohen wrote:
> 
> > > Well, first, what do you mean by "swapped" ? :-) But no, it won't for
> > > all intend and purpose, this is a copy routine, copy routines never
> > > swap, neither do fifo accesses for example.
> > When I say swapped, I mean not necessairliy by software. I think that
> > the chipset will swap the the data. The reason I think so is that the
> > CPU arch is big endian, while PCI bus is defined as little endian.
> > That's why I think a swap will occur in ppc and not in x86.
> 
> No it won't "swap the data". The wiring between PCI and the CPU bus is
> done in a way called "byte address invariant", and there is some kind of
> flip of byte lanes related essentially to ensure that, but for all
> intend and purpose it's transparent.
> 
> > It's a special descriptor that resides both in memory and also written
> > to the device's register. An it contains both data and control
> > informartion.
> 
> Data should not be swapped then. Only individual bits of control
> information. In any case, the buffer should be generated in the right
> format in memory to start with. The copy routine doesn't need to swap.
> 
> To go back to the driver code, the statements that ring a "bell" are:
> 
> 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> 
> This doesn't look right unless "doorbell_qpn" itself is already somewhat
> in the appropriate byte order.
This is something that supports my claim that the chipset swaps
endianess in ppc.
Look at mlx4_en_activate_tx_ring():
ring->doorbell_qpn = swab32(ring->qp.qpn << 8);
so in LE machines it layed as big endian in memory while in BE machines
it is layed as little endian in memory.
Then we write this value to the device registers which must get it in
big endian otherwise it won't work - and we know this works in both
ppc and x86. You can ignore the case of blue flame:

        } else if (nreq) {
                qp->sq.head += nreq;

                /*
                 * Make sure that descriptors are written before
                 * doorbell record.
                 */
                wmb();

                writel(qp->doorbell_qpn, qp->bf.uar->map +
MLX4_SEND_DOORBELL); <==  remember that it is layed in little endian
but the device must get it in big endian.

                /*
                 * Make sure doorbells don't leak out of SQ spinlock
                 * and reach the HCA out of order.
                 */
                mmiowb();

        }




> 
> Is that vlan_tag a big or little endian quantity ? Either way, this
> looks broken in either x86 or ppc unless doorbell_qpn itself is already
> in the right endian.
> 
> But since later I see
> 
> 	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> 
> That rings a nasty bell, it looks like doorbell_pqn is in CPU (native)
> endian, so it should have to be swapped before being OR'ed into the
> descriptor, either that or the HW does some black magic I don't
> understand.
> 
> Cheers,
> Ben.
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  9:21                           ` Eli Cohen
@ 2011-10-09  9:52                             ` Benjamin Herrenschmidt
  2011-10-09 10:30                               ` Eli Cohen
  2011-10-10  8:20                               ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled David Laight
  0 siblings, 2 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-09  9:52 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev


> > To go back to the driver code, the statements that ring a "bell" are:
> > 
> > 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> > 
> > This doesn't look right unless "doorbell_qpn" itself is already somewhat
> > in the appropriate byte order.

> This is something that supports my claim that the chipset swaps
> endianess in ppc.

No the chipset doesn't swap

> Look at mlx4_en_activate_tx_ring():
> ring->doorbell_qpn = swab32(ring->qp.qpn << 8);

That looks gross, I think somebody writing this driver doesn't
understand endianness.

> so in LE machines it layed as big endian in memory while in BE machines
> it is layed as little endian in memory.

Yes which is very odd, it should be layed out the same in memory
regardless of the machine. However in this case, this isn't accessed
directly via DMA (this field at least), so what you appear to be doing
here is to artificially "undo" what writel does later on (see below).

> Then we write this value to the device registers which must get it in
> big endian otherwise it won't work - and we know this works in both
> ppc and x86. You can ignore the case of blue flame:

Well it works because your device is odd and has BE registers :-) It's
however not the right way to do and it's broken in your blue flame case
(for what is now obvious reasons, see below).

What's happening basically here is that you are swapping once in
swab32 , store that swapped value, then writel will re-swap on power and
not swap on x86 (because the writel accessor performs swapping). So on
x86 you basically do LE -> BE and on ppc you do BE -> LE -> BE :-)
Pretty inefficient.

None of this has anything to do with the chipset which doesn't swap
anything behind your back.

Ideally you want to avoid that swapping altogether and use the right
accessor that indicates that your register is BE to start with. IE.
remove the swab32 completely and then use something like 
iowrite32be() instead of writel().

Basically, the problem you have is that writel() has an implicit "write
to LE register" semantic. Your register is BE. the "iomap" variants
provide you with more fine grained "be" variants to use in that case.
There's also writel_be() but that one doesn't exist on every
architecture afaik.

Now, once the mmio problem is out of the way, let's look back at how you
then use that qpn.

With the current code, you've generated something in memory which is
byte reversed, so essentially "LE" on ppc and "BE" on x86.

Then, this statement:

*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;

Will essentially write it out as-is in memory for use by the chip. The chip,
from what you say, expects BE, so this will be broken on PPC.

Here too, the right solution is to instead not do that swab32 to begin
with (ring->doorbell_qpn remains a native endian value) and instead do,
in addition to the above mentioned change to the writel:

	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);

(Also get rid of that cast and define vlan_tag as a __be32 to start
with).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  9:52                             ` Benjamin Herrenschmidt
@ 2011-10-09 10:30                               ` Eli Cohen
  2011-10-10  7:32                                 ` Benjamin Herrenschmidt
  2011-10-10  8:20                               ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled David Laight
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Cohen @ 2011-10-09 10:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev

On Sun, Oct 09, 2011 at 11:52:19AM +0200, Benjamin Herrenschmidt wrote:
> 
> > > To go back to the driver code, the statements that ring a "bell" are:
> > > 
> > > 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> > > 
> > > This doesn't look right unless "doorbell_qpn" itself is already somewhat
> > > in the appropriate byte order.
> 
> > This is something that supports my claim that the chipset swaps
> > endianess in ppc.
> 
> No the chipset doesn't swap
> 
> > Look at mlx4_en_activate_tx_ring():
> > ring->doorbell_qpn = swab32(ring->qp.qpn << 8);
> 
> That looks gross, I think somebody writing this driver doesn't
> understand endianness.
> 
> > so in LE machines it layed as big endian in memory while in BE machines
> > it is layed as little endian in memory.
> 
> Yes which is very odd, it should be layed out the same in memory
> regardless of the machine. However in this case, this isn't accessed
> directly via DMA (this field at least), so what you appear to be doing
> here is to artificially "undo" what writel does later on (see below).
> 
> > Then we write this value to the device registers which must get it in
> > big endian otherwise it won't work - and we know this works in both
> > ppc and x86. You can ignore the case of blue flame:
> 
> Well it works because your device is odd and has BE registers :-) It's
> however not the right way to do and it's broken in your blue flame case
> (for what is now obvious reasons, see below).
> 
> What's happening basically here is that you are swapping once in
> swab32 , store that swapped value, then writel will re-swap on power and
> not swap on x86 (because the writel accessor performs swapping). So on
> x86 you basically do LE -> BE and on ppc you do BE -> LE -> BE :-)
> Pretty inefficient.
> 
> None of this has anything to do with the chipset which doesn't swap
> anything behind your back.
> 
> Ideally you want to avoid that swapping altogether and use the right
> accessor that indicates that your register is BE to start with. IE.
> remove the swab32 completely and then use something like 
> iowrite32be() instead of writel().
I agree, this looks better but does it work on memory mapped io or
only on io pci space? All our registers are memory mapped...

> 
> Basically, the problem you have is that writel() has an implicit "write
> to LE register" semantic. Your register is BE. the "iomap" variants
> provide you with more fine grained "be" variants to use in that case.
> There's also writel_be() but that one doesn't exist on every
> architecture afaik.
So writel_be is the function I should use for memory mapped io? If it
does not exist for all platforms it's a pitty :-(
> 
> Now, once the mmio problem is out of the way, let's look back at how you
> then use that qpn.
> 
> With the current code, you've generated something in memory which is
> byte reversed, so essentially "LE" on ppc and "BE" on x86.
> 
> Then, this statement:
> 
> *(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> 
> Will essentially write it out as-is in memory for use by the chip. The chip,
> from what you say, expects BE, so this will be broken on PPC.
I see. So this field is layed in le for ppc and the rest of the
descriptor is be. so I assum that __iowrite64_copy() does not swap
anything but we still have tx_desc->ctrl.vlan_tag in the wrong
endianess.

> 
> Here too, the right solution is to instead not do that swab32 to begin
> with (ring->doorbell_qpn remains a native endian value) and instead do,
> in addition to the above mentioned change to the writel:
> 
> 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
> 
> (Also get rid of that cast and define vlan_tag as a __be32 to start
> with).
> 
> Cheers,
> Ben.

Thanks for your review. I will send another patch which should fix the
deficiencies.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09 10:30                               ` Eli Cohen
@ 2011-10-10  7:32                                 ` Benjamin Herrenschmidt
  2011-10-10 16:42                                   ` [PATCH] mlx4_en: fix endianness with blue frame support Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-10  7:32 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Thadeu Lima de Souza Cascardo, Yevgeny Petrilin, netdev,
	Eli Cohen, linuxppc-dev

On Sun, 2011-10-09 at 12:30 +0200, Eli Cohen wrote:

> > Ideally you want to avoid that swapping altogether and use the right
> > accessor that indicates that your register is BE to start with. IE.
> > remove the swab32 completely and then use something like 
> > iowrite32be() instead of writel().
> I agree, this looks better but does it work on memory mapped io or
> only on io pci space? All our registers are memory mapped...

The iomap functions work on both.

> > Basically, the problem you have is that writel() has an implicit "write
> > to LE register" semantic. Your register is BE. the "iomap" variants
> > provide you with more fine grained "be" variants to use in that case.
> > There's also writel_be() but that one doesn't exist on every
> > architecture afaik.
> So writel_be is the function I should use for memory mapped io? If it
> does not exist for all platforms it's a pitty :-(

Just use the iomap variant. Usually you also use pci_iomap() instead of
ioremap() but afaik, for straight MMIO, it works with normal ioremap as
well.

> > Now, once the mmio problem is out of the way, let's look back at how you
> > then use that qpn.
> > 
> > With the current code, you've generated something in memory which is
> > byte reversed, so essentially "LE" on ppc and "BE" on x86.
> > 
> > Then, this statement:
> > 
> > *(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> > 
> > Will essentially write it out as-is in memory for use by the chip. The chip,
> > from what you say, expects BE, so this will be broken on PPC.
> I see. So this field is layed in le for ppc and the rest of the
> descriptor is be. so I assum that __iowrite64_copy() does not swap
> anything but we still have tx_desc->ctrl.vlan_tag in the wrong
> endianess.

Yes because you had swapped it initially. IE your original swab32 is
what busts it for you on ppc.

> > Here too, the right solution is to instead not do that swab32 to begin
> > with (ring->doorbell_qpn remains a native endian value) and instead do,
> > in addition to the above mentioned change to the writel:
> > 
> > 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
> > 
> > (Also get rid of that cast and define vlan_tag as a __be32 to start
> > with).
> > 
> > Cheers,
> > Ben.
> 
> Thanks for your review. I will send another patch which should fix the
> deficiencies.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  9:52                             ` Benjamin Herrenschmidt
  2011-10-09 10:30                               ` Eli Cohen
@ 2011-10-10  8:20                               ` David Laight
  2011-10-10  8:29                                 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 36+ messages in thread
From: David Laight @ 2011-10-10  8:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Eli Cohen
  Cc: netdev, Eli Cohen, linuxppc-dev, Thadeu Lima de Souza Cascardo,
	Yevgeny Petrilin

 
> Then, this statement:
> 
> *(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;

...
> instead do ... :

> 	*(u32 *) (&tx_desc->ctrl.vlan_tag) |=
cpu_to_be32(ring->doorbell_qpn);
> 
> (Also get rid of that cast and define vlan_tag as a __be32 to start
> with).

Agreed, casts that change the type of memory - *(foo *)&xxx - are
generally bad news unless you are casting a generic 'buffer' to
a specific structure.
I've seen far to much code that ends up being depending on the
endianness and system word size.

For the above I'd actually suggest making 'doorbell_qpn' have the
correct endianness in order to avoid the (potential) swap every
time it is set.

You also need to treble-check the required endianness for the
'vlan_tag' in the tx descriptor. What would be needed is the
MAC PCI slave were on an x86 (LE) system.

	David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-10  8:20                               ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled David Laight
@ 2011-10-10  8:29                                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-10  8:29 UTC (permalink / raw)
  To: David Laight
  Cc: Eli Cohen, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin

On Mon, 2011-10-10 at 09:20 +0100, David Laight wrote:
> 
> For the above I'd actually suggest making 'doorbell_qpn' have the
> correct endianness in order to avoid the (potential) swap every
> time it is set.

Well, the problem is that either you'll end up swapping on x86 or you'll
end up swapping on ppc, there is no "native" MMIO accessor that allow
you to do a no-swap access whatever the arch you are on. Or rather,
there is the __raw_ one but you shouldn't use it for most things :-)
(Because it also doesn't have the right memory barriers).

So I'd rather they do it right using the simpler method, the cost of
swap is going to be negligible, probably not even measurable, and if and
only if they think they can improve on that in a second step, then
consider doing otherwise with appropriate measurements showing a
significant difference.

> You also need to treble-check the required endianness for the
> 'vlan_tag' in the tx descriptor. What would be needed is the
> MAC PCI slave were on an x86 (LE) system.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-09  7:25                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled Benjamin Herrenschmidt
  2011-10-09  7:35                   ` Eli Cohen
@ 2011-10-10  8:40                   ` David Laight
  2011-10-10  8:47                     ` Eli Cohen
  2011-10-10  8:53                     ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 36+ messages in thread
From: David Laight @ 2011-10-10  8:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Eli Cohen
  Cc: netdev, Eli Cohen, linuxppc-dev, Thadeu Lima de Souza Cascardo,
	Yevgeny Petrilin

 
> What is this __iowrite64_copy... oh I see
> 
> Nice, somebody _AGAIN_ added a bunch of "generic" IO 
> accessors that are utterly wrong on all archs except
> x86 (ok, -almost-).
> There isn't a single bloody memory barrier in there !

Actually memory barriers shouldn't really be added to
any of these 'accessor' functions.
(Or, at least, ones without barriers should be provided.)

The driver may want to to a series of writes, then a
single barrier, before a final write of a command (etc).

in_le32() from io.h is specially horrid!

	David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-10  8:40                   ` David Laight
@ 2011-10-10  8:47                     ` Eli Cohen
  2011-10-10  9:01                       ` Benjamin Herrenschmidt
  2011-10-10  8:53                     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Cohen @ 2011-10-10  8:47 UTC (permalink / raw)
  To: David Laight
  Cc: Yevgeny Petrilin, Eli Cohen, Thadeu Lima de Souza Cascardo,
	netdev, linuxppc-dev

On Mon, Oct 10, 2011 at 09:40:17AM +0100, David Laight wrote:
> 
> Actually memory barriers shouldn't really be added to
> any of these 'accessor' functions.
> (Or, at least, ones without barriers should be provided.)
> 
> The driver may want to to a series of writes, then a
> single barrier, before a final write of a command (etc).
> 
> in_le32() from io.h is specially horrid!
> 
> 	David
> 
The driver would like to control if and when we want to put a memory
barrier. We really don't want it to be done under the hood. In this
respect we prefer raw functions which are still available to all
platforms.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-10  8:40                   ` David Laight
  2011-10-10  8:47                     ` Eli Cohen
@ 2011-10-10  8:53                     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-10  8:53 UTC (permalink / raw)
  To: David Laight
  Cc: Eli Cohen, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin

On Mon, 2011-10-10 at 09:40 +0100, David Laight wrote:
> > What is this __iowrite64_copy... oh I see
> > 
> > Nice, somebody _AGAIN_ added a bunch of "generic" IO 
> > accessors that are utterly wrong on all archs except
> > x86 (ok, -almost-).
> > There isn't a single bloody memory barrier in there !
> 
> Actually memory barriers shouldn't really be added to
> any of these 'accessor' functions.
> (Or, at least, ones without barriers should be provided.)

As long as they are documented to provide no guarantee of ordering
between the stores... And x86 driver writers have any clue that they
will not be ordered vs. surrounding accesses.

> The driver may want to to a series of writes, then a
> single barrier, before a final write of a command (etc).
> 
> in_le32() from io.h is specially horrid!

The reason for that is that drivers expect fully ordered writel() vs
everything (including DMA).

Unfortunately, this is how Linux defines those semantics. I would much
prefer to require barriers explicitely but the decision was made back
then simply because the vast majority of driver writers do not
understand weakly ordered memory models and "everything should be made
to look like x86".

It would be great to come up with a set of more relaxed accessors along
with the appropriate barrier to use for drivers who "know better" but so
far all attempts at doing so have failed due to the inability to agree
on their precise semantics. Tho that was a while ago, we should probably
give it a new shot.

Cheers,
Ben. 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-10  8:47                     ` Eli Cohen
@ 2011-10-10  9:01                       ` Benjamin Herrenschmidt
  2011-10-10  9:16                         ` Eli Cohen
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-10  9:01 UTC (permalink / raw)
  To: Eli Cohen
  Cc: David Laight, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin

On Mon, 2011-10-10 at 10:47 +0200, Eli Cohen wrote:
> On Mon, Oct 10, 2011 at 09:40:17AM +0100, David Laight wrote:
> > 
> > Actually memory barriers shouldn't really be added to
> > any of these 'accessor' functions.
> > (Or, at least, ones without barriers should be provided.)
> > 
> > The driver may want to to a series of writes, then a
> > single barrier, before a final write of a command (etc).
> > 
> > in_le32() from io.h is specially horrid!
> > 
> > 	David
> > 
> The driver would like to control if and when we want to put a memory
> barrier. We really don't want it to be done under the hood. In this
> respect we prefer raw functions which are still available to all
> platforms.

 ... but not necessarily the corresponding barriers.

That's why on powerpc we had to make all rmb,wmb and mb the same, aka a
full sync, because our weaker barriers don't order cachable vs.
non-cachable.

In any case, the raw functions are a bit nasty to use because they both
don't have barriers -and- don't handle endianness. So you have to be
extra careful.

In 90% of the cases, the barriers are what you want anyway. For example
in the else case of the driver, the doorbell MMIO typically wants it, so
using writel() is fine (or iowrite32be) and will have the necessary
barriers.

The case where things get a bit more nasty is when you try to use MMIO
for low latency small-data type transfers instead of DMA, in which case
you do want the ability for the chipset to write-combine and control the
barriers more precisely.

However, this is hard and Linux doesn't provide very good accessors to
do so, thus you need to be extra careful (see my example about wmb()

In the case of the iomap "copy" operations, my problem is that they
don't properly advertise their lack of ordering since normal iomap does
have full ordering.

I believe they should provide ordering with a barrier before & a barrier
after, eventually with _relaxed variants or _raw variants for those who
"know what they are doing".

Maybe it's time for us to revive those discussions about providing a
good set of relaxed MMIO accessors with explicit barriers :-)

Cheers,
Ben.
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-10  9:01                       ` Benjamin Herrenschmidt
@ 2011-10-10  9:16                         ` Eli Cohen
  2011-10-10  9:24                           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Cohen @ 2011-10-10  9:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Laight, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin

On Mon, Oct 10, 2011 at 11:01:24AM +0200, Benjamin Herrenschmidt wrote:
> 
> The case where things get a bit more nasty is when you try to use MMIO
> for low latency small-data type transfers instead of DMA, in which case
> you do want the ability for the chipset to write-combine and control the
> barriers more precisely.
> 
> However, this is hard and Linux doesn't provide very good accessors to
> do so, thus you need to be extra careful (see my example about wmb()
> 
> In the case of the iomap "copy" operations, my problem is that they
> don't properly advertise their lack of ordering since normal iomap does
> have full ordering.
> 
> I believe they should provide ordering with a barrier before & a barrier
> after, eventually with _relaxed variants or _raw variants for those who
> "know what they are doing".

Until then I think we need to have the logic working right on ppc and
measure if blue flame buys us any improvement in ppc. If that's not
the case (e.g because write combining is not working), then maybe we
should avoid using blueflame in ppc.
Could any of the guys from IBM check this and give us feedback?
> 
> Maybe it's time for us to revive those discussions about providing a
> good set of relaxed MMIO accessors with explicit barriers :-)
> 
> Cheers,
> Ben.
>  

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-10  9:16                         ` Eli Cohen
@ 2011-10-10  9:24                           ` Benjamin Herrenschmidt
  2011-10-10  9:29                             ` Eli Cohen
  0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-10  9:24 UTC (permalink / raw)
  To: Eli Cohen
  Cc: David Laight, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin

On Mon, 2011-10-10 at 11:16 +0200, Eli Cohen wrote:

> Until then I think we need to have the logic working right on ppc and
> measure if blue flame buys us any improvement in ppc. If that's not
> the case (e.g because write combining is not working), then maybe we
> should avoid using blueflame in ppc.
> Could any of the guys from IBM check this and give us feedback?

I don't have the necessary hardware myself to test that but maybe Thadeu
can.

Note that for WC to work, things must be mapped non-guarded. You can do
that by using ioremap_prot() with pgprot_noncached_wc(PAGE_KERNEL) or
ioremap_wc() (dunno how "generic" the later is).

>From there, you should get write combining provided that you don't have
barriers between every access (ie those copy operations in their current
form should do the trick).

Cheers,
Ben.

> > Maybe it's time for us to revive those discussions about providing a
> > good set of relaxed MMIO accessors with explicit barriers :-)
> > 
> > Cheers,
> > Ben.
> >  

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-10  9:24                           ` Benjamin Herrenschmidt
@ 2011-10-10  9:29                             ` Eli Cohen
  2011-10-10 10:18                               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Cohen @ 2011-10-10  9:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Laight, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin

On Mon, Oct 10, 2011 at 11:24:05AM +0200, Benjamin Herrenschmidt wrote:
> On Mon, 2011-10-10 at 11:16 +0200, Eli Cohen wrote:
> 
> > Until then I think we need to have the logic working right on ppc and
> > measure if blue flame buys us any improvement in ppc. If that's not
> > the case (e.g because write combining is not working), then maybe we
> > should avoid using blueflame in ppc.
> > Could any of the guys from IBM check this and give us feedback?
> 
> I don't have the necessary hardware myself to test that but maybe Thadeu
> can.
> 
> Note that for WC to work, things must be mapped non-guarded. You can do
> that by using ioremap_prot() with pgprot_noncached_wc(PAGE_KERNEL) or
> ioremap_wc() (dunno how "generic" the later is).

I use the io mapping API:

at driver statrt:
        priv->bf_mapping = io_mapping_create_wc(bf_start, bf_len);
        if (!priv->bf_mapping)
                err = -ENOMEM;

and then:
        uar->bf_map = io_mapping_map_wc(priv->bf_mapping, uar->index << PAGE_SHIFT);

        
Will this work on ppc?

> 
> >From there, you should get write combining provided that you don't have
> barriers between every access (ie those copy operations in their current
> form should do the trick).
> 
> Cheers,
> Ben.
> 
> > > Maybe it's time for us to revive those discussions about providing a
> > > good set of relaxed MMIO accessors with explicit barriers :-)
> > > 
> > > Cheers,
> > > Ben.
> > >  
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled
  2011-10-10  9:29                             ` Eli Cohen
@ 2011-10-10 10:18                               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2011-10-10 10:18 UTC (permalink / raw)
  To: Eli Cohen
  Cc: David Laight, netdev, Eli Cohen, linuxppc-dev,
	Thadeu Lima de Souza Cascardo, Yevgeny Petrilin

On Mon, 2011-10-10 at 11:29 +0200, Eli Cohen wrote:
> On Mon, Oct 10, 2011 at 11:24:05AM +0200, Benjamin Herrenschmidt wrote:
> > On Mon, 2011-10-10 at 11:16 +0200, Eli Cohen wrote:
> > 
> > > Until then I think we need to have the logic working right on ppc and
> > > measure if blue flame buys us any improvement in ppc. If that's not
> > > the case (e.g because write combining is not working), then maybe we
> > > should avoid using blueflame in ppc.
> > > Could any of the guys from IBM check this and give us feedback?
> > 
> > I don't have the necessary hardware myself to test that but maybe Thadeu
> > can.
> > 
> > Note that for WC to work, things must be mapped non-guarded. You can do
> > that by using ioremap_prot() with pgprot_noncached_wc(PAGE_KERNEL) or
> > ioremap_wc() (dunno how "generic" the later is).
> 
> I use the io mapping API:
> 
> at driver statrt:
>         priv->bf_mapping = io_mapping_create_wc(bf_start, bf_len);
>         if (!priv->bf_mapping)
>                 err = -ENOMEM;
> 
> and then:
>         uar->bf_map = io_mapping_map_wc(priv->bf_mapping, uar->index << PAGE_SHIFT);
> 
>         
> Will this work on ppc?

That API has never been tested on ppc I suspect. We don't have
CONFIG_HAVE_ATOMIC_IOMAP (mostly because we never needed it, it
was designed and only ever used for Intel graphics before), so
it will fallback to:

static inline struct io_mapping *
io_mapping_create_wc(resource_size_t base, unsigned long size)
{
	return (struct io_mapping __force *) ioremap_wc(base, size);
}

Which should work (hopefully :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] mlx4_en: fix endianness with blue frame support
  2011-10-10  7:32                                 ` Benjamin Herrenschmidt
@ 2011-10-10 16:42                                   ` Thadeu Lima de Souza Cascardo
  2011-10-10 16:46                                     ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 36+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-10 16:42 UTC (permalink / raw)
  To: netdev
  Cc: linuxppc-dev, Thadeu Lima de Souza Cascardo, Eli Cohen,
	Yevgeny Petrilin, Benjamin Herrenschmidt

The doorbell register was being unconditionally swapped. In x86, that
meant it was being swapped to BE and written to the descriptor and to
memory, depending on the case of blue frame support or writing to
doorbell register. On PPC, this meant it was being swapped to LE and
then swapped back to BE while writing to the register. But in the blue
frame case, it was being written as LE to the descriptor.

The fix is not to swap doorbell unconditionally, write it to the
register as BE and convert it to BE when writing it to the descriptor.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Reported-by: Richard Hendrickson <richhend@us.ibm.com>
Cc: Eli Cohen <eli@dev.mellanox.co.il>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 drivers/net/mlx4/en_tx.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
index 6e03de0..f76ab6b 100644
--- a/drivers/net/mlx4/en_tx.c
+++ b/drivers/net/mlx4/en_tx.c
@@ -172,7 +172,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
 	memset(ring->buf, 0, ring->buf_size);
 
 	ring->qp_state = MLX4_QP_STATE_RST;
-	ring->doorbell_qpn = swab32(ring->qp.qpn << 8);
+	ring->doorbell_qpn = ring->qp.qpn << 8;
 
 	mlx4_en_fill_qp_context(priv, ring->size, ring->stride, 1, 0, ring->qpn,
 				ring->cqn, &ring->context);
@@ -791,7 +791,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		skb_orphan(skb);
 
 	if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tag) {
-		*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
+		*(__be32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
 		op_own |= htonl((bf_index & 0xffff) << 8);
 		/* Ensure new descirptor hits memory
 		* before setting ownership of this descriptor to HW */
@@ -812,7 +812,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		wmb();
 		tx_desc->ctrl.owner_opcode = op_own;
 		wmb();
-		writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
+		iowrite32be(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
 	}
 
 	/* Poll CQ here */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix endianness with blue frame support
  2011-10-10 16:42                                   ` [PATCH] mlx4_en: fix endianness with blue frame support Thadeu Lima de Souza Cascardo
@ 2011-10-10 16:46                                     ` Thadeu Lima de Souza Cascardo
  2011-10-10 18:10                                       ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-10-10 16:46 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, Eli Cohen, Yevgeny Petrilin, Benjamin Herrenschmidt

On Mon, Oct 10, 2011 at 01:42:23PM -0300, Thadeu Lima de Souza Cascardo wrote:
> The doorbell register was being unconditionally swapped. In x86, that
> meant it was being swapped to BE and written to the descriptor and to
> memory, depending on the case of blue frame support or writing to
> doorbell register. On PPC, this meant it was being swapped to LE and
> then swapped back to BE while writing to the register. But in the blue
> frame case, it was being written as LE to the descriptor.
> 
> The fix is not to swap doorbell unconditionally, write it to the
> register as BE and convert it to BE when writing it to the descriptor.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> Reported-by: Richard Hendrickson <richhend@us.ibm.com>
> Cc: Eli Cohen <eli@dev.mellanox.co.il>
> Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---

So I tested this patch and it works for me. Thanks Ben and Eli for
finding out the problem with doorbell in the descriptor.

Regards,
Cascardo.

>  drivers/net/mlx4/en_tx.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
> index 6e03de0..f76ab6b 100644
> --- a/drivers/net/mlx4/en_tx.c
> +++ b/drivers/net/mlx4/en_tx.c
> @@ -172,7 +172,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
>  	memset(ring->buf, 0, ring->buf_size);
> 
>  	ring->qp_state = MLX4_QP_STATE_RST;
> -	ring->doorbell_qpn = swab32(ring->qp.qpn << 8);
> +	ring->doorbell_qpn = ring->qp.qpn << 8;
> 
>  	mlx4_en_fill_qp_context(priv, ring->size, ring->stride, 1, 0, ring->qpn,
>  				ring->cqn, &ring->context);
> @@ -791,7 +791,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>  		skb_orphan(skb);
> 
>  	if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tag) {
> -		*(u32 *) (&tx_desc->ctrl.vlan_tag) |= ring->doorbell_qpn;
> +		*(__be32 *) (&tx_desc->ctrl.vlan_tag) |= cpu_to_be32(ring->doorbell_qpn);
>  		op_own |= htonl((bf_index & 0xffff) << 8);
>  		/* Ensure new descirptor hits memory
>  		* before setting ownership of this descriptor to HW */
> @@ -812,7 +812,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>  		wmb();
>  		tx_desc->ctrl.owner_opcode = op_own;
>  		wmb();
> -		writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> +		iowrite32be(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
>  	}
> 
>  	/* Poll CQ here */
> -- 
> 1.7.4.4
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] mlx4_en: fix endianness with blue frame support
  2011-10-10 16:46                                     ` Thadeu Lima de Souza Cascardo
@ 2011-10-10 18:10                                       ` David Miller
  0 siblings, 0 replies; 36+ messages in thread
From: David Miller @ 2011-10-10 18:10 UTC (permalink / raw)
  To: cascardo; +Cc: netdev, linuxppc-dev, eli, yevgenyp, benh

From: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Date: Mon, 10 Oct 2011 13:46:54 -0300

> On Mon, Oct 10, 2011 at 01:42:23PM -0300, Thadeu Lima de Souza Cascardo wrote:
>> The doorbell register was being unconditionally swapped. In x86, that
>> meant it was being swapped to BE and written to the descriptor and to
>> memory, depending on the case of blue frame support or writing to
>> doorbell register. On PPC, this meant it was being swapped to LE and
>> then swapped back to BE while writing to the register. But in the blue
>> frame case, it was being written as LE to the descriptor.
>> 
>> The fix is not to swap doorbell unconditionally, write it to the
>> register as BE and convert it to BE when writing it to the descriptor.
>> 
>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
>> Reported-by: Richard Hendrickson <richhend@us.ibm.com>
>> Cc: Eli Cohen <eli@dev.mellanox.co.il>
>> Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> ---
> 
> So I tested this patch and it works for me. Thanks Ben and Eli for
> finding out the problem with doorbell in the descriptor.

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-10-10 18:13 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-30 13:23 [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled Thadeu Lima de Souza Cascardo
2011-10-02  8:58 ` Yevgeny Petrilin
2011-10-03 14:37   ` Thadeu Lima de Souza Cascardo
2011-10-03 14:56     ` Yevgeny Petrilin
2011-10-03 20:53       ` Thadeu Lima de Souza Cascardo
2011-10-04  6:02         ` Benjamin Herrenschmidt
2011-10-04 20:26           ` Thadeu Lima de Souza Cascardo
2011-10-05  8:15             ` Eli Cohen
2011-10-06 13:57               ` Eli Cohen
2011-10-06 14:10                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame isenabled David Laight
2011-10-07 22:29                   ` Thadeu Lima de Souza Cascardo
2011-10-09  7:28                   ` Benjamin Herrenschmidt
2011-10-09  7:25                 ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled Benjamin Herrenschmidt
2011-10-09  7:35                   ` Eli Cohen
2011-10-09  8:00                     ` Benjamin Herrenschmidt
2011-10-09  8:07                       ` Eli Cohen
2011-10-09  8:38                         ` Benjamin Herrenschmidt
2011-10-09  9:21                           ` Eli Cohen
2011-10-09  9:52                             ` Benjamin Herrenschmidt
2011-10-09 10:30                               ` Eli Cohen
2011-10-10  7:32                                 ` Benjamin Herrenschmidt
2011-10-10 16:42                                   ` [PATCH] mlx4_en: fix endianness with blue frame support Thadeu Lima de Souza Cascardo
2011-10-10 16:46                                     ` Thadeu Lima de Souza Cascardo
2011-10-10 18:10                                       ` David Miller
2011-10-10  8:20                               ` [PATCH] mlx4_en: fix transmit of packages when blue frame is enabled David Laight
2011-10-10  8:29                                 ` Benjamin Herrenschmidt
2011-10-10  8:40                   ` David Laight
2011-10-10  8:47                     ` Eli Cohen
2011-10-10  9:01                       ` Benjamin Herrenschmidt
2011-10-10  9:16                         ` Eli Cohen
2011-10-10  9:24                           ` Benjamin Herrenschmidt
2011-10-10  9:29                             ` Eli Cohen
2011-10-10 10:18                               ` Benjamin Herrenschmidt
2011-10-10  8:53                     ` Benjamin Herrenschmidt
2011-10-09  7:21               ` Benjamin Herrenschmidt
2011-10-09  7:19             ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).