All of lore.kernel.org
 help / color / mirror / Atom feed
* mlx5 error when the skb linear space is empty
@ 2021-01-04 10:59 Xuan Zhuo
  2021-01-05 20:51 ` Saeed Mahameed
  0 siblings, 1 reply; 5+ messages in thread
From: Xuan Zhuo @ 2021-01-04 10:59 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: Magnus Karlsson, Leon Romanovsky, netdev, Saeed Mahameed

hi

In the process of developing xdp socket, we tried to directly use page to
construct skb directly, to avoid data copy. And the MAC information is also in
the page, which caused the linear space of skb to be empty. In this case, I
encountered a problem :

mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb


And when I try to copy only the mac address into the linear space of skb, the
other parts are still placed in the page. When constructing skb in this way, I
found that although the data can be sent successfully, the sending performance
is relatively poor!!

I would like to ask, is there any way to solve this problem?

dev info:
    driver: mlx5_core
    version: 5.10.0+
    firmware-version: 14.21.2328 (MT_2470112034)
    expansion-rom-version:
    bus-info: 0000:3b:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: yes





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mlx5 error when the skb linear space is empty
  2021-01-04 10:59 mlx5 error when the skb linear space is empty Xuan Zhuo
@ 2021-01-05 20:51 ` Saeed Mahameed
  2021-01-11  8:02   ` Magnus Karlsson
  0 siblings, 1 reply; 5+ messages in thread
From: Saeed Mahameed @ 2021-01-05 20:51 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Magnus Karlsson, Leon Romanovsky, netdev

On Mon, 2021-01-04 at 18:59 +0800, Xuan Zhuo wrote:
> hi
> 
> In the process of developing xdp socket, we tried to directly use
> page to
> construct skb directly, to avoid data copy. And the MAC information
> is also in
> the page, which caused the linear space of skb to be empty. In this
> case, I
> encountered a problem :
> 
> mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn
> 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
> WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
> 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
> 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
> 
> 
> And when I try to copy only the mac address into the linear space of
> skb, the
> other parts are still placed in the page. When constructing skb in
> this way, I
> found that although the data can be sent successfully, the sending
> performance
> is relatively poor!!
> 

Hi,

This is an expected behavior of ConnectX4-LX, ConnectX4-LX requires the
driver to copy at least the L2 headers into the linear part, in some
DCB/DSCP configuration it will require L3 headers.
to check what the current configuration, you can check from the driver
code:
mlx5e_calc_min_inline() // Calculates the minimum required headers to
copy to linear part per packet 

and sq->min_inline_mode; stores the minimum required by the FW.

This "must copy" requirement doesn't exist for ConnectX5 and above .. 

> I would like to ask, is there any way to solve this problem?
> 
> dev info:
>     driver: mlx5_core
>     version: 5.10.0+
>     firmware-version: 14.21.2328 (MT_2470112034)
>     expansion-rom-version:
>     bus-info: 0000:3b:00.0
>     supports-statistics: yes
>     supports-test: yes
>     supports-eeprom-access: no
>     supports-register-dump: no
>     supports-priv-flags: yes
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mlx5 error when the skb linear space is empty
  2021-01-05 20:51 ` Saeed Mahameed
@ 2021-01-11  8:02   ` Magnus Karlsson
  2021-01-12 20:35     ` Saeed Mahameed
  0 siblings, 1 reply; 5+ messages in thread
From: Magnus Karlsson @ 2021-01-11  8:02 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: Xuan Zhuo, Leon Romanovsky, Network Development

On Tue, Jan 5, 2021 at 9:51 PM Saeed Mahameed <saeed@kernel.org> wrote:
>
> On Mon, 2021-01-04 at 18:59 +0800, Xuan Zhuo wrote:
> > hi
> >
> > In the process of developing xdp socket, we tried to directly use
> > page to
> > construct skb directly, to avoid data copy. And the MAC information
> > is also in
> > the page, which caused the linear space of skb to be empty. In this
> > case, I
> > encountered a problem :
> >
> > mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn
> > 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
> > 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
> > WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
> > 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
> > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
> > 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
> >
> >
> > And when I try to copy only the mac address into the linear space of
> > skb, the
> > other parts are still placed in the page. When constructing skb in
> > this way, I
> > found that although the data can be sent successfully, the sending
> > performance
> > is relatively poor!!
> >
>
> Hi,
>
> This is an expected behavior of ConnectX4-LX, ConnectX4-LX requires the
> driver to copy at least the L2 headers into the linear part, in some
> DCB/DSCP configuration it will require L3 headers.

Do I understand this correctly if I say whatever is calling
ndo_start_xmit has to make sure at least the L2 headers is in the
linear part of the skb? If Xuan does not do this, the ConnectX4 driver
crashes, but if he does, it works. So from an ndo_start_xmit interface
perspective, what is the requirement of an skb that is passed to it?
Do all users of ndo_start_xmit make sure the L2 header is in the
linear part, or are there users that do not make sure this is the
case? Judging from the ConnectX5 code it seems that the latter is
possible (since it has code to deal with this), but from the
ConnectX4, it seems like the former is true (since it does not copy
the L2 headers into the linear part as far as I can see). Sorry for my
confusion, but I think it is important to get some clarity here as it
will decide if Xuan's patch is a good idea or not in its current form.

Thank you.

> to check what the current configuration, you can check from the driver
> code:
> mlx5e_calc_min_inline() // Calculates the minimum required headers to
> copy to linear part per packet
>
> and sq->min_inline_mode; stores the minimum required by the FW.
>
> This "must copy" requirement doesn't exist for ConnectX5 and above ..

What is the

> > I would like to ask, is there any way to solve this problem?
> >
> > dev info:
> >     driver: mlx5_core
> >     version: 5.10.0+
> >     firmware-version: 14.21.2328 (MT_2470112034)
> >     expansion-rom-version:
> >     bus-info: 0000:3b:00.0
> >     supports-statistics: yes
> >     supports-test: yes
> >     supports-eeprom-access: no
> >     supports-register-dump: no
> >     supports-priv-flags: yes
> >
> >
> >
> >
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mlx5 error when the skb linear space is empty
  2021-01-11  8:02   ` Magnus Karlsson
@ 2021-01-12 20:35     ` Saeed Mahameed
  2021-01-13  7:29       ` Magnus Karlsson
  0 siblings, 1 reply; 5+ messages in thread
From: Saeed Mahameed @ 2021-01-12 20:35 UTC (permalink / raw)
  To: Magnus Karlsson; +Cc: Xuan Zhuo, Leon Romanovsky, Network Development

On Mon, 2021-01-11 at 09:02 +0100, Magnus Karlsson wrote:
> On Tue, Jan 5, 2021 at 9:51 PM Saeed Mahameed <saeed@kernel.org>
> wrote:
> > On Mon, 2021-01-04 at 18:59 +0800, Xuan Zhuo wrote:
> > > hi
> > > 
> > > In the process of developing xdp socket, we tried to directly use
> > > page to
> > > construct skb directly, to avoid data copy. And the MAC
> > > information
> > > is also in
> > > the page, which caused the linear space of skb to be empty. In
> > > this
> > > case, I
> > > encountered a problem :
> > > 
> > > mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn
> > > 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
> > > 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
> > > WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
> > > 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
> > > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
> > > 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
> > > 
> > > 
> > > And when I try to copy only the mac address into the linear space
> > > of
> > > skb, the
> > > other parts are still placed in the page. When constructing skb
> > > in
> > > this way, I
> > > found that although the data can be sent successfully, the
> > > sending
> > > performance
> > > is relatively poor!!
> > > 
> > 
> > Hi,
> > 
> > This is an expected behavior of ConnectX4-LX, ConnectX4-LX requires
> > the
> > driver to copy at least the L2 headers into the linear part, in
> > some
> > DCB/DSCP configuration it will require L3 headers.
> 
> Do I understand this correctly if I say whatever is calling
> ndo_start_xmit has to make sure at least the L2 headers is in the
> linear part of the skb? If Xuan does not do this, the ConnectX4
> driver
> crashes, but if he does, it works. So from an ndo_start_xmit
> interface
> perspective, what is the requirement of an skb that is passed to it?
> Do all users of ndo_start_xmit make sure the L2 header is in the
> linear part, or are there users that do not make sure this is the
> case? Judging from the ConnectX5 code it seems that the latter is
> possible (since it has code to deal with this), but from the
> ConnectX4, it seems like the former is true (since it does not copy
> the L2 headers into the linear part as far as I can see). Sorry for
> my
> confusion, but I think it is important to get some clarity here as it
> will decide if Xuan's patch is a good idea or not in its current
> form.
> 

To clarify: 
Connectx4Lx, doesn't really require data to be in the linear part, I
was refereing to a HW limitation that requires the driver to copy the
L2/L3 headers (depending on current HW config) to a special area in the
tx descriptor, currently the driver copy the L2/L3 headers only from
the linear part of the SKB, but this can be changed via calling
pskb_may_pull in mlx5 ConnectX4LX tx path to make sure the linear part
has the needed data .. 

Something like:

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 61ed671fe741..5939fd8eed2c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -159,6 +159,7 @@ static inline int mlx5e_skb_l2_header_offset(struct
sk_buff *skb)
 {
 #define MLX5E_MIN_INLINE (ETH_HLEN + VLAN_HLEN)

+       /* need to check ret val */ 
+       pskb_may_pull(skb, MLX5E_MIN_INLINE);
        return max(skb_network_offset(skb), MLX5E_MIN_INLINE);
 }



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: mlx5 error when the skb linear space is empty
  2021-01-12 20:35     ` Saeed Mahameed
@ 2021-01-13  7:29       ` Magnus Karlsson
  0 siblings, 0 replies; 5+ messages in thread
From: Magnus Karlsson @ 2021-01-13  7:29 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: Xuan Zhuo, Leon Romanovsky, Network Development

On Tue, Jan 12, 2021 at 9:35 PM Saeed Mahameed <saeed@kernel.org> wrote:
>
> On Mon, 2021-01-11 at 09:02 +0100, Magnus Karlsson wrote:
> > On Tue, Jan 5, 2021 at 9:51 PM Saeed Mahameed <saeed@kernel.org>
> > wrote:
> > > On Mon, 2021-01-04 at 18:59 +0800, Xuan Zhuo wrote:
> > > > hi
> > > >
> > > > In the process of developing xdp socket, we tried to directly use
> > > > page to
> > > > construct skb directly, to avoid data copy. And the MAC
> > > > information
> > > > is also in
> > > > the page, which caused the linear space of skb to be empty. In
> > > > this
> > > > case, I
> > > > encountered a problem :
> > > >
> > > > mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn
> > > > 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
> > > > 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
> > > > WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
> > > > 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
> > > > 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
> > > > 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
> > > >
> > > >
> > > > And when I try to copy only the mac address into the linear space
> > > > of
> > > > skb, the
> > > > other parts are still placed in the page. When constructing skb
> > > > in
> > > > this way, I
> > > > found that although the data can be sent successfully, the
> > > > sending
> > > > performance
> > > > is relatively poor!!
> > > >
> > >
> > > Hi,
> > >
> > > This is an expected behavior of ConnectX4-LX, ConnectX4-LX requires
> > > the
> > > driver to copy at least the L2 headers into the linear part, in
> > > some
> > > DCB/DSCP configuration it will require L3 headers.
> >
> > Do I understand this correctly if I say whatever is calling
> > ndo_start_xmit has to make sure at least the L2 headers is in the
> > linear part of the skb? If Xuan does not do this, the ConnectX4
> > driver
> > crashes, but if he does, it works. So from an ndo_start_xmit
> > interface
> > perspective, what is the requirement of an skb that is passed to it?
> > Do all users of ndo_start_xmit make sure the L2 header is in the
> > linear part, or are there users that do not make sure this is the
> > case? Judging from the ConnectX5 code it seems that the latter is
> > possible (since it has code to deal with this), but from the
> > ConnectX4, it seems like the former is true (since it does not copy
> > the L2 headers into the linear part as far as I can see). Sorry for
> > my
> > confusion, but I think it is important to get some clarity here as it
> > will decide if Xuan's patch is a good idea or not in its current
> > form.
> >
>
> To clarify:
> Connectx4Lx, doesn't really require data to be in the linear part, I
> was refereing to a HW limitation that requires the driver to copy the
> L2/L3 headers (depending on current HW config) to a special area in the
> tx descriptor, currently the driver copy the L2/L3 headers only from
> the linear part of the SKB, but this can be changed via calling
> pskb_may_pull in mlx5 ConnectX4LX tx path to make sure the linear part
> has the needed data ..

That made it clear. Thank you.

> Something like:
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> index 61ed671fe741..5939fd8eed2c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> @@ -159,6 +159,7 @@ static inline int mlx5e_skb_l2_header_offset(struct
> sk_buff *skb)
>  {
>  #define MLX5E_MIN_INLINE (ETH_HLEN + VLAN_HLEN)
>
> +       /* need to check ret val */
> +       pskb_may_pull(skb, MLX5E_MIN_INLINE);
>         return max(skb_network_offset(skb), MLX5E_MIN_INLINE);
>  }
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-13  7:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 10:59 mlx5 error when the skb linear space is empty Xuan Zhuo
2021-01-05 20:51 ` Saeed Mahameed
2021-01-11  8:02   ` Magnus Karlsson
2021-01-12 20:35     ` Saeed Mahameed
2021-01-13  7:29       ` Magnus Karlsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.